Deep Learning for Single-Molecule Science

Albrecht, T., Slabaugh, G.G., Alonso, E. & Al-Arif, M. R. (2017). Deep Learning for Single-Molecule Science. Nanotechnology, doi: 10.1088/1361-6528/aa8334

[img] Text - Accepted Version
Restricted to Repository staff only until 1 August 2018.

Download (724kB) | Request a copy

Abstract

Exploring and making predictions based on single-molecule data can be challenging, not only due to the sheer size of the datasets, but also because a priori knowledge about the signal characteristics is typically limited and poor signal-to-noise ratio. For example, hypothesis-driven data exploration, informed by an expectation of the signal characteristics, can lead to interpretation bias or loss of information. Equally, even when the different data categories are known, e.g., the four bases in DNA sequencing, it is often difficult to know how to make best use of the available information content. The latest developments in Machine Learning (ML), so-called Deep Learning (DL) offers an interesting, new avenues to address such challenges. In some applications, such as speech and image recognition, DL has been able to outperform conventional Machine Learning strategies and even human performance. However, to date DL has not been applied much in single-molecule science, presumably in part because relatively little is known about the 'internal workings' of such DL tools within single-molecule science as a field. In this Tutorial, we make an attempt to illustrate in a step-by-step guide how one of those, a Convolutional Neural Network, may be used for base calling in DNA sequencing applications. We compare it with a Support Vector Machine as a more conventional ML method, and and discuss some of the strengths and weaknesses of the approach. In particular, a 'deep' neural network has many features of a 'black box', which has important implications on how we look at and interpret data.

Item Type: Article
Additional Information: This is an author-created, un-copyedited version of an article accepted for publication/published in Nanotechnology. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at https://doi.org/10.1088/1361-6528/aa8334.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QC Physics
Divisions: School of Informatics > Department of Computing
URI: http://openaccess.city.ac.uk/id/eprint/17949

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics