Analysis of nanopore detector measurements using Machine-Learning methods, with application to single-molecule kinetic analysis

Abstract

<p>Abstract</p> <p>Background</p> <p>A nanopore detector has a nanometer-scale trans-membrane channel across which a potential difference is established, resulting in an ionic current through the channel in the pA-nA range. A distinctive channel current blockade signal is created as individually "captured" DNA molecules interact with the channel and modulate the channel's ionic current. The nanopore detector is sensitive enough that nearly identical DNA molecules can be classified with very high accuracy using machine learning techniques such as Hidden Markov Models (HMMs) and Support Vector Machines (SVMs).</p> <p>Results</p> <p>A non-standard implementation of an HMM, emission inversion, is used for improved classification. Additional features are considered for the feature vector employed by the SVM for classification as well: The addition of a single feature representing spike density is shown to notably improve classification results. Another, much larger, feature set expansion was studied (2500 additional features instead of 1), deriving from including all the HMM's transition probabilities. The expanded features can introduce redundant, noisy information (as well as diagnostic information) into the current feature set, and thus degrade classification performance. A hybrid Adaptive Boosting approach was used for feature selection to alleviate this problem.</p> <p>Conclusion</p> <p>The methods shown here, for more informed feature extraction, improve both classification and provide biologists and chemists with tools for obtaining a better understanding of the kinetic properties of molecules of interest.</p

    Similar works