106 research outputs found

    Analysis of nanopore detector measurements using Machine-Learning methods, with application to single-molecule kinetic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A nanopore detector has a nanometer-scale trans-membrane channel across which a potential difference is established, resulting in an ionic current through the channel in the pA-nA range. A distinctive channel current blockade signal is created as individually "captured" DNA molecules interact with the channel and modulate the channel's ionic current. The nanopore detector is sensitive enough that nearly identical DNA molecules can be classified with very high accuracy using machine learning techniques such as Hidden Markov Models (HMMs) and Support Vector Machines (SVMs).</p> <p>Results</p> <p>A non-standard implementation of an HMM, emission inversion, is used for improved classification. Additional features are considered for the feature vector employed by the SVM for classification as well: The addition of a single feature representing spike density is shown to notably improve classification results. Another, much larger, feature set expansion was studied (2500 additional features instead of 1), deriving from including all the HMM's transition probabilities. The expanded features can introduce redundant, noisy information (as well as diagnostic information) into the current feature set, and thus degrade classification performance. A hybrid Adaptive Boosting approach was used for feature selection to alleviate this problem.</p> <p>Conclusion</p> <p>The methods shown here, for more informed feature extraction, improve both classification and provide biologists and chemists with tools for obtaining a better understanding of the kinetic properties of molecules of interest.</p

    Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Baum-Welch learning procedure for Hidden Markov Models (HMMs) provides a powerful tool for tailoring HMM topologies to data for use in knowledge discovery and clustering. A linear memory procedure recently proposed by <it>Miklós, I. and Meyer, I.M. </it>describes a memory sparse version of the Baum-Welch algorithm with modifications to the original probabilistic table topologies to make memory use independent of sequence length (and linearly dependent on state number). The original description of the technique has some errors that we amend. We then compare the corrected implementation on a variety of data sets with conventional and checkpointing implementations.</p> <p>Results</p> <p>We provide a correct recurrence relation for the emission parameter estimate and extend it to parameter estimates of the Normal distribution. To accelerate estimation of the prior state probabilities, and decrease memory use, we reverse the originally proposed forward sweep. We describe different scaling strategies necessary in all real implementations of the algorithm to prevent underflow. In this paper we also describe our approach to a linear memory implementation of the Viterbi decoding algorithm (with linearity in the sequence length, while memory use is approximately independent of state number). We demonstrate the use of the linear memory implementation on an extended Duration Hidden Markov Model (DHMM) and on an HMM with a spike detection topology. Comparing the various implementations of the Baum-Welch procedure we find that the checkpointing algorithm produces the best overall tradeoff between memory use and speed. In cases where sequence length is very large (for Baum-Welch), or state number is very large (for Viterbi), the linear memory methods outlined may offer some utility.</p> <p>Conclusion</p> <p>Our performance-optimized Java implementations of Baum-Welch algorithm are available at <url>http://logos.cs.uno.edu/~achurban</url>. The described method and implementations will aid sequence alignment, gene structure prediction, HMM profile training, nanopore ionic flow blockades analysis and many other domains that require efficient HMM training with EM.</p

    Development of Solid-State Nanopore Technology for Life Detection

    Get PDF
    Biomarkers for life on Earth are an important starting point to guide the search for life elsewhere. However, the search for life beyond Earth should incorporate technologies capable of recognizing an array of potential biomarkers beyond what we see on Earth, in order to minimize the risk of false negatives from life detection missions. With this in mind, charged linear polymers may be a universal signature for life, due to their ability to store information while also inherently reducing the tendency of complex tertiary structure formation that significantly inhibit replication. Thus, these molecules are attractive targets for biosignature detection as potential "self-sustaining chemical signatures." Examples of charged linear polymers, or polyelectrolytes, include deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) as well as synthetic polyelectrolytes that could potentially support life, including threose nucleic acid (TNA) and other xenonucleic acids (XNAs). Nanopore analysis is a novel technology that has been developed for singlemolecule sequencing with exquisite single nucleotide resolution which is also well-suited for analysis of polyelectrolyte molecules. Nanopore analysis has the ability to detect repeating sequences of electrical charges in organic linear polymers, and it is not molecule- specific (i.e. it is not restricted to only DNA or RNA). In this sense, it is a better life detection technique than approaches that are based on specific molecules, such as the polymerase chain reaction (PCR), which requires that the molecule being detected be composed of DNA

    Unzipping Kinetics of Double-Stranded DNA in a Nanopore

    Get PDF
    We studied the unzipping kinetics of single molecules of double-stranded DNA by pulling one of their two strands through a narrow protein pore. PCR analysis yielded the first direct proof of DNA unzipping in such a system. The time to unzip each molecule was inferred from the ionic current signature of DNA traversal. The distribution of times to unzip under various experimental conditions fit a simple kinetic model. Using this model, we estimated the enthalpy barriers to unzipping and the effective charge of a nucleotide in the pore, which was considerably smaller than previously assumed.Comment: 10 pages, 5 figures, Accepted: Physics Review Letter

    Duration learning for analysis of nanopore ionic current blockades

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties, with potential implications for DNA sequencing. The alpha-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, frequently evidenced by a complex ionic flow blockade pattern. Typically, recorded current blockade signals have several levels of blockade, with various durations, all obeying a fixed statistical profile for a given molecule. Hidden Markov Model (HMM) based duration learning experiments on artificial two-level Gaussian blockade signals helped us to identify proper modeling framework. We then apply our framework to the real multi-level DNA hairpin blockade signal.</p> <p>Results</p> <p>The identified upper level blockade state is observed with durations that are geometrically distributed (consistent with an a physical decay process for remaining in any given state). We show that mixture of convolution chains of geometrically distributed states is better for presenting multimodal long-tailed duration phenomena. Based on learned HMM profiles we are able to classify 9 base-pair DNA hairpins with accuracy up to 99.5% on signals from same-day experiments.</p> <p>Conclusion</p> <p>We have demonstrated several implementations for <it>de novo </it>estimation of duration distribution probability density function with HMM framework and applied our model topology to the real data. The proposed design could be handy in molecular analysis based on nanopore current blockade signal.</p

    DNA Molecule Classification Using Feature Primitives

    Get PDF
    BACKGROUND: We present a novel strategy for classification of DNA molecules using measurements from an alpha-Hemolysin channel detector. The proposed approach provides excellent classification performance for five different DNA hairpins that differ in only one base-pair. For multi-class DNA classification problems, practitioners usually adopt approaches that use decision trees consisting of binary classifiers. Finding the best tree topology requires exploring all possible tree topologies and is computationally prohibitive. We propose a computational framework based on feature primitives that eliminates the need of a decision tree of binary classifiers. In the first phase, we generate a pool of weak features from nanopore blockade current measurements by using HMM analysis, principal component analysis and various wavelet filters. In the next phase, feature selection is performed using AdaBoost. AdaBoost provides an ensemble of weak learners of various types learned from feature primitives. RESULTS AND CONCLUSION: We show that our technique, despite its inherent simplicity, provides a performance comparable to recent multi-class DNA molecule classification results. Unlike the approach presented by Winters-Hilt et al., where weaker data is dropped to obtain better classification, the proposed approach provides comparable classification accuracy without any need for rejection of weak data. A weakness of this approach, on the other hand, is the very "hands-on" tuning and feature selection that is required to obtain good generalization. Simply put, this method obtains a more informed set of features and provides better results for that reason. The strength of this approach appears to be in its ability to identify strong features, an area where further results are actively being sought

    Nanopore-based kinetics analysis of individual antibody-channel and antibody-antigen interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The UNO/RIC Nanopore Detector provides a new way to study the binding and conformational changes of individual antibodies. Many critical questions regarding antibody function are still unresolved, questions that can be approached in a new way with the nanopore detector.</p> <p>Results</p> <p>We present evidence that different forms of channel blockade can be associated with the same antibody, we associate these different blockades with different orientations of "capture" of an antibody in the detector's nanometer-scale channel. We directly detect the presence of antibodies via reductions in channel current. Changes to blockade patterns upon addition of antigen suggest indirect detection of antibody/antigen binding. Similarly, DNA-hairpin anchored antibodies have been studied, where the DNA linkage is to the carboxy-terminus at the base of the antibody's Fc region, with significantly fewer types of (lengthy) capture blockades than was observed for free (un-bound) IgG antibody. The introduction of chaotropic agents and its effects on protein-protein interactions have also been observed.</p> <p>Conclusion</p> <p>Nanopore-based approaches may eventually provide a direct analysis of the complex conformational "negotiations" that occur upon binding between proteins.</p

    Support Vector Machine Implementations for Classification & Clustering

    Get PDF
    BACKGROUND: We describe Support Vector Machine (SVM) applications to classification and clustering of channel current data. SVMs are variational-calculus based methods that are constrained to have structural risk minimization (SRM), i.e., they provide noise tolerant solutions for pattern recognition. The SVM approach encapsulates a significant amount of model-fitting information in the choice of its kernel. In work thus far, novel, information-theoretic, kernels have been successfully employed for notably better performance over standard kernels. Currently there are two approaches for implementing multiclass SVMs. One is called external multi-class that arranges several binary classifiers as a decision tree such that they perform a single-class decision making function, with each leaf corresponding to a unique class. The second approach, namely internal-multiclass, involves solving a single optimization problem corresponding to the entire data set (with multiple hyperplanes). RESULTS: Each SVM approach encapsulates a significant amount of model-fitting information in its choice of kernel. In work thus far, novel, information-theoretic, kernels were successfully employed for notably better performance over standard kernels. Two SVM approaches to multiclass discrimination are described: (1) internal multiclass (with a single optimization), and (2) external multiclass (using an optimized decision tree). We describe benefits of the internal-SVM approach, along with further refinements to the internal-multiclass SVM algorithms that offer significant improvement in training time without sacrificing accuracy. In situations where the data isn't clearly separable, making for poor discrimination, signal clustering is used to provide robust and useful information – to this end, novel, SVM-based clustering methods are also described. As with the classification, there are Internal and External SVM Clustering algorithms, both of which are briefly described
    • …
    corecore