989 research outputs found

    An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis

    Get PDF
    Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Modern methods based on testing against a catalogue of previously identified mutations often yield poor predictive performance. On the other hand, machine learning techniques have demonstrated high predictive accuracy, but many of them lack interpretability to aid in identifying specific mutations which lead to resistance. We propose a novel technique, inspired by the group testing problem and Boolean compressed sensing, which yields highly accurate predictions and interpretable results at the same time. Results: We develop a modified version of the Boolean compressed sensing problem for identifying drug resistance, and implement its formulation as an integer linear program. This allows us to characterize the predictive accuracy of the technique and select an appropriate metric to optimize. A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of our model under different regimes. We test the predictive accuracy of our approach on a variety of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to that of standard machine learning models and points to several genes with previously identified association to drug resistance

    Computationally Tractable Algorithms for Finding a Subset of Non-defective Items from a Large Population

    Full text link
    In the classical non-adaptive group testing setup, pools of items are tested together, and the main goal of a recovery algorithm is to identify the "complete defective set" given the outcomes of different group tests. In contrast, the main goal of a "non-defective subset recovery" algorithm is to identify a "subset" of non-defective items given the test outcomes. In this paper, we present a suite of computationally efficient and analytically tractable non-defective subset recovery algorithms. By analyzing the probability of error of the algorithms, we obtain bounds on the number of tests required for non-defective subset recovery with arbitrarily small probability of error. Our analysis accounts for the impact of both the additive noise (false positives) and dilution noise (false negatives). By comparing with the information theoretic lower bounds, we show that the upper bounds on the number of tests are order-wise tight up to a log2K\log^2K factor, where KK is the number of defective items. We also provide simulation results that compare the relative performance of the different algorithms and provide further insights into their practical utility. The proposed algorithms significantly outperform the straightforward approaches of testing items one-by-one, and of first identifying the defective set and then choosing the non-defective items from the complement set, in terms of the number of measurements required to ensure a given success rate.Comment: In this revision: Unified some proofs and reorganized the paper, corrected a small mistake in one of the proofs, added more reference

    Compressed Genotyping

    Full text link
    Significant volumes of knowledge have been accumulated in recent years linking subtle genetic variations to a wide variety of medical disorders from Cystic Fibrosis to mental retardation. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, largely due to the relatively tedious and expensive process of DNA sequencing. Since the genetic polymorphisms that underlie these disorders are relatively rare in the human population, the presence or absence of a disease-linked polymorphism can be thought of as a sparse signal. Using methods and ideas from compressed sensing and group testing, we have developed a cost-effective genotyping protocol. In particular, we have adapted our scheme to a recently developed class of high throughput DNA sequencing technologies, and assembled a mathematical framework that has some important distinctions from 'traditional' compressed sensing ideas in order to address different biological and technical constraints.Comment: Submitted to IEEE Transaction on Information Theory - Special Issue on Molecular Biology and Neuroscienc

    INGOT-DR: An Interpretable Classifier for Predicting Drug Resistance in M. Tuberculosis

    Get PDF
    Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library

    Expectation propagation on the diluted Bayesian classifier

    Get PDF
    Efficient feature selection from high-dimensional datasets is a very important challenge in many data-driven fields of science and engineering. We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature selection in the context of binary classification by leveraging a computational scheme known as expectation propagation (EP). The algorithm is used in order to train a continuous-weights perceptron learning a classification rule from a set of (possibly partly mislabeled) examples provided by a teacher perceptron with diluted continuous weights. We test the method in the Bayes optimal setting under a variety of conditions and compare it to other state-of-the-art algorithms based on message passing and on expectation maximization approximate inference schemes. Overall, our simulations show that EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity, especially when the student perceptron is trained from correlated patterns that prevent other iterative methods from converging. Furthermore, our numerical tests demonstrate that the algorithm is capable of learning online the unknown values of prior parameters, such as the dilution level of the weights of the teacher perceptron and the fraction of mislabeled examples, quite accurately. This is achieved by means of a simple maximum likelihood strategy that consists in minimizing the free energy associated with the EP algorithm.Comment: 24 pages, 6 figure

    Advanced photonic and electronic systems WILGA 2018

    Get PDF
    WILGA annual symposium on advanced photonic and electronic systems has been organized by young scientist for young scientists since two decades. It traditionally gathers around 400 young researchers and their tutors. Ph.D students and graduates present their recent achievements during well attended oral sessions. Wilga is a very good digest of Ph.D. works carried out at technical universities in electronics and photonics, as well as information sciences throughout Poland and some neighboring countries. Publishing patronage over Wilga keep Elektronika technical journal by SEP, IJET and Proceedings of SPIE. The latter world editorial series publishes annually more than 200 papers from Wilga. Wilga 2018 was the XLII edition of this meeting. The following topical tracks were distinguished: photonics, electronics, information technologies and system research. The article is a digest of some chosen works presented during Wilga 2018 symposium. WILGA 2017 works were published in Proc. SPIE vol.10445. WILGA 2018 works were published in Proc. SPIE vol.10808
    corecore