989 research outputs found
An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis
Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Modern methods based on testing against a catalogue of previously identified mutations often yield poor predictive performance. On the other hand, machine learning techniques have demonstrated high predictive accuracy, but many of them lack interpretability to aid in identifying specific mutations which lead to resistance. We propose a novel technique, inspired by the group testing problem and Boolean compressed sensing, which yields highly accurate predictions and interpretable results at the same time.
Results: We develop a modified version of the Boolean compressed sensing problem for identifying drug resistance, and implement its formulation as an integer linear program. This allows us to characterize the predictive accuracy of the technique and select an appropriate metric to optimize. A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of our model under different regimes. We test the predictive accuracy of our approach on a variety of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to that of standard machine learning models and points to several genes with previously identified association to drug resistance
Computationally Tractable Algorithms for Finding a Subset of Non-defective Items from a Large Population
In the classical non-adaptive group testing setup, pools of items are tested
together, and the main goal of a recovery algorithm is to identify the
"complete defective set" given the outcomes of different group tests. In
contrast, the main goal of a "non-defective subset recovery" algorithm is to
identify a "subset" of non-defective items given the test outcomes. In this
paper, we present a suite of computationally efficient and analytically
tractable non-defective subset recovery algorithms. By analyzing the
probability of error of the algorithms, we obtain bounds on the number of tests
required for non-defective subset recovery with arbitrarily small probability
of error. Our analysis accounts for the impact of both the additive noise
(false positives) and dilution noise (false negatives). By comparing with the
information theoretic lower bounds, we show that the upper bounds on the number
of tests are order-wise tight up to a factor, where is the number
of defective items. We also provide simulation results that compare the
relative performance of the different algorithms and provide further insights
into their practical utility. The proposed algorithms significantly outperform
the straightforward approaches of testing items one-by-one, and of first
identifying the defective set and then choosing the non-defective items from
the complement set, in terms of the number of measurements required to ensure a
given success rate.Comment: In this revision: Unified some proofs and reorganized the paper,
corrected a small mistake in one of the proofs, added more reference
Compressed Genotyping
Significant volumes of knowledge have been accumulated in recent years
linking subtle genetic variations to a wide variety of medical disorders from
Cystic Fibrosis to mental retardation. Nevertheless, there are still great
challenges in applying this knowledge routinely in the clinic, largely due to
the relatively tedious and expensive process of DNA sequencing. Since the
genetic polymorphisms that underlie these disorders are relatively rare in the
human population, the presence or absence of a disease-linked polymorphism can
be thought of as a sparse signal. Using methods and ideas from compressed
sensing and group testing, we have developed a cost-effective genotyping
protocol. In particular, we have adapted our scheme to a recently developed
class of high throughput DNA sequencing technologies, and assembled a
mathematical framework that has some important distinctions from 'traditional'
compressed sensing ideas in order to address different biological and technical
constraints.Comment: Submitted to IEEE Transaction on Information Theory - Special Issue
on Molecular Biology and Neuroscienc
INGOT-DR: An Interpretable Classifier for Predicting Drug Resistance in M. Tuberculosis
Motivation
Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data.
Contribution
In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time.
Results
We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library
Expectation propagation on the diluted Bayesian classifier
Efficient feature selection from high-dimensional datasets is a very
important challenge in many data-driven fields of science and engineering. We
introduce a statistical mechanics inspired strategy that addresses the problem
of sparse feature selection in the context of binary classification by
leveraging a computational scheme known as expectation propagation (EP). The
algorithm is used in order to train a continuous-weights perceptron learning a
classification rule from a set of (possibly partly mislabeled) examples
provided by a teacher perceptron with diluted continuous weights. We test the
method in the Bayes optimal setting under a variety of conditions and compare
it to other state-of-the-art algorithms based on message passing and on
expectation maximization approximate inference schemes. Overall, our
simulations show that EP is a robust and competitive algorithm in terms of
variable selection properties, estimation accuracy and computational
complexity, especially when the student perceptron is trained from correlated
patterns that prevent other iterative methods from converging. Furthermore, our
numerical tests demonstrate that the algorithm is capable of learning online
the unknown values of prior parameters, such as the dilution level of the
weights of the teacher perceptron and the fraction of mislabeled examples,
quite accurately. This is achieved by means of a simple maximum likelihood
strategy that consists in minimizing the free energy associated with the EP
algorithm.Comment: 24 pages, 6 figure
Advanced photonic and electronic systems WILGA 2018
WILGA annual symposium on advanced photonic and electronic systems has been organized by young scientist for young scientists since two decades. It traditionally gathers around 400 young researchers and their tutors. Ph.D students and graduates present their recent achievements during well attended oral sessions. Wilga is a very good digest of Ph.D. works carried out at technical universities in electronics and photonics, as well as information sciences throughout Poland and some neighboring countries. Publishing patronage over Wilga keep Elektronika technical journal by SEP, IJET and Proceedings of SPIE. The latter world editorial series publishes annually more than 200 papers from Wilga. Wilga 2018 was the XLII edition of this meeting. The following topical tracks were distinguished: photonics, electronics, information technologies and system research. The article is a digest of some chosen works presented during Wilga 2018 symposium. WILGA 2017 works were published in Proc. SPIE vol.10445. WILGA 2018 works were published in Proc. SPIE vol.10808
- …