19 research outputs found
Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data
Microbial identification is a central issue in microbiology, in particular in
the fields of infectious diseases diagnosis and industrial quality control. The
concept of species is tightly linked to the concept of biological and clinical
classification where the proximity between species is generally measured in
terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the
information provided by this well-known hierarchical structure is rarely used
by machine learning-based automatic microbial identification systems.
Structured machine learning methods were recently proposed for taking into
account the structure embedded in a hierarchy and using it as additional a
priori information, and could therefore allow to improve microbial
identification systems. We test and compare several state-of-the-art machine
learning methods for microbial identification on a new Matrix-Assisted Laser
Desorption/Ionization Time-of-Flight mass spectrometry (MALDI-TOF MS) dataset.
We include in the benchmark standard and structured methods, that leverage the
knowledge of the underlying hierarchical structure in the learning process. Our
results show that although some methods perform better than others, structured
methods do not consistently perform better than their "flat" counterparts. We
postulate that this is partly due to the fact that standard methods already
reach a high level of accuracy in this context, and that they mainly confuse
species close to each other in the tree, a case where using the known hierarchy
is not helpful
Updating the Northern Tsetse Limit in Burkina Faso (1949–2009): Impact of Global Change
The northern distribution limit of tsetse flies was updated in Burkina Faso and compared to previous limits to revise the existing map of these vectors of African trypanosomiases dating from several decades ago. From 1949 to 2009, a 25- to 150-km shift has appeared toward the south. Tsetse are now discontinuously distributed in Burkina Faso with a western and an eastern tsetse belt. This range shift can be explained by a combination of decreased rainfall and increased human density. Within a context of international control, this study provides a better understanding of the factors influencing the distribution of tsetse flies
A geografia médica e as expedições francesas para o Brasil: uma descrição da estação naval do Brasil e da Prata (1868-1870)
Sélection de variables structurée par régularisation jointe dans un cadre multi-tâches
National audienceMotivated by diagnostic applications in the field of clinical microbiology, we introduce a joint in-put/output regularization method to perform struc-tured variable selection in a multi-task setting where tasks can exhibit various degrees of correlation. Our approach extensively relies on the tree-structured group-lasso penalty and explicitly combines hierarchical structures defined across features and task by means of the Cartesian product of graphs to induce a global hierarchical group structure. A vectorization procedure is then used to solve the resulting multi-task problem with standard mono-task optimization algorithms developed for the overlapping group-lasso problem. Experimental results on simulated and real data demonstrate the interest of the approach
On learning matrices with orthogonal columns or disjoint supports
16 pagesWe investigate new matrix penalties to jointly learn linear models with orthogonality constraints, generalizing the work of Xiao et al. [24] who proposed a strictly convex matrix norm for orthogonal trans- fer. We show that this norm converges to a particular atomic norm when its convexity parameter decreases, leading to new algorithmic solutions to minimize it. We also investigate concave formulations of this norm, corresponding to more aggressive strategies to induce orthogonality, and show how these penalties can also be used to learn sparse models with disjoint supports
Large-scale Machine Learning for Metagenomics Sequence Classification
Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Due to the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. In this work, we investigate the potential of modern, large-scale machine learning implementations for taxonomic affectation of next-generation sequencing reads based on their k-mers profile. We show that machine learning-based compositional approaches benefit from increasing the number of fragments sampled from reference genome to tune their parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning these models involves training a machine learning model on about 10 8 samples in 10 7 dimensions, which is out of reach of standard soft-wares but can be done efficiently with modern implementations for large-scale machine learning. The resulting models are competitive in terms of accuracy with well-established alignment tools for problems involving a small to moderate number of candidate species, and for reasonable amounts of sequencing errors. We show, however, that compositional approaches are still limited in their ability to deal with problems involving a greater number of species, and more sensitive to sequencing errors. We finally confirm that compositional approach achieve faster prediction times, with a gain of 3 to 15 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise
Therapy-related Myeloid Neoplasms in Patients With Chronic Lymphocytic Leukemia Who Received FCR/FC as Frontline Therapy
Bulletin Bibliographique
À tous les lecteurs et collaborateurs du Bulletin Bibliographique des ASSR Pour la première fois cette année nous mettons en ligne les comptes-rendus sur note site Revues.org au rythme semestriel (juin / décembre) qui est celui des échéances du Bulletin Bibliographique. La totalité des recensions de l'année 2009 sera publiée dans notre numéro 148 (octobre-décembre)