48 research outputs found

    Sampling ARG of multiple populations under complex configurations of subdivision and admixture.

    Get PDF
    Abstract Motivation: Simulating complex evolution scenarios of multiple populations is an important task for answering many basic questions relating to population genomics. Apart from the population samples, the underlying Ancestral Recombinations Graph (ARG) is an additional important means in hypothesis checking and reconstruction studies. Furthermore, complex simulations require a plethora of interdependent parameters making even the scenario-specification highly non-trivial. Results: We present an algorithm SimRA that simulates generic multiple population evolution model with admixture. It is based on random graphs that improve dramatically in time and space requirements of the classical algorithm of single populations. Using the underlying random graphs model, we also derive closed forms of expected values of the ARG characteristics i.e., height of the graph, number of recombinations, number of mutations and population diversity in terms of its defining parameters. This is crucial in aiding the user to specify meaningful parameters for the complex scenario simulations, not through trial-and-error based on raw compute power but intelligent parameter estimation. To the best of our knowledge this is the first time closed form expressions have been computed for the ARG properties. We show that the expected values closely match the empirical values through simulations. Finally, we demonstrate that SimRA produces the ARG in compact forms without compromising any accuracy. We demonstrate the compactness and accuracy through extensive experiments. Availability and implementation: SimRA (Simulation based on Random graph Algorithms) source, executable, user manual and sample input-output sets are available for downloading at: https://github.com/ComputationalGenomics/SimRA Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    Combining explainable machine learning, demographic and multi-omic data to inform precision medicine strategies for inflammatory bowel disease.

    Get PDF
    Inflammatory bowel diseases (IBDs), including ulcerative colitis and Crohn's disease, affect several million individuals worldwide. These diseases are heterogeneous at the clinical, immunological and genetic levels and result from complex host and environmental interactions. Investigating drug efficacy for IBD can improve our understanding of why treatment response can vary between patients. We propose an explainable machine learning (ML) approach that combines bioinformatics and domain insight, to integrate multi-modal data and predict inter-patient variation in drug response. Using explanation of our models, we interpret the ML models' predictions to infer unique combinations of important features associated with pharmacological responses obtained during preclinical testing of drug candidates in ex vivo patient-derived fresh tissues. Our inferred multi-modal features that are predictive of drug efficacy include multi-omic data (genomic and transcriptomic), demographic, medicinal and pharmacological data. Our aim is to understand variation in patient responses before a drug candidate moves forward to clinical trials. As a pharmacological measure of drug efficacy, we measured the reduction in the release of the inflammatory cytokine TNFα from the fresh IBD tissues in the presence/absence of test drugs. We initially explored the effects of a mitogen-activated protein kinase (MAPK) inhibitor; however, we later showed our approach can be applied to other targets, test drugs or mechanisms of interest. Our best model predicted TNFα levels from demographic, medicinal and genomic features with an error of only 4.98% on unseen patients. We incorporated transcriptomic data to validate insights from genomic features. Our results showed variations in drug effectiveness (measured by ex vivo assays) between patients that differed in gender, age or condition and linked new genetic polymorphisms to patient response variation to the anti-inflammatory treatment BIRB796 (Doramapimod). Our approach models IBD drug response while also identifying its most predictive features as part of a transparent ML precision medicine strategy

    Mosquito, Bird and Human Surveillance of West Nile and Usutu Viruses in Emilia-Romagna Region (Italy) in 2010

    Get PDF
    <div><h3>Background</h3><p>In 2008, after the first West Nile virus (WNV) detection in the Emilia-Romagna region, a surveillance system, including mosquito- and bird-based surveillance, was established to evaluate the virus presence. Surveillance was improved in following years by extending the monitoring to larger areas and increasing the numbers of mosquitoes and birds tested.</p> <h3>Methodology/Principal Findings</h3><p>A network of mosquito traps, evenly distributed and regularly activated, was set up within the surveyed area. A total of 438,558 mosquitoes, grouped in 3,111 pools and 1,276 birds (1,130 actively sampled and 146 from passive surveillance), were tested by biomolecular analysis. The survey detected WNV in 3 <em>Culex pipiens</em> pools while Usutu virus (USUV) was found in 89 <em>Cx. pipiens</em> pools and in 2 <em>Aedes albopictus</em> pools. Two birds were WNV-positive and 12 were USUV-positive. Furthermore, 30 human cases of acute meningoencephalitis, possibly caused by WNV or USUV, were evaluated for both viruses and 1,053 blood bags were tested for WNV, without any positive result.</p> <h3>Conclusions/Significance</h3><p>Despite not finding symptomatic human WNV infections during 2010, the persistence of the virus, probably due to overwintering, was confirmed through viral circulation in mosquitoes and birds, as well as for USUV. In 2010, circulation of the two viruses was lower and more delayed than in 2009, but this decrease was not explained by the relative abundance of <em>Cx. pipiens</em> mosquito, which was greater in 2010. The USUV detection in mosquito species confirms the role of <em>Cx. pipiens</em> as the main vector and the possible involvement of <em>Ae. albopictus</em> in the virus cycle. The effects of meteorological conditions on the presence of USUV-positive mosquito pools were considered finding an association with drought conditions and a wide temperature range. The output produced by the surveillance system demonstrated its usefulness and reliability in terms of planning public health policies.</p> </div

    Interpreting machine learning models to investigate circadian regulation and facilitate exploration of clock function

    Get PDF
    The circadian clock is an important adaptation to life on Earth. Here, we use machine learning to predict complex, temporal, and circadian gene expression patterns in Arabidopsis. Most significantly, we classify circadian genes using DNA sequence features generated de novo from public, genomic resources, facilitating downstream application of our methods with no experimental work or prior knowledge needed. We use local model explanation that is transcript specific to rank DNA sequence features, providing a detailed profile of the potential circadian regulatory mechanisms for each transcript. Furthermore, we can discriminate the temporal phase of transcript expression using the local, explanation-derived, and ranked DNA sequence features, revealing hidden subclasses within the circadian class. Model interpretation/explanation provides the backbone of our methodological advances, giving insight into biological processes and experimental design. Next, we use model interpretation to optimize sampling strategies when we predict circadian transcripts using reduced numbers of transcriptomic timepoints. Finally, we predict the circadian time from a single, transcriptomic timepoint, deriving marker transcripts that are most impactful for accurate prediction; this could facilitate the identification of altered clock function from existing datasets

    Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes

    Get PDF
    The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.Peer reviewe
    corecore