144 research outputs found

    A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bioactivity profiling using high-throughput <it>in vitro </it>assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex <it>in vitro/in vivo </it>datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods.</p> <p>Results</p> <p>The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated <it>in vitro </it>assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA.</p> <p>Conclusion</p> <p>We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.</p

    An Integrative Genomic and Epigenomic Approach for the Study of Transcriptional Regulation

    Get PDF
    The molecular heterogeneity of acute leukemias and other tumors constitutes a major obstacle towards understanding disease pathogenesis and developing new targeted-therapies. Aberrant gene regulation is a hallmark of cancer and plays a central role in determining tumor phenotype. We predicted that integration of different genome-wide epigenetic regulatory marks along with gene expression levels would provide greater power in capturing biological differences between leukemia subtypes. Gene expression, cytosine methylation and histone H3 lysine 9 (H3K9) acetylation were measured using high-density oligonucleotide microarrays in primary human acute myeloid leukemia (AML) and acute lymphocytic leukemia (ALL) specimens. We found that DNA methylation and H3K9 acetylation distinguished these leukemias of distinct cell lineage, as expected, but that an integrative analysis combining the information from each platform revealed hundreds of additional differentially expressed genes that were missed by gene expression arrays alone. This integrated analysis also enhanced the detection and statistical significance of biological pathways dysregulated in AML and ALL. Integrative epigenomic studies are thus feasible using clinical samples and provide superior detection of aberrant transcriptional programming than single-platform microarray studies

    Saffold Virus, a Human Theiler's-Like Cardiovirus, Is Ubiquitous and Causes Infection Early in Life

    Get PDF
    The family Picornaviridae contains well-known human pathogens (e.g., poliovirus, coxsackievirus, rhinovirus, and parechovirus). In addition, this family contains a number of viruses that infect animals, including members of the genus Cardiovirus such as Encephalomyocarditis virus (EMCV) and Theiler's murine encephalomyelits virus (TMEV). The latter are important murine pathogens that cause myocarditis, type 1 diabetes and chronic inflammation in the brains, mimicking multiple sclerosis. Recently, a new picornavirus was isolated from humans, named Saffold virus (SAFV). The virus is genetically related to Theiler's virus and classified as a new species in the genus Cardiovirus, which until the discovery of SAFV did not contain human viruses. By analogy with the rodent cardioviruses, SAFV may be a relevant new human pathogen. Thus far, SAFVs have sporadically been detected by molecular techniques in respiratory and fecal specimens, but the epidemiology and clinical significance remained unclear. Here we describe the first cultivated SAFV type 3 (SAFV-3) isolate, its growth characteristics, full-length sequence, and epidemiology. Unlike the previously isolated SAFV-1 and -2 viruses, SAFV-3 showed efficient growth in several cell lines with a clear cytopathic effect. The latter allowed us to conduct a large-scale serological survey by a virus-neutralization assay. This survey showed that infection by SAFV-3 occurs early in life (>75% positive at 24 months) and that the seroprevalence reaches >90% in older children and adults. Neutralizing antibodies were found in serum samples collected in several countries in Europe, Africa, and Asia. In conclusion, this study describes the first cultivated SAFV-3 isolate, its full-length sequence, and epidemiology. SAFV-3 is a highly common and widespread human virus causing infection in early childhood. This finding has important implications for understanding the impact of these ubiquitous viruses and their possible role in acute and/or chronic disease

    Proteasome inhibition for treatment of leishmaniasis, Chagas disease and sleeping sickness

    Get PDF
    Chagas disease, leishmaniasis and sleeping sickness affect 20 million people worldwide and lead to more than 50,000 deaths annually. The diseases are caused by infection with the kinetoplastid parasites Trypanosoma cruzi, Leishmania spp. and Trypanosoma brucei spp., respectively. These parasites have similar biology and genomic sequence, suggesting that all three diseases could be cured with drugs that modulate the activity of a conserved parasite target. However, no such molecular targets or broad spectrum drugs have been identified to date. Here we describe a selective inhibitor of the kinetoplastid proteasome (GNF6702) with unprecedented in vivo efficacy, which cleared parasites from mice in all three models of infection. GNF6702 inhibits the kinetoplastid proteasome through a non-competitive mechanism, does not inhibit the mammalian proteasome or growth of mammalian cells, and is well-tolerated in mice. Our data provide genetic and chemical validation of the parasite proteasome as a promising therapeutic target for treatment of kinetoplastid infections, and underscore the possibility of developing a single class of drugs for these neglected diseases

    Genome-Wide Analysis of Transcriptional Reprogramming in Mouse Models of Acute Myeloid Leukaemia

    Get PDF
    Acute leukaemias are commonly caused by mutations that corrupt the transcriptional circuitry of haematopoietic stem/progenitor cells. However, the mechanisms underlying large-scale transcriptional reprogramming remain largely unknown. Here we investigated transcriptional reprogramming at genome-scale in mouse retroviral transplant models of acute myeloid leukaemia (AML) using both gene-expression profiling and ChIP-sequencing. We identified several thousand candidate regulatory regions with altered levels of histone acetylation that were characterised by differential distribution of consensus motifs for key haematopoietic transcription factors including Gata2, Gfi1 and Sfpi1/Pu.1. In particular, downregulation of Gata2 expression was mirrored by abundant GATA motifs in regions of reduced histone acetylation suggesting an important role in leukaemogenic transcriptional reprogramming. Forced re-expression of Gata2 was not compatible with sustained growth of leukaemic cells thus suggesting a previously unrecognised role for Gata2 in downregulation during the development of AML. Additionally, large scale human AML datasets revealed significantly higher expression of GATA2 in CD34+ cells from healthy controls compared with AML blast cells. The integrated genome-scale analysis applied in this study represents a valuable and widely applicable approach to study the transcriptional control of both normal and aberrant haematopoiesis and to identify critical factors responsible for transcriptional reprogramming in human cancer
    • …
    corecore