337 research outputs found

    The FAST-AIMS Clinical Mass Spectrometry Analysis System

    Get PDF
    Within clinical proteomics, mass spectrometry analysis of biological samples is emerging as an important high-throughput technology, capable of producing powerful diagnostic and prognostic models and identifying important disease biomarkers. As interest in this area grows, and the number of such proteomics datasets continues to increase, the need has developed for efficient, comprehensive, reproducible methods of mass spectrometry data analysis by both experts and nonexperts. We have designed and implemented a stand-alone software system, FAST-AIMS, which seeks to meet this need through automation of data preprocessing, feature selection, classification model generation, and performance estimation. FAST-AIMS is an efficient and user-friendly stand-alone software for predictive analysis of mass spectrometry data. The present resource review paper will describe the features and use of the FAST-AIMS system. The system is freely available for download for noncommercial use

    Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective

    Get PDF
    Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them

    Prostate cancer screening research can benefit from network medicine: an emerging awareness

    Get PDF
    Up to date, screening for prostate cancer (PCa) remains one of the most appealing but also a very controversial topics in the urological community. PCa is the second most common cancer in men worldwide and it is universally acknowledged as a complex disease, with a multi-factorial etiology. The pathway of PCa diagnosis has changed dramatically in the last few years, with the multiparametric magnetic resonance (mpMRI) playing a starring role with the introduction of the “MRI Pathway”. In this scenario the basic tenet of network medicine (NM) that sees the disease as perturbation of a network of interconnected molecules and pathways, seems to fit perfectly with the challenges that PCa early detection must face to advance towards a more reliable technique. Integration of tests on body fluids, tissue samples, grading/staging classification, physiological parameters, MR multiparametric imaging and molecular profiling technologies must be integrated in a broader vision of “disease” and its complexity with a focus on early signs. PCa screening research can greatly benefit from NM vision since it provides a sound interpretation of data and a common language, facilitating exchange of ideas between clinicians and data analysts for exploring new research pathways in a rational, highly reliable, and reproducible way

    Comparative genomics and transcriptomics elucidate virulence mechanisms and host responses in infectious diseases

    Get PDF
    The main thematic area of the present thesis is the development and application of bioinformatics pipelines, namely whole-genome sequence (WGS) analysis and transcriptome profile analysis. These pipelines were applied to study the fungal pathogen Aspergillus fumigatus (Manuscripts I, III, and IV) and the early human immune mechanisms activated in response to different types of pathogens (bacteria, fungi, and co-infections) in sepsis patients (Manuscript II). The comparative genomic and transcriptomic analyses applied in my thesis have significantly improved our understanding of fungal pathogenicity as well as the pathogen-specific immune response mechanisms of the human host. Next to a number of novel insights, my work included in this thesis has generated a large number of new hypotheses based on big-data analysis, offering the scientific community the possibility to design exciting new research to confirm them in future experimental studies and bring us closer to actual precision medicine for infectious diseases

    Stratification of patients with clear cell renal cell carcinoma to facilitate drug repositioning

    Get PDF
    Clear cell renal cell carcinoma (ccRCC) is the most common histological type of kidney cancer and has high heterogeneity. Stratification of ccRCC is important since distinct subtypes differ in prognosis and treatment. Here, we applied a systems biology approach to stratify ccRCC into three molecular subtypes with different mRNA expression patterns and prognosis of patients. Further, we developed a set of biomarkers that could robustly classify the patients into each of the three subtypes and predict the prognosis of patients. Then, we reconstructed subtype-specific metabolic models and performed essential gene analysis to identify the potential drug targets. We identified four drug targets, including SOAT1, CRLS1, and ACACB, essential in all the three subtypes and GPD2, exclusively essential to subtype 1. Finally, we repositioned mitotane, an FDA-approved SOAT1 inhibitor, to treat ccRCC and showed that it decreased tumor cell viability and inhibited tumor cell growth based on in vitro experiments

    Genes related to apoptosis predict necrosis of the liver as a phenotype observed in rats exposed to a compendium of hepatotoxicants

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Some of the biochemical events that lead to necrosis of the liver are well-known. However, the pathogenesis of necrosis of the liver from exposure to hepatotoxicants is a complex biological response to the injury. We hypothesize that gene expression profiles can serve as a signature to predict the level of necrosis elicited by acute exposure of rats to a variety of hepatotoxicants and postulate that the expression profiles of the predictor genes in the signature can provide insight to some of the biological processes and molecular pathways that may be involved in the manifestation of necrosis of the rat liver.</p> <p>Results</p> <p>Rats were treated individually with one of seven known hepatotoxicants and were analyzed for gene expression by microarray. Liver samples were grouped by the level of necrosis exhibited in the tissue. Analysis of significantly differentially expressed genes between adjacent necrosis levels revealed that inflammation follows programmed cell death in response to the agents. Using a Random Forest classifier with feature selection, 21 informative genes were identified which achieved 90%, 80% and 60% prediction accuracies of necrosis against independent test data derived from the livers of rats exposed to acetaminophen, carbon tetrachloride, and allyl alcohol, respectively. Pathway and gene network analyses of the genes in the signature revealed several gene interactions suggestive of apoptosis as a process possibly involved in the manifestation of necrosis of the liver from exposure to the hepatotoxicants. Cytotoxic effects of TNF-α, as well as transcriptional regulation by JUN and TP53, and apoptosis-related genes possibly lead to necrosis.</p> <p>Conclusion</p> <p>The data analysis, gene selection and prediction approaches permitted grouping of the classes of rat liver samples exhibiting necrosis to improve the accuracy of predicting the level of necrosis as a phenotypic end-point observed from the exposure. The strategy, along with pathway analysis and gene network reconstruction, led to the identification of 1) expression profiles of genes as a signature of necrosis and 2) perturbed regulatory processes that exhibited biological relevance to the manifestation of necrosis from exposure of rat livers to the compendium of hepatotoxicants.</p

    A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets

    Get PDF
    BACKGROUND: Gene selection is an important step when building predictors of disease state based on gene expression data. Gene selection generally improves performance and identifies a relevant subset of genes. Many univariate and multivariate gene selection approaches have been proposed. Frequently the claim is made that genes are co-regulated (due to pathway dependencies) and that multivariate approaches are therefore per definition more desirable than univariate selection approaches. Based on the published performances of all these approaches a fair comparison of the available results can not be made. This mainly stems from two factors. First, the results are often biased, since the validation set is in one way or another involved in training the predictor, resulting in optimistically biased performance estimates. Second, the published results are often based on a small number of relatively simple datasets. Consequently no generally applicable conclusions can be drawn. RESULTS: In this study we adopted an unbiased protocol to perform a fair comparison of frequently used multivariate and univariate gene selection techniques, in combination with a ränge of classifiers. Our conclusions are based on seven gene expression datasets, across several cancer types. CONCLUSION: Our experiments illustrate that, contrary to several previous studies, in five of the seven datasets univariate selection approaches yield consistently better results than multivariate approaches. The simplest multivariate selection approach, the Top Scoring method, achieves the best results on the remaining two datasets. We conclude that the correlation structures, if present, are difficult to extract due to the small number of samples, and that consequently, overly-complex gene selection algorithms that attempt to extract these structures are prone to overtraining

    Computational models and approaches for lung cancer diagnosis

    Full text link
    The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results

    A Computational Pipeline for the Development of Multi-Marker Bio-Signature Panels and Ensemble Classifiers

    Get PDF
    BACKGROUND:Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble?RESULTS:The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity.CONCLUSION:Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway
    corecore