523 research outputs found

    Systematic Interpretation of High-Throughput Biological Data

    Get PDF

    Missing value imputation improves clustering and interpretation of gene expression microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used.</p> <p>Results</p> <p>We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods.</p> <p>Conclusion</p> <p>The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can – up to a certain degree – be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA).</p

    Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of gene expression profiling in both clinical and laboratory settings would be enhanced by better characterization of variance due to individual, environmental, and technical factors. Meta-analysis of microarray data from untreated or vehicle-treated animals within the control arm of toxicogenomics studies could yield useful information on baseline fluctuations in gene expression, although control animal data has not been available on a scale and in a form best served for data-mining.</p> <p>Results</p> <p>A dataset of control animal microarray expression data was assembled by a working group of the Health and Environmental Sciences Institute's Technical Committee on the Application of Genomics in Mechanism Based Risk Assessment in order to provide a public resource for assessments of variability in baseline gene expression. Data from over 500 Affymetrix microarrays from control rat liver and kidney were collected from 16 different institutions. Thirty-five biological and technical factors were obtained for each animal, describing a wide range of study characteristics, and a subset were evaluated in detail for their contribution to total variability using multivariate statistical and graphical techniques.</p> <p>Conclusion</p> <p>The study factors that emerged as key sources of variability included gender, organ section, strain, and fasting state. These and other study factors were identified as key descriptors that should be included in the minimal information about a toxicogenomics study needed for interpretation of results by an independent source. Genes that are the most and least variable, gender-selective, or altered by fasting were also identified and functionally categorized. Better characterization of gene expression variability in control animals will aid in the design of toxicogenomics studies and in the interpretation of their results.</p

    Comprehensive plasma proteomic profiling reveals biomarkers for active tuberculosis

    Get PDF
    BACKGROUND. Tuberculosis (TB) kills more people than any other infection, and new diagnostic tests to identify active cases are required. We aimed to discover and verify novel markers for TB in nondepleted plasma. / METHODS. We applied an optimized quantitative proteomics discovery methodology based on multidimensional and orthogonal liquid chromatographic separation combined with high-resolution mass spectrometry to study nondepleted plasma of 11 patients with active TB compared with 10 healthy controls. Prioritized candidates were verified in independent UK (n = 118) and South African cohorts (n = 203). / RESULTS. We generated the most comprehensive TB plasma proteome to date, profiling 5022 proteins spanning 11 orders-of-magnitude concentration range with diverse biochemical and molecular properties. We analyzed the predominantly low–molecular weight subproteome, identifying 46 proteins with significantly increased and 90 with decreased abundance (peptide FDR ≤ 1%, q ≤ 0.05). Verification was performed for novel candidate biomarkers (CFHR5, ILF2) in 2 independent cohorts. Receiver operating characteristics analyses using a 5-protein panel (CFHR5, LRG1, CRP, LBP, and SAA1) exhibited discriminatory power in distinguishing TB from other respiratory diseases (AUC = 0.81). / CONCLUSION. We report the most comprehensive TB plasma proteome to date, identifying novel markers with verification in 2 independent cohorts, leading to a 5-protein biosignature with potential to improve TB diagnosis. With further development, these biomarkers have potential as a diagnostic triage test. / FUNDING. Colciencias, Medical Research Council, Innovate UK, NIHR, Academy of Medical Sciences, Program for Advanced Research Capacities for AIDS, Wellcome Centre for Infectious Diseases Research

    Reverse engineering of gene regulatory networks governing cell-cell communication in the microenvironment of pancreatic cancer

    Get PDF
    Background: Pancreatic ductal adenocarcinoma (PDAC) is one of the leading causes of cancer death, with a five-year survival rate of <5% and a median survival of 6 months. Extensive desmoplastic reaction is a characteristic feature and a prognostic factor of PDAC, which conveys its resistance. Desmoplastic stroma accounts for approx. 90% of tumor volume and consists predominantly of non-malignant fibroblasts (pancreatic stellate cells, PSC). Previous studies have revealed the PSC mesenchymal origins, capacity to switch between quiescent and activated states, proinflammatory features, expression of soluble factors, ability to migrate, and phagocytize. State of the art: Abundance of stroma has sparked previous attempts to dissect the interactions between PSC and tumor cells (TC) producing a common picture of a microenvironment supporting PDAC development. Unfortunately, focus on snapshot-like analysis has proven difficult to translate into therapeutical advances, as it discards the dynamic interactions in the microenvironment, as well as the temporal dynamics of gene expression itself. Gene regulatory networks (GRN) adapt to environmental cues by rewiring connections between genes, those induced modulations effectively lead to state-transitions e.g. PSC activation, or produce mutually exclusive cell-fate decisions e.g. differentiation, senescence, or death. We recognize that cell-specific assignment of stimuli, identification of genes forming the GRNs, as well as the identification of cellular state-changes remain undiscovered. We hypothesize that at an early stage, the quiescent → activated PSC transition yields a steady state PSC gene regulatory network (GRN), but the subsequent succession of impulse responses along TC→PSC→TC interaction axis drives both cell types into unstable states maintained only for the duration of the direct TC-PSC contact. Aims: Through the application of a high-throughput complexity reduction approach and in silico modeling I aim to reconstruct the GRNs underlying the cell-cell communication, and identify key soluble factors shaping the double-paracrine interactions. I aim to use the models to gain a mechanistic and functional insight into how the cues are integrated and how they affect GRN maintenance. I hope to capture cell-fate decisions and identify key dynamic changes with the ultimate goal of finding genetic markers to aid development of novel therapeutic options for this deadly malignancy. Results: We have individually stimulated PSC and TC with conditioned supernatant from the respective other cell type and recorded a time-series (1-24h) from which genome-wide microarray expression data has been generated. In this dissertation I used the time-resolved expression profiles to identify significant gene kinetics through an approach-involving gene ranking, filtering, and clustering followed by gene ontology and pathway analysis. I identified key gene interactions using a genetic algorithm embedded in a continuous time recurrent neural network (CTRNN) modeling scheme. Then I used the derived GRN’s to produce a picture of unique intercellular interactions. Through in silico simulations with the created models, and subsequent data analysis and interpretation I delivered targets for experimental testing on the inter- as well as intra-cellular levels. Experimental validation of the selected gene targets using gene silencing and qRT-PCR confirmed the in silico predicted TC network behavior; validation of the intercellular connections confirmed their dependence on the identified networks

    Identifying the molecular components that matter: a statistical modelling approach to linking functional genomics data to cell physiology

    Get PDF
    Functional genomics technologies, in which thousands of mRNAs, proteins, or metabolites can be measured in single experiments, have contributed to reshape biological investigations. One of the most important issues in the analysis of the generated large datasets is the selection of relatively small sub-sets of variables that are predictive of the physiological state of a cell or tissue. In this thesis, a truly multivariate variable selection framework using diverse functional genomics data has been developed, characterized, and tested. This framework has also been used to prove that it is possible to predict the physiological state of the tumour from the molecular state of adjacent normal cells. This allows us to identify novel genes involved in cell to cell communication. Then, using a network inference technique networks representing cell-cell communication in prostate cancer have been inferred. The analysis of these networks has revealed interesting properties that suggests a crucial role of directional signals in controlling the interplay between normal and tumour cell to cell communication. Experimental verification performed in our laboratory has provided evidence that one of the identified genes could be a novel tumour suppressor gene. In conclusion, the findings and methods reported in this thesis have contributed to further understanding of cell to cell interaction and multivariate variable selection not only by applying and extending previous work, but also by proposing novel approaches that can be applied to any functional genomics data
    • …
    corecore