2,045 research outputs found

    An exploratory data analysis method to reveal modular latent structures in high-throughput data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Modular structures are ubiquitous across various types of biological networks. The study of network modularity can help reveal regulatory mechanisms in systems biology, evolutionary biology and developmental biology. Identifying putative modular latent structures from high-throughput data using exploratory analysis can help better interpret the data and generate new hypotheses. Unsupervised learning methods designed for global dimension reduction or clustering fall short of identifying modules with factors acting in linear combinations.</p> <p>Results</p> <p>We present an exploratory data analysis method named MLSA (Modular Latent Structure Analysis) to estimate modular latent structures, which can find co-regulative modules that involve non-coexpressive genes.</p> <p>Conclusions</p> <p>Through simulations and real-data analyses, we show that the method can recover modular latent structures effectively. In addition, the method also performed very well on data generated from sparse global latent factor models. The R code is available at <url>http://userwww.service.emory.edu/~tyu8/MLSA/</url>.</p

    Probabilistic analysis of the human transcriptome with side information

    Get PDF
    Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

    Improving gene expression data interpretation by finding latent factors that co-regulate gene modules with clinical factors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the analysis of high-throughput data with a clinical outcome, researchers mostly focus on genes/proteins that show first-order relations with the clinical outcome. While this approach yields biomarkers and biological mechanisms that are easily interpretable, it may miss information that is important to the understanding of disease mechanism and/or treatment response. Here we test the hypothesis that unobserved factors can be mobilized by the living system to coordinate the response to the clinical factors.</p> <p>Results</p> <p>We developed a computational method named Guided Latent Factor Discovery (GLFD) to identify hidden factors that act in combination with the observed clinical factors to control gene modules. In simulation studies, the method recovered masked factors effectively. Using real microarray data, we demonstrate that the method identifies latent factors that are biologically relevant, and extracts more information than analyzing only the first-order response to the clinical outcome.</p> <p>Conclusions</p> <p>Finding latent factors using GLFD brings extra insight into the mechanisms of the disease/drug response. The R code of the method is available at <url>http://userwww.service.emory.edu/~tyu8/GLFD</url>.</p

    Multivariate Analysis in Metabolomics

    Get PDF
    Metabolomics aims to provide a global snapshot of all small-molecule metabolites in cells and biological fluids, free of observational biases inherent to more focused studies of metabolism. However, the staggeringly high information content of such global analyses introduces a challenge of its own; efficiently forming biologically relevant conclusions from any given metabolomics dataset indeed requires specialized forms of data analysis. One approach to finding meaning in metabolomics datasets involves multivariate analysis (MVA) methods such as principal component analysis (PCA) and partial least squares projection to latent structures (PLS), where spectral features contributing most to variation or separation are identified for further analysis. However, as with any mathematical treatment, these methods are not a panacea; this review discusses the use of multivariate analysis for metabolomics, as well as common pitfalls and misconceptions

    Multivariate Analysis in Metabolomics

    Get PDF
    Metabolomics aims to provide a global snapshot of all small-molecule metabolites in cells and biological fluids, free of observational biases inherent to more focused studies of metabolism. However, the staggeringly high information content of such global analyses introduces a challenge of its own; efficiently forming biologically relevant conclusions from any given metabolomics dataset indeed requires specialized forms of data analysis. One approach to finding meaning in metabolomics datasets involves multivariate analysis (MVA) methods such as principal component analysis (PCA) and partial least squares projection to latent structures (PLS), where spectral features contributing most to variation or separation are identified for further analysis. However, as with any mathematical treatment, these methods are not a panacea; this review discusses the use of multivariate analysis for metabolomics, as well as common pitfalls and misconceptions

    Where have all the flowers gone?: a modular systems perspective of IT infrastructure design and productivity

    Get PDF
    Assessing value of IT infrastructure investments has been both difficult and ambiguous. This research develops and tests a conceptual framework to understand the productivity process. A lagged and recursive framework is used to trace the relationship between IT infrastructure investments, infrastructure design, and organizational productivity along with contingencies of IT management and the environment. A major contribution of this study is the use of the systems perspective to disaggregate the concepts of IT infrastructure and productivity into collectively exhaustive types. Findings reveal that IT investments do not significant affect productivity but do so when used to develop an IT infrastructure design. IT management is seen to strongly influence IT infrastructure design. Similarly, organizational environment appears to significantly influence the type of productivity focus for a firm. The study adds to the existing body of knowledge through a holistic investigation of the multi-level relationship between IT infrastructure configurations, contingencies, and productivity

    Processing and Visualization of Metabolomics Data Using R

    Get PDF
    Metabolomics provides a wealth of information about the biochemical status of cells, tissues, and other biological systems. However, for many researchers, processing the large quantities of data generated in typical metabolomics experiments poses a formidable challenge. Robust computational tools are required for all data processing steps, from handling raw data to high level statistical analysis and interpretation. This chapter describes several established methods for processing and analyzing metabolomics data within the R statistical programming environment. The focus is on processing LCMS data but the methods can be applied virtually to any analytical platform. We provide a step-by-step workflow to demonstrate how to integrate, analyze, and visualize LCMS-based metabolomics data using computational tools available in R. These concepts and methods will allow specialists and nonspecialists alike to develop and evaluate their own data more critically
    • …
    corecore