181 research outputs found
Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics
Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.
google.com/site/gaussianbhc
Using Pre-existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance
Although they have become a widely used experimental technique for identifying differentially expressed (DE) genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources; thus, analytical methods are needed which increase accuracy at no additional cost. One inexpensive source of microarray replicates comes from prior work: to date, data from hundreds of thousands of microarray experiments are in the public domain. Although these data assay a wide range of conditions, they cannot be used directly to inform any particular experiment and are thus ignored by most DE gene methods. We present the SVD Augmented Gene expression Analysis Tool (SAGAT), a mathematically principled, data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by using observed coexpression relationships from publicly available microarray datasets to reduce uncertainty in individual genes' expression measurements. We tested the method on three well-replicated human microarray datasets and demonstrate that use of SAGAT increased effective sample sizes by as many as 2.72 arrays. We applied SAGAT to unpublished data from a microarray study investigating transcriptional responses to insulin resistance, resulting in a 50% increase in the number of significant genes detected. We evaluated 11 (58%) of these genes experimentally using qPCR, confirming the directions of expression change for all 11 and statistical significance for three. Use of SAGAT revealed coherent biological changes in three pathways: inflammation, differentiation, and fatty acid synthesis, furthering our molecular understanding of a type 2 diabetes risk factor. We envision SAGAT as a means to maximize the potential for biological discovery from subtle transcriptional responses, and we provide it as a freely available software package that is immediately applicable to any human microarray study
Repression of Mitochondrial Translation, Respiration and a Metabolic Cycle-Regulated Gene, SLF1, by the Yeast Pumilio-Family Protein Puf3p
Synthesis and assembly of the mitochondrial oxidative phosphorylation (OXPHOS) system requires genes located both in the nuclear and mitochondrial genomes, but how gene expression is coordinated between these two compartments is not fully understood. One level of control is through regulated expression mitochondrial ribosomal proteins and other factors required for mitochondrial translation and OXPHOS assembly, which are all products of nuclear genes that are subsequently imported into mitochondria. Interestingly, this cadre of genes in budding yeast has in common a 3′-UTR element that is bound by the Pumilio family protein, Puf3p, and is coordinately regulated under many conditions, including during the yeast metabolic cycle. Multiple functions have been assigned to Puf3p, including promoting mRNA degradation, localizing nucleus-encoded mitochondrial transcripts to the outer mitochondrial membrane, and facilitating mitochondria-cytoskeletal interactions and motility. Here we show that Puf3p has a general repressive effect on mitochondrial OXPHOS abundance, translation, and respiration that does not involve changes in overall mitochondrial biogenesis and largely independent of TORC1-mitochondrial signaling. We also identified the cytoplasmic translation factor Slf1p as yeast metabolic cycle-regulated gene that is repressed by Puf3p at the post-transcriptional level and promotes respiration and extension of yeast chronological life span when over-expressed. Altogether, these results should facilitate future studies on which of the many functions of Puf3p is most relevant for regulating mitochondrial gene expression and the role of nuclear-mitochondrial communication in aging and longevity
Systematic Analysis of Pleiotropy in C. elegans Early Embryogenesis
Pleiotropy refers to the phenomenon in which a single gene controls several distinct, and seemingly unrelated, phenotypic effects. We use C. elegans early embryogenesis as a model to conduct systematic studies of pleiotropy. We analyze high-throughput RNA interference (RNAi) data from C. elegans and identify “phenotypic signatures”, which are sets of cellular defects indicative of certain biological functions. By matching phenotypic profiles to our identified signatures, we assign genes with complex phenotypic profiles to multiple functional classes. Overall, we observe that pleiotropy occurs extensively among genes involved in early embryogenesis, and a small proportion of these genes are highly pleiotropic. We hypothesize that genes involved in early embryogenesis are organized into partially overlapping functional modules, and that pleiotropic genes represent “connectors” between these modules. In support of this hypothesis, we find that highly pleiotropic genes tend to reside in central positions in protein-protein interaction networks, suggesting that pleiotropic genes act as connecting points between different protein complexes or pathways
Systematic Analysis of Pleiotropy in C. elegans Early Embryogenesis
Pleiotropy refers to the phenomenon in which a single gene controls several distinct, and seemingly unrelated, phenotypic effects. We use C. elegans early embryogenesis as a model to conduct systematic studies of pleiotropy. We analyze high-throughput RNA interference (RNAi) data from C. elegans and identify “phenotypic signatures”, which are sets of cellular defects indicative of certain biological functions. By matching phenotypic profiles to our identified signatures, we assign genes with complex phenotypic profiles to multiple functional classes. Overall, we observe that pleiotropy occurs extensively among genes involved in early embryogenesis, and a small proportion of these genes are highly pleiotropic. We hypothesize that genes involved in early embryogenesis are organized into partially overlapping functional modules, and that pleiotropic genes represent “connectors” between these modules. In support of this hypothesis, we find that highly pleiotropic genes tend to reside in central positions in protein-protein interaction networks, suggesting that pleiotropic genes act as connecting points between different protein complexes or pathways
Extracting expression modules from perturbational gene expression compendia
<p>Abstract</p> <p>Background</p> <p>Compendia of gene expression profiles under chemical and genetic perturbations constitute an invaluable resource from a systems biology perspective. However, the perturbational nature of such data imposes specific challenges on the computational methods used to analyze them. In particular, traditional clustering algorithms have difficulties in handling one of the prominent features of perturbational compendia, namely partial coexpression relationships between genes. Biclustering methods on the other hand are specifically designed to capture such partial coexpression patterns, but they show a variety of other drawbacks. For instance, some biclustering methods are less suited to identify overlapping biclusters, while others generate highly redundant biclusters. Also, none of the existing biclustering tools takes advantage of the staple of perturbational expression data analysis: the identification of differentially expressed genes.</p> <p>Results</p> <p>We introduce a novel method, called ENIGMA, that addresses some of these issues. ENIGMA leverages differential expression analysis results to extract expression modules from perturbational gene expression data. The core parameters of the ENIGMA clustering procedure are automatically optimized to reduce the redundancy between modules. In contrast to the biclusters produced by most other methods, ENIGMA modules may show internal substructure, i.e. subsets of genes with distinct but significantly related expression patterns. The grouping of these (often functionally) related patterns in one module greatly aids in the biological interpretation of the data. We show that ENIGMA outperforms other methods on artificial datasets, using a quality criterion that, unlike other criteria, can be used for algorithms that generate overlapping clusters and that can be modified to take redundancy between clusters into account. Finally, we apply ENIGMA to the Rosetta compendium of expression profiles for <it>Saccharomyces cerevisiae </it>and we analyze one pheromone response-related module in more detail, demonstrating the potential of ENIGMA to generate detailed predictions.</p> <p>Conclusion</p> <p>It is increasingly recognized that perturbational expression compendia are essential to identify the gene networks underlying cellular function, and efforts to build these for different organisms are currently underway. We show that ENIGMA constitutes a valuable addition to the repertoire of methods to analyze such data.</p
Extracting the abstraction pyramid from complex networks
<p>Abstract</p> <p>Background</p> <p>At present, the organization of system modules is typically limited to either a multilevel hierarchy that describes the "vertical" relationships between modules at different levels (e.g., module A at level two is included in module B at level one), or a single-level graph that represents the "horizontal" relationships among modules (e.g., genetic interactions between module A and module B). Both types of organizations fail to provide a broader and deeper view of the complex systems that arise from an integration of vertical and horizontal relationships.</p> <p>Results</p> <p>We propose a complex network analysis tool, Pyramabs, which was developed to integrate vertical and horizontal relationships and extract information at various granularities to create a pyramid from a complex system of interacting objects. The pyramid depicts the nested structure implied in a complex system, and shows the vertical relationships between abstract networks at different levels. In addition, at each level the abstract network of modules, which are connected by weighted links, represents the modules' horizontal relationships. We first tested Pyramabs on hierarchical random networks to verify its ability to find the module organization pre-embedded in the networks. We later tested it on a protein-protein interaction (PPI) network and a metabolic network. According to Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), the vertical relationships identified from the PPI and metabolic pathways correctly characterized the <it>inclusion </it>(i.e., <it>part-of</it>) relationship, and the horizontal relationships provided a good indication of the functional closeness between modules. Our experiments with Pyramabs demonstrated its ability to perform knowledge mining in complex systems.</p> <p>Conclusions</p> <p>Networks are a flexible and convenient method of representing interactions in a complex system, and an increasing amount of information in real-world situations is described by complex networks. We considered the analysis of a complex network as an iterative process for extracting meaningful information at multiple granularities from a system of interacting objects. The quality of the interpretation of the networks depends on the completeness and expressiveness of the extracted knowledge representations. Pyramabs was designed to interpret a complex network through a disclosure of a pyramid of abstractions. The abstraction pyramid is a new knowledge representation that combines vertical and horizontal viewpoints at different degrees of abstraction. Interpretations in this form are more accurate and more meaningful than multilevel dendrograms or single-level graphs. Pyramabs can be accessed at <url>http://140.113.166.165/pyramabs.php/</url>.</p
Consistency analysis of metabolic correlation networks
<p>Abstract</p> <p>Background</p> <p>Metabolic correlation networks are derived from the covariance of metabolites in replicates of metabolomics experiments. They constitute an interesting intermediate between topology (i.e. the system's architecture defined by the set of reactions between metabolites) and dynamics (i.e. the metabolic concentrations observed as fluctuations around steady-state values in the metabolic network).</p> <p>Results</p> <p>Here we analyze, how such a correlation network changes over time, and compare the relative positions of metabolites in the correlation networks with those in established metabolic networks derived from genome databases. We find that network similarity indeed decreases with an increasing time difference between these networks during a day/night course and, counter intuitively, that proximity of metabolites in the correlation network is no indicator of proximity of the metabolites in the metabolic network.</p> <p>Conclusion</p> <p>The organizing principles of correlation networks are distinct from those of metabolic reaction maps. Time courses of correlation networks may in the future prove an important data source for understanding these organizing principles.</p
The home environment and childhood obesity in low-income households: indirect effects via sleep duration and screen time
Background
Childhood obesity disproportionally affects children from low-income households. With the aim of informing interventions, this study examined pathways through which the physical and social home environment may promote childhood overweight/obesity in low-income households.
Methods
Data on health behaviors and the home environment were collected at home visits in low-income, urban households with either only normal weight (n = 48) or predominantly overweight/obese (n = 55) children aged 6–13 years. Research staff conducted comprehensive, in-person audits of the foods, media, and sports equipment in each household. Anthropometric measurements were collected, and children’s physical activity was assessed through accelerometry. Caregivers and children jointly reported on child sleep duration, screen time, and dietary intake of foods previously implicated in childhood obesity risk. Path analysis was used to test direct and indirect associations between the home environment and child weight status via the health behaviors assessed.
Results
Sleep duration was the only health behavior associated with child weight status (OR = 0.45, 95% CI: 0.27, 0.77), with normal weight children sleeping 33.3 minutes/day longer on average than overweight/obese children. The best-fitting path model explained 26% of variance in child weight status, and included paths linking chaos in the home environment, lower caregiver screen time monitoring, inconsistent implementation of bedtime routines, and the presence of a television in children’s bedrooms to childhood overweight/obesity through effects on screen time and sleep duration.
Conclusions
This study adds to the existing literature by identifying aspects of the home environment that influence childhood weight status via indirect effects on screen time and sleep duration in children from low-income households. Pediatric weight management interventions for low-income households may be improved by targeting aspects of the physical and social home environment associated with sleep
Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets
Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes
- …