106 research outputs found

    Population genomics of domestic and wild yeasts

    Get PDF
    The natural genetics of an organism is determined by the distribution of sequences of its genome. Here we present one- to four-fold, with some deeper, coverage of the genome sequences of over seventy isolates of the domesticated baker's yeast, _Saccharomyces cerevisiae_, and its closest relative, the wild _S. paradoxus_, which has never been associated with human activity. These were collected from numerous geographic locations and sources (including wild, clinical, baking, wine, laboratory and food spoilage). These sequences provide an unprecedented view of the population structure, natural (and artificial) selection and genome evolution in these species. Variation in gene content, SNPs, indels, copy numbers and transposable elements provide insights into the evolution of different lineages. Phenotypic variation broadly correlates with global genome-wide phylogenetic relationships however there is no correlation with source. _S. paradoxus_ populations are well delineated along geographic boundaries while the variation among worldwide _S. cerevisiae_ isolates show less differentiation and is comparable to a single _S. paradoxus_ population. Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of _S. cerevisiae_ shows a few well defined geographically isolated lineages and many different mosaics of these lineages, supporting the notion that human influence provided the opportunity for outbreeding and production of new combinations of pre-existing variation

    MCL-CAw: A refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating core-attachment structure

    Get PDF
    Abstract Background The reconstruction of protein complexes from the physical interactome of organisms serves as a building block towards understanding the higher level organization of the cell. Over the past few years, several independent high-throughput experiments have helped to catalogue enormous amount of physical protein interaction data from organisms such as yeast. However, these individual datasets show lack of correlation with each other and also contain substantial number of false positives (noise). Over these years, several affinity scoring schemes have also been devised to improve the qualities of these datasets. Therefore, the challenge now is to detect meaningful as well as novel complexes from protein interaction (PPI) networks derived by combining datasets from multiple sources and by making use of these affinity scoring schemes. In the attempt towards tackling this challenge, the Markov Clustering algorithm (MCL) has proved to be a popular and reasonably successful method, mainly due to its scalability, robustness, and ability to work on scored (weighted) networks. However, MCL produces many noisy clusters, which either do not match known complexes or have additional proteins that reduce the accuracies of correctly predicted complexes. Results Inspired by recent experimental observations by Gavin and colleagues on the modularity structure in yeast complexes and the distinctive properties of "core" and "attachment" proteins, we develop a core-attachment based refinement method coupled to MCL for reconstruction of yeast complexes from scored (weighted) PPI networks. We combine physical interactions from two recent "pull-down" experiments to generate an unscored PPI network. We then score this network using available affinity scoring schemes to generate multiple scored PPI networks. The evaluation of our method (called MCL-CAw) on these networks shows that: (i) MCL-CAw derives larger number of yeast complexes and with better accuracies than MCL, particularly in the presence of natural noise; (ii) Affinity scoring can effectively reduce the impact of noise on MCL-CAw and thereby improve the quality (precision and recall) of its predicted complexes; (iii) MCL-CAw responds well to most available scoring schemes. We discuss several instances where MCL-CAw was successful in deriving meaningful complexes, and where it missed a few proteins or whole complexes due to affinity scoring of the networks. We compare MCL-CAw with several recent complex detection algorithms on unscored and scored networks, and assess the relative performance of the algorithms on these networks. Further, we study the impact of augmenting physical datasets with computationally inferred interactions for complex detection. Finally, we analyse the essentiality of proteins within predicted complexes to understand a possible correlation between protein essentiality and their ability to form complexes. Conclusions We demonstrate that core-attachment based refinement in MCL-CAw improves the predictions of MCL on yeast PPI networks. We show that affinity scoring improves the performance of MCL-CAw.http://deepblue.lib.umich.edu/bitstream/2027.42/78256/1/1471-2105-11-504.xmlhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/2/1471-2105-11-504-S1.PDFhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/3/1471-2105-11-504-S2.ZIPhttp://deepblue.lib.umich.edu/bitstream/2027.42/78256/4/1471-2105-11-504.pdfPeer Reviewe

    Multiple organism algorithm for finding ultraconserved elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ultraconserved elements are nucleotide or protein sequences with 100% identity (no mismatches, insertions, or deletions) in the same organism or between two or more organisms. Studies indicate that these conserved regions are associated with micro RNAs, mRNA processing, development and transcription regulation. The identification and characterization of these elements among genomes is necessary for the further understanding of their functionality.</p> <p>Results</p> <p>We describe an algorithm and provide freely available software which can find all of the ultraconserved sequences between genomes of multiple organisms. Our algorithm takes a combinatorial approach that finds all sequences without requiring the genomes to be aligned. The algorithm is significantly faster than BLAST and is designed to handle very large genomes efficiently. We ran our algorithm on several large comparative analyses to evaluate its effectiveness; one compared 17 vertebrate genomes where we find 123 ultraconserved elements longer than 40 bps shared by all of the organisms, and another compared the human body louse, <it>Pediculus humanus humanus</it>, against itself and select insects to find thousands of non-coding, potentially functional sequences.</p> <p>Conclusion</p> <p>Whole genome comparative analysis for multiple organisms is both feasible and desirable in our search for biological knowledge. We argue that bioinformatic programs should be forward thinking by assuming analysis on multiple (and possibly large) genomes in the design and implementation of algorithms. Our algorithm shows how a compromise design with a trade-off of disk space versus memory space allows for efficient computation while only requiring modest computer resources, and at the same time providing benefits not available with other software.</p

    PETALS: Proteomic Evaluation and Topological Analysis of a mutated Locus' Signaling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Colon cancer is driven by mutations in a number of genes, the most notorious of which is <it>Apc</it>. Though much of <it>Apc</it>'s signaling has been mechanistically identified over the years, it is not always clear which functions or interactions are operative in a particular tumor. This is confounded by the presence of mutations in a number of other putative cancer driver (CAN) genes, which often synergize with mutations in <it>Apc</it>.</p> <p>Computational methods are, thus, required to predict which pathways are likely to be operative when a particular mutation in <it>Apc </it>is observed.</p> <p>Results</p> <p>We developed a pipeline, PETALS, to predict and test likely signaling pathways connecting <it>Apc </it>to other CAN-genes, where the interaction network originating at <it>Apc </it>is defined as a "blossom," with each <it>Apc</it>-CAN-gene subnetwork referred to as a "petal." Known and predicted protein interactions are used to identify an Apc blossom with 24 petals. Then, using a novel measure of bimodality, the coexpression of each petal is evaluated against proteomic (2 D differential In Gel Electrophoresis, 2D-DIGE) measurements from the <it>Apc</it><sup><it>1638N</it>+/-</sup>mouse to test the network-based hypotheses.</p> <p>Conclusions</p> <p>The predicted pathways linking <it>Apc </it>and <it>Hapln1 </it>exhibited the highest amount of bimodal coexpression with the proteomic targets, prioritizing the <it>Apc-Hapln1 </it>petal over other CAN-gene pairs and suggesting that this petal may be involved in regulating the observed proteome-level effects. These results not only demonstrate how functional 'omics data can be employed to test in <it>silico </it>predictions of CAN-gene pathways, but also reveal an approach to integrate models of upstream genetic interference with measured, downstream effects.</p

    Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology

    Get PDF
    Genome-wide association studies have uncovered hundreds of DNA changes associated with complex disease. The ultimate promise of these studies is the understanding of disease biology; this goal, however, is not easily achieved because each disease has yielded numerous associations, each one pointing to a region of the genome, rather than a specific causal mutation. Presumably, the causal variants affect components of common molecular processes, and a first step in understanding the disease biology perturbed in patients is to identify connections among regions associated to disease. Since it has been reported in numerous Mendelian diseases that protein products of causal genes tend to physically bind each other, we chose to approach this problem using known protein–protein interactions to test whether any of the products of genes in five complex trait-associated loci bind each other. We applied several permutation methods and find robustly significant connectivity within four of the traits. In Crohn's disease and rheumatoid arthritis, we are able to show that these genes are co-expressed and that other proteins emerging in the network are enriched for association to disease. These findings suggest that, for the complex traits studied here, associated loci contain variants that affect common molecular processes, rather than distinct mechanisms specific to each association.Massachusetts Institute of Technology (MIT IDEA2 Program)Harvard University. Biological and Biomedical Sciences ProgramEunice Kennedy Shriver National Institute of Child Health and Human Development (U.S.) (NICHD RO1 grant HD055150-03)National Institute of Arthritis and Musculoskeletal and Skin Diseases (U.S.) (K08 NIH-NIAMS career development award (AR055688))National Institute of Diabetes and Digestive and Kidney Diseases (U.S.) (DK083756)National Institute of Diabetes and Digestive and Kidney Diseases (U.S.) (DK086502)Denmark. Forskningsradet for Sundhed og SygdomCenter for the Study of Inflammatory Bowel Diseas

    Genetics Meets Metabolomics: A Genome-Wide Association Study of Metabolite Profiles in Human Serum

    Get PDF
    The rapidly evolving field of metabolomics aims at a comprehensive measurement of ideally all endogenous metabolites in a cell or body fluid. It thereby provides a functional readout of the physiological state of the human body. Genetic variants that associate with changes in the homeostasis of key lipids, carbohydrates, or amino acids are not only expected to display much larger effect sizes due to their direct involvement in metabolite conversion modification, but should also provide access to the biochemical context of such variations, in particular when enzyme coding genes are concerned. To test this hypothesis, we conducted what is, to the best of our knowledge, the first GWA study with metabolomics based on the quantitative measurement of 363 metabolites in serum of 284 male participants of the KORA study. We found associations of frequent single nucleotide polymorphisms (SNPs) with considerable differences in the metabolic homeostasis of the human body, explaining up to 12% of the observed variance. Using ratios of certain metabolite concentrations as a proxy for enzymatic activity, up to 28% of the variance can be explained (p-values 10−16 to 10−21). We identified four genetic variants in genes coding for enzymes (FADS1, LIPC, SCAD, MCAD) where the corresponding metabolic phenotype (metabotype) clearly matches the biochemical pathways in which these enzymes are active. Our results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population. This may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization. These genetically determined metabotypes may subscribe the risk for a certain medical phenotype, the response to a given drug treatment, or the reaction to a nutritional intervention or environmental challenge

    Identifying Hubs in Protein Interaction Networks

    Get PDF
    In spite of the scale-free degree distribution that characterizes most protein interaction networks (PINs), it is common to define an ad hoc degree scale that defines "hub" proteins having special topological and functional significance. This raises the concern that some conclusions on the functional significance of proteins based on network properties may not be robust.In this paper we present three objective methods to define hub proteins in PINs: one is a purely topological method and two others are based on gene expression and function. By applying these methods to four distinct PINs, we examine the extent of agreement among these methods and implications of these results on network construction.We find that the methods agree well for networks that contain a balance between error-free and unbiased interactions, indicating that the hub concept is meaningful for such networks

    Systematic identification of yeast cell cycle transcription factors using multiple data sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Eukaryotic cell cycle is a complex process and is precisely regulated at many levels. Many genes specific to the cell cycle are regulated transcriptionally and are expressed just before they are needed. To understand the cell cycle process, it is important to identify the cell cycle transcription factors (TFs) that regulate the expression of cell cycle-regulated genes.</p> <p>Results</p> <p>We developed a method to identify cell cycle TFs in yeast by integrating current ChIP-chip, mutant, transcription factor binding site (TFBS), and cell cycle gene expression data. We identified 17 cell cycle TFs, 12 of which are known cell cycle TFs, while the remaining five (Ash1, Rlm1, Ste12, Stp1, Tec1) are putative novel cell cycle TFs. For each cell cycle TF, we assigned specific cell cycle phases in which the TF functions and identified the time lag for the TF to exert regulatory effects on its target genes. We also identified 178 novel cell cycle-regulated genes, among which 59 have unknown functions, but they may now be annotated as cell cycle-regulated genes. Most of our predictions are supported by previous experimental or computational studies. Furthermore, a high confidence TF-gene regulatory matrix is derived as a byproduct of our method. Each TF-gene regulatory relationship in this matrix is supported by at least three data sources: gene expression, TFBS, and ChIP-chip or/and mutant data. We show that our method performs better than four existing methods for identifying yeast cell cycle TFs. Finally, an application of our method to different cell cycle gene expression datasets suggests that our method is robust.</p> <p>Conclusion</p> <p>Our method is effective for identifying yeast cell cycle TFs and cell cycle-regulated genes. Many of our predictions are validated by the literature. Our study shows that integrating multiple data sources is a powerful approach to studying complex biological systems.</p

    Design of a randomized controlled trial of physical training and cancer (Phys-Can) – the impact of exercise intensity on cancer related fatigue, quality of life and disease outcome

    Get PDF
    Background: Cancer-related fatigue is a common problem in persons with cancer, influencing health-related quality of life and causing a considerable challenge to society. Current evidence supports the beneficial effects of physical exercise in reducing fatigue, but the results across studies are not consistent, especially in terms of exercise intensity. It is also unclear whether use of behaviour change techniques can further increase exercise adherence and maintain physical activity behaviour. This study will investigate whether exercise intensity affects fatigue and health related quality of life in persons undergoing adjuvant cancer treatment. In addition, to examine effects of exercise intensity on mood disturbance, adherence to oncological treatment, adverse effects from treatment, activities of daily living after treatment completion and return to work, and behaviour change techniques effect on exercise adherence. We will also investigate whether exercise intensity influences inflammatory markers and cytokines, and whether gene expressions following training serve as mediators for the effects of exercise on fatigue and health related quality of life. Methods/design: Six hundred newly diagnosed persons with breast, colorectal or prostate cancer undergoing adjuvant therapy will be randomized in a 2 × 2 factorial design to following conditions; A) individually tailored low-to-moderate intensity exercise with or without behaviour change techniques or B) individually tailored high intensity exercise with or without behaviour change techniques. The training consists of both resistance and endurance exercise sessions under the guidance of trained coaches. The primary outcomes, fatigue and health related quality of life, are measured by self-reports. Secondary outcomes include fitness, mood disturbance, adherence to the cancer treatment, adverse effects, return to activities of daily living after completed treatment, return to work as well as inflammatory markers, cytokines and gene expression. Discussion: The study will contribute to our understanding of the value of exercise and exercise intensity in reducing fatigue and improving health related quality of life and, potentially, clinical outcomes. The value of behaviour change techniques in terms of adherence to and maintenance of physical exercise behaviour in persons with cancer will be evaluated

    Identification of InuR, a new Zn(II)2Cys6 transcriptional activator involved in the regulation of inulinolytic genes in Aspergillus niger

    Get PDF
    The expression of inulinolytic genes in Aspergillus niger is co-regulated and induced by inulin and sucrose. We have identified a positive acting transcription factor InuR, which is required for the induced expression of inulinolytic genes. InuR is a member of the fungal specific class of transcription factors of the Zn(II)2Cys6 type. Involvement of InuR in inulin and sucrose metabolism was suspected because of the clustering of inuR gene with sucB, which encodes an intracellular invertase with transfructosylation activity and a putative sugar transporter encoding gene (An15g00310). Deletion of the inuR gene resulted in a strain displaying a severe reduction in growth on inulin and sucrose medium. Northern analysis revealed that expression of inulinolytic and sucrolytic genes, e.g., inuE, inuA, sucA, as well as the putative sugar transporter gene (An15g00310) is dependent on InuR. Genome-wide expression analysis revealed, three additional putative sugar transporters encoding genes (An15g04060, An15g03940 and An17g01710), which were strongly induced by sucrose in an InuR dependent way. In silico analysis of the promoter sequences of strongly InuR regulated genes suggests that InuR might bind as dimer to two CGG triplets, which are separated by eight nucleotides
    corecore