337 research outputs found

    Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

    Get PDF
    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function

    Picking Pyknons out of the Human Genome

    Get PDF
    In a recent paper in PNAS, Rigoutsos et al. (2006) describe a nonrandom pattern of repeated elements, called pyknons, which are found more frequently in the 3′ untranslated regions of genes than in other regions of the human genome. Although it is unclear how pyknons might have arisen, it is possible that they may be involved in a new form of gene regulation

    Big Data in Biology: How EMBL delivers big data for biology, and some highlights of its application to human disease biology

    Get PDF
    Molecular biology is now a leading example of a data intensive science, with both pragmatic and theoretical challenges being raised by data volumes and dimensionality of the data. These changes are present in both “large scale” consortia science and small scale science, and across now a broad range of applications – from human health, through to agriculture and ecosystems. All of molecular life science is feeling this effect. The European Molecular Biology Laboratory (EMBL) – Europe’s only intergovernmental research organisation in the life sciences is at the forefront of these developments performing both excellent research and providing world leading services to enable science across Europe. This shift in modality is creating a wealth of new opportunities and has some accompanying challenges. In particular there is a continued need for a robust information infrastructure for molecular biology. This ranges from the physical aspects of dealing with data volume through to the more statistically challenging aspects of interpreting it. A particular problem is finding causal relationships in the high level of correlative data. Genetic data are particular useful in resolving these issues. I will present how EMBL pursues this science and give examples from my own research that spans human genetics research through to partnering for clinical application.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202

    Automated generation of heuristics for biological sequence comparison

    Get PDF
    BACKGROUND: Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to implement. We introduce bounded sparse dynamic programming (BSDP) to allow rapid approximation to exhaustive alignment. This is used within a framework whereby the alignment algorithms are described in terms of their underlying model, to allow automated development of efficient heuristic implementations which may be applied to a general set of sequence comparison problems. RESULTS: The speed and accuracy of this approach compares favourably with existing methods. Examples of its use in the context of genome annotation are given. CONCLUSIONS: This system allows rapid implementation of heuristics approximating to many complex alignment models, and has been incorporated into the freely available sequence alignment program, exonerate

    Genomic information infrastructure after the deluge

    Get PDF
    Maintaining up-to-date annotation on reference genomes is becoming more important, not less, as the ability to rapidly and cheaply resequence genomes expands

    The consequence of natural selection on genetic variation in the mouse

    Get PDF
    AbstractLaboratory mouse strains are known to have emerged from recent interbreeding between individuals of Mus musculus isolated populations. As a result of this breeding history, the collection of polymorphisms observed between laboratory mouse strains is likely to harbor the effects of natural selection between reproductively isolated populations. Until now no study has systematically investigated the consequences of this breeding history on gene evolution. Here we have used a novel, unbiased evolutionary approach to predict the founder origin of laboratory mouse strains and to assess the balance between ancient and newly emerged mutations in the founder subspecies. Our results confirm a contribution from at least four distinct subspecies. Additionally, our method allowed us to identify regions of relaxed selective constraint among laboratory mouse strains. This unique structure of variation is likely to have significant consequences on the use of mouse to find genes underlying phenotypic variation

    Epigenome-wide Association Studies and the Interpretation of Disease -Omics

    Get PDF
    Epigenome-wide association studies represent one means of applying genome-wide assays to identify molecular events that could be associated with human phenotypes. The epigenome is especially intriguing as a target for study, as epigenetic regulatory processes are, by definition, heritable from parent to daughter cells and are found to have transcriptional regulatory properties. As such, the epigenome is an attractive candidate for mediating long-term responses to cellular stimuli, such as environmental effects modifying disease risk. Such epigenomic studies represent a broader category of disease -omics, which suffer from multiple problems in design and execution that severely limit their interpretability. Here we define many of the problems with current epigenomic studies and propose solutions that can be applied to allow this and other disease -omics studies to achieve their potential for generating valuable insights

    Considerations for the inclusion of 2x mammalian genomes in phylogenetic analyses

    Get PDF
    Comment on Milinkovitch et al.: http://genomebiology.com/2010/11/2/R1

    Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels

    Get PDF
    Motivation: High-throughput sequencing has made the analysis of new model organisms more affordable. Although assembling a new genome can still be costly and difficult, it is possible to use RNA-seq to sequence mRNA. In the absence of a known genome, it is necessary to assemble these sequences de novo, taking into account possible alternative isoforms and the dynamic range of expression values
    corecore