347 research outputs found
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function
Picking Pyknons out of the Human Genome
In a recent paper in PNAS, Rigoutsos et al. (2006) describe a nonrandom pattern of repeated elements, called pyknons, which are found more frequently in the 3′ untranslated regions of genes than in other regions of the human genome. Although it is unclear how pyknons might have arisen, it is possible that they may be involved in a new form of gene regulation
Big Data in Biology: How EMBL delivers big data for biology, and some highlights of its application to human disease biology
Molecular biology is now a leading example of a data intensive science, with both pragmatic
and theoretical challenges being raised by data volumes and dimensionality of the data.
These changes are present in both “large scale” consortia science and small scale science,
and across now a broad range of applications – from human health, through to agriculture
and ecosystems. All of molecular life science is feeling this effect. The European Molecular
Biology Laboratory (EMBL) – Europe’s only intergovernmental research organisation in
the life sciences is at the forefront of these developments performing both excellent
research and providing world leading services to enable science across Europe.
This shift in modality is creating a wealth of new opportunities and has some accompanying
challenges. In particular there is a continued need for a robust information infrastructure
for molecular biology. This ranges from the physical aspects of dealing with data volume
through to the more statistically challenging aspects of interpreting it. A particular
problem is finding causal relationships in the high level of correlative data. Genetic data
are particular useful in resolving these issues. I will present how EMBL pursues this
science and give examples from my own research that spans human genetics research
through to partnering for clinical application.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202
Automated generation of heuristics for biological sequence comparison
BACKGROUND: Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to implement. We introduce bounded sparse dynamic programming (BSDP) to allow rapid approximation to exhaustive alignment. This is used within a framework whereby the alignment algorithms are described in terms of their underlying model, to allow automated development of efficient heuristic implementations which may be applied to a general set of sequence comparison problems. RESULTS: The speed and accuracy of this approach compares favourably with existing methods. Examples of its use in the context of genome annotation are given. CONCLUSIONS: This system allows rapid implementation of heuristics approximating to many complex alignment models, and has been incorporated into the freely available sequence alignment program, exonerate
Genomic information infrastructure after the deluge
Maintaining up-to-date annotation on reference genomes is becoming more important, not less, as the ability to rapidly and cheaply resequence genomes expands
The consequence of natural selection on genetic variation in the mouse
AbstractLaboratory mouse strains are known to have emerged from recent interbreeding between individuals of Mus musculus isolated populations. As a result of this breeding history, the collection of polymorphisms observed between laboratory mouse strains is likely to harbor the effects of natural selection between reproductively isolated populations. Until now no study has systematically investigated the consequences of this breeding history on gene evolution. Here we have used a novel, unbiased evolutionary approach to predict the founder origin of laboratory mouse strains and to assess the balance between ancient and newly emerged mutations in the founder subspecies. Our results confirm a contribution from at least four distinct subspecies. Additionally, our method allowed us to identify regions of relaxed selective constraint among laboratory mouse strains. This unique structure of variation is likely to have significant consequences on the use of mouse to find genes underlying phenotypic variation
Epigenome-wide Association Studies and the Interpretation of Disease -Omics
Epigenome-wide association studies represent one means of applying genome-wide assays to identify molecular events that could be associated with human phenotypes. The epigenome is especially intriguing as a target for study, as epigenetic regulatory processes are, by definition, heritable from parent to daughter cells and are found to have transcriptional regulatory properties. As such, the epigenome is an attractive candidate for mediating long-term responses to cellular stimuli, such as environmental effects modifying disease risk. Such epigenomic studies represent a broader category of disease -omics, which suffer from multiple problems in design and execution that severely limit their interpretability. Here we define many of the problems with current epigenomic studies and propose solutions that can be applied to allow this and other disease -omics studies to achieve their potential for generating valuable insights
Considerations for the inclusion of 2x mammalian genomes in phylogenetic analyses
Comment on Milinkovitch et al.: http://genomebiology.com/2010/11/2/R1
Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels
Motivation: High-throughput sequencing has made the analysis of new model organisms more affordable. Although assembling a new genome can still be costly and difficult, it is possible to use RNA-seq to sequence mRNA. In the absence of a known genome, it is necessary to assemble these sequences de novo, taking into account possible alternative isoforms and the dynamic range of expression values
- …