Search CORE

1,452 research outputs found

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

Author: The ENCODE Project Consortium
Publication venue
Publication date: 01/01/2011
Field of study

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome

Carolina Digital Repository

Novel rheumatoid arthritis susceptibility locus at 22q12 identified in an extended UK genome-wide association study

Author: Anderson
Barrett
ENCODE Project Consortium
Franke
Howie
Imielinski
Li
Martin
Mells
Orozco
Purcell
Sawcer
Stahl
Viatte
Wellcome Trust Case Control Consortium
Zhernakova
Publication venue: 'Wiley'
Publication date: 30/12/2013
Field of study

Aberdeen University Research

Crossref

PubMed Central

The University of Manchester - Institutional Repository

King's Research Portal

Perspectives on ENCODE

Author: ENCODE Project Consortium
Moore Jill E.
Snyder Michael P.
Weng Zhiping
Publication venue: eScholarship@UMassChan
Publication date: 29/07/2020
Field of study

The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression(1). The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community

eScholarship@UMMS

Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction

Author: Artemis Hatzigeorgiou
Axel Bernal
David Haussler
ENCODE Project Consortium
Fernando Pereira
Koby Crammer
Publication venue: Public Library of Science
Publication date: 01/03/2007
Field of study

Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

ScholarlyCommons@Penn

Identification and Analysis of Genes and Pseudogenes within Duplicated Regions in the Human and Mouse Genomes

Author: David Torrents
ENCODE Project Consortium
Eoghan Harrington
MGSC
Mikita Suyama
Peer Bork
Roderic Guigo
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

Hardy-Weinberg Equilibrium Testing of Biological Ascertainment for Mendelian Randomization Studies

Author: Cupples
Davey-Smith
Gu
Guarnieri
Hardy
Hedrick
Hingorani
Ian N. M. Day
Kavvoura
Santiago Rodriguez
The ENCODE Project Consortium
The International HapMap Consortium
The Wellcome Trust Case Control Consortium
Tom R. Gaunt
Weinberg
Publication venue: Oxford University Press
Publication date: 15/02/2009
Field of study

Mendelian randomization (MR) permits causal inference between exposures and a disease. It can be compared with randomized controlled trials. Whereas in a randomized controlled trial the randomization occurs at entry into the trial, in MR the randomization occurs during gamete formation and conception. Several factors, including time since conception and sampling variation, are relevant to the interpretation of an MR test. Particularly important is consideration of the “missingness” of genotypes that can be originated by chance, genotyping errors, or clinical ascertainment. Testing for Hardy-Weinberg equilibrium (HWE) is a genetic approach that permits evaluation of missingness. In this paper, the authors demonstrate evidence of nonconformity with HWE in real data. They also perform simulations to characterize the sensitivity of HWE tests to missingness. Unresolved missingness could lead to a false rejection of causality in an MR investigation of trait-disease association. These results indicate that large-scale studies, very high quality genotyping data, and detailed knowledge of the life-course genetics of the alleles/genotypes studied will largely mitigate this risk. The authors also present a Web program (http://www.oege.org/software/hwe-mr-calc.shtml) for estimating possible missingness and an approach to evaluating missingness under different genetic models

Crossref

PubMed Central

Explore Bristol Research

A User\u27s Guide to the Encyclopedia of DNA Elements (ENCODE)

Author: Dekker Job
ENCODE Project Consortium
Lajoie Bryan R.
Meyers Richard M.
Sanyal Amartya
Wang Jie
Weng Zhiping
Whitfield Troy W.
Publication venue: eScholarship@UMassChan
Publication date: 19/04/2011
Field of study

eScholarship@UMMS

Modeling associations between genetic markers using Bayesian networks

Author: Altshuler
Browning
C. D. Maciel
E. Villanueva
Liu
Mueller
Nothnagel
Pritchard
Scheet
The ENCODE Project Consortium
Thomas
Thomas
Tishkoff
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Understanding the patterns of association between polymorphisms at different loci in a population (linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging

Crossref

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Novel Bayes Factors That Capture Expert Uncertainty in Prior Density Specification in Genetic Association Studies.

Author: Boyle
Encode Project Consortium
Fawcett
Kass
Lee
Maller
Marchini
Marchini
Michailidou
Morris
Slager
Spencer
Spencer
Stephens
Udler
Verdinelli
Vignal
Wakefield
Wakefield
Wang
Publication venue: 'Wiley'
Publication date: 27/02/2015
Field of study

Bayes factors (BFs) are becoming increasingly important tools in genetic association studies, partly because they provide a natural framework for including prior information. The Wakefield BF (WBF) approximation is easy to calculate and assumes a normal prior on the log odds ratio (logOR) with a mean of zero. However, the prior variance (W) must be specified. Because of the potentially high sensitivity of the WBF to the choice of W, we propose several new BF approximations with logOR ∼N(0,W), but allow W to take a probability distribution rather than a fixed value. We provide several prior distributions for W which lead to BFs that can be calculated easily in freely available software packages. These priors allow a wide range of densities for W and provide considerable flexibility. We examine some properties of the priors and BFs and show how to determine the most appropriate prior based on elicited quantiles of the prior odds ratio (OR). We show by simulation that our novel BFs have superior true-positive rates at low false-positive rates compared to those from both P-value and WBF analyses across a range of sample sizes and ORs. We give an example of utilizing our BFs to fine-map the CASP8 region using genotype data on approximately 46,000 breast cancer case and 43,000 healthy control samples from the Collaborative Oncological Gene-environment Study (COGS) Consortium, and compare the single-nucleotide polymorphism ranks to those obtained using WBFs and P-values from univariate logistic regression

Crossref

White Rose Research Online

Modeling associations between genetic markers using Bayesian networks

Author: Altshuler
Browning
C. D. Maciel
E. Villanueva
Liu
Mueller
Nothnagel
Pritchard
Scheet
The ENCODE Project Consortium
Thomas
Thomas
Tishkoff
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Crossref

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo