Search CORE

1,471 research outputs found

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

Author: The ENCODE Project Consortium
Publication venue
Publication date: 01/01/2011
Field of study

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome

Carolina Digital Repository

Novel Bayes Factors That Capture Expert Uncertainty in Prior Density Specification in Genetic Association Studies.

Author: Boyle
Encode Project Consortium
Fawcett
Kass
Lee
Maller
Marchini
Marchini
Michailidou
Morris
Slager
Spencer
Spencer
Stephens
Udler
Verdinelli
Vignal
Wakefield
Wakefield
Wang
Publication venue: 'Wiley'
Publication date: 27/02/2015
Field of study

Bayes factors (BFs) are becoming increasingly important tools in genetic association studies, partly because they provide a natural framework for including prior information. The Wakefield BF (WBF) approximation is easy to calculate and assumes a normal prior on the log odds ratio (logOR) with a mean of zero. However, the prior variance (W) must be specified. Because of the potentially high sensitivity of the WBF to the choice of W, we propose several new BF approximations with logOR ∼N(0,W), but allow W to take a probability distribution rather than a fixed value. We provide several prior distributions for W which lead to BFs that can be calculated easily in freely available software packages. These priors allow a wide range of densities for W and provide considerable flexibility. We examine some properties of the priors and BFs and show how to determine the most appropriate prior based on elicited quantiles of the prior odds ratio (OR). We show by simulation that our novel BFs have superior true-positive rates at low false-positive rates compared to those from both P-value and WBF analyses across a range of sample sizes and ORs. We give an example of utilizing our BFs to fine-map the CASP8 region using genotype data on approximately 46,000 breast cancer case and 43,000 healthy control samples from the Collaborative Oncological Gene-environment Study (COGS) Consortium, and compare the single-nucleotide polymorphism ranks to those obtained using WBFs and P-values from univariate logistic regression

Crossref

White Rose Research Online

MSR1 repeats modulate gene expression and affect risk of breast and prostate cancer

Author: Batra
Bhavsar
Borgoño
Cullen
Doolittle
Eatemadi
Ecker
Emami
ENCODE Project Consortium
Graur
Grimwood
Jurka
Jurka
Karolchik
Kerr
Lai
Lose
McLean
Rabien
Rose
Wheeler
Wheeler
Wong
Yousef
Yousef
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

[Background] MSR1 repeats are a 36–38 bp minisatellite element that have recently been implicated in the regulation of gene expression, through copy number variation (CNV).[Patients and methods] Bioinformatic and experimental methods were used to assess the distribution of MSR1 across the genome, evaluate the regulatory potential of such elements and explore the role of MSR1 elements in cancer, particularly non-familial breast cancer and prostate cancer.[Results] MSR1s are predominately located at chromosome 19 and are functionally enriched in regulatory regions of the genome, particularly regions implicated in short-range regulatory activities (H3K27ac, H3K4me1 and H3K4me3). MSR1-regulated genes were found to have specific molecular roles, such as serine-protease activity (P = 4.80 × 10−7) and ion channel activity (P = 2.7 × 10−4). The kallikrein locus was found to contain a large number of MSR1 clusters, and at least six of these showed CNV. An MSR1 cluster was identified within KLK14, with 9 and 11 copies being normal variants. A significant association with the 9-copy allele and non-familial breast cancer was found in two independent populations (P = 0.004; P = 0.03). In the white British population, the minor allele conferred an increased risk of 1.21–3.51 times for all non-familial disease, or 1.7–5.3 times in early-onset disease. The 9-copy allele was also found to be associated with increased risk of prostate cancer in an independent population (odds ratio = 1.27–1.56; P =0.009).[Conclusions] MSR1 repeats act as molecular switches that modulate gene expression. It is likely that CNV of MSR1 will affect risk of development of various forms of cancer, including that of breast and prostate. The MSR1 cluster at KLK14 represents the strongest risk factor identified to date in non-familial breast cancer and a significant risk factor for prostate cancer. Analysis of MSR1 genotype will allow development of precise stratification of disease risk and provide a novel target for therapeutic agents.Prostate cancer study is supported by an National Health and Medical Research Council (NHMRC) grant and Career Development Fellowship APP1090505 to JB. The Australian Prostate Cancer BioResource is supported by the NHMRC Enabling Grant APP614296 and by a grant from the Prostate Cancer Foundation, Australia.Peer reviewe

Crossref

ACU Research Bank

OPUS - University of Technology Sydney

Queensland University of Technology ePrints Archive

UCL Discovery

Griffith Research Online

Digital.CSIC

Discovery Research Portal

RISalud-ANDALUCÍA

Genome-wide associations of gene expression variation in humans

Author: Andrew G Clark
Barbara E Stranger
Brenda Kahl
David Allison
Emmanouil T Dermitzakis
ENCODE Project Consortium
Mark J Minichiello
Matthew S Forrest
Panagiotis Deloukas
Robert Lyle
Samuel Deutsch
Sarah Hunt
Simon Tavaré
Stylianos E Antonarakis
The International HapMap Consortium
Publication venue: PUBLIC LIBRARY SCIENCE
Publication date: 01/01/2005
Field of study

The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

The Francis Crick Institute

A Geometric Framework for Evaluating Rare Variant Tests of Association

Author: 1000 Genomes Project Consortium
Asimit
Bansal
Basu
Cooper
Dai
Dering
Feng
Gibson
Han
Ionita-Laza
Ladouceur
Li
Li
Li
Lin
Luedtke
Madsen
Mayer-Jochimsen
Morgenthaler
Morris
Neale
Nelson
Pan
Powers
Price
Quintana
Rivas
Sul
Sun
Tennessen
The ENCODE Project Consortium
Tintle
Torgerson
Wu
Yi
Zawistowski
Zhang
Publication venue: 'Wiley'
Publication date: 01/05/2013
Field of study

The wave of next‐generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/97460/1/gepi21722.pd

Mapping the <i>Shh</i> long-range regulatory domain

Author: Amano
Belloni
Bickmore
Chuong
Davis
Dixon
Echelard
Epstein
Hecksher-Sorensen
Jeong
Jeong
Klopocki
Kokubu
Lettice
Lettice
Lettice
Lettice
Lettice
Lettice
Liu
Marinić
Mates
Montavon
Nagy
Niedermaier
Osoegawa
Paek
Riddle
Ruf
Sagai
Sagai
Sagai
Sharpe
Sharpe
Shen
Smallwood
Spitz
Sun
Symmons
Symmons
The ENCODE Consortium Project
Tsukiji
Publication venue: 'The Company of Biologists'
Publication date: 01/10/2014
Field of study

Coordinated gene expression controlled by long-distance enhancers is orchestrated by DNA regulatory sequences involving transcription factors and layers of control mechanisms. The Shh gene and well-established regulators are an example of genomic composition in which enhancers reside in a large desert extending into neighbouring genes to control the spatiotemporal pattern of expression. Exploiting the local hopping activity of the Sleeping Beauty transposon, the lacZ reporter gene was dispersed throughout the Shh region to systematically map the genomic features responsible for expression activity. We found that enhancer activities are retained inside a genomic region that corresponds to the topological associated domain (TAD) defined by Hi-C. This domain of approximately 900 kb is in an open conformation over its length and is generally susceptible to all Shh enhancers. Similar to the distal enhancers, an enhancer residing within the Shh second intron activates the reporter gene located at distances of hundreds of kilobases away, suggesting that both proximal and distal enhancers have the capacity to survey the Shh topological domain to recognise potential promoters. The widely expressed Rnf32 gene lying within the Shh domain evades enhancer activities by a process that may be common among other housekeeping genes that reside in large regulatory domains. Finally, the boundaries of the Shh TAD do not represent the absolute expression limits of enhancer activity, as expression activity is lost stepwise at a number of genomic positions at the verges of these domains

Crossref

PubMed Central

Edinburgh Research Explorer

Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction

Author: Artemis Hatzigeorgiou
Axel Bernal
David Haussler
ENCODE Project Consortium
Fernando Pereira
Koby Crammer
Publication venue: Public Library of Science
Publication date: 01/03/2007
Field of study

Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

ScholarlyCommons@Penn

Identification and Analysis of Genes and Pseudogenes within Duplicated Regions in the Human and Mouse Genomes

Author: David Torrents
ENCODE Project Consortium
Eoghan Harrington
MGSC
Mikita Suyama
Peer Bork
Roderic Guigo
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

GIVE: portable genome browsers for personal websites.

Author: Alvin Zheng
B Sridhar
B Sridhar
C Tyner
D Barrios
D Comer
E Lieberman-Aiden
E Sharma
F Ozsolak
F Yue
FH Biase
JD Buenrostro
JG Aw
JT Robinson
LD Stein
ME Skinner
MJ Fullwood
Qiuyang Wu
R Bayer
R Li
R Mourad
S Carrere
Sheng Zhong
TC Nguyen
The ENCODE Project Consortium
VW Zhou
WJ Kent
X Li
X Zhou
Xiaoyi Cao
Z Lu
Zhangming Yan
Publication venue: eScholarship, University of California
Publication date: 01/07/2018
Field of study

Growing popularity and diversity of genomic data demand portable and versatile genome browsers. Here, we present an open source programming library called GIVE that facilitates the creation of personalized genome browsers without requiring a system administrator. By inserting HTML tags, one can add to a personal webpage interactive visualization of multiple types of genomics data, including genome annotation, "linear" quantitative data, and genome interaction data. GIVE includes a graphical interface called HUG (HTML Universal Generator) that automatically generates HTML code for displaying user chosen data, which can be copy-pasted into user's personal website or saved and shared with collaborators. GIVE is available at: https://www.givengine.org/

Crossref

Directory of Open Access Journals

eScholarship - University of California