Search CORE

606 research outputs found

Genome Network and FANTOM3: Assessing the Complexity of the Transcriptome

Author: ENCODE Project Consortium
Frith MC Wilming LG, Forrest A, Kawaji H, Tan SL, et al.
Liu J Gough J, Rost B
Piero Carninci
van Nimwegen E Paul N, Sheridan R, Zavolan M
Yoshihide Hayashizaki
Publication venue: Public Library of Science
Publication date: 01/04/2006
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

ENCODE whole-genome data in the UCSC Genome Browser

Author: A. Pohl
A. S. Hinrichs
A. S. Zweig
B. J. Raney
B. Rhead
Celniker
D. Haussler
D. Karolchik
G. P. Barber
K. E. Smith
K. Learned
K. R. Rosenbloom
L. R. Meyer
M. Pheasant
P. A. Fujita
R. M. Kuhn
T. R. Dreszer
T. Wang
The ENCODE Project Consortium
W. J. Kent
Weinstock
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The Encyclopedia of DNA Elements (ENCODE) project is an international consortium of investigators funded to analyze the human genome with the goal of producing a comprehensive catalog of functional elements. The ENCODE Data Coordination Center at The University of California, Santa Cruz (UCSC) is the primary repository for experimental results generated by ENCODE investigators. These results are captured in the UCSC Genome Bioinformatics database and download server for visualization and data mining via the UCSC Genome Browser and companion tools (Rhead et al. The UCSC Genome Browser Database: update 2010, in this issue). The ENCODE web portal at UCSC (http://encodeproject.org or http://genome.ucsc.edu/ENCODE) provides information about the ENCODE data and convenient links for access

Crossref

PubMed Central

University of Queensland eSpace

Fine Mapping of Type 2 Diabetes Susceptibility Loci

Crossref

Differential analysis for high density tiling microarray data

Abstract Background High density oligonucleotide tiling arrays are an effective and powerful platform for conducting unbiased genome-wide studies. The <it>ab initio </it>probe selection method employed in tiling arrays is unbiased, and thus ensures consistent sampling across coding and non-coding regions of the genome. These arrays are being increasingly used to study the associated processes of transcription, transcription factor binding, chromatin structure and their association. Studies of differential expression and/or regulation provide critical insight into the mechanics of transcription and regulation that occurs during the developmental program of a cell. The time-course experiment, which comprises an <it>in-vivo </it>system and the proposed analyses, is used to determine if annotated and un-annotated portions of genome manifest coordinated differential response to the induced developmental program. Results We have proposed a novel approach, based on a piece-wise function – to analyze genome-wide differential response. This enables segmentation of the response based on protein-coding and non-coding regions; for genes the methodology also partitions differential response with a 5' versus 3' versus intra-genic bias. Conclusion The algorithm built upon the framework of Significance Analysis of Microarrays, uses a generalized logic to define regions/patterns of coordinated differential change. By not adhering to the gene-centric paradigm, discordant differential expression patterns between exons and introns have been identified at a FDR of less than 12 percent. A co-localization of differential binding between RNA Polymerase II and tetra-acetylated histone has been quantified at a p-value < 0.003; it is most significant at the 5' end of genes, at a p-value < 10-13. The prototype R code has been made available as supplementary material [see Additional file <supplr sid="S1">1</supplr>]. <suppl id="S1"> <title> Additional file 1 </title> <text> gsam_prototypercode.zip. File archive comprising of prototype R code for gSAM implementation including readme and examples. </text> <file name="1471-2105-8-359-S1.zip"> Click here for file </file> </suppl

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Law of Genome Evolution Direction : Coding Information Quantity Grows

Author: A. F. A. Smit
A. G. Matera
A. Mira
B. Charlesworth
C. L. Organ
C. Nusbaum
D. A. Petrov
D. L. Marais Des
D. R. Scannell
E. Schrodinger
E. T. Dermitzakis
F. Clark
G. Bejerano
G. Liu
G. Storz
H. H. Chou
H. H. Kazazian
H. Ozkan
H. Winter
I. J. Leitch
I. Wapinski
I. Wickelgren
International Human Genome Sequencing Consortium
J. Filkowski
J.M. Aury
K. M. Devos
L. F. Luo
L. F. Luo
L. F. Luo
L. He
L. Patthy
L. R. Zhang
Liao-fu Luo
R. J. Taft
R. P. Bininda-Edmonds
S. E. Peters
T. C. Stadtman
T. Kouzarides
T. R. Gregory
The ENCODE Project Consortium
W. Deng
W. Enard
W. H. Li
W. Makalowski
X. Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/08/2008
Field of study

The problem of the directionality of genome evolution is studied. Based on the analysis of C-value paradox and the evolution of genome size we propose that the function-coding information quantity of a genome always grows in the course of evolution through sequence duplication, expansion of code, and gene transfer from outside. The function-coding information quantity of a genome consists of two parts, p-coding information quantity which encodes functional protein and n-coding information quantity which encodes other functional elements except amino acid sequence. The evidences on the evolutionary law about the function-coding information quantity are listed. The needs of function is the motive force for the expansion of coding information quantity and the information quantity expansion is the way to make functional innovation and extension for a species. So, the increase of coding information quantity of a genome is a measure of the acquired new function and it determines the directionality of genome evolution.Comment: 16 page

arXiv.org e-Print Archive

Crossref

The Diploid Genome Sequence of an Individual Human

Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Diposit Digital de la Universitat de Barcelona

ScholarBank@NUS

The UCSC Genome Browser database: update 2010

Author: A. Pohl
A. S. Hinrichs
A. S. Zweig
Austin
B. Giardine
B. J. Raney
B. Rhead
Berman
Blanchette
D. Haussler
D. Karolchik
F. Hsu
Feuk
G. P. Barber
H. Clawson
Hsu
Iafrate
J. Hillman-Jackson
Jain
K. E. Smith
K. Learned
K. R. Rosenbloom
Kaiser
Karolchik
Karolchik
Kent
L. R. Meyer
M. Diekhans
M. Pheasant
Nord
P. A. Fujita
Pettersen
R. A. Harte
R. M. Kuhn
Sherry
T. R. Dreszer
The ENCODE Project Consortium
The MGC Project Team
W. J. Kent
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools

CiteSeerX

Crossref

PubMed Central

University of Queensland eSpace

The UCSC Genome Browser Database: update 2009

Author: A. Pohl
A. S. Hinrichs
A. S. Zweig
B. Giardine
B. J. Raney
B. Rhead
Bellen
Blanchette
D. Haussler
D. Karolchik
F. Hsu
G. P. Barber
H. Clawson
Hinrichs
Hsu
Iafrate
K. E. Smith
K. R. Rosenbloom
Karolchik
Karolchik
Kent
L. Meyer
M. Diekhans
M. Pheasant
Mattes
Nord
P. Fujita
R. A. Harte
R. M. Kuhn
Sherry
T. Dreszer
T. Wang
The ENCODE Project Consortium
The MGC Project Team
W. J. Kent
Yang
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu) is a publicly available collection of genome assembly sequence data and integrated annotations for a large number of organisms, including extensive comparative-genomic resources. In the past year, 13 new genome assemblies have been added, including two important primate species, orangutan and marmoset, bringing the total to 46 assemblies for 24 different vertebrates and 39 assemblies for 22 different invertebrate animals. The GBD datasets may be viewed graphically with the UCSC Genome Browser, which uses a coordinate-based display system allowing users to juxtapose a wide variety of data. These data include all mRNAs from GenBank mapped to all organisms, RefSeq alignments, gene predictions, regulatory elements, gene expression data, repeats, SNPs and other variation data, as well as pairwise and multiple-genome alignments. A variety of other bioinformatics tools are also provided, including BLAT, the Table Browser, the Gene Sorter, the Proteome Browser, VisiGene and Genome Graphs

CiteSeerX

Crossref

PubMed Central

Tracking and coordinating an international curation effort for the CCDS Project

Author: A. Frankish
B. Aken
Bab
Baertsch
Brogna
Buhler
C. M. Farrell
C. Wallin
Church
Crowe
D. Barrell
Eberle
Green
Hwang
J. E. Loveland
J. Harrow
Jackson
K. D. Pruitt
Kim
Kozak
Kozak
Kozak
L. Wilming
Lee
Luukkonen
M. Diekhans
M.-M. Suner
Morris
Natsoulis
Nicholson
Parla
Prakash
R. A. Harte
S. Searle
Silva
Simeone
The ENCODE Project Consortium
Udby
Wethmar
Wu
Publication venue: Oxford University Press
Publication date: 12/02/2013
Field of study

The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a ‘gold standard’ definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines

CiteSeerX

Crossref

PubMed Central

DGW: an exploratory data analysis tool for clustering and visualisation of epigenomic marks

Author: A Barski
A Kundaje
BW Matthews
C Taslim
D Benveniste
ENCODE Project Consortium
G Jurman
G Schweikert
Gabriele B. Schweikert
GJ Filion
Guido Sanguinetti
H Sakoe
M Müller
MB Eisen
NI Bieberstein
Roberto Visintainer
Saulius Lukauskas
TA Knijnenburg
TS Furey
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background Functional genomic and epigenomic research relies fundamentally on sequencing based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent. Results We present DGW, an open source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses Dynamic Time Warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project. Conclusions Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open source Python package

Crossref

Springer - Publisher Connector

Archivio della ricerca - Fondazione Bruno Kessler

PubMed Central

Edinburgh Research Explorer

University of Dundee Online Publications