75 research outputs found
A User's Guide to the Encyclopedia of DNA Elements (ENCODE)
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to
interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE
Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional
elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with
their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have
been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made
available through a freely accessible database. Here we provide an overview of the project and the resources it is generating
and illustrate the application of ENCODE data to interpret the human genome.National Human Genome Research Institute (U.S.)National Institutes of Health (U.S.
Evolution of Exon-Intron Structure and Alternative Splicing
Despite significant advances in high-throughput DNA sequencing, many important
species remain understudied at the genome level. In this study we addressed a
question of what can be predicted about the genome-wide characteristics of less
studied species, based on the genomic data from completely sequenced species.
Using NCBI databases we performed a comparative genome-wide analysis of such
characteristics as alternative splicing, number of genes, gene products and
exons in 36 completely sequenced model species. We created statistical
regression models to fit these data and applied them to loblolly pine
(Pinus taeda L.), an example of an important species whose
genome has not been completely sequenced yet. Using these models, the
genome-wide characteristics, such as total number of genes and exons, can be
roughly predicted based on parameters estimated from available limited genomic
data, e.g. exon length and exon/gene ratio
Comparative Analysis of Human Protein-Coding and Noncoding RNAs between Brain and 10 Mixed Cell Lines by RNA-Seq
In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome
A Family of Tree-Based Generators for Bubbles in Directed Graphs
International audienceBubbles are pairs of internally vertex-disjoint (s, t)-paths in a directed graph. In de Bruijn graphs built from reads of RNA and DNA data, bubbles represent interesting biological events, such as alternative splicing (AS) and allelic differences (SNPs and indels). However, the set of all bubbles in a de Bruijn graph built from real data is usually too large to be efficiently enumerated and analysed in practice. In particular, despite significant research done in this area, listing bubbles still remains the main bottleneck for tools that detect AS events in a reference-free context. Recently, in [1] the concept of a bubble generator was introduced as a way for obtaining a compact representation of the bubble space of a graph. Although this generator was quite effective in finding AS events, preliminary experiments showed that it is about 5 times slower than state-of-art methods. In this paper we propose a new family of bubble generators which improve substantially on the previous generator: generators in this new family are about two orders of magnitude faster and are still able to achieve similar precision in identifying AS events. To highlight the practical value of our new generators, we also report some experimental results on a real dataset
Genetic effects on gene expression across human tissues
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of diseas
A user's guide to the Encyclopedia of DNA elements (ENCODE)
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome
Genetic effects on gene expression across human tissues
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease
Rapid threshold estimation using the chained-stimuli technique for auditory brain stem response measurement.
The chained-stimuli technique for rapid auditory brain stem response (ABR) threshold estimation involves lengthening the averaging time window and presenting a series ( chain ) of click stimuli. Each stimulus chain contains, in addition to a silent interval, click stimuli of 10, 20, 30, 40, 50, 60, and 70 dB nHL that are separated by 10 msec intervals. Using this method, the single averaged response to the chained-stimulus contains up to seven individual ABRs. The responses elicited by each level of click stimulus within the chain can be analyzed separately. In this study, chained-stimuli ABR threshold estimations for normal hearers were essentially equivalent to those obtained using an automated conventional ABR method. The data for a seven point latency-intensity function using the chained-stimuli technique were obtained in a mean time of only 8 min per ear
- …