194 research outputs found
Long Noncoding RNAs are Rarely Translated in Two Human Cell Lines
Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ~100 have been deeply characterized to determine their role in the cell. To measure the protein-coding output from these RNAs, we jointly analyzed two recent data sets produced in the ENCODE project: tandem mass spectrometry (MS/MS) data mapping expressed peptides to their encoding genomic loci, and RNA-seq data generated by ENCODE in long polyA+ and polyA– fractions in the cell lines K562 and GM12878. We used the machinelearning algorithm RuleFit3 to regress the peptide data against RNA expression data. The most important covariate for predicting translation was, surprisingly, the Cytosol polyA– fraction in both cell lines. LncRNAs are ~13-fold less likely to produce detectable peptides than similar mRNAs, indicating that ~92% of GENCODE v7 lncRNAs are not translated in these two ENCODE cell lines. Intersecting 9640 lncRNA loci with 79,333 peptides yielded 85 unique peptides matching 69 lncRNAs. Most cases were due to a coding transcript misannotated as lncRNA. Two exceptions were an unprocessed pseudogene and a bona fide lncRNA gene, both with open reading frames (ORFs) compromised by upstream stop codons. All potentially translatable lncRNA ORFs had only a single peptide match, indicating low protein abundance and/or false-positive peptide matches. We conclude that with very few exceptions, ribosomes are able to distinguish coding from noncoding transcripts and, hence, that ectopic translation and cryptic mRNAs are rare in the human lncRNAome
Identification of Novel Susceptibility Loci and Genes for Prostate Cancer Risk: A Transcriptome-Wide Association Study in over 140,000 European Descendants
Genome-wide association study–identified prostate cancer risk variants explain only a relatively small fraction of its familial relative risk, and the genes responsible for many of these identified associations remain unknown. To discover novel prostate cancer genetic loci and possible causal genes at previously identified risk loci, we performed a transcriptome-wide association study in 79,194 cases and 61,112 controls of European ancestry. Using data from the Genotype-Tissue Expression Project, we established genetic models to predict gene expression across the transcriptome for both prostate models and cross-tissue models and evaluated model performance using two independent datasets. We identified significant associations for 137 genes at P < 2.61 × 10−6, a Bonferroni-corrected threshold, including nine genes that remained significant at P < 2.61 × 10−6 after adjusting for all known prostate cancer risk variants in nearby regions. Of the 128 remaining associated genes, 94 have not yet been reported as potential target genes at known loci. We silenced 14 genes and many showed a consistent effect on viability and colony-forming efficiency in three cell lines. Our study provides substantial new information to advance our understanding of prostate cancer genetics and biology.
SIGNIFICANCE: This study identifies novel prostate cancer genetic loci and possible causal genes, advancing our understanding of the molecular mechanisms that drive prostate cancer
Long noncoding RNAs are rarely translated in two human cell lines
Data from the Encyclopedia of DNA Elements (ENCODE) project show over 9640 human genome loci classified as long noncoding RNAs (lncRNAs), yet only ∼100 have been deeply characterized to determine their role in the cell. To measure the protein-coding output from these RNAs, we jointly analyzed two recent data sets produced in the ENCODE project: tandem mass spectrometry (MS/MS) data mapping expressed peptides to their encoding genomic loci, and RNA-seq data generated by ENCODE in long polyA+ and polyA− fractions in the cell lines K562 and GM12878. We used the machine-learning algorithm RuleFit3 to regress the peptide data against RNA expression data. The most important covariate for predicting translation was, surprisingly, the Cytosol polyA− fraction in both cell lines. LncRNAs are ∼13-fold less likely to produce detectable peptides than similar mRNAs, indicating that ∼92% of GENCODE v7 lncRNAs are not translated in these two ENCODE cell lines. Intersecting 9640 lncRNA loci with 79,333 peptides yielded 85 unique peptides matching 69 lncRNAs. Most cases were due to a coding transcript misannotated as lncRNA. Two exceptions were an unprocessed pseudogene and a bona fide lncRNA gene, both with open reading frames (ORFs) compromised by upstream stop codons. All potentially translatable lncRNA ORFs had only a single peptide match, indicating low protein abundance and/or false-positive peptide matches. We conclude that with very few exceptions, ribosomes are able to distinguish coding from noncoding transcripts and, hence, that ectopic translation and cryptic mRNAs are rare in the human lncRNAome
Recommended from our members
Exploring the Plasmodium falciparum Transcriptome Using Hypergeometric Analysis of Time Series (HATS)
Malaria poses a significant public health and economic threat in many regions of the world, disproportionately affecting children in sub-Saharan Africa under the age of five. Though success has been celebrated in lowering infection rates, it remains a serious challenge, causing at least 200 million infections and 655,000 deaths per year, with deleterious effects on economic growth and development. Investigation of the malaria parasite Plasmodium falciparum has entered the post-genomics age, with several strains sequenced and many microarray gene expression studies performed. Gene expression studies allow a full sampling of the genomic repertoire of a parasite, and their detailed analysis will prove invaluable in deciphering novel parasite biology as well as the modes of action of antimalarial drug resistance.
We have developed a computational pipeline that converts a series of fluorescence readings from a DNA microarray into a meaningful set of biological hypotheses based on the comparison of two lines, generally one that is drug sensitive and one that is drug resistant. Each step of the computational pipeline is described in detail in this thesis, beginning with data normalization and alignment, followed by visualization through dimensionality reduction, and finally a direct analysis of the differences and similarities between the two lines. Comparisons and analyses were performed at both the individual gene and gene set level. An important component of the analytical methods we have developed is a suite of visualization tools that help to easily identify outliers and experimental flaws, measure the significance of predictions, show how lines relate and how well they can be aligned, and demonstrate the results of an analysis.
These visualization tools should be used as a starting point for further biological study to test the resulting hypotheses. We also developed a software tool, Gene Attribute and Set Enrichment Ranking (GASER), which combines a wealth of genomic data from the TDR Targets web site along with expression data from a variety of sources, and allows researchers to create sophisticated weighted queries to undercover potential drug targets. Queries in our system can be updated in real time, along with their accompanying gene and gene set lists. We analyzed all possible pair-wise combinations of 11 parasite lines to create baseline distributions for gene and gene set enrichment. Using the baseline as a comparison, we identified and discarded spurious results and recognized stochastic genes and gene sets.
We analyzed three major sets of parasite lines: those involving manipulation of the multidrug resistance-1 (PfMDR1) transporter, a key resistance determinant; those involving manipulation of the P. falciparum chloroquine resistance transporter (PfCRT), another important resistance determinant; and finally a set of parasites that had varying sensitivity to artemisinins. This analysis resulted in a rich library of high scoring genes that may merit further exploration as potential modes of action of resistance. More specifically, we found that manipulation of pfcrt expression resulted in an up-regulation of tRNA synthetases, which might serve to increase protein production in response to reduced amino acid availability from degraded hemoglobin. We observed that a copy number increase in pfmdr1 resulted in increases in glycerophospholipid metabolism and up-regulation of a number of ABC transporters. Finally, when comparing artemisinin sensitive to artemisinin tolerant lines, we found an increased abundance of redox metabolites and the transcripts involved in redox regulation, and significant reduction in transcription and altered expression of transcripts encoding for core histone proteins. These alterations could help confer an increased tolerance to drug induced redox perturbation by lowering endogenous redox stress.
We also offer a robust computational tool, Hypergeometric Analysis of Time Series (HATS), to handle challenging biological questions related to comparison of time series experiments. Our pipeline provides a rigorous method for aligning expression experiments and then determining which genes and gene sets differ most between them. The changes in gene expression level between drug-sensitive and drug-resistant lines offer important clues in our quest for understanding mechanisms of resistance and identifying new drug targets. Our pipeline allows for comparison of future lines with our base set and holds potential for other organisms, especially those similar to Plasmodium with a strong time-dependent component. The full excel files of all the analyses performed in this thesis can be found at: (http://www.fidock.org/dan)
The long noncoding RNA mimi scaffolds neuronal granules to maintain nervous system maturity
RNA binding proteins and messenger RNAs (mRNAs) assemble into ribonucleoprotein granules that regulate mRNA trafficking, local translation, and turnover. The dysregulation of RNA-protein condensation disturbs synaptic plas-ticity and neuron survival and has been widely associated with human neurological disease. Neuronal granules are thought to condense around particular proteins that dictate the identity and composition of each granule type. Here, we show in Drosophila that a previously uncharacterized long noncoding RNA, mimi, is required to scaffold large neuronal granules in the adult nervous system. Neuronal ELAV-like proteins directly bind mimi and mediate granule assembly, while Staufen maintains condensate integrity. mimi granules contain mRNAs and proteins involved in synaptic processes; granule loss in mimi mutant flies impairs nervous system maturity and neuropeptide-mediated signaling and causes phenotypes of neurodegeneration. Our work reports an architectural RNA for a neuronal granule and provides a handle to interrogate functions of a condensate independently of those of its constituent proteins
Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs
MicroRNAs (miRNAs) are small ∼22-nt RNAs that are important regulators of posttranscriptional gene expression. Since their initial discovery, they have been shown to be involved in many cellular processes, and their misexpression is associated with disease etiology. Currently, nearly 2,800 human miRNAs are annotated in public repositories. A key question in miRNA research is how many miRNAs are harbored by the human genome. To answer this question, we examined 1,323 short RNA sequence samples and identified 3,707 novel miRNAs, many of which are human-specific and tissue-specific. Our findings suggest that the human genome expresses a greater number of miRNAs than has previously been appreciated and that many more miRNA molecules may play key roles in disease etiology
Multi-omics phenotyping of the gut-liver axis reveals metabolic perturbations from a low-dose pesticide mixture in rats.
Health effects of pesticides are not always accurately detected using the current battery of regulatory toxicity tests. We compared standard histopathology and serum biochemistry measures and multi-omics analyses in a subchronic toxicity test of a mixture of six pesticides frequently detected in foodstuffs (azoxystrobin, boscalid, chlorpyrifos, glyphosate, imidacloprid and thiabendazole) in Sprague-Dawley rats. Analysis of water and feed consumption, body weight, histopathology and serum biochemistry showed little effect. Contrastingly, serum and caecum metabolomics revealed that nicotinamide and tryptophan metabolism were affected, which suggested activation of an oxidative stress response. This was not reflected by gut microbial community composition changes evaluated by shotgun metagenomics. Transcriptomics of the liver showed that 257 genes had their expression changed. Gene functions affected included the regulation of response to steroid hormones and the activation of stress response pathways. Genome-wide DNA methylation analysis of the same liver samples showed that 4,255 CpG sites were differentially methylated. Overall, we demonstrated that in-depth molecular profiling in laboratory animals exposed to low concentrations of pesticides allows the detection of metabolic perturbations that would remain undetected by standard regulatory biochemical measures and which could thus improve the predictability of health risks from exposure to chemical pollutants
The Genetics of Primary Open-Angle Glaucoma: A Complex Human Disease
<p>Glaucoma is a chronic ocular neuropathy and a leading cause of blindness worldwide. Primary open-angle glaucoma (POAG) is the most common subtype with an estimated 2 million affected individuals in the Unites States. POAG is a heritable complex trait. Understanding the genetics of POAG may increase our ability to predict disease onset and help elucidate the underlying biological mechanisms responsible for the development of the disease. With this overall goal, three different approaches are presented here.</p><p>First, the genetics of an important POAG-associated trait, central corneal thickness (CCT), was investigated using genome-wide single nucleotide polymorphism (SNP) data available from the NEIGHBOR and GLAUGEN consortia to identify novel POAG candidate genes. Twenty previously published CCT-associated SNPs were tested for association with both CCT (N = 1,117) and POAG (N = 6,470). While several of these variants were significantly associated with CCT in our dataset (top SNP = rs12447690, near ZNF469 (beta = -5.08 µm/allele; p = 0.001), none were associated with POAG. A CCT genome-wide association study was conducted. Using a p-value threshold of </p><p>1X10-4, 50 candidate SNPs were tested for association with POAG. One SNP, rs7481514, within the NTM gene was significantly associated with POAG in a low tension subset of cases (odds ratio (OR) = 1.28; p = 0.001). Additionally, SNPs in the CNTNAP4 gene showed suggestive evidence of association with POAG (top SNP = rs1428758; OR = 0.84; p = 0.018). A gene expression analysis showed evidence of NTM and CNTNAP4 gene expression in relevant ocular tissues. This study suggests previously reported CCT loci do not increase POAG susceptibility. However, by using a two-step gene mapping approach, the cell adhesion molecules, NTM and CNTNAP4, were identified as potential POAG candidate genes in a subset of cases.</p><p>The second study aimed to identify functional alleles within a POAG candidate gene. Previous association studies identified a significant association between POAG and the SIX6 locus (top SNP = rs10483727, OR = 1.32, p = 3.87X10-11). SIX6 plays a role in ocular development and has been associated with the morphology of the optic nerve. Sequencing of the SIX6 coding and regulatory regions in 262 POAG cases and 256 controls identified six nonsynonymous coding variants. Of these six, five were rare (minor allele frequency (MAF) < 0.002) and one, Asn141His (rs33912345), was a common variant that showed a significant association with POAG (OR = 1.27, p = 4.2X10-10) in the NEIGHBOR/GLAUGEN dataset. These variants were tested in an in vivo zebrafish complementation assay to evaluate ocular metrics. Five of the six alleles had a functional effect on the protein. These five variants, found primarily in POAG cases, were hypomorphic or null, while a sixth variant, found only in controls, was benign. One variant in the SIX6 enhancer increased expression of the SIX6 gene and disrupted its regulation. Using optical coherence tomography, the retinal thickness of POAG with and without the common SIX6 risk allele, Asn141His (rs33912345), was measured. Patients who are homozygous for the SIX6 risk allele (His141) have a statistically thinner retinal nerve fiber layer than patients homozygous for the SIX6 non-risk allele (Asn141). These results in combination with previous SIX6 work, leads us to hypothesize that SIX6 risk variants disrupt the development of the neural retina and result in a reduced number of retinal ganglion cells generated during development, thereby increasing the risk of glaucoma-associated vision loss later in life.</p><p>Next, the transcriptional landscape of three POAG-related tissues; the trabecular meshwork, the cornea, and the ciliary body; were evaluated using an RNA sequencing (RNA-seq) approach. Tissues were selected from two fetal and four adult human donor samples with no known history of ocular disease. Deep RNA-seq was performed, and the total number of paired reads per sample ranged from 32,137,380 to 59,784,117. A descriptive analysis was conducted and included the identification of the top most expressed genes in each tissue and the distribution of gene expression values. Additionally, gene expression of selected POAG candidate genes (CDKN2B, CDKN2A, CDKN2B-AS, SIX6, SRBD1, ATOH, CAV1, CAV2, ELOVL5, and TMCO1) was evaluated. Most of these genes showed high expression values in the trabecular meshwork and cornea. ATOH was only found to be expressed in the fetal TM and, interestingly, SIX6 was shown to be highly expressed in the adult and fetal ciliary body. CDKN2B-AS was not found to be expressed in any of the tissues evaluated. Finally, the RNA-seq data was used to identify potential novel isoforms of these candidate genes. Using a stringent threshold, five novel isoforms were identified in CDKN2B, SRBD1, and SIX6. The data generated as part of this study can be used to develop novel hypotheses, guide future work, and is broadly applicable for ocular research because the tissues included in this analysis are essential for normal vision and play important roles in ocular diseases.</p><p>In this dissertation, three different approaches (assessment of a quantitative risk factor, candidate gene functional analysis, and the assessment of the transcriptional landscape of relevant ocular tissues) were used to study the common blinding disorder, primary open-angle glaucoma. Continued research in this field is essential. There is a need for increased functional follow-up of genetic association studies in order to identify true causal susceptibility genes and improve our understanding of POAG biology. Additionally, researchers should focus on building and implementing accurate prediction models to increase POAG diagnosis rates and preemptive treatment.</p>Dissertatio
- …