62 research outputs found
In-Depth Transcriptome Analysis Reveals Novel TARs and Prevalent Antisense Transcription in Human Cell Lines
Several recent studies have indicated that transcription is pervasive in regions outside of protein coding genes and that short antisense transcripts can originate from the promoter and terminator regions of genes. Here we investigate transcription of fragments longer than 200 nucleotides, focusing on antisense transcription for known protein coding genes and intergenic transcription. We find that roughly 12% to 16% of all reads that originate from promoter and terminator regions, respectively, map antisense to the gene in question. Furthermore, we detect a high number of novel transcriptionally active regions (TARs) that are generally expressed at a lower level than protein coding genes. We find that the correlation between RNA-seq data and microarray data is dependent on the gene length, with longer genes showing a better correlation. We detect high antisense transcriptional activity from promoter, terminator and intron regions of protein-coding genes and identify a vast number of previously unidentified TARs, including putative novel EGFR transcripts. This shows that in-depth analysis of the transcriptome using RNA-seq is a valuable tool for understanding complex transcriptional events. Furthermore, the development of new algorithms for estimation of gene expression from RNA-seq data is necessary to minimize length bias
Cone-setting in spruce is regulated by conserved elements of the age-dependent flowering pathway
Reproductive phase change is well characterized in angiosperm model species, but less studied in gymnosperms. We utilize the early cone-setting acrocona mutant to study reproductive phase change in the conifer Picea abies (Norway spruce), a gymnosperm. The acrocona mutant frequently initiates cone-like structures, called transition shoots, in positions where wild-type P. abies always produces vegetative shoots. We collect acrocona and wild-type samples, and RNA-sequence their messenger RNA (mRNA) and microRNA (miRNA) fractions. We establish gene expression patterns and then use allele-specific transcript assembly to identify mutations in acrocona. We genotype a segregating population of inbred acrocona trees. A member of the SQUAMOSA BINDING PROTEIN-LIKE (SPL) gene family, PaSPL1, is active in reproductive meristems, whereas two putative negative regulators of PaSPL1, miRNA156 and the conifer specific miRNA529, are upregulated in vegetative and transition shoot meristems. We identify a mutation in a putative miRNA156/529 binding site of the acrocona PaSPL1 allele and show that the mutation renders the acrocona allele tolerant to these miRNAs. We show co-segregation between the early cone-setting phenotype and trees homozygous for the acrocona mutation. In conclusion, we demonstrate evolutionary conservation of the age-dependent flowering pathway and involvement of this pathway in regulating reproductive phase change in the conifer P. abies
Genome-wide identification of physically clustered genes suggests chromatin-level co-regulation in male reproductive development in Arabidopsis thaliana
Co-expression of physically linked genes occurs surprisingly frequently in eukaryotes. Such chromosomal clustering may confer a selective advantage as it enables coordinated gene regulation at the chromatin level. We studied the chromosomal organization of genes involved in male reproductive development in Arabidopsis thaliana. We developed an in-silico tool to identify physical clusters of co-regulated genes from gene expression data. We identified 17 clusters (96 genes) involved in stamen development and acting downstream of the transcriptional activator MS1 (MALE STERILITY 1), which contains a PHD domain associated with chromatin re-organization. The clusters exhibited little gene homology or promoter element similarity, and largely overlapped with reported repressive histone marks. Experiments on a subset of the clusters suggested a link between expression activation and chromatin conformation: qRT-PCR and mRNA in situ hybridization showed that the clustered genes were up-regulated within 48 h after MS1 induction; out of 14 chromatin-remodeling mutants studied, expression of clustered genes was consistently down-regulated only in hta9/hta11, previously associated with metabolic cluster activation; DNA fluorescence in situ hybridization confirmed that transcriptional activation of the clustered genes was correlated with open chromatin conformation. Stamen development thus appears to involve transcriptional activation of physically clustered genes through chromatin de-condensation.QC 20170320</p
Analysis of transcript and protein overlap in a human osteosarcoma cell line
<p>Abstract</p> <p>Background</p> <p>An interesting field of research in genomics and proteomics is to compare the overlap between the transcriptome and the proteome. Recently, the tools to analyse gene and protein expression on a whole-genome scale have been improved, including the availability of the new generation sequencing instruments and high-throughput antibody-based methods to analyze the presence and localization of proteins. In this study, we used massive transcriptome sequencing (RNA-seq) to investigate the transcriptome of a human osteosarcoma cell line and compared the expression levels with <it>in situ </it>protein data obtained in-situ from antibody-based immunohistochemistry (IHC) and immunofluorescence microscopy (IF).</p> <p>Results</p> <p>A large-scale analysis based on 2749 genes was performed, corresponding to approximately 13% of the protein coding genes in the human genome. We found the presence of both RNA and proteins to a large fraction of the analyzed genes with 60% of the analyzed human genes detected by all three methods. Only 34 genes (1.2%) were not detected on the transcriptional or protein level with any method. Our data suggest that the majority of the human genes are expressed at detectable transcript or protein levels in this cell line. Since the reliability of antibodies depends on possible cross-reactivity, we compared the RNA and protein data using antibodies with different reliability scores based on various criteria, including Western blot analysis. Gene products detected in all three platforms generally have good antibody validation scores, while those detected only by antibodies, but not by RNA sequencing, generally consist of more low-scoring antibodies.</p> <p>Conclusion</p> <p>This suggests that some antibodies are staining the cells in an unspecific manner, and that assessment of transcript presence by RNA-seq can provide guidance for validation of the corresponding antibodies.</p
Sorting Signals, N-Terminal Modifications and Abundance of the Chloroplast Proteome
Characterization of the chloroplast proteome is needed to understand the essential contribution of the chloroplast to plant growth and development. Here we present a large scale analysis by nanoLC-Q-TOF and nanoLC-LTQ-Orbitrap mass spectrometry (MS) of ten independent chloroplast preparations from Arabidopsis thaliana which unambiguously identified 1325 proteins. Novel proteins include various kinases and putative nucleotide binding proteins. Based on repeated and independent MS based protein identifications requiring multiple matched peptide sequences, as well as literature, 916 nuclear-encoded proteins were assigned with high confidence to the plastid, of which 86% had a predicted chloroplast transit peptide (cTP). The protein abundance of soluble stromal proteins was calculated from normalized spectral counts from LTQ-Obitrap analysis and was found to cover four orders of magnitude. Comparison to gel-based quantification demonstrates that âspectral countingâ can provide large scale protein quantification for Arabidopsis. This quantitative information was used to determine possible biases for protein targeting prediction by TargetP and also to understand the significance of protein contaminants. The abundance data for 550 stromal proteins was used to understand abundance of metabolic pathways and chloroplast processes. We highlight the abundance of 48 stromal proteins involved in post-translational proteome homeostasis (including aminopeptidases, proteases, deformylases, chaperones, protein sorting components) and discuss the biological implications. N-terminal modifications were identified for a subset of nuclear- and chloroplast-encoded proteins and a novel N-terminal acetylation motif was discovered. Analysis of cTPs and their cleavage sites of Arabidopsis chloroplast proteins, as well as their predicted rice homologues, identified new species-dependent features, which will facilitate improved subcellular localization prediction. No evidence was found for suggested targeting via the secretory system. This study provides the most comprehensive chloroplast proteome analysis to date and an expanded Plant Proteome Database (PPDB) in which all MS data are projected on identified gene models
Engagera och aktivera studenter med inspiration frÄn konferenser : examination genom poster-presentation
I en forskningsnÀra kurs om 7.5 hp pÄ master-nivÄ inom bioinformatikÀmnet vid KTH bestÄr drygt halva kursen av ett projekt som genomförs i grupper om tre studenter. Varje projekt har en egen projektuppgift med inget eller marginellt överlapp med andra gruppers uppgifter. Projekten Àr sÄ gott som uteslutande baserade pÄ aktuella frÄgestÀllningar i lÀrarteamets egna forskningsgrupper eller deras nÀrhet. Projektet redovisas dels genom en posterpresentation, dels med individuell webbaserad projektdagbok. Vid posterredovisningen, som omfattar tre timmar i slutet av tentamensperioden, Àr alla kursdeltagare med. Vi försöker i möjligaste mÄn efterlikna situationen dÀr ett autentiskt forskningsresultat presenteras pÄ en riktig konferens. Varje deltagare (student) förvÀntas alltsÄ ta del av varje annan grupps poster, pÄ samma sÀtt som sker vid de flesta vetenskapliga konferenser. Vi genomför en enklare kamratbedömning pÄ posternivÄ, dÀr varje student ska avge en kort och konfidentiell kommentar om var och en av övriga postrar. Kursens lÀrare bedömer förstÄs ocksÄ postrarna. En av svÄrigheterna Àr att sÀtta individuella betyg. HÀr anvÀnder vi oss av individuella projektdagböcker, som ger vÀgledning till de olika individernas insatser inom projektet. Vi har provat detta under fyra kursomgÄngar med som mest sju projekt. Examinationsformen Àr rolig och motiverande bÄde för studenterna och lÀrarna.QC 20150327Pedagogiska utvecklare vid KT
Sequencing Degraded RNA Addressed by 3' Tag Counting
RNA sequencing has become widely used in gene expression profiling experiments. Prior to any RNA sequencing experiment the quality of the RNA must be measured to assess whether or not it can be used for further downstream analysis. The RNA integrity number (RIN) is a scale used to measure the quality of RNA that runs from 1 (completely degraded) to 10 (intact). Ideally, samples with high RIN (>8) are used in RNA sequencing experiments. RNA, however, is a fragile molecule which is susceptible to degradation and obtaining high quality RNA is often hard, or even impossible when extracting RNA from certain clinical tissues. Thus, occasionally, working with low quality RNA is the only option the researcher has. Here we investigate the effects of RIN on RNA sequencing and suggest a computational method to handle data from samples with low quality RNA which also enables reanalysis of published datasets. Using RNA from a human cell line we generated and sequenced samples with varying RINs and illustrate what effect the RIN has on the basic procedure of RNA sequencing; both quality aspects and differential expression. We show that the RIN has systematic effects on gene coverage, false positives in differential expression and the quantification of duplicate reads. We introduce 3' tag counting (3TC) as a computational approach to reliably estimate differential expression for samples with low RIN. We show that using the 3TC method in differential expression analysis significantly reduces false positives when comparing samples with different RIN, while retaining reasonable sensitivity.QC 20140423</p
ClusTrast: a short read de novo transcript isoform assembler guided by clustered contigs
Abstract Background Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate ability to reconstruct transcript isoforms. We address this issue by constructing an assembly pipeline whose main purpose is to produce a comprehensive set of transcript isoforms. Results We present the de novo transcript isoform assembler ClusTrast, which takes short read RNA-seq data as input, assembles a primary assembly, clusters a set of guiding contigs, aligns the short reads to the guiding contigs, assembles each clustered set of short reads individually, and merges the primary and clusterwise assemblies into the final assembly. We tested ClusTrast on real datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. For recall, ClusTrast was on top in the lower end of expression levels (<15% percentile) for all tested datasets, and over the entire range for almost all datasets. Reference transcripts were often (35â69% for the six datasets) reconstructed to at least 95% of their length by ClusTrast, and more than half of reference transcripts (58â81%) were reconstructed with contigs that exhibited polymorphism, measuring on a subset of reliably predicted contigs. ClusTrast recall increased when using a union of assembled transcripts from more than one assembly tool as primary assembly. Conclusion We suggest that ClusTrast can be a useful tool for studying isoforms in species without a reliable reference genome, in particular when the goal is to produce a comprehensive transcriptome set with polymorphic variants
Analysis of Curated and Predicted Plastid Subproteomes of Arabidopsis. Subcellular Compartmentalization Leads to Distinctive Proteome Properties
Carefully curated proteomes of the inner envelope membrane, the thylakoid membrane, and the thylakoid lumen of chloroplasts from Arabidopsis were assembled based on published, well-documented localizations. These curated proteomes were evaluated for distribution of physical-chemical parameters, with the goal of extracting parameters for improved subcellular prediction and subsequent identification of additional (low abundant) components of each membrane system. The assembly of rigorously curated subcellular proteomes is in itself also important as a parts list for plant and systems biology. Transmembrane and subcellular prediction strategies were evaluated using the curated data sets. The three curated proteomes differ strongly in average isoelectric point and protein size, as well as transmembrane distribution. Removal of the cleavable, N-terminal transit peptide sequences greatly affected isoelectric point and size distribution. Unexpectedly, the Cys content was much lower for the thylakoid proteomes than for the inner envelope. This likely relates to the role of the thylakoid membrane in light-driven electron transport and helps to avoid unwanted oxidation-reduction reactions. A rule of thumb for discriminating between the predicted integral inner envelope membrane and integral thylakoid membrane proteins is suggested. Using a combination of predictors and experimentally derived parameters, four plastid subproteomes were predicted from the fully annotated Arabidopsis genome. These predicted subproteomes were analyzed for their properties and compared to the curated proteomes. The sensitivity and accuracy of the prediction strategies are discussed. Data can be extracted from the new plastid proteome database (http://ppdb.tc.cornell.edu)
- âŠ