23,131 research outputs found
PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets
Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification of lncRNAs in RNA-seq datasets is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify them in transcriptomic data. Well-known CPC tools such as CPC2, lncScore, CPAT are primarily designed for prediction of lncRNAs based on the GENCODE, NONCODE and CANTATAdb databases. The prediction accuracy of these tools often drops when tested on transcriptomic datasets. This leads to higher false positive results and inaccuracy in the function annotation process. In this study, we present a novel tool, PLIT, for the identification of lncRNAs in plants RNA-seq datasets. PLIT implements a feature selection method based on L1 regularization and iterative Random Forests (iRF) classification for selection of optimal features. Based on sequence and codon-bias features, it classifies the RNA-seq derived FASTA sequences into coding or long non-coding transcripts. Using L1 regularization, 31 optimal features were obtained based on lncRNA and protein-coding transcripts from 8 plant species. The performance of the tool was evaluated on 7 plant RNA-seq datasets using 10-fold cross-validation. The analysis exhibited superior accuracy when evaluated against currently available state-of-the-art CPC tools
Nuclear RNA sequencing of the mouse erythroid cell transcriptome.
In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq) in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq) of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A)-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs
๋ฒผ ๋์ด๋ณ๊ท ์ ๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ ๋ถ์ ๋ฐ ์งง์ ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ๊ณผ์ ์ํธ์์ฉ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๋์
์๋ช
๊ณผํ๋ํ ํ๋๊ณผ์ ๋์๋ช
์ ์ ์ฒดํ์ ๊ณต, 2023. 2. ์ด์ฉํ.๋จ๋ฐฑ์ง์ ์ํธํํ๋ ๊ตฌ์ญ ๋ฐ ์ํธํํ๋ ์์ด์ด ์๋ ๊ตฌ์ญ์์๋ ์ ์ฌ๋ ์ผ์ด๋๋ค. ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ๋จ๋ฐฑ์ง์ ๋ง๋๋ ์ ๋ณด๊ฐ ์์ง๋ง ์ ์ ์๋ฅผ ์กฐ์ ํจ์ผ๋ก์จ ์ ์ฌ ๊ณผ์ , ์ ์ฌ ํ ๊ณผ์ , ๋ฒ์ญ ๊ณผ์ , ๋ฒ์ญ ํ ๊ณผ์ ์์ ์ผ์ด๋๋ ์กฐ์ ๊ณผ์ ์ ๊ด์ฌํ๋ค. ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ 200๊ฐ์ ์ผ๊ธฐ๋ณด๋ค ๊ธด ๊ฒฝ์ฐ ๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ(lncRNA)์ผ๋ก ๊ฐ์ฃผ๋๋ค. ์ํ์ฑ(sequencing) ๋ถ์ ๊ธฐ์ ์ด ๋ฐ์ ํ๋ฉด์ ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ ์ ์ฌ์ฒด๊ฐ ์ถ์ ๋๊ณ ๊ธฐ๋ฅ ๋ถ์์ด ์ํ๋๊ณ ์๋ค. ๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ๋ฐ๋ฌ ๊ณผ์ , ๋น์๋ฌผ์ ์๊ทน์ ๋ํ ๋ฐ์, ๊ธฐ์ฃผ์ ๋ฏธ์๋ฌผ์ ์ํธ์์ฉ์ ์ฐธ์ฌํ๋ค๊ณ ๋ณด๊ณ ๋์๋ค. ๊ทธ๋ฌ๋ ์ ํ๋ ์ข
์์์ ์ฐ๊ตฌ๋ก ์ธํด ์๋ฌผ๋ณ์์ฑ ๊ณฐํก์ด์์๋ ๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ๋ํ ์ญํ ์ ๋ํ ์ดํด๊ฐ ๋ถ์กฑํ๋ค.
ํด๋น ์ฐ๊ตฌ๋ ๊ธฐ์ฃผ์ ๋ํ ๋ฐ์์์ ๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ์ญํ ์ ์ดํดํ๊ธฐ ์ํด ๋ณ์ด ๋ฐ์ํ๋ ๋์ ๋ฒผ ๋์ด๋ณ๊ท (Magnaporthe oryzae)์์ ํ๋กํ์ผ๋ง(profiling)์ ์ํํ๋ค. ๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ํ์ธ ํ ๊ธฐ๋ฅ๊ณผ ๊ด๋ จ์ด ์์ ์ ์๋ ์ ์ ์ฒด ์์ด ํน์ง๊ณผ ๋ฐํ ๊ฒฝํฅ์ ๋ถ์ํ๋ค. ์ถ๊ฐ์ ์ผ๋ก, ๊ธฐ๋ฅ์ ํ ๊ฐ๋ฅ์ฑ์ด ํฐ ๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ์กฐ์ฌํ๊ธฐ ์ํด์ ๊ฐ์ผ ๋จ๊ณ์ ํน์ด์ ์ผ๋ก ๋ฐํํ ๊ฒฝ์ฐ์ ๋์ ์ ์ ์๋ฅผ ํ์ํ๋ค. ์ ์ ์ ๋ถ์ ๊ฒฐ๊ณผ๋ ๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ์ธํฌ๋ฒฝ ๋ถํด์ ๊ธฐ์ฃผ์ ๋ฉด์ญ์ฒด๊ณ ํํผ ๊ฐ์ ์ญํ ์ ์ํํ์ฌ ๋ณ์์ฑ์ ๊ด์ฌํ๋ค๊ณ ์ ์ํด ์ค๋ค.
๊ธด ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ๋จ๋
์ผ๋ก ๋๋ ์งง์ ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ(sRNA)์ ํ๋ ฅํด์ ๊ธฐ๋ฅํ๋ค. ์ํธ์์ฉ ๋ฐฉ์์ ์ผ๋ฐ์ ์ผ๋ก ์ธ ๊ฐ์ง๊ฐ ์๋ค. ์ ์๊ฐ ํ์์ ์ ๊ตฌ์ฒด๊ฐ ๋๋ ๊ฒฝ์ฐ, ํ์๊ฐ ์ ์๋ฅผ ์กฐ์ ํ๋ ๊ฒฝ์ฐ, ์ ์๊ฐ ํ์์ ํ๋์ ์กฐ์ ํ๋ ๊ฒฝ์ฐ๋ก ๊ตฌ๋ถํ ์ ์๋ค. ๊ณฐํก์ด์์๋ ์ด๋ค์ ์ํธ์์ฉ์ ๋ํ ์ดํด๊ฐ ๋ถ์กฑํ ์ํฉ์ด๋ค. ๋ฒผ ๋์ด๋ณ๊ท ์์ ์ํธ์์ฉ์ ๋ฐํ๊ธฐ ์ํด ์งง์ ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ์ํฉ์ฑ ์ ์ ์๊ฐ ์๋ ์ํฉ์์ ๋ ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ํ๋กํ์ผ๋ง์ ์ํํ๋ค. ๊ทธ ๊ณผ์ ์์ ์งง์ ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ ์ค ์ํด๋ฅผ ๋ฐฐ์ ํ๊ธฐ ์ํด์ ๋ฆฌ๋ณดํต์ฐ ๊ฐ์ญ ๋๊ตฌ์ ์ํด ์ฒ๋ฆฌ๋๋ ๊ฒ๋ค์ ์ ๋ณํ๋ค. ๋์ ์ ์ ์์ ๋ถ์ ๊ฒฐ๊ณผ ์ํธ์์ฉ์ ์ข
๋ฅ์ ๋ฐ๋ผ ๋ค๋ฅธ ์๋ฌผํ์ ๊ณผ์ ๊ณผ ์ฐ๊ด๋์ด ์์์ ๋ฐํ๋ค.
ํด๋น ์ฐ๊ตฌ๋ ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ๋ ํผํ ๋ฆฌ๋ฅผ ์ ๊ณตํ์ฌ ์๋ฌผํ์ ๊ธฐ๋ฅ์ ์์๋ณด๊ธฐ ์ํ ๊ธฐ๋ฅ์ ์ฐ๊ตฌ์ ๊ธฐ๋ฐ์ ์ ๊ณตํ๋ค. ๋ํ ์ข
ํฉ์ ์ธ ์ฐ๊ตฌ๋ฅผ ํตํด ๋ ์ข
๋ฅ์ ๋น์ํธํ ๋ฆฌ๋ณดํต์ฐ์ ์ํธ์์ฉ์ ๋ํ ์ดํด๋ฅผ ๋๊ณ , ๋ณ์์ฑ์ ํฌํจํ๋ ์๋ฌผํ์ ๊ณผ์ ์์ ์ด๋ค์ด ํต์ฌ ์์๋ผ๋ ์ ์ ์ ์ํ๋ค. ๋ฐ๋ผ์ ๋ณธ ์ฐ๊ตฌ๋ ์๋ฌผ ๋ณ์์ฑ ๊ณฐํก์ด์์ ๋ณต์กํ ์กฐ์ ๋ง์ ๋ํ ์ฐ๊ตฌ ๋ฐฉํฅ์ ์ ์ํ๋ค.Transcription occurs in the protein-coding regions as well as the regions where any protein-coding sequence is absent. Although these non-coding RNAs lack coding potential, they play roles in transcriptional, post-transcriptional, translational, and post-translational regulation by controlling protein-coding genes. Non-coding RNAs, which are longer than 200 nucleotides, are considered as long non-coding RNAs (lncRNAs). As the sequencing technology has advanced, a repertoire of lncRNA transcriptomes has been accumulated and the functional characterization of each lncRNA has been performed. LncRNAs have been reported to participate in the development, responses to abiotic stresses, and host-microbe interaction. However, their role in plant fungal pathogens was poorly understood due to the limited range of studied species. In this study, we profiled lncRNAs of the rice blast fungus, Magnaporthe oryzae, during disease development to decipher the role of lncRNAs in response to the host. We identified lncRNAs and analyzed their genomic feature and expression pattern to understand their properties, which could be related to their functions. Moreover, specifically expressed lncRNAs in infection stages and their target genes were identified to investigate functional lncRNAs. The analysis of target gene functions suggests that these lncRNAs play roles in pathogenesis such as cell wall degradation and evasion of host immunity. LncRNAs could function solely or in cooperation with small RNAs (sRNAs). LncRNAs generally interact with sRNAs in three ways. LncRNAs could be precursors of sRNAs, be regulated by sRNAs, and regulate sRNA activity. However, their interaction is not well understood in fungi. We profiled lncRNAs and sRNAs in the defect of sRNA biogenesis machinery genes to unravel their interaction in M. oryzae. We selected sRNAs processed by RNA interference machinery to filter out the debris. The analysis of genes targeted by non-coding RNAs suggests that two classes of non-coding RNAs be involved in different biological processes depending on the type of interaction. This study provides a repertoire of non-coding RNAs and a foundation for functional studies to elucidate their biological roles. This comprehensive study helps to understand the crosstalk between two classes of non-coding RNAs and suggests that non-coding RNAs can be key regulators in biological processes including pathogenesis. Taken together, this work shed light on the complex regulatory network in plant pathogenic fungi.CHAPTER I. Long non-coding RNA in fungi 1
ABSTRACT 2
INTRODUCTION 3
I. LncRNA profiling in fungi 5
II. Biological roles of lncRNAs in fungi 9
PERSPECTIVE 14
LITERATURE CITED 15
CHAPTER II. Genome-wide profiling of long non-coding RNA of the rice blast fungus Magnaporthe oryzae during infection 26
ABSTRACT 27
INTRODUCTION 28
MATERIALS AND METHODS
I. RNA extraction and strandโspecific sequencing 31
II. Collection of in planta RNA-seq data 31
III. Transcriptome assembly 32
IV. LncRNA identification 32
V. LncRNA conservation analysis 35
VI. Assessment of stage specificity and prediction of stage-specific lncRNAs 35
VII. Target gene prediction 36
VIII. Validation of lncRNA transcript production 37
RESULTS 39
I. Genome-wide identification of lncRNAs in M. oryzae 39
II. Genomic features of M. oryzae lncRNAs 43
III. Expression of lncRNA transcripts during infection 46
IV. Prediction of stage-specifically expressed lncRNA 50
V. Verification of lncRNA production 57
DISCUSSION 60
LITERATURE CITED 63
CHAPTER III. Comprehensive genome-wide analysis of non-coding RNAs reveals functions of lncRNA-sRNA crosstalk in the rice blast fungus Magnaporthe oryzae 71
ABSTRACT 72
INTRODUCTION 73
MATERIALS AND METHODS
I. Collection of RNA-seq and sRNA-seq data 76
II. RNA-seq data analysis 76
III. sRNA-seq data analysis 77
IV. Target gene prediction and analysis 78
RESULTS 79
I. Identification of lncRNAs and Dicer-dependent sRNAs 79
II. Identification of small RNAs originating from lncRNAs 84
III. Identification of sRNAs regulating lncRNA expression 89
IV. Construction of a lncRNA-sRNA-mRNA network 92
DISCUSSION 94
LITERATURE CITED 97
ABSTRACT (in Korean) 104๋ฐ
Using Pan RNA-Seq Analysis to Reveal the Ubiquitous Existence of 5โฒ and 3โฒ End Small RNAs
In this study, we used pan RNA-seq analysis to reveal the ubiquitous existence of both 5โฒ and 3โฒ end small RNAs (5โฒ and 3โฒ sRNAs). 5โฒ and 3โฒ sRNAs alone can be used to annotate nuclear non-coding and mitochondrial genes at 1-bp resolution and identify new steady RNAs, which are usually transcribed from functional genes. Then, we provided a simple and cost effective way for the annotation of nuclear non-coding and mitochondrial genes and the identification of new steady RNAs, particularly long non-coding RNAs (lncRNAs). Using 5โฒ and 3โฒ sRNAs, the annotation of human mitochondrial was corrected and a novel ncRNA named non-coding mitochondrial RNA 1 (ncMT1) was reported for the first time in this study. We also found that most of human tRNA genes have downstream lncRNA genes as lncTRS-TGA1-1 and corrected the misunderstanding of them in previous studies. Using 5โฒ, 3โฒ, and intronic sRNAs, we reported for the first time that enzymatic double-stranded RNA (dsRNA) cleavage and RNA interference (RNAi) might be involved in the RNA degradation and gene expression regulation of U1 snRNA in human. We provided a different perspective on the regulation of gene expression in U1 snRNA. We also provided a novel view on cancer and virus-induced diseases, leading to find diagnostics or therapy targets from the ribonuclease III (RNase III) family and its related pathways. Our findings pave the way toward a rediscovery of dsRNA cleavage and RNAi, challenging classical theories
Recommended from our members
New technologies accelerate the exploration of non-coding RNAs in horticultural plants.
Non-coding RNAs (ncRNAs), that is, RNAs not translated into proteins, are crucial regulators of a variety of biological processes in plants. While protein-encoding genes have been relatively well-annotated in sequenced genomes, accounting for a small portion of the genome space in plants, the universe of plant ncRNAs is rapidly expanding. Recent advances in experimental and computational technologies have generated a great momentum for discovery and functional characterization of ncRNAs. Here we summarize the classification and known biological functions of plant ncRNAs, review the application of next-generation sequencing (NGS) technology and ribosome profiling technology to ncRNA discovery in horticultural plants and discuss the application of new technologies, especially the new genome-editing tool clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems, to functional characterization of plant ncRNAs
Recommended from our members
Non-coding and Coding Transcriptional Profiles Are Significantly Altered in Pediatric Retinoblastoma Tumors.
Retinoblastoma is a rare pediatric tumor of the retina, caused by the homozygous loss of the Retinoblastoma 1 (RB1) tumor suppressor gene. Previous microarray studies have identified changes in the expression profiles of coding genes; however, our understanding of how non-coding genes change in this tumor is absent. This is an important area of research, as in many adult malignancies, non-coding genes including LNC-RNAs are used as biomarkers to predict outcome and/or relapse. To establish a complete and in-depth RNA profile, of both coding and non-coding genes, in Retinoblastoma tumors, we conducted RNA-seq from a cohort of tumors and normal retina controls. This analysis identified widespread transcriptional changes in the levels of both coding and non-coding genes. Unexpectedly, we also found rare RNA fusion products resulting from genomic alterations, specific to Retinoblastoma tumor samples. We then determined whether these gene expression changes, of both coding and non-coding genes, were also found in a completely independent Retinoblastoma cohort. Using our dataset, we then profiled the potential effects of deregulated LNC-RNAs on the expression of neighboring genes, the entire genome, and on mRNAs that contain a putative area of homology. This analysis showed that most deregulated LNC-RNAs do not act locally to change the transcriptional environment, but potentially function to modulate genes at distant sites. From this analysis, we selected a strongly down-regulated LNC-RNA in Retinoblastoma, DRAIC, and found that restoring DRAIC RNA levels significantly slowed the growth of the Y79 Retinoblastoma cell line. Collectively, our work has generated the first non-coding RNA profile of Retinoblastoma tumors and has found that these tumors show widespread transcriptional deregulation
Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum.
BackgroundPlasmodium falciparum, the deadliest malaria-causing parasite, has an extremely AT-rich (80.7 %) genome. Because of high AT-content, sequence-based annotation of genes and functional elements remains challenging. In order to better understand the regulatory network controlling gene expression in the parasite, a more complete genome annotation as well as analysis tools adapted for AT-rich genomes are needed. Recent studies on genome-wide nucleosome positioning in eukaryotes have shown that nucleosome landscapes exhibit regular characteristic patterns at the 5'- and 3'-end of protein and non-protein coding genes. In addition, nucleosome depleted regions can be found near transcription start sites. These unique nucleosome landscape patterns may be exploited for the identification of novel genes. In this paper, we propose a computational approach to discover novel putative genes based exclusively on nucleosome positioning data in the AT-rich genome of P. falciparum.ResultsUsing binary classifiers trained on nucleosome landscapes at the gene boundaries from two independent nucleosome positioning data sets, we were able to detect a total of 231 regions containing putative genes in the genome of Plasmodium falciparum, of which 67 highly confident genes were found in both data sets. Eighty-eight of these 231 newly predicted genes exhibited transcription signal in RNA-Seq data, indicative of active transcription. In addition, 20 out of 21 selected gene candidates were further validated by RT-PCR, and 28 out of the 231 genes showed significant matches using BLASTN against an expressed sequence tag (EST) database. Furthermore, 108 (47%) out of the 231 putative novel genes overlapped with previously identified but unannotated long non-coding RNAs. Collectively, these results provide experimental validation for 163 predicted genes (70.6%). Finally, 73 out of 231 genes were found to be potentially translated based on their signal in polysome-associated RNA-Seq representing transcripts that are actively being translated.ConclusionOur results clearly indicate that nucleosome positioning data contains sufficient information for novel gene discovery. As distinct nucleosome landscapes around genes are found in many other eukaryotic organisms, this methodology could be used to characterize the transcriptome of any organism, especially when coupled with other DNA-based gene finding and experimental methods (e.g., RNA-Seq)
Genome-wide transcription start site profiling in biofilm-grown Burkholderia cenocepacia J2315
Background: Burkholderia cenocepacia is a soil-dwelling Gram-negative Betaproteobacterium with an important role as opportunistic pathogen in humans. Infections with B. cenocepacia are very difficult to treat due to their high intrinsic resistance to most antibiotics. Biofilm formation further adds to their antibiotic resistance. B. cenocepacia harbours a large, multi-replicon genome with a high GC-content, the reference genome of strain J2315 includes 7374 annotated genes. This study aims to annotate transcription start sites and identify novel transcripts on a whole genome scale.
Methods: RNA extracted from B. cenocepacia J2315 biofilms was analysed by differential RNA-sequencing and the resulting dataset compared to data derived from conventional, global RNA-sequencing. Transcription start sites were annotated and further analysed according to their position relative to annotated genes.
Results: Four thousand ten transcription start sites were mapped over the whole B. cenocepacia genome and the primary transcription start site of 2089 genes expressed in B. cenocepacia biofilms were defined. For 64 genes a start codon alternative to the annotated one was proposed. Substantial antisense transcription for 105 genes and two novel protein coding sequences were identified. The distribution of internal transcription start sites can be used to identify genomic islands in B. cenocepacia. A potassium pump strongly induced only under biofilm conditions was found and 15 non-coding small RNAs highly expressed in biofilms were discovered.
Conclusions: Mapping transcription start sites across the B. cenocepacia genome added relevant information to the J2315 annotation. Genes and novel regulatory RNAs putatively involved in B. cenocepacia biofilm formation were identified. These findings will help in understanding regulation of B. cenocepacia biofilm formation
- โฆ