Search CORE

5,049 research outputs found

Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons

Author: Jingyi Jessica Li
Ruiqi Gao
Publication venue: Springer Nature
Publication date: 01/01/2017
Field of study

Illustration of RNA-seq datasets. Illustration of RNA-seq datasets of fly and worm from modEncode. (PDF 1020Â kb

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

FigShare

PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets

Author: Deshpande Sumukh
England Matthew
Shuttleworth James
Taramonli Sandy
Yang Jianhua
Publication venue: 'Elsevier BV'
Publication date: 01/02/2019
Field of study

Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification of lncRNAs in RNA-seq datasets is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify them in transcriptomic data. Well-known CPC tools such as CPC2, lncScore, CPAT are primarily designed for prediction of lncRNAs based on the GENCODE, NONCODE and CANTATAdb databases. The prediction accuracy of these tools often drops when tested on transcriptomic datasets. This leads to higher false positive results and inaccuracy in the function annotation process. In this study, we present a novel tool, PLIT, for the identification of lncRNAs in plants RNA-seq datasets. PLIT implements a feature selection method based on L1 regularization and iterative Random Forests (iRF) classification for selection of optimal features. Based on sequence and codon-bias features, it classifies the RNA-seq derived FASTA sequences into coding or long non-coding transcripts. Using L1 regularization, 31 optimal features were obtained based on lncRNA and protein-coding transcripts from 8 plant species. The performance of the tool was evaluated on 7 plant RNA-seq datasets using 10-fold cross-validation. The analysis exhibited superior accuracy when evaluated against currently available state-of-the-art CPC tools

arXiv.org e-Print Archive

Online Research @ Cardiff

Coventry University Pure Portal

Extracting novel hypotheses and findings from RNA-seq data

Author: Doughty Tyler
Kerkhoven Eduard
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

Over the past decade, improvements in technology and methods have enabled rapid and relatively inexpensive generation of high-quality RNA-seq datasets. These datasets have been used to characterize gene expression for several yeast species and have provided systems-level insights for basic biology, biotechnology and medicine. Herein, we discuss new techniques that have emerged and existing techniques that enable analysts to extract information from multifactorial yeast RNA-seq datasets. Ultimately, this minireview seeks to inspire readers to query datasets, whether previously published or freshly obtained, with creative and diverse methods to discover and support novel hypotheses

Chalmers Research

Rust expression browser: an open source database for simultaneous analysis of host and pathogen gene expression profiles with expVIP

Author: Adams Thomas M.
Bryant Ruth
Bryson Rosie
Campos Pablo Eduardo
Fenwick Paul
Feuerhelm David
Hayes Charlotte
Henriksson Tina
Hubbard Amelia
Jevtić Radivoje
Judge Christopher
Kerton Matthew
Lage Jacob
Lewis Clare M.
Lilly Christine
Meidan Udi
Novoselović Dario
Olsson Tjelvar S. G.
Patrick Colin
Ramirez-Gonzalez Ricardo H.
Saunders Diane G. O.
Wanyera Ruth
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

BackgroundTranscriptomics is being increasingly applied to generate new insight into the interactions between plants and their pathogens. For the wheat yellow (stripe) rust pathogen (Puccinia striiformis f. sp. tritici, Pst) RNA-based sequencing (RNA-Seq) has proved particularly valuable, overcoming the barriers associated with its obligate biotrophic nature. This includes the application of RNA-Seq approaches to study Pst and wheat gene expression dynamics over time and the Pst population composition through the use of a novel RNA-Seq based surveillance approach called "field pathogenomics". As a dual RNA-Seq approach, the field pathogenomics technique also provides gene expression data from the host, giving new insight into host responses. However, this has created a wealth of data for interrogation.ResultsHere, we used the field pathogenomics approach to generate 538 new RNA-Seq datasets from Pst-infected field wheat samples, doubling the amount of transcriptomics data available for this important pathosystem. We then analysed these datasets alongside 66 RNA-Seq datasets from four Pst infection time-courses and 420 Pst-infected plant field and laboratory samples that were publicly available. A database of gene expression values for Pst and wheat was generated for each of these 1024 RNA-Seq datasets and incorporated into the development of the rust expression browser (http://www.rust-expression.com). This enables for the first time simultaneous 'point-and-click' access to gene expression profiles for Pst and its wheat host and represents the largest database of processed RNA-Seq datasets available for any of the three Puccinia wheat rust pathogens. We also demonstrated the utility of the browser through investigation of expression of putative Pst virulence genes over time and examined the host plants response to Pst infection.ConclusionsThe rust expression browser offers immense value to the wider community, facilitating data sharing and transparency and the underlying database can be continually expanded as more datasets become publicly available

FiVeR

Repositorio Institucional – Biblioteca Digital

Large-scale discovery of male reproductive tract-specific genes through analysis of RNA-seq datasets

Author: Coarfa Cristian
Dean Laura
Fujihara Yoshitaka
Garcia Thomas X.
Grimm Sandra L.
Ikawa Masahito
Kent Katarzyna
Légaré Christine
Mathew Michelle
Matzuk Martin M.
Nozawa Kaori
Robertson Matthew J.
Sullivan Robert
Tharp Nathan
Yu Zhifeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2020
Field of study

Robertson, M.J., Kent, K., Tharp, N. et al. Large-scale discovery of male reproductive tract-specific genes through analysis of RNA-seq datasets. BMC Biol 18, 103 (2020). https://doi.org/10.1186/s12915-020-00826-

Osaka University Knowledge Archive

SPsimSeq : semi-parametric simulation of bulk and single-cell RNA-sequencing data

Author: Assefa Alemu Takele
Thas Olivier
Vandesompele Jo
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects

Ghent University Academic Bibliography

MINTmap: fast and exhaustive profiling of nuclear and mitochondrial tRNA fragments from short RNA-seq data.

Author: Loher Phillipe
Rigoutsos Isidore
Telonis Aristeidis G.
Publication venue: Jefferson Digital Commons
Publication date: 21/02/2017
Field of study

Transfer RNA fragments (tRFs) are an established class of constitutive regulatory molecules that arise from precursor and mature tRNAs. RNA deep sequencing (RNA-seq) has greatly facilitated the study of tRFs. However, the repeat nature of the tRNA templates and the idiosyncrasies of tRNA sequences necessitate the development and use of methodologies that differ markedly from those used to analyze RNA-seq data when studying microRNAs (miRNAs) or messenger RNAs (mRNAs). Here we present MINTmap (for MItochondrial and Nuclear TRF mapping), a method and a software package that was developed specifically for the quick, deterministic and exhaustive identification of tRFs in short RNA-seq datasets. In addition to identifying them, MINTmap is able to unambiguously calculate and report both raw and normalized abundances for the discovered tRFs. Furthermore, to ensure specificity, MINTmap identifies the subset of discovered tRFs that could be originating outside of tRNA space and flags them as candidate false positives. Our comparative analysis shows that MINTmap exhibits superior sensitivity and specificity to other available methods while also being exceptionally fast. The MINTmap codes are available through https://github.com/TJU-CMC-Org/MINTmap/ under an open source GNU GPL v3.0 license

PubMed Central

Jefferson Digital Commons

A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

Author: Bansal Vikas
Publication venue: eScholarship, University of California
Publication date: 01/03/2017
Field of study

BackgroundPCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments.ResultsIn this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples.ConclusionsThe method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates

PubMed Central

eScholarship - University of California

Computation for ChIP-seq and RNA-seq studies

Author: Mortazavi Ali
Pepke Shirley
Wold Barbara
Publication venue: Nature Publishing Group
Publication date: 01/11/2009
Field of study

Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. Whereas early adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects. Here we describe the multilayered analyses of ChIP-seq and RNA-seq datasets, discuss the software packages currently available to perform tasks at each layer and describe some upcoming challenges and features for future analysis tools. We also discuss how software choices and uses are affected by specific aspects of the underlying biology and data structure, including genome size, positional clustering of transcription factor binding sites, transcript discovery and expression quantification

Caltech Authors

Combination of novel and public RNA-seq datasets to generate an mRNA expression atlas for the domestic chicken

Author: A Alexa
A Aziz
A Balic
A Celada
A Chatr-Aryamontri
A Chatr-aryamontri
A Conesa
A Diaz-Perales
A Elefsinioti
A Esteve-Codina
A Joshi
A Kranis
A Psifidi
A So
A Theocharidis
A Van Goor
AC Long
AJ Vilella
Amanda J. MacCallum
Androniki Psifidi
AR Forrest
B Glick
B Strasser
BO Fabriek
BR Johnson
C Furusawa
C Garcia-Morales
C Wasmeier
C Wu
C-F Le
CD Stern
Chunlei Wu
CJ Langouet-Astrie
CP Zeferino
CW Resnyk
Cyrus Afrasiabi
D Brawand
D Günzel
D Han
D Risso
D-D Wu
DA Hume
DA Hume
David A. Hume
DJ Lynn
DJ Lynn
DR Rhodes
E Arner
E Eising
E González
EL Clark
EL Gautier
EL van Dijk
EM Pritchett
ET Richardson
F Bangs
F He
F Wang
G Frühbeck
G Zhu
GA Pavlopoulos
GD Bader
GD Plowman
H Hermjakob
HA Eckelhoefer
HH Cheng
I Gallego Romero
I Galvan
J Lopes Ricardo
J Merkin
J Smith
J Zhou
Jacqueline Smith
Jenny O’Dell
JF Reiter
JY Han
JY Kim
K Muret
K Pazdrak
K Piórkowska
KD Hansen
KD Pruitt
Kim M. Summers
KM Summers
KR Brown
L Alibardi
L Huminiecki
L Opitz
L Salwinski
L Taylor
L X-d
LM Quinn
Lucy Freem
M Kotlyar
M Kotlyar
M Lizio
M Stauber
M Sultan
M Takeda
MA Quail
Mark P. Stevens
ME Woodcock
MK Chang
MM Song
NA Mabbott
NA O'Leary
NC Johnson
NL Bray
P Kovarik
P Wu
P-F Roux
PH Sudmant
PJ Balwierz
PJ Seear
QC Zhang
R Andersson
R Deviatiiarov
R Feng
R Jansen
R Kapetanovic
R Kist
R Rodriguez-Manzanet
R Sinha
RI Kuo
RJ Kinsella
RS Holmes
S Chhangawala
S Epelman
S Intarapat
S Li
S Lin
S Oliver
S Roosing
S Tarazona
S Tornow
S van Dongen
S Zhao
S-A Lee
SF Altschul
SJ Bush
SM Carpanini
Stephen J. Bush
T Lu
TC Freeman
TC Freeman
TN Doig
TP van Gurp
TS Keshava Prasad
TX Jiang
U Coppola
V Curwen
V Garceau
X Adiconis
X Li
X Shen
X Su
Y Kodama
Y Liu
Y Wang
Y Yin
Z Bar-Joseph
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background: The domestic chicken (Gallus gallus) is widely used as a model in developmental biology and is also an important livestock species. We describe a novel approach to data integration to generate an mRNA expression atlas for the chicken spanning major tissue types and developmental stages, using a diverse range of publicly-archived RNA-seq datasets and new data derived from immune cells and tissues. Results: Randomly down-sampling RNA-seq datasets to a common depth and quantifying expression against a reference transcriptome using the mRNA quantitation tool Kallisto ensured that disparate datasets explored comparable transcriptomic space. The network analysis tool Graphia was used to extract clusters of co-expressed genes from the resulting expression atlas, many of which were tissue or cell-type restricted, contained transcription factors that have previously been implicated in their regulation, or were otherwise associated with biological processes, such as the cell cycle. The atlas provides a resource for the functional annotation of genes that currently have only a locus ID. We cross-referenced the RNA-seq atlas to a publicly available embryonic Cap Analysis of Gene Expression (CAGE) dataset to infer the developmental time course of organ systems, and to identify a signature of the expansion of tissue macrophage populations during development. Conclusion: Expression profiles obtained from public RNA-seq datasets - despite being generated by different laboratories using different methodologies - can be made comparable to each other. This meta-analytic approach to RNA-seq can be extended with new datasets from novel tissues, and is applicable to any species

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

Oxford University Research Archive

University of Queensland eSpace