2,644 research outputs found

    Transcriptome-wide functional characterization reveals novel relationships among differentially expressed transcripts in developing soybean embryos

    Get PDF
    Sense and antisense transcripts and primers chosen for validation of RNA-Seq-based expression level changes. Sense and antisense transcripts are shown with the corresponding annotation, primer pairs used for qPCR, time points of differential expression, and notes on the presence of additional melt curve peaks. (PPTX 39 kb

    Statistical Algorithms and Bioinformatics Tools Development for Computational Analysis of High-throughput Transcriptomic Data

    Get PDF
    Next-Generation Sequencing technologies allow for a substantial increase in the amount of data available for various biological studies. In order to effectively and efficiently analyze this data, computational approaches combining mathematics, statistics, computer science, and biology are implemented. Even with the substantial efforts devoted to development of these approaches, numerous issues and pitfalls remain. One of these issues is mapping uncertainty, in which read alignment results are biased due to the inherent difficulties associated with accurately aligning RNA-Sequencing reads. GeneQC is an alignment quality control tool that provides insight into the severity of mapping uncertainty in each annotated gene from alignment results. GeneQC used feature extraction to identify three levels of information for each gene and implements elastic net regularization and mixture model fitting to provide insight in the severity of mapping uncertainty and the quality of read alignment. In combination with GeneQC, the Ambiguous Reads Mapping (ARM) algorithm works to re-align ambiguous reads through the integration of motif prediction from metabolic pathways to establish coregulatory gene modules for re-alignment using a negative binomial distribution-based probabilistic approach. These two tools work in tandem to address the issue of mapping uncertainty and provide more accurate read alignments, and thus more accurate expression estimates. Also presented in this dissertation are two approaches to interpreting the expression estimates. The first is IRIS-EDA, an integrated shiny web server that combines numerous analyses to investigate gene expression data generated from RNASequencing data. The second is ViDGER, an R/Bioconductor package that quickly generates high-quality visualizations of differential gene expression results to assist users in comprehensive interpretations of their differential gene expression results, which is a non-trivial task. These four presented tools cover a variety of aspects of modern RNASeq analyses and aim to address bottlenecks related to algorithmic and computational issues, as well as more efficient and effective implementation methods

    Common Features in lncRNA Annotation and Classification: A Survey

    Get PDF
    Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap

    miRkwood: a tool for the reliable identification of microRNAs in plant genomes

    Get PDF
    International audienceBackground: MicroRNAs (miRNAs) play crucial roles in post-transcriptional regulation of eukaryotic gene expression and are involved in many aspects of plant development. Although several prediction tools are available for metazoan genomes, the number of tools dedicated to plants is relatively limited. Results: Here, we present miRkwood, a user-friendly tool for the identification of miRNAs in plant genomes using small RNA sequencing data. Deep-sequencing data of Argonaute associated small RNAs showed that miRkwood is able to identify a large diversity of plant miRNAs and limits false positive predictions. Moreover, it outperforms current tools such as ShortStack and contrary to ShortStack, miRkwood provides a quality score allowing users to rank miRNA predictions. Conclusion: miRkwood is a very efficient tool for the annotation of miRNAs in plant genomes. It is available as a web server, as a standalone version, as a docker image and as a Galaxy tool

    A strategy for the identification of new abiotic stress determinants in arabidopsis using web-based data mining and reverse genetics

    Get PDF
    Since the sequencing of the Arabidopsis thaliana genome in 2000, plant researchers have faced the complex challenge of assigning function to thousands of genes. Functional discovery by in silico prediction or homology search resolved a significant number of genes, but only a minor part has been experimentally validated. Arabidopsis entry into the post-genomic era signified a massive increase in high-throughput approaches to functional discovery, which have since become available through publicly-available web-based resources. The present work focuses on an easy and straightforward strategy that couples data-mining to reverse genetics principles, to allow for the identification of new abiotic stress determinant genes. The strategy explores systematic microarray-based transcriptomics experiments, involving Arabidopsis abiotic stress responses. An overview of the most significant resources and databases for functional discovery in Arabidopsis is presented. The successful application of the outlined strategy is illustrated by the identification of a new abiotic stress determinant gene, HRR, which displays a heat stress-related phenotype after a loss-of-function reverse genetics approach.No competing financial interests exist. The present work was supported by Foundation for Science and Technology (POCTI/AGR/45462/2002). H. Azevedo (SFRH/BPD/17198/2004), J. Correia (SFRH/BD/16663/2004), J. Oliveira (SFRH/BD/38379/2007), S. Laranjeira (SFRH/BD/29778/2006), C. Barbeta (SFRH/BD/12081/2003) and V. Amorim-Silva (SFRH/BD/29778/2006) were supported by Foundation for Science and Technology

    A Factor Graph Approach to Automated GO Annotation

    Get PDF
    As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Roda, Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Bulacio, Pilar Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

    Selection for improved energy use efficiency and drought tolerance in canola results in distinct transcriptome and epigenome changes

    Get PDF
    To increase both the yield potential and stability of crops, integrated breeding strategies are used that have mostly a direct genetic basis, but the utility of epigenetics to improve complex traits is unclear. A better understanding of the status of the epigenome and its contribution to agronomic performance would help in developing approaches to incorporate the epigenetic component of complex traits into breeding programs. Starting from isogenic canola (Brassica napus) lines, epilines were generated by selecting, repeatedly for three generations, for increased energy use efficiency and drought tolerance. These epilines had an enhanced energy use efficiency, drought tolerance, and nitrogen use efficiency. Transcriptome analysis of the epilines and a line selected for its energy use efficiency solely revealed common differentially expressed genes related to the onset of stress tolerance-regulating signaling events. Genes related to responses to salt, osmotic, abscisic acid, and drought treatments were specifically differentially expressed in the drought-tolerant epilines. The status of the epigenome, scored as differential trimethylation of lysine-4 of histone 3, further supported the phenotype by targeting drought-responsive genes and facilitating the transcription of the differentially expressed genes. From these results, we conclude that the canola epigenome can be shaped by selection to increase energy use efficiency and stress tolerance. Hence, these findings warrant the further development of strategies to incorporate epigenetics into breeding

    Plant transcriptional responses to explosives as revealed by \u3cem\u3eArabidopsis thaliana\u3c/em\u3e microarrays and its application in phytoremediation and phytosensing

    Get PDF
    This research focused on understanding genetic responses of plants to explosives, which is necessary to produce plants to detect and clean soil and water contaminated with toxic explosive compounds. The first study used microarray technology to reveal transcriptional changes in the model plant Arabidopsis thaliana exposed to the explosive compounds RDX (hexahydro-1,3,5-trinitro-1,3,5-triazine; Royal Demolition Explosive or Research Department Explosive) and TNT (2,4,6-trinitrotoluene). This study yielded a list of genes up- and downregulated by explosive compounds, which can be potentially used for phytoremediation (remediation using plants) or phytosensing (detection using plants) of explosive compounds. The second study presented biotechnology tools to enhance phytosensing that might have application in not only explosives phytosensing but also sensing of other contaminants or important biological agents. This study addressed the problem of low detectable levels of reporter gene signal from a phytosensor and the results suggest the potential use of a site-specific recombination system to amplify the reporter gene signal. The final study addressed microarray data analysis and best practices for statistical analysis of microarray data. Standard parametric approaches for microarray analysis can be very conservative, indicating no unusable information from expensive microarray experiments. A nonparametric method of analysis on a variety of microarray datasets proved to be effective in providing reliable and useful information, when the standard parametric approach used was too conservative

    Bioinformatics Tools and Genomic Resources Available in Understanding the Structure and Function of Gossypium

    Get PDF
    Cotton is economically and evolutionarily important crop for its fiber. In order to improve fiber quality and yield, and to exploit the natural genetic potential inherent in genotypes, understanding genome structure and function of cultivated cotton is important. In order to achieve this, a functional understanding of bioinformatics resources such as databases, software solutions, and analysis tools is required. But currently, there are very few unified reports on bioinformatics tools and even fewer repositories to access cotton genomic information. Also, resourceful developers and bioinformatics scientists actively addressing complex genomic challenges in cotton genomes are much in need. The primary goal of this chapter is to provide a review of such tools and resources for analyzing the structure and function of the cotton genome with preferential emphasis on this complex and economically important plant species. This discourse begins with a description of concurrent advances in high‐throughput genome sequencing and bioinformatics analyses and focuses on four major sections covering bioinformatics tools and resources for analysis of: (1) genomes; (2) transcriptomes; (3) small RNAs; and (4) epigenomes. In each section, recent advances in cotton have been discussed. Cotton genome sequencing and annotation efforts are outlined within these sections. This review discusses the availability of genome information of both diploid and tetraploid species that have impelled cotton genome research into the post‐genomics era, opening new avenues for exploring regulatory mechanisms associated with fine‐tuning of gene expression of fiber‐related genes. Finally, the potential impacts of these rapid advances, especially the challenges in handling and analyzing the large datasets are discussed
    corecore