103 research outputs found

    FAIRness and Usability for Open-access Omics Data Systems

    Get PDF
    Omics data sharing is crucial to the biological research community, and the last decade or two has seen a huge rise in collaborative analysis systems, databases, and knowledge bases for omics and other systems biology data. We assessed the FAIRness of NASAs GeneLab Data Systems (GLDS) along with four similar kinds of systems in the research omics data domain, using 14 FAIRness metrics. The range of overall FAIRness scores was 6-12 (out of 14), average 10.1, and standard deviation 2.4. The range of Pass ratings for the metrics was 29-79%, Partial Pass 0-21%, and Fail 7-50%. The systems we evaluated performed the best in the areas of data findability and accessibility, and worst in the area of data interoperability. Reusability of metadata, in particular, was frequently not well supported. We relate our experiences implementing semantic integration of omics data from some of the assessed systems for federated querying and retrieval functions, given their shortcomings in data interoperability. Finally, we propose two new principles that Big Data system developers, in particular, should consider for maximizing data accessibility

    Large-Scale Identification of Mirtrons in Arabidopsis and Rice

    Get PDF
    A new catalog of microRNA (miRNA) species called mirtrons has been discovered in animals recently, which originate from spliced introns of the gene transcripts. However, only one putative mirtron, osa-MIR1429, has been identified in rice (Oryza sativa). We employed a high-throughput sequencing (HTS) data- and structure-based approach to do a genome-wide search for the mirtron candidate in both Arabidopsis (Arabidopsis thaliana) and rice. Five and eighteen candidates were discovered in the two plants respectively. To investigate their biological roles, the targets of these mirtrons were predicted and validated based on degradome sequencing data. The result indicates that the mirtrons could guide target cleavages to exert their regulatory roles post-transcriptionally, which needs further experimental validation

    A novel compression tool for efficient storage of genome resequencing data

    Get PDF
    With the advent of DNA sequencing technologies, more and more reference genome sequences are available for many organisms. Analyzing sequence variation and understanding its biological importance are becoming a major research aim. However, how to store and process the huge amount of eukaryotic genome data, such as those of the human, mouse and rice, has become a challenge to biologists. Currently available bioinformatics tools used to compress genome sequence data have some limitations, such as the requirement of the reference single nucleotide polymorphisms (SNPs) map and information on deletions and insertions. Here, we present a novel compression tool for storing and analyzing Genome ReSequencing data, named GRS. GRS is able to process the genome sequence data without the use of the reference SNPs and other sequence variation information and automatically rebuild the individual genome sequence data using the reference genome sequence. When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ∼159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS

    RNA editing of nuclear transcripts in Arabidopsis thaliana

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA editing is a transcript-based layer of gene regulation. To date, no systemic study on RNA editing of plant nuclear genes has been reported. Here, a transcriptome-wide search for editing sites in nuclear transcripts of Arabidopsis (<it>Arabidopsis thaliana</it>) was performed.</p> <p>Results</p> <p>MPSS (massively parallel signature sequencing) and PARE (parallel analysis of RNA ends) data retrieved from public databases were utilized, focusing on one-base-conversion editing. Besides cytidine (C)-to-uridine (U) editing in mitochondrial transcripts, many nuclear transcripts were found to be diversely edited. Interestingly, a sizable portion of these nuclear genes are involved in chloroplast- or mitochondrion-related functions, and many editing events are tissue-specific. Some editing sites, such as adenosine (A)-to-U editing loci, were found to be surrounded by peculiar elements. The editing events of some nuclear transcripts are highly enriched surrounding the borders between coding sequences (CDSs) and 3′ untranslated regions (UTRs), suggesting site-specific editing. Furthermore, RNA editing is potentially implicated in new start or stop codon generation, and may affect alternative splicing of certain protein-coding transcripts. RNA editing in the precursor microRNAs (pre-miRNAs) of <it>ath-miR854</it> family, resulting in secondary structure transformation, implies its potential role in microRNA (miRNA) maturation.</p> <p>Conclusions</p> <p>To our knowledge, the results provide the first global view of RNA editing in plant nuclear transcripts.</p

    Featured Organism: Arabidopsis Thaliana

    Get PDF
    Arabidopsis is universally acknowledged as the model for dicotyledonous crop plants. Furthermore, some of the information gleaned from this small plant can be used to aid work on monocotyledonous crops. Here we provide an overview of the current state of knowledge and resources for the study of this important model plant, with comments on future prospects in the field from Professor Pamela Green and Dr Sean May

    Comparative analysis of miRNAs and their targets across four plant species

    Get PDF
    BACKGROUND: MicroRNA (miRNA) mediated regulation of gene expression has been recognized as a major posttranscriptional regulatory mechanism also in plants. We performed a comparative analysis of miRNAs and their respective gene targets across four plant species: Arabidopsis thaliana (Ath), Medicago truncatula(Mtr), Brassica napus (Bna), and Chlamydomonas reinhardtii (Cre). RESULTS: miRNAs were obtained from mirBase with 218 miRNAs for Ath, 375 for Mtr, 46 for Bna, and 73 for Cre, annotated for each species respectively. miRNA targets were obtained from available database annotations, bioinformatic predictions using RNAhybrid as well as predicted from an analysis of mRNA degradation products (degradome sequencing) aimed at identifying miRNA cleavage products. On average, and considering both experimental and bioinformatic predictions together, every miRNA was associated with about 46 unique gene transcripts with considerably variation across species. We observed a positive and linear correlation between the number miRNAs and the total number of transcripts across different plant species suggesting that the repertoire of miRNAs correlates with the size of the transcriptome of an organism. Conserved miRNA-target pairs were found to be associated with developmental processes and transcriptional regulation, while species-specific (in particular, Ath) pairs are involved in signal transduction and response to stress processes. Conserved miRNAs have more targets and higher expression values than non-conserved miRNAs. We found evidence for a conservation of not only the sequence of miRNAs, but their expression levels as well. CONCLUSIONS: Our results support the notion of a high birth and death rate of miRNAs and that miRNAs serve many species specific functions, while conserved miRNA are related mainly to developmental processes and transcriptional regulation with conservation operating at both the sequence and expression level

    Testing the recent theories for the origin of the hermaphrodite flower by comparison of the transcriptomes of gymnosperms and angiosperms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Different theories for the origin of the angiosperm hermaphrodite flower make different predictions concerning the overlap between the genes expressed in the male and female cones of gymnosperms and the genes expressed in the hermaphrodite flower of angiosperms. The Mostly Male (MM) theory predicts that, of genes expressed primarily in male versus female gymnosperm cones, an excess of male orthologs will be expressed in flowers, excluding ovules, while Out Of Male (OOM) and Out Of Female (OOF) theories predict no such excess.</p> <p>Results</p> <p>In this paper, we tested these predictions by comparing the transcriptomes of three gymnosperms (<it>Ginkgo biloba</it>, <it>Welwitschia mirabilis </it>and <it>Zamia fisheri</it>) and two angiosperms (<it>Arabidopsis thaliana </it>and <it>Oryza sativa</it>), using EST data. We found that the proportion of orthologous genes expressed in the reproductive organs of the gymnosperms and in the angiosperms flower is significantly higher than the proportion of orthologous genes expressed in the reproductive organs of the gymnosperms and in the angiosperms vegetative tissues, which shows that the approach is correct. However, we detected no significant differences between the proportion of gymnosperm orthologous genes expressed in the male cone and in the angiosperms flower and the proportion of gymnosperm orthologous genes expressed in the female cone and in the angiosperms flower.</p> <p>Conclusions</p> <p>These results do not support the MM theory prediction of an excess of male gymnosperm genes expressed in the hermaphrodite flower of the angiosperms and seem to support the OOM/OOF theories. However, other explanations can be given for the 1:1 ratio that we found. More abundant and more specific (namely carpel and ovule) expression data should be produced in order to further test these theories.</p