20,686 research outputs found

    In silico Gene Characterization and biological annotation of Aspergillus niger CBS 513.88

    Get PDF
    Genome annotation is the process of estimation of biological features from genomic data. The target of a genome annotation is to identify the key features of the genome sequence particularly, the genes and gene products. The characteristics of a gene, its products, and gene prediction programs of Aspergillus niger are discussed. Although the number of genomes in genomic databases are increasing day by day, genome-wide analyses are affected by the quality of the genome annotations. This study illustrates the importance of integrative approaches for automatic annotations of genomes of Aspergillus niger by computational methods. However, the annotation process is more complicated in Eukaryotes; we used a comparative study for gene prediction using the FgenesH algorithm by various software providers. The final annotation of Aspergillus niger CBS 513.88. has been created as a GB file in Artemis, A sequence viewer and annotation tool was developed in the anger Institute

    Improving dbNSFP

    Get PDF
    IMPROVING dbNSFP Mingyao Lu, B.S. Advisory Professor: Xiaoming Liu, Ph.D. The analysis and interpretation of DNA variation are very important for the Whole Exome studies (WES). Genome research has focused on single nucleotide variants (SNVs). Since indels are as important as SNVs, especially indels in coding regions are often candidates of disease-causing variants, thus, it is necessary to expand the focus to include indel mutations. The goal of my project is to provide an automatic annotation pipeline to the WES based disease studies project by extending the dbNSFP with a tool for automated indel annotation and deleteriousness prediction. The current sequencing results typically include both SNVs and indels. Although there have been many available tools to integrate functional prediction/annotations for SNV effects, there are no such tools for indels to my knowledge. Therefore, the aim of this thesis was to add deleteriousness prediction scores to indel annotation based on gene models, including CADD, SIFT, and PROVEAN. All those scores can be calculated on-the-fly after installing resources locally. A Docker implementing the indel annotation and deleteriousness prediction has been developed and ready to be deployed from the cloud

    Community Outreach through Genomics Education Partnership

    Get PDF
    The J Craig Venter Institute (JCVI) has recently partnered with undergraduate university faculty to expand the scope of education and outreach program as part of the NIAID’s BRC initiative, by joining forces with faculty members participating in the Genomics Education Partnership (GEP). The goal of the GEP is to provide opportunities for undergraduate students to participate in genomics research and gain hands on experience. Faculty members trained on annotation methodologies and tools during the Prokaryotic Annotation Workshop conducted at JCVI, impart their knowledge in the classroom as part of the semester course. As a pilot project, we are currently collaborating with 3 groups lead by a faculty member, spread across 3 universities in the community curation of bacterial genomes. Each participating undergraduate group collectively annotates a specific bacterial genome that was sequenced at JCVI and run through the automatic annotation pipeline. Remote access to genome sequence data, pre-computed gene predictions, search results, automatic annotation and bioinformatics analysis is provided through our web-based manual annotation tool, MANATEE. The students log into JCVI genome databases with user specific ids and password and learn to annotate single genes, entire metabolic pathways leading to analysis of a question that may be unique to the genome being analyzed. Users of the genome data receive dedicated support and guidance from our in house annotation experts on the usage of JCVI’s tools and annotation methodologies. Through this exercise, the undergraduate students are introduced to concepts of genomics and bioinformatics and gain deeper understanding of the concepts of cellular metabolism and disease pathology, which may lead them to making scientific research their career path. Some groups are focusing on genome specific pathways and plan to conduct wet lab experiments to understand unique genome features. We are highly encouraged that this model of web based, remote access, community annotation has been successful and propose to leverage the community of annotators to update annotations of pathogen genomes in Pathema-BRC

    Combining DNA Methylation with Deep Learning Improves Sensitivity and Accuracy of Eukaryotic Genome Annotation

    Get PDF
    Thesis (Ph.D.) - Indiana University, School of Informatics, Computing, and Engineering, 2020The genome assembly process has significantly decreased in computational complexity since the advent of third-generation long-read technologies. However, genome annotations still require significant manual effort from scientists to produce trust-worthy annotations required for most bioinformatic analyses. Current methods for automatic eukaryotic annotation rely on sequence homology, structure, or repeat detection, and each method requires a separate tool, making the workflow for a final product a complex ensemble. Beyond the nucleotide sequence, one important component of genetic architecture is the presence of epigenetic marks, including DNA methylation. However, no automatic annotation tools currently use this valuable information. As methylation data becomes more widely available from nanopore sequencing technology, tools that take advantage of patterns in this data will be in demand. The goal of this dissertation was to improve the annotation process by developing and training a recurrent neural network (RNN) on trusted annotations to recognize multiple classes of elements from both the reference sequence and DNA methylation. We found that our proposed tool, RNNotate, detected fewer coding elements than GlimmerHMM and Augustus, but those predictions were more often correct. When predicting transposable elements, RNNotate was more accurate than both Repeat-Masker and RepeatScout. Additionally, we found that RNNotate was significantly less sensitive when trained and run without DNA methylation, validating our hypothesis. To our best knowledge, we are not only the first group to use recurrent neural networks for eukaryotic genome annotation, but we also innovated in the data space by utilizing DNA methylation patterns for prediction

    AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system

    Get PDF
    We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license

    TRAPID : an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes

    Get PDF
    Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system

    Finding the Core-Genes of Chloroplasts

    Full text link
    Due to the recent evolution of sequencing techniques, the number of available genomes is rising steadily, leading to the possibility to make large scale genomic comparison between sets of close species. An interesting question to answer is: what is the common functionality genes of a collection of species, or conversely, to determine what is specific to a given species when compared to other ones belonging in the same genus, family, etc. Investigating such problem means to find both core and pan genomes of a collection of species, \textit{i.e.}, genes in common to all the species vs. the set of all genes in all species under consideration. However, obtaining trustworthy core and pan genomes is not an easy task, leading to a large amount of computation, and requiring a rigorous methodology. Surprisingly, as far as we know, this methodology in finding core and pan genomes has not really been deeply investigated. This research work tries to fill this gap by focusing only on chloroplastic genomes, whose reasonable sizes allow a deep study. To achieve this goal, a collection of 99 chloroplasts are considered in this article. Two methodologies have been investigated, respectively based on sequence similarities and genes names taken from annotation tools. The obtained results will finally be evaluated in terms of biological relevance

    The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information

    Get PDF
    The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites
    • …
    corecore