5 research outputs found
Developing the MAR databases – Augmenting Genomic Versatility of Sequenced Marine Microbiota
This thesis introduces the MAR databases as marine-specific resources in the genomic landscape. Paper 1 describes the curation effort and development leading to the MAR databases being created. It results in the highly valued reference database MarRef, the broader MarDB, and the marine gene catalog MarCat. Definition of a marine environment, the curation process, and the Marine Metagenomics Portal as a public web-service are described. It facilitates scientists to find marine sequence data for prokaryotes and to explore rich contextual information, secondary metabolites, updated taxonomy, and helps in evaluating genome quality. Many of these database advancements are covered in Paper 2. This includes new entries and development of specific databases on marine fungi (MarFun) and salmon related prokaryotes (SalDB). With the implementation of metagenome assembled and single amplified genomes it leads up to the database quality evaluation discussed in Paper 3. The lack of quality control in primary databases is here discussed based on estimated completeness and contamination in the genomes of the MAR databases.
Paper 4 explores the microbiota of skin and gut mucosa of Atlantic salmon. By using a database dependent amplicon analysis, the full-length 16 rRNA gene proved accurate, but not a game-changer in taxonomic classification for this environmental niche. The proportion of dataset sequences lacking clear taxonomic classification suggests lack of diversity in current-day databases and inadequate phylogenetic resolution. Advancing phylogenetic resolution was the subject of Paper 5. Here the highly similar species of genus Aliivibrio became delineated using six genes in a multilocus sequence analysis. Five potentially novel species could in this way be delineated, which coincided with recent genome-wide taxonomy listings. Thus, Paper 4 and 5 parallel those of the MAR databases by providing insight into the inter-relational framework of bioinformatic analysis and marine database sources
Recommended from our members
An Informatics Roadmap Toward a FAIR Understanding of Mitochondrial Biology and Rare Mitochondrial Disease
Mitochondrial biology is integral to our fundamental understanding of human health and many diseases. They exist in every human cell type except for red blood cells and have critical functions in metabolism, oxidative phosphorylation, oxidation-reduction, and as signaling hubs responsible for mediating protective mechanisms. Rare mitochondrial diseases (RMDs) are devastating and complex, affect multiple organ systems, and disproportionately impact young children. Despite copious existing knowledge and increased public interest, the knowledge is fragmented and difficult to access. Clinical case reports (CCRs) on RMDs contain valuable clinical insights, but they are scarce and lack the metadata necessary to facilitate their discovery among the two million CCRs on PubMed. The unstructured text data of CCRs is also ill-suited to computational approaches, limiting our ability to derive the knowledge contained within.To address these issues, I assembled all available informatics tools and resources with mitochondrial components and used them to contribute to Gene Wiki pages that enable easy access to mitochondrial knowledge for researchers, students, clinicians, and patients. Through these efforts, I made mitochondrial gene, protein, and disease knowledge widely accessible with contributions of over 4MB of content across 541 Gene Wiki pages. Concurrently, I used Gene Wiki as an educational platform to train over 50 students in the biosciences and pre-medical studies in mitochondrial biology and disease, as well as instilling effective research and writing methods in biomedicine.To impose structure on CCRs and render them FAIR (Findable, Accessible, Interoperable, Reusable), I developed and applied a standardized metadata template to RMD CCRs and codified patient symptomology with the International Statistical Classification of Disease and Related Health Problems (ICD) system. I created the open-source, cloud-based MitoCases RMD Knowledge Platform (http://mitocases.org/) to house data on 384 RMD CCRs, including 4,561 instances of 952 unique ICD codes. Supplementing CCRs with structured metadata amplifies machine-readable information content and provides a distinct improvement in searching for CCRs as compared to indexing by title and abstract. Finally, I employed these resources to conduct a thorough review of Barth syndrome and characterized the diversity of presentations, range of genetic etiologies, and treatment paradigms