188,245 research outputs found

    Bioinformatics Education—Perspectives and Challenges

    Get PDF
    This article discusses the evolution of curriculum, instructional methodologies and initiatives supporting the dissemination of bioinformatics. Building on the early applications of informatics to the field of biology, bioinformatics research entails input from the diverse disciplines of mathematics and statistics, physics and chemistry and medicine and pharmacology. Training in bioinformatics remains the oldest and most important rapid introduction approach to learning bioinformatics skills.2 page(s

    Large-scale event extraction from literature with multi-level gene normalization

    Get PDF
    Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons -Attribution - Share Alike (CC BY-SA) license

    Evaluating the Relationship Between Running Times and DNA Sequence Sizes using a Generic-Based Filtering Program.

    Get PDF
    Generic programming depends on the decomposition of programs into simpler components which may be developed separately and combined arbitrarily, subject only to well- defined interfaces. Bioinformatics deals with the application of computational techniques to data present in the Biological sciences. A genetic sequence is a succession of letters which represents the basic structure of a hypothetical DNA molecule, with the capacity to carry information. This research article studied the relationship between the running times of a generic-based filtering program and different samples of genetic sequences in an increasing order of magnitude. A graphical result was obtained to adequately depict this relationship. It was also discovered that the complexity of the generic tree program was O (log2 N). This research article provided one of the systematic approaches of generic programming to Bioinformatics, which could be instrumental in elucidating major discoveries in Bioinformatics, as regards efficient data management and analysis

    Differential Functional Constraints Cause Strain-Level Endemism in Polynucleobacter Populations.

    Get PDF
    The adaptation of bacterial lineages to local environmental conditions creates the potential for broader genotypic diversity within a species, which can enable a species to dominate across ecological gradients because of niche flexibility. The genus Polynucleobacter maintains both free-living and symbiotic ecotypes and maintains an apparently ubiquitous distribution in freshwater ecosystems. Subspecies-level resolution supplemented with metagenome-derived genotype analysis revealed that differential functional constraints, not geographic distance, produce and maintain strain-level genetic conservation in Polynucleobacter populations across three geographically proximal riverine environments. Genes associated with cofactor biosynthesis and one-carbon metabolism showed habitat specificity, and protein-coding genes of unknown function and membrane transport proteins were under positive selection across each habitat. Characterized by different median ratios of nonsynonymous to synonymous evolutionary changes (dN/dS ratios) and a limited but statistically significant negative correlation between the dN/dS ratio and codon usage bias between habitats, the free-living and core genotypes were observed to be evolving under strong purifying selection pressure. Highlighting the potential role of genetic adaptation to the local environment, the two-component system protein-coding genes were highly stable (dN/dS ratio, < 0.03). These results suggest that despite the impact of the habitat on genetic diversity, and hence niche partition, strong environmental selection pressure maintains a conserved core genome for Polynucleobacter populations. IMPORTANCE Understanding the biological factors influencing habitat-wide genetic endemism is important for explaining observed biogeographic patterns. Polynucleobacter is a genus of bacteria that seems to have found a way to colonize myriad freshwater ecosystems and by doing so has become one of the most abundant bacteria in these environments. We sequenced metagenomes from locations across the Chicago River system and assembled Polynucleobacter genomes from different sites and compared how the nucleotide composition, gene codon usage, and the ratio of synonymous (codes for the same amino acid) to nonsynonymous (codes for a different amino acid) mutations varied across these population genomes at each site. The environmental pressures at each site drove purifying selection for functional traits that maintained a streamlined core genome across the Chicago River Polynucleobacter population while allowing for site-specific genomic adaptation. These adaptations enable Polynucleobacter to become dominant across different riverine environmental gradients

    Unusual Metabolism and Hypervariation in the Genome of a Gracilibacterium (BD1-5) from an Oil-Degrading Community.

    Get PDF
    The candidate phyla radiation (CPR) comprises a large monophyletic group of bacterial lineages known almost exclusively based on genomes obtained using cultivation-independent methods. Within the CPR, Gracilibacteria (BD1-5) are particularly poorly understood due to undersampling and the inherent fragmented nature of available genomes. Here, we report the first closed, curated genome of a gracilibacterium from an enrichment experiment inoculated from the Gulf of Mexico and designed to investigate hydrocarbon degradation. The gracilibacterium rose in abundance after the community switched to dominance by Colwellia Notably, we predict that this gracilibacterium completely lacks glycolysis, the pentose phosphate and Entner-Doudoroff pathways. It appears to acquire pyruvate, acetyl coenzyme A (acetyl-CoA), and oxaloacetate via degradation of externally derived citrate, malate, and amino acids and may use compound interconversion and oxidoreductases to generate and recycle reductive power. The initial genome assembly was fragmented in an unusual gene that is hypervariable within a repeat region. Such extreme local variation is rare but characteristic of genes that confer traits under pressure to diversify within a population. Notably, the four major repeated 9-mer nucleotide sequences all generate a proline-threonine-aspartic acid (PTD) repeat. The genome of an abundant Colwellia psychrerythraea population has a large extracellular protein that also contains the repeated PTD motif. Although we do not know the host for the BD1-5 cell, the high relative abundance of the C. psychrerythraea population and the shared surface protein repeat may indicate an association between these bacteria.IMPORTANCE CPR bacteria are generally predicted to be symbionts due to their extensive biosynthetic deficits. Although monophyletic, they are not monolithic in terms of their lifestyles. The organism described here appears to have evolved an unusual metabolic platform not reliant on glucose or pentose sugars. Its biology appears to be centered around bacterial host-derived compounds and/or cell detritus. Amino acids likely provide building blocks for nucleic acids, peptidoglycan, and protein synthesis. We resolved an unusual repeat region that would be invisible without genome curation. The nucleotide sequence is apparently under strong diversifying selection, but the amino acid sequence is under stabilizing selection. The amino acid repeat also occurs in a surface protein of a coexisting bacterium, suggesting colocation and possibly interdependence

    Of bits and bugs

    Get PDF
    Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins

    BioGUID: resolving, discovering, and minting identifiers for biodiversity informatics

    Get PDF
    Background: Linking together the data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) requires services that can mint, resolve, and discover globally unique identifiers (including, but not limited to, DOIs, HTTP URIs, and LSIDs). Results: BioGUID implements a range of services, the core ones being an OpenURL resolver for bibliographic resources, and a LSID resolver. The LSID resolver supports Linked Data-friendly resolution using HTTP 303 redirects and content negotiation. Additional services include journal ISSN look-up, author name matching, and a tool to monitor the status of biodiversity data providers. Conclusion: BioGUID is available at http://bioguid.info/. Source code is available from http://code.google.com/p/bioguid/
    corecore