452 research outputs found

    Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Enteropathogen Resource Integration Center (ERIC; <url>http://www.ericbrc.org</url>) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as <it>Escherichia coli </it>and <it>Salmonella </it>spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process.</p> <p>Description</p> <p>We have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application.</p> <p>Conclusion</p> <p>Our Text Mining application is available online on the ERIC website <url>http://www.ericbrc.org/portal/eric/articles</url>. The information retrieval interface displays a list of recently published enteropathogen literature abstracts, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed abstracts and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The abstract also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems.</p

    Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata

    Get PDF
    Many Microbe Microarrays Database (M3D) is designed to facilitate the analysis and visualization of expression data in compendia compiled from multiple laboratories. M3D contains over a thousand Affymetrix microarrays for Escherichia coli, Saccharomyces cerevisiae and Shewanella oneidensis. The expression data is uniformly normalized to make the data generated by different laboratories and researchers more comparable. To facilitate computational analyses, M3D provides raw data (CEL file) and normalized data downloads of each compendium. In addition, web-based construction, visualization and download of custom datasets are provided to facilitate efficient interrogation of the compendium for more focused analyses. The experimental condition metadata in M3D is human curated with each chemical and growth attribute stored as a structured and computable set of experimental features with consistent naming conventions and units. All versions of the normalized compendia constructed for each species are maintained and accessible in perpetuity to facilitate the future interpretation and comparison of results published on M3D data. M3D is accessible at http://m3d.bu.edu/

    Enteropathogen Resource Integration Center (ERIC): bioinformatics support for research on biodefense-relevant enterobacteria

    Get PDF
    ERIC, the Enteropathogen Resource Integration Center (www.ericbrc.org), is a new web portal serving as a rich source of information about enterobacteria on the NIAID established list of Select Agents related to biodefense—diarrheagenic Escherichia coli, Shigella spp., Salmonella spp., Yersinia enterocolitica and Yersinia pestis. More than 30 genomes have been completely sequenced, many more exist in draft form and additional projects are underway. These organisms are increasingly the focus of studies using high-throughput experimental technologies and computational approaches. This wealth of data provides unprecedented opportunities for understanding the workings of basic biological systems and discovery of novel targets for development of vaccines, diagnostics and therapeutics. ERIC brings information together from disparate sources and supports data comparison across different organisms, analysis of varying data types and visualization of analyses in human and computer-readable formats

    Passing to the Limit in a Wasserstein Gradient Flow: From Diffusion to Reaction

    Get PDF
    We study a singular-limit problem arising in the modelling of chemical reactions. At finite {\epsilon} > 0, the system is described by a Fokker-Planck convection-diffusion equation with a double-well convection potential. This potential is scaled by 1/{\epsilon}, and in the limit {\epsilon} -> 0, the solution concentrates onto the two wells, resulting into a limiting system that is a pair of ordinary differential equations for the density at the two wells. This convergence has been proved in Peletier, Savar\'e, and Veneroni, SIAM Journal on Mathematical Analysis, 42(4):1805-1825, 2010, using the linear structure of the equation. In this paper we re-prove the result by using solely the Wasserstein gradient-flow structure of the system. In particular we make no use of the linearity, nor of the fact that it is a second-order system. The first key step in this approach is a reformulation of the equation as the minimization of an action functional that captures the property of being a curve of maximal slope in an integrated form. The second important step is a rescaling of space. Using only the Wasserstein gradient-flow structure, we prove that the sequence of rescaled solutions is pre-compact in an appropriate topology. We then prove a Gamma-convergence result for the functional in this topology, and we identify the limiting functional and the differential equation that it represents. A consequence of these results is that solutions of the {\epsilon}-problem converge to a solution of the limiting problem.Comment: Added two sections, corrected minor typos, updated reference

    Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata

    Get PDF
    Many Microbe Microarrays Database (M3D) is designed to facilitate the analysis and visualization of expression data in compendia compiled from multiple laboratories. M3D contains over a thousand Affymetrix microarrays for Escherichia coli, Saccharomyces cerevisiae and Shewanella oneidensis. The expression data is uniformly normalized to make the data generated by different laboratories and researchers more comparable. To facilitate computational analyses, M3D provides raw data (CEL file) and normalized data downloads of each compendium. In addition, web-based construction, visualization and download of custom datasets are provided to facilitate efficient interrogation of the compendium for more focused analyses. The experimental condition metadata in M3D is human curated with each chemical and growth attribute stored as a structured and computable set of experimental features with consistent naming conventions and units. All versions of the normalized compendia constructed for each species are maintained and accessible in perpetuity to facilitate the future interpretation and comparison of results published on M3D data. M3D is accessible at http://m3d.bu.edu/

    Mapping twenty years of antimicrobial resistance research trends

    Get PDF
    OBJECTIVE: Antimicrobial resistance (AMR) is a global threat to health and healthcare. In response to the growing AMR burden, research funding also increased. However, a comprehensive overview of the research output, including conceptual, temporal, and geographical trends, is missing. Therefore, this study uses topic modelling, a machine learning approach, to reveal the scientific evolution of AMR research and its trends, and provides an interactive user interface for further analyses. METHODS: Structural topic modelling (STM) was applied on a text corpus resulting from a PubMed query comprising AMR articles (1999-2018). A topic network was established and topic trends were analysed by frequency, proportion, and importance over time and space. RESULTS: In total, 88 topics were identified in 158,616 articles from 166 countries. AMR publications increased by 450% between 1999 and 2018, emphasizing the vibrancy of the field. Prominent topics in 2018 were Strategies for emerging resistances and diseases, Nanoparticles, and Stewardship. Emerging topics included Water and environment, and Sequencing. Geographical trends showed prominence of Multidrug-resistant tuberculosis (MDR-TB) in the WHO African Region, corresponding with the MDR-TB burden. China and India were growing contributors in recent years, following the United States of America as overall lead contributor. CONCLUSION: This study provides a comprehensive overview of the AMR research output thereby revealing the AMR research response to the increased AMR burden. Both the results and the publicly available interactive database serve as a base to inform and optimise future research

    Patterns of subnet usage reveal distinct scales of regulation in the transcriptional regulatory network of Escherichia coli

    Get PDF
    The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downstream of the node, including itself. Using a large set of microarray expression data of the bacterium Escherichia coli, we find that the gene expression in different subnets exhibits a structured pattern in response to environmental changes and genotypic mutation. Subnets with less changes in their expression pattern have a higher fraction of feed-forward loop motifs and a lower fraction of small RNA targets within them. Our study implies that the TRN consists of several scales of regulatory organization: 1) subnets with more varying gene expression controlled by both transcription factors and post-transcriptional RNA regulation, and 2) subnets with less varying gene expression having more feed-forward loops and less post-transcriptional RNA regulation.Comment: 14 pages, 8 figures, to be published in PLoS Computational Biolog
    corecore