2,084 research outputs found

    Extraction of Transcript Diversity from Scientific Literature

    Get PDF
    Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term “alternative splicing” to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embl.de/LSAT/

    Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.</p> <p>Results</p> <p>Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).</p> <p>Conclusions</p> <p>Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.</p

    Creating, Modeling, and Visualizing Metabolic Networks

    Get PDF
    Metabolic networks combine metabolism and regulation. These complex networks are difficult to understand and create due to the diverse types of information that need to be represented. This chapter describes a suite of interlinked tools for developing, displaying, and modeling metabolic networks. The metabolic network interactions database, MetNetDB, contains information on regulatory and metabolic interactions derived from a combination of web databases and input from biologists in their area of expertise. PathBinderA mines the biological “literaturome” by searching for new interactions or supporting evidence for existing interactions in metabolic networks. Sentences from abstracts are ranked in terms of the likelihood that an interaction is described and combined with evidence provided by other sentences. FCModeler, a publicly available software package, enables the biologist to visualize and model metabolic and regulatory network maps. FCModeler aids in the development and evaluation of hypotheses, and provides a modeling framework for assessing the large amounts of data captured by high-throughput gene expression experiments

    An Integrated Web-based System for MEDLINE Analysis: A Case Study of Chronic Kidney Disease

    Get PDF
    In the era of big data, medical researchers attempt to utilize some analysis techniques like machine learning and text mining on their large-scale corpora to save valuable labor work and time. Consequently, many data analysis platforms are built to support medical professionals such as Pubtator, GeneWays, BioContext, etc. These platforms are helpful to medical entities recognition and relation extraction, but there is not an integrated platform to support researchers’ various needs, and medical projects are isolated from each other, which is hard to be shared and reused. As a result, we present an integrated system containing ‘name entity recognition’, ‘document categorization’ and ‘association extraction’. Besides, we add the concept of ‘socialization’ making projects reusable for further analyses. A case study of chronic kidney disease was adopted to indicate the effectiveness of the proposed system

    PPLook: an automated data mining tool for protein-protein interaction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results.</p> <p>Results</p> <p>We developed a PPI search system, termed PPLook, which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface.</p> <p>Conclusions</p> <p>PPLook is an effective tool for biologists and biosystem developers who need to access PPI information from the literature. PPLook is freely available for non-commercial users at <url>http://meta.usc.edu/softs/PPLook</url>.</p

    Text Mining for Systems Biology and MetNet

    Get PDF
    The rapidly expanding volume of biological and biomedical literature motivates demand for more friendly access. Better automated mining of this literature can help find useful and desired citations and can extract new knowledge from the massive biological literaturome. The research objectives presented here, when met, will provide comprehensive text mining utilities within the MetNet (Metabolic Network Exchange) (Wurtele et al., 2007), platform to help biologists visualize, explore, and analyze the biological literaturome. The overarching research question to be addressed is how to automatically extract biomolecular interactions from numerous biomedical texts. Here are the specific aims of this work. 1. Research on the text empirics of interaction-indicating terms to find more clues to improve the current algorithm applied in PathBinder to more precisely judge whether biomolecular interaction descriptions are present in sentences from the biological literature. 2. Based on these research results, extract interacting biomolecule pairs from literature and use those pairs to construct a biomolecule interaction database and network. 3. Integrate biomolecular interaction-indicating term extraction into MetNet\u27s existing metabolomic network database. 4. Apply all of the above results in PathBinder software. 5. Quantitatively evaluate the success of algorithms developed based on the text empirics results. This work is expected to advance systems biology by answering scientific questions about biological text empirics, by contributing to the engineering task of building MetNet and key constituent subsystems of MetNet, and by supporting the MetNet project through selected maintenance tasks
    corecore