14,592 research outputs found

    A text mining approach for the extraction of kinetic information from literature

    Get PDF
    Systems biology has fostered interest in the use of kinetic models to better understand the dynamic behavior of metabolic networks in a wide variety of conditions. Unfortunately, in most cases, data available in different databases are not sufficient for the development of such models, since a significant part of the relevant information is still scattered in the literature. Thus, it becomes essential to develop specific and powerful text mining tools towards this aim. In this context, this work has as main objective the development of a text mining tool to extract, from scientific literature, kinetic parameters, their respective values and their relations with enzymes and metabolites. The pipeline proposed integrates the development of a novel plug-in over the text mining tool @Note2. Overall, the results validate the developed approach

    KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed.</p> <p>Description</p> <p>Here we present a text mining algorithm for the extraction of kinetic information such as K<sub>M</sub>, K<sub>i</sub>, k<sub>cat </sub>etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (K<sub>M</sub>, K<sub>i</sub>, k<sub>cat</sub>, k<sub>cat</sub>/K<sub>M</sub>, V<sub>max</sub>, IC<sub>50</sub>, S<sub>0.5</sub>, K<sub>d</sub>, K<sub>a</sub>, t<sub>1/2</sub>, pI, n<sub>H</sub>, specific activity, V<sub>max</sub>/K<sub>M</sub>) from about 17 million PubMed abstracts and combine them with other data in the abstract.</p> <p>A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched.</p> <p>The results were stored in a database and are available as "KID the KInetic Database" via the internet.</p> <p>Conclusions</p> <p>The presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases.</p> <p>The database is available at <url>http://kid.tu-bs.de</url>. The source code of the algorithm is provided under the GNU General Public Licence and available on request from the author.</p

    Sustainability and Food: a Text Analysis of the Scientific Literature

    Get PDF
    The paper analyses the evolution of the research debate related to sustainability and to the relation between food and sustainability. A number of text analysis techniques were combined for the investigation of scientific papers. The results stress how discourse analysis of sustainability in the pre-Rio period is mostly associated with agriculture and with a vision where the ecological and environmental aspects are dominant. In the post-Rio phase, the discussion about sustainability, though still strongly linked to environmental issues, enters a holistic dimension that includes social elements. The themes of energy and the sustainability of urban areas become central, and the scientific debate stresses the importance of indicators within an assessment approach linked to the relevance of planning and intervention aspects. The focus on the role of food within the debate on sustainability highlights a food security oriented approach in the pre-Rio phase, with a particular attention towards agriculture and third world Countries. In the post-Rio period, the focus of the analysis moves towards developed Countries. Even though food security remains a strongly significant element of the debate, the attention shifts towards consumers and food choices

    Extracting kinetic information from literature with KineticRE

    Get PDF
    To better understand the dynamic behavior of metabolic networks in a wide variety of conditions, the field of Systems Biology has increased its interest in the use of kinetic models. The different databases, available these days, do not contain enough data regarding this topic. Given that a significant part of the relevant information for the development of such models is still wide spread in the literature, it becomes essential to develop specific and powerful text mining tools to collect these data. In this context, this work has as main objective the development of a text mining tool to extract, from scientific literature, kinetic parameters, their respective values and their relations with enzymes and metabolites. The approach proposed integrates the development of a novel plug-in over the text mining framework @Note2. In the end, the pipeline developed was validated with a case study on Kluyveromyces lactis, spanning the analysis and results of 20 full text documents.The work was funded by National Funds through the FCT (Portuguese Foundation for ScienceandTechnology)withinprojectref. PTDC/QUI-BIQ/119657/2010 “Finding the naturally evolved design principles of prevalent metabolic circuits”. The authors would like to thank the FCT Strategic Project PEst-OE/EQB/ LA0023/2013 and the Projects “BioInd - Biotechnology and Bioengineering for improved Industrial and Agro-Food processes”, REF. NORTE07-0124-FEDER-000028 and “PEM Metabolic Engineering Platform”, project number 23060, both co-funded by the Programa Operacional Regional do Norte (ON.2 ONovoNorte),QREN, FEDER

    Semantic text mining support for lignocellulose research

    Get PDF
    Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the identification and characterization of the enzymes involved is a key challenge in the research and development of biomass-derived products and fuels. One approach to meeting this challenge is to mine the rapidly-expanding repertoire of microbial genomes for enzymes with the appropriate catalytic properties. Semantic technologies, including natural language processing, ontologies, semantic Web services and Web-based collaboration tools, promise to support users in handling complex data, thereby facilitating knowledge-intensive tasks. An ongoing challenge is to select the appropriate technologies and combine them in a coherent system that brings measurable improvements to the users. We present our ongoing development of a semantic infrastructure in support of genomics-based lignocellulose research. Part of this effort is the automated curation of knowledge from information on fungal enzymes that is available in the literature and genome resources. Working closely with fungal biology researchers who manually curate the existing literature, we developed ontological natural language processing pipelines integrated in a Web-based interface to assist them in two main tasks: mining the literature for relevant knowledge, and at the same time providing rich and semantically linked information

    Intelligent Management and Efficient Operation of Big Data

    Get PDF
    This chapter details how Big Data can be used and implemented in networking and computing infrastructures. Specifically, it addresses three main aspects: the timely extraction of relevant knowledge from heterogeneous, and very often unstructured large data sources, the enhancement on the performance of processing and networking (cloud) infrastructures that are the most important foundational pillars of Big Data applications or services, and novel ways to efficiently manage network infrastructures with high-level composed policies for supporting the transmission of large amounts of data with distinct requisites (video vs. non-video). A case study involving an intelligent management solution to route data traffic with diverse requirements in a wide area Internet Exchange Point is presented, discussed in the context of Big Data, and evaluated.Comment: In book Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 201

    Structuring the Unstructured: Unlocking pharmacokinetic data from journals with Natural Language Processing

    Get PDF
    The development of a new drug is an increasingly expensive and inefficient process. Many drug candidates are discarded due to pharmacokinetic (PK) complications detected at clinical phases. It is critical to accurately estimate the PK parameters of new drugs before being tested in humans since they will determine their efficacy and safety outcomes. Preclinical predictions of PK parameters are largely based on prior knowledge from other compounds, but much of this potentially valuable data is currently locked in the format of scientific papers. With an ever-increasing amount of scientific literature, automated systems are essential to exploit this resource efficiently. Developing text mining systems that can structure PK literature is critical to improving the drug development pipeline. This thesis studied the development and application of text mining resources to accelerate the curation of PK databases. Specifically, the development of novel corpora and suitable natural language processing architectures in the PK domain were addressed. The work presented focused on machine learning approaches that can model the high diversity of PK studies, parameter mentions, numerical measurements, units, and contextual information reported across the literature. Additionally, architectures and training approaches that could efficiently deal with the scarcity of annotated examples were explored. The chapters of this thesis tackle the development of suitable models and corpora to (1) retrieve PK documents, (2) recognise PK parameter mentions, (3) link PK entities to a knowledge base and (4) extract relations between parameter mentions, estimated measurements, units and other contextual information. Finally, the last chapter of this thesis studied the feasibility of the whole extraction pipeline to accelerate tasks in drug development research. The results from this thesis exhibited the potential of text mining approaches to automatically generate PK databases that can aid researchers in the field and ultimately accelerate the drug development pipeline. Additionally, the thesis presented contributions to biomedical natural language processing by developing suitable architectures and corpora for multiple tasks, tackling novel entities and relations within the PK domain
    • …
    corecore