89 research outputs found

    Semantic text mining support for lignocellulose research

    Get PDF
    Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the identification and characterization of the enzymes involved is a key challenge in the research and development of biomass-derived products and fuels. One approach to meeting this challenge is to mine the rapidly-expanding repertoire of microbial genomes for enzymes with the appropriate catalytic properties. Semantic technologies, including natural language processing, ontologies, semantic Web services and Web-based collaboration tools, promise to support users in handling complex data, thereby facilitating knowledge-intensive tasks. An ongoing challenge is to select the appropriate technologies and combine them in a coherent system that brings measurable improvements to the users. We present our ongoing development of a semantic infrastructure in support of genomics-based lignocellulose research. Part of this effort is the automated curation of knowledge from information on fungal enzymes that is available in the literature and genome resources. Working closely with fungal biology researchers who manually curate the existing literature, we developed ontological natural language processing pipelines integrated in a Web-based interface to assist them in two main tasks: mining the literature for relevant knowledge, and at the same time providing rich and semantically linked information

    mycoCLAP, the database for characterized lignocellulose-active proteins of fungal origin: resource and text mining curation support

    Get PDF
    Enzymes active on components of lignocellulosic biomass are used for industrial applications ranging from food processing to biofuels production. These include a diverse array of glycoside hydrolases, carbohydrate esterases, polysaccharide lyases and oxidoreductases. Fungi are prolific producers of these enzymes, spurring fungal genome sequencing efforts to identify and catalogue the genes that encode them. To facilitate the functional annotation of these genes, biochemical data on over 800 fungal lignocellulose-degrading enzymes have been collected from the literature and organized into the searchable database, mycoCLAP (http://mycoclap.fungalgenomics.ca). First implemented in 2011, and updated as described here, mycoCLAP is capable of ranking search results according to closest biochemically characterized homologues: this improves the quality of the annotation, and significantly decreases the time required to annotate novel sequences. The database is freely available to the scientific community, as are the open source applications based on natural language processing developed to support the manual curation of mycoCLAP. Database URL: http://mycoclap.fungalgenomics.ca

    On the R&D Landscape Evolution in Catalytic Upgrading of Biomass

    Get PDF
    International audienceFor the last decade, the scientific community and industrial developers have been searching for improved methods to convert biomass into valuable products in order to respond to enhanced sustainability considerations. In this development, catalysts play an essential role at the core of the many technological routes to convert complex biomass into fuels or chemicals, which can be used in our daily lives. This chapter reports on the evolution of catalytic conversion of biomass by exploring databases on scientific literature and on patents retrieved from Scopus and DWPI. The trend analysis of more than 14,000 patent and nonpatent documents on renewable biological feedstock conversion by catalytic route has been carried out by using Intellixir, a statistical tool to analyze a large number of data for scientific intelligence. The scope of this chapter is to not only display a comprehensive study on patent and nonpatent literature in the catalytic upgrading (value creation) of biomass, but to increase the awareness in the use of patent literature as a tool to reach the rich and open-source treasure of knowledge in various technological fields

    Machine Learning for Biomedical Literature Triage

    Get PDF
    This paper presents a machine learning system for supporting the first task of the biological literature manual curation process, called triage. We compare the performance of various classification models, by experimenting with dataset sampling factors and a set of features, as well as three different machine learning algorithms (Naive Bayes, Support Vector Machine and Logistic Model Trees). The results show that the most fitting model to handle the imbalanced datasets of the triage classification task is obtained by using domain relevant features, an under-sampling technique, and the Logistic Model Trees algorithm

    Integration of and Access to Distributed Data and Tools in Genomics

    Get PDF
    One of the important data sources in bioinformatics is protein or nucleotide sequences that are used as input to many programs to collectively or individually analyze them. There exists an ample amount of protein sequences scattered over many different databases. This division complicates the process of feeding them into existing programs to be further analyzed. Moreover, there exists a program integration portal, namely Mobyle that makes the common programs available with unified interface to the users; in addition, it provides the functionality of chaining the results from one program to another. The two existing programs in Mobyle fetch sequences to feed the other programs, however, they fetch sequences from limited number of databases that are statically defined by the Mobyle administrator. In addition, neither of these tools have access to the DAS servers, resulting in the loss of a major data source. In this work, a program was developed and integrated, namely DasSeqFetcher, for use in Mobyle to dynamically fetch sequences from all available sequence databases providing a DAS reference server. Also, both DAS reference and annotation servers were developed for a database made by our research group which holds experimentally characterized lignocellulose-active proteins. The reference servers can then be added to DAS registry to be used by DAS client tools, e.g. DasSeqFetcher

    Poster abstract research showcase College of Science and Technology

    Get PDF
    Welcome to the College of Science and Technology Research and Innovation Showcase 2014, an event which celebrates the research achievements of our science disciplines. Our research brings together scientists from architecture and the built environment through to computing, engineering, mathematics and physics and biology, geography and environmental science. We are committed to build on our strengths, and our key vision is to drive research growth and impact through exploitation of the synergy between research, innovation and enterprise. This year’s showcase event includes over 70 posters illustrating the excellent research being pursued, a Dean's prize recognising the achievements of an early career researcher, prizes for the best student and best students’ posters and journal papers, and this proceedings of abstracts showing the high quality and range of research in the College of Science and Technology

    A General Architecture to Enhance Wiki Systems with Natural Language Processing Techniques

    Get PDF
    Wikis are web-based software applications that allow users to collaboratively create and edit web page content, through a Web browser using a simplified syntax. The ease-of-use and “open” philosophy of wikis has brought them to the attention of organizations and online communities, leading to a wide-spread adoption as a simple and “quick” way of collaborative knowledge management. However, these characteristics of wiki systems can act as a double-edged sword: When wiki content is not properly structured, it can turn into a “tangle of links”, making navigation, organization and content retrieval difficult for their end-users. Since wiki content is mostly written in unstructured natural language, we believe that existing state-of-the-art techniques from the Natural Language Processing (NLP) and Semantic Computing domains can help mitigating these common problems when using wikis and improve their users’ experience by introducing new features. The challenge, however, is to find a solution for integrating novel semantic analysis algorithms into the multitude of existing wiki systems, without the need for modifying their engines. In this research work, we present a general architecture that allows wiki systems to benefit from NLP services made available through the Semantic Assistants framework – a service-oriented architecture for brokering NLP pipelines as web services. Our main contributions in this thesis include an analysis of wiki engines, the development of collaboration patterns be- tween wikis and NLP, and the design of a cohesive integration architecture. As a concrete application, we deployed our integration to MediaWiki – the powerful wiki engine behind Wikipedia – to prove its practicability. Finally, we evaluate the usability and efficiency of our integration through a number of user studies we performed in real-world projects from various domains, including cultural heritage data management, software requirements engineering, and biomedical literature curation

    Systems Biology Knowledgebase for a New Era in Biology A Genomics:GTL Report from the May 2008 Workshop

    Full text link
    corecore