30 research outputs found

    Open Science: Tools, approaches, and implications

    Get PDF
    The Pacific Symposium on Biocomputing is an annual meeting whose topics are determined by proposals submitted by members of the community. This document is the proposal for a session on Open Science, submitted for consideration for the PSB meeting in 2009

    ISMB 2008 Toronto

    Get PDF
    The International Society for Computational Biology (ISCB) presents the Sixteenth International Conference on Intelligent Systems for Molecular Biology (ISMB 2008), to be held in Toronto, Canada, July 19–23, 2008. Now in the final phases of scheduling selected presentations, demonstrations, and posters, the organizers are preparing what will likely be recognized as the premier conference on computational biology in 2008. ISMB 2008 (http://www.iscb.org/ismb2008/) will follow the road paved by the ISMB/ ECCB 2007 (http://www.iscb.org/ ismbeccb2007/) in Vienna in the attempt to specifically encourage increased participation from previously under-represented disciplines of computational biology. This conference will feature the best of the computer and life sciences through a variety of core sessions running in multiple parallel tracks, along with single-tracked Keynote Presentations, posters on display throughout the duration of the conference, and an extensive commercial exposition. The first day (July 18) of the meeting is reserved for two-day Special Interest Group (SIG) and Satellite meetings, the second day (July 19) runs SIGs for the first time in parallel with Tutorials and the Student Council Symposium, and for the first time two SIGs are running in parallel with the main ISMB meeting (July 20–23)Other Research Uni

    Foreword

    Get PDF
    The aim of this Workshop is to focus on building and evaluating resources used to facilitate biomedical text mining, including their design, update, delivery, quality assessment, evaluation and dissemination. Key resources of interest are lexical and knowledge repositories (controlled vocabularies, terminologies, thesauri, ontologies) and annotated corpora, including both task-specific resources and repositories reengineered from biomedical or general language resources. Of particular interest is the process of building annotated resources, including designing guidelines and annotation schemas (aiming at both syntactic and semantic interoperability) and relying on language engineering standards. Challenging aspects are updates and evolution management of resources, as well as their documentation, dissemination and evaluation

    EUFORIA : European forest research and innovation area : programme and book of abstracts

    Get PDF

    A Dependency Parsing Approach to Biomedical Text Mining

    Get PDF
    Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.Siirretty Doriast

    Access to genetic resources and sharing of benefits arising out of their utilization : a critical analysis of the contribution of the Nagoya Protocol to the existing international regime on access and benefit-sharing.

    Get PDF
    Thesis (LL.M.)-University of KwaZulu-Natal, Pietermaritzburg, 2012.Prior to the commencement of the Convention on Biological Diversity (CBD), genetic resources were considered to be the common heritage of mankind; this principle gave the right to developed countries to obtain and freely use the genetic material of developing countries. Growing concern over the controversial ‘free access’ system and the monopolization of benefits led to the negotiation of an international treaty, the CBD, to regulate access to genetic resources and the sharing of benefits resulting from the utilisation of such resources. The CBD makes some important innovations. It recognizes that the authority to determine access to genetic resources depends on national governments and is subject to national legislation. Thus, the CBD recognizes state sovereignty over genetic resources and institutes the principles of Prior informed Consent (PIC), Mutually Agreed Terms and Benefit-Sharing. However, the CBD and other international instruments relating to genetic resources have not had the desired effect of preventing the misappropriation of genetic resources and associated traditional knowledge (TK). Developing countries suffered and continue to suffer from the piracy of their resources. This state of affairs has led to the recent adoption of the ‘Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to The Convention on Biological Diversity,’ (2010 Nagoya Protocol). This dissertation will consider the contribution of the Nagoya Protocol to the existing global and regional instruments concerning the access and benefit sharing of genetic resources. After explaining the gaps in the existing instruments, it will explore whether the Protocol is a miracle solution to the recurrent concern over misappropriation of genetic resources from biologically rich countries, or whether there is still much work to do to sort out this problem

    Automating the gathering of relevant information from biomedical text

    Get PDF
    More and more, database curators rely on literature-mining techniques to help them gather and make use of the knowledge encoded in text documents. This thesis investigates how an assisted annotation process can help and explores the hypothesis that it is only with respect to full-text publications that a system can tell relevant and irrelevant facts apart by studying their frequency. A semi-automatic annotation process was developed for a particular database - the Nuclear Protein Database (NPD), based on a set of full-text articles newly annotated with regards to subnuclear protein localisation, along with eight lexicons. The annotation process is carried out online, retrieving relevant documents (abstracts and full-text papers) and highlighting sentences of interest in them. The process also offers a summary Table of the facts found clustered by type of information. Each method involved in each step of the tool is evaluated using cross-validation results on the training data as well as test set results. The performance of the final tool, called the “NPD Curator System Interface”, is estimated empirically in an experiment where the NPD curator updates the database with pieces of information found relevant in 31 publications using the interface. A final experiment complements our main methodology by showing its extensibility to retrieving information on protein function rather than localisation. I argue that the general methods, the results they produced and the discussions they engendered are useful for any subsequent attempt to generate semi-automatic database annotation processes. The annotated corpora, gazetteers, methods and tool are fully available on request of the author ([email protected])
    corecore