45 research outputs found

    Natural language processing in aid of FlyBase curators.

    Get PDF
    BACKGROUND: Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. RESULTS: PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. CONCLUSION: We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    The Drosophila phenotype ontology

    Get PDF
    BACKGROUND: Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions. RESULTS: We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable. CONCLUSIONS: The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes

    BC4GO: a full-text corpus for the BioCreative IV GO task

    Get PDF
    Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain ∌10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community

    Towards FAIR principles for research software

    Get PDF
    The FAIR Guiding Principles, published in 2016, aim to improve the findability, accessibility, interoperability and reusability of digital research objects for both humans and machines. Until now the FAIR principles have been mostly applied to research data. The ideas behind these principles are, however, also directly relevant to research software. Hence there is a distinct need to explore how the FAIR principles can be applied to software. In this work, we aim to summarize the current status of the debate around FAIR and software, as basis for the development of community-agreed principles for FAIR research software in the future. We discuss what makes software different from data with regard to the application of the FAIR principles, and which desired characteristics of research software go beyond FAIR. Then we present an analysis of where the existing principles can directly be applied to software, where they need to be adapted or reinterpreted, and where the definition of additional principles is required. Here interoperability has proven to be the most challenging principle, calling for particular attention in future discussions. Finally, we outline next steps on the way towards definite FAIR principles for research software

    Drosophila Neurotrophins Reveal a Common Mechanism for Nervous System Formation

    Get PDF
    Neurotrophic interactions occur in Drosophila, but to date, no neurotrophic factor had been found. Neurotrophins are the main vertebrate secreted signalling molecules that link nervous system structure and function: they regulate neuronal survival, targeting, synaptic plasticity, memory and cognition. We have identified a neurotrophic factor in flies, Drosophila Neurotrophin (DNT1), structurally related to all known neurotrophins and highly conserved in insects.By investigating with genetics the consequences of removing DNT1 or adding it in excess, we show that DNT1 maintains neuronal survival, as more neurons die in DNT1 mutants and expression of DNT1 rescues naturally occurring cell death, and it enables targeting by motor neurons. We show that Spaš tzle and a further fly neurotrophin superfamily member, DNT2, also have neurotrophic functions in flies. Our findings imply that most likely a neurotrophin was present in the common ancestor of all bilateral organisms, giving rise to invertebrate and vertebrate neurotrophins through gene or whole-genome duplications. This work provides a missing link between aspects of neuronal function in flies and vertebrates, and it opens the opportunity to use Drosophila to investigate further aspects of neurotrophin function and to model related diseases

    Ten simple rules for making training materials FAIR

    Get PDF
    Author summary: Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it’s sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They’re often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all

    The Ontogeny and Evolution of Sex-Biased Gene Expression in Drosophila melanogaster

    Get PDF
    Sexually dimorphic phenotypes are thought to largely result from sex differences in gene expression, and genes with sex-biased expression have been well characterized in adults of many species. Although most sexual dimorphisms manifest in adults, many result from sex-specific developmental trajectories, implying that juveniles may exhibit significant levels of sex-biased expression. However, it is unclear how much sex-biased expression occurs before reproductive maturity and whether preadult sex-biased genes should exhibit the same evolutionary dynamics observed for adult sex-biased genes. In order to understand the continuity, or lack thereof, and evolutionary dynamics of sex-biased expression throughout the life cycle, we examined sex-biased genes in pre-gonad tissue of two preadult stages and compared them with the adult gonad of Drosophila melanogaster. We found that the majority of the genome is sex-biased at some point in the life cycle, with some genes exhibiting conserved sex-biased expression and others displaying stage-specific sex bias. Our results also reveal a far more complex pattern of evolution for sex-biased genes throughout development. The most rapid evolutionary divergence occurred in genes expressed only in larvae within each sex, compared with continuously expressed genes. In females—but not males—this pattern appeared to be due to relaxed purifying selection in larva-limited genes. Furthermore, genes that retained male bias throughout life evolved more rapidly than stage-specific male-biased genes, due to stronger purifying selection in stage-specific genes. However, female-biased genes that were specific to larvae evolved most rapidly, a pattern that could not be definitively attributed to differences in adaptive evolution or purifying selection, suggesting that pleiotropic constraints on protein-coding sequences can arise when genes are broadly expressed across developmental stages. These results indicate that the signature of sex-specific selection can be detected well before reproductive maturity and is strongest during development

    Overview of the interactive task in BioCreative V

    Get PDF
    Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se. In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested
    corecore