46 research outputs found

    Natural language processing in aid of FlyBase curators.

    Get PDF
    BACKGROUND: Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. RESULTS: PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. CONCLUSION: We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    FAIR, ethical, and coordinated data sharing for COVID-19 response: a scoping review and cross-sectional survey of COVID-19 data sharing platforms and registries

    Get PDF
    Data sharing is central to the rapid translation of research into advances in clinical medicine and public health practice. In the context of COVID-19, there has been a rush to share data marked by an explosion of population-specific and discipline-specific resources for collecting, curating, and disseminating participant-level data. We conducted a scoping review and cross-sectional survey to identify and describe COVID-19-related platforms and registries that harmonise and share participant-level clinical, omics (eg, genomic and metabolomic data), imaging data, and metadata. We assess how these initiatives map to the best practices for the ethical and equitable management of data and the findable, accessible, interoperable, and reusable (FAIR) principles for data resources. We review gaps and redundancies in COVID-19 data-sharing efforts and provide recommendations to build on existing synergies that align with frameworks for effective and equitable data reuse. We identified 44 COVID-19-related registries and 20 platforms from the scoping review. Data-sharing resources were concentrated in high-income countries and siloed by comorbidity, body system, and data type. Resources for harmonising and sharing clinical data were less likely to implement FAIR principles than those sharing omics or imaging data. Our findings are that more data sharing does not equate to better data sharing, and the semantic and technical interoperability of platforms and registries harmonising and sharing COVID-19-related participant-level data needs to improve to facilitate the global collaboration required to address the COVID-19 crisis

    The Drosophila phenotype ontology

    Get PDF
    BACKGROUND: Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions. RESULTS: We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable. CONCLUSIONS: The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes

    BC4GO: a full-text corpus for the BioCreative IV GO task

    Get PDF
    Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain ∼10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community

    Towards FAIR principles for research software

    Get PDF
    The FAIR Guiding Principles, published in 2016, aim to improve the findability, accessibility, interoperability and reusability of digital research objects for both humans and machines. Until now the FAIR principles have been mostly applied to research data. The ideas behind these principles are, however, also directly relevant to research software. Hence there is a distinct need to explore how the FAIR principles can be applied to software. In this work, we aim to summarize the current status of the debate around FAIR and software, as basis for the development of community-agreed principles for FAIR research software in the future. We discuss what makes software different from data with regard to the application of the FAIR principles, and which desired characteristics of research software go beyond FAIR. Then we present an analysis of where the existing principles can directly be applied to software, where they need to be adapted or reinterpreted, and where the definition of additional principles is required. Here interoperability has proven to be the most challenging principle, calling for particular attention in future discussions. Finally, we outline next steps on the way towards definite FAIR principles for research software

    Drosophila Neurotrophins Reveal a Common Mechanism for Nervous System Formation

    Get PDF
    Neurotrophic interactions occur in Drosophila, but to date, no neurotrophic factor had been found. Neurotrophins are the main vertebrate secreted signalling molecules that link nervous system structure and function: they regulate neuronal survival, targeting, synaptic plasticity, memory and cognition. We have identified a neurotrophic factor in flies, Drosophila Neurotrophin (DNT1), structurally related to all known neurotrophins and highly conserved in insects.By investigating with genetics the consequences of removing DNT1 or adding it in excess, we show that DNT1 maintains neuronal survival, as more neurons die in DNT1 mutants and expression of DNT1 rescues naturally occurring cell death, and it enables targeting by motor neurons. We show that Spa¨ tzle and a further fly neurotrophin superfamily member, DNT2, also have neurotrophic functions in flies. Our findings imply that most likely a neurotrophin was present in the common ancestor of all bilateral organisms, giving rise to invertebrate and vertebrate neurotrophins through gene or whole-genome duplications. This work provides a missing link between aspects of neuronal function in flies and vertebrates, and it opens the opportunity to use Drosophila to investigate further aspects of neurotrophin function and to model related diseases

    Ten simple rules for making training materials FAIR

    Get PDF
    Author summary: Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it’s sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They’re often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all

    The Ontogeny and Evolution of Sex-Biased Gene Expression in Drosophila melanogaster

    Get PDF
    Sexually dimorphic phenotypes are thought to largely result from sex differences in gene expression, and genes with sex-biased expression have been well characterized in adults of many species. Although most sexual dimorphisms manifest in adults, many result from sex-specific developmental trajectories, implying that juveniles may exhibit significant levels of sex-biased expression. However, it is unclear how much sex-biased expression occurs before reproductive maturity and whether preadult sex-biased genes should exhibit the same evolutionary dynamics observed for adult sex-biased genes. In order to understand the continuity, or lack thereof, and evolutionary dynamics of sex-biased expression throughout the life cycle, we examined sex-biased genes in pre-gonad tissue of two preadult stages and compared them with the adult gonad of Drosophila melanogaster. We found that the majority of the genome is sex-biased at some point in the life cycle, with some genes exhibiting conserved sex-biased expression and others displaying stage-specific sex bias. Our results also reveal a far more complex pattern of evolution for sex-biased genes throughout development. The most rapid evolutionary divergence occurred in genes expressed only in larvae within each sex, compared with continuously expressed genes. In females—but not males—this pattern appeared to be due to relaxed purifying selection in larva-limited genes. Furthermore, genes that retained male bias throughout life evolved more rapidly than stage-specific male-biased genes, due to stronger purifying selection in stage-specific genes. However, female-biased genes that were specific to larvae evolved most rapidly, a pattern that could not be definitively attributed to differences in adaptive evolution or purifying selection, suggesting that pleiotropic constraints on protein-coding sequences can arise when genes are broadly expressed across developmental stages. These results indicate that the signature of sex-specific selection can be detected well before reproductive maturity and is strongest during development
    corecore