36 research outputs found

    Infrastructure for synthetic health data

    Get PDF
    editorial reviewedMachine learning (ML) methods are becoming ever more prevalent across all domains of lifesciences. However, a key component of effective ML is the availability of large datasets thatare diverse and representative. In the context of health systems, with significant heterogeneityof clinical phenotypes and diversity of healthcare systems, there exists a necessity to developand refine unbiased and fair ML models. Synthetic data are increasingly being used to protectthe patient’s right to privacy and overcome the paucity of annotated open-access medical data. Here, we present our proof of concept for the generation of synthetic health data and our proposed FAIR implementation of the generated synthetic datasets. The work was developed during and after the one-week-long BioHackathon Europe, by together 20 participants (10 new to the project), from different countries (NL, ES, LU, UK, GR, FL, DE, . . . ).</p

    Mark2Cure: Learn, Work, Help

    No full text
    At 26 million articles and growing, knowledge extraction from biomedical literature is an important big data problem. Mark2Cure trains citizen scientists to help tackle this problem in order to facilitate research on a rare disease known as NGLY1-deficiency. Learn about biomedical terms, biological processes, fascinating diseases, genes, and drugs from the same sources that scientists use--all while helping organize information relevant to a rare disease that makes children unable to shed tears when they cry. Training is provided via an online tutorial, and there is NO cost to participate. If you can READ, you can HELP

    Coxsackievirus Persistence in the Neonatal Central Nervous System : Investigating the Interplay between the Host Response and Viral Persistence in Neural Stem and Progenitor Cells

    No full text
    Newborn infants are particularly vulnerable to neurotropic infections of coxsackievirus which can potentially cause serious central nervous system (CNS) diseases such as meningitis and encephalitis. Coxsackievirus is also capable of persisting in the host CNS for extensive periods of time; however, the mechanism by which the virus evades clearance by the host remains unclear. In vivo models of coxsackievirus infection have previously revealed that the virus isolated from persistent infection is not infectious, suggesting the evolution of the virus over the course of infection. In order to disaggregate the effects of the adaptive immune response and other complicating factors from the actual infection of the central nervous system, we therefore wish to develop and utilize an in vitro model of Coxsackievirus infection using Neural Progenitor and Stem Cells (NPSCs) and a recombinant Coxsackievirus B3 expressing enhanced GFP (eGFP). In developing and utilizing this model we hope to explore the interaction between the host innate immune response and the virus and the impact of these interactions on the evolution of the virus and the development of disorders in the infected host CN

    Mark2Curator annotation submissions for NCBI disease corpus

    No full text
    An export of annotations submitted via Mark2Cure a citizen science project aimed at empowering the public to help facilitate biomedical research. This data set contains citizen scientist submitted annotations of the NCBI disease corpus in BioC xml format<br

    Citizen Science for Mining the Biomedical Literature

    No full text
    Biomedical literature represents one of the largest and fastest growing collections of unstructured biomedical knowledge. Finding critical information buried in the literature can be challenging. To extract information from free-flowing text, researchers need to: 1. identify the entities in the text (named entity recognition), 2. apply a standardized vocabulary to these entities (normalization), and 3. identify how entities in the text are related to one another (relationship extraction). Researchers have primarily approached these information extraction tasks through manual expert curation and computational methods. We have previously demonstrated that named entity recognition (NER) tasks can be crowdsourced to a group of non-experts via the paid microtask platform, Amazon Mechanical Turk (AMT), and can dramatically reduce the cost and increase the throughput of biocuration efforts. However, given the size of the biomedical literature, even information extraction via paid microtask platforms is not scalable. With our web-based application Mark2Cure (http://mark2cure.org), we demonstrate that NER tasks also can be performed by volunteer citizen scientists with high accuracy. We apply metrics from the Zooniverse Matrices of Citizen Science Success and provide the results here to serve as a basis of comparison for other citizen science projects. Further, we discuss design considerations, issues, and the application of analytics for successfully moving a crowdsourcing workflow from a paid microtask platform to a citizen science platform. To our knowledge, this study is the first application of citizen science to a natural language processing task

    MyGene.info web frontend component

    No full text
    MyGene.info: Gene Annotation Query as a Service http://mygene.inf

    MyGene.info data backend component

    No full text
    MyGene.info: Gene Annotation Query as a Service http://mygene.inf

    Les baraquettes / paroles de Ed. Barneaud ; musique de J. A. Fruchier

    No full text
    Abstract Background Application Programming Interfaces (APIs) are now widely used to distribute biological data. And many popular biological APIs developed by many different research teams have adopted Javascript Object Notation (JSON) as their primary data format. While usage of a common data format offers significant advantages, that alone is not sufficient for rich integrative queries across APIs. Results Here, we have implemented JSON for Linking Data (JSON-LD) technology on the BioThings APIs that we have developed, MyGene.info, MyVariant.info and MyChem.info. JSON-LD provides a standard way to add semantic context to the existing JSON data structure, for the purpose of enhancing the interoperability between APIs. We demonstrated several use cases that were facilitated by semantic annotations using JSON-LD, including simpler and more precise query capabilities as well as API cross-linking. Conclusions We believe that this pattern offers a generalizable solution for interoperability of APIs in the life sciences
    corecore