172 research outputs found

    Incentivising Use of Structured Language in Biological Descriptions: Author-Driven Phenotype Data and Ontology Production

    Get PDF
    Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project’s semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production

    Incentivising Use of Structured Language in Biological Descriptions: Author-Driven Phenotype Data and Ontology Production

    Get PDF
    Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project’s semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production

    Incorporating hydrology into climate suitability models changes projections of malaria transmission in Africa

    Get PDF
    Continental-scale models of malaria climate suitability typically couple well-established temperature-response models with basic estimates of vector habitat availability using rainfall as a proxy. Here we show that across continental Africa, the estimated geographic range of climatic suitability for malaria transmission is more sensitive to the precipitation threshold than the thermal response curve applied. To address this problem we use downscaled daily climate predictions from seven GCMs to run a continental-scale hydrological model for a process-based representation of mosquito breeding habitat availability. A more complex pattern of malaria suitability emerges as water is routed through drainage networks and river corridors serve as year-round transmission foci. The estimated hydro-climatically suitable area for stable malaria transmission is smaller than previous models suggest and shows only a very small increase in state-of-the-art future climate scenarios. However, bigger geographical shifts are observed than with most rainfall threshold models and the pattern of that shift is very different when using a hydrological model to estimate surface water availability for vector breeding

    Biomarker-driven phenotyping in Parkinson's disease: A translational missing link in disease-modifying clinical trials

    Get PDF
    Past clinical trials of putative neuroprotective therapies have targeted PD as a single pathogenic disease entity. From an Oslerian clinicopathological perspective, the wide complexity of PD converges into Lewy bodies and justifies a reductionist approach to PD: A single-mechanism therapy can affect most of those sharing the classic pathological hallmark. From a systems-biology perspective, PD is a group of disorders that, while related by sharing the feature of nigral dopamine-neuron degeneration, exhibit unique genetic, biological, and molecular abnormalities, which probably respond differentially to a given therapeutic approach, particularly for strategies aimed at neuroprotection. Under this model, only biomarker-defined, homogenous subtypes of PD are likely to respond optimally to therapies proven to affect the biological processes within each subtype. Therefore, we suggest that precision medicine applied to PD requires a reevaluation of the biomarker-discovery effort. This effort is currently centered on correlating biological measures to clinical features of PD and on identifying factors that predict whether various prodromal states will convert into the classical movement disorder. We suggest, instead, that subtyping of PD requires the reverse view, where abnormal biological signals (i.e., biomarkers), rather than clinical definitions, are used to define disease phenotypes. Successful development of disease-modifying strategies will depend on how relevant the specific biological processes addressed by an intervention are to the pathogenetic mechanisms in the subgroup of targeted patients. This precision-medicine approach will likely yield smaller, but well-defined, subsets of PD amenable to successful neuroprotection.Fil: Espay, Alberto J.. University of Cincinnati; Estados UnidosFil: Schwarzschild, Michael A.. Massachusetts General Hospital; Estados UnidosFil: Tanner, Caroline M.. University of California; Estados UnidosFil: Fernandez, Hubert H.. Cleveland Clinic; Estados UnidosFil: Simon, David K.. Harvard Medical School; Estados UnidosFil: Leverenz, James B.. Cleveland Clinic; Estados UnidosFil: Merola, Aristide. University of Cincinnati; Estados UnidosFil: Chen Plotkin, Alice. University of Pennsylvania; Estados UnidosFil: Brundin, Patrik. Van Andel Research Institute. Center for Neurodegenerative Science; Estados UnidosFil: Kauffman, Marcelo Andres. Universidad Austral; Argentina. Universidad Austral. Facultad de Ciencias Biomédicas. Instituto de Investigaciones en Medicina Traslacional. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Medicina Traslacional; Argentina. Gobierno de la Ciudad de Buenos Aires. Hospital General de Agudos "Ramos Mejía"; ArgentinaFil: Erro, Roberto. Universita di Verona; Italia. University College London; Reino UnidoFil: Kieburtz, Karl. University of Rochester Medical Center; Estados UnidosFil: Woo, Daniel. University of Cincinnati; Estados UnidosFil: Macklin, Eric A.. Massachusetts General Hospital; Estados UnidosFil: Standaert, David G.. University of Alabama at Birmingahm; Estados UnidosFil: Lang, Anthony E.. University of Toronto; Canad

    Digital Extended Specimens: Enabling an Extensible Network of Biodiversity Data Records as Integrated Digital Objects on the Internet

    Get PDF
    The early twenty-first century has witnessed massive expansions in availability and accessibility of digital data in virtually all domains of the biodiversity sciences. Led by an array of asynchronous digitization activities spanning ecological, environmental, climatological, and biological collections data, these initiatives have resulted in a plethora of mostly disconnected and siloed data, leaving to researchers the tedious and time-consuming manual task of finding and connecting them in usable ways, integrating them into coherent data sets, and making them interoperable. The focus to date has been on elevating analog and physical records to digital replicas in local databases prior to elevating them to ever-growing aggregations of essentially disconnected discipline-specific information. In the present article, we propose a new interconnected network of digital objects on the Internet—the Digital Extended Specimen (DES) network—that transcends existing aggregator technology, augments the DES with third-party data through machine algorithms, and provides a platform for more efficient research and robust interdisciplinary discovery

    Chaste: an open source C++ library for computational physiology and biology

    Get PDF
    Chaste - Cancer, Heart And Soft Tissue Environment - is an open source C++ library for the computational simulation of mathematical models developed for physiology and biology. Code development has been driven by two initial applications: cardiac electrophysiology and cancer development. A large number of cardiac electrophysiology studies have been enabled and performed, including high performance computational investigations of defibrillation on realistic human cardiac geometries. New models for the initiation and growth of tumours have been developed. In particular, cell-based simulations have provided novel insight into the role of stem cells in the colorectal crypt. Chaste is constantly evolving and is now being applied to a far wider range of problems. The code provides modules for handling common scientific computing components, such as meshes and solvers for ordinary and partial differential equations (ODEs/PDEs). Re-use of these components avoids the need for researchers to "re-invent the wheel" with each new project, accelerating the rate of progress in new applications. Chaste is developed using industrially-derived techniques, in particular test-driven development, to ensure code quality, re-use and reliability. In this article we provide examples that illustrate the types of problems Chaste can be used to solve, which can be run on a desktop computer. We highlight some scientific studies that have used or are using Chaste, and the insights they have provided. The source code, both for specific releases and the development version, is available to download under an open source Berkeley Software Distribution (BSD) licence at http://www.cs.ox.ac.uk/chaste, together with details of a mailing list and links to documentation and tutorials

    YesWorkflow:A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts

    Get PDF
    Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow systems due to the convenience and familiarity of scripting languages (such as Perl, Python, R, and MATLAB), and to the high productivity many scientists experience when using these languages. YesWorkflow is a set of software tools that aim to provide such users of scripting languages with many of the benefits of scientific workflow systems. YesWorkflow requires neither the use of a workflow engine nor the overhead of adapting code to run effectively in such a system. Instead, YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts. YesWorkflow tools extract and analyze these comments, represent the scripts in terms of entities based on the typical scientific workflow model, and provide graphical renderings of this workflow-like view of the scripts. Future versions of YesWorkflow also will allow the prospective provenance of the data products of these scripts to be queried in ways similar to those available to users of scientific workflow systems
    corecore