173 research outputs found
Recommended from our members
Semantic Annotation of Mutable Data
Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema
Incentivising Use of Structured Language in Biological Descriptions: Author-Driven Phenotype Data and Ontology Production
Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project’s semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production
Incentivising Use of Structured Language in Biological Descriptions: Author-Driven Phenotype Data and Ontology Production
Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project’s semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production
Incorporating hydrology into climate suitability models changes projections of malaria transmission in Africa
Continental-scale models of malaria climate suitability typically couple well-established temperature-response models with basic estimates of vector habitat availability using rainfall as a proxy. Here we show that across continental Africa, the estimated geographic range of climatic suitability for malaria transmission is more sensitive to the precipitation threshold than the thermal response curve applied. To address this problem we use downscaled daily climate predictions from seven GCMs to run a continental-scale hydrological model for a process-based representation of mosquito breeding habitat availability. A more complex pattern of malaria suitability emerges as water is routed through drainage networks and river corridors serve as year-round transmission foci. The estimated hydro-climatically suitable area for stable malaria transmission is smaller than previous models suggest and shows only a very small increase in state-of-the-art future climate scenarios. However, bigger geographical shifts are observed than with most rainfall threshold models and the pattern of that shift is very different when using a hydrological model to estimate surface water availability for vector breeding
Biomarker-driven phenotyping in Parkinson's disease: A translational missing link in disease-modifying clinical trials
Past clinical trials of putative neuroprotective therapies have targeted PD as a single pathogenic disease entity. From an Oslerian clinicopathological perspective, the wide complexity of PD converges into Lewy bodies and justifies a reductionist approach to PD: A single-mechanism therapy can affect most of those sharing the classic pathological hallmark. From a systems-biology perspective, PD is a group of disorders that, while related by sharing the feature of nigral dopamine-neuron degeneration, exhibit unique genetic, biological, and molecular abnormalities, which probably respond differentially to a given therapeutic approach, particularly for strategies aimed at neuroprotection. Under this model, only biomarker-defined, homogenous subtypes of PD are likely to respond optimally to therapies proven to affect the biological processes within each subtype. Therefore, we suggest that precision medicine applied to PD requires a reevaluation of the biomarker-discovery effort. This effort is currently centered on correlating biological measures to clinical features of PD and on identifying factors that predict whether various prodromal states will convert into the classical movement disorder. We suggest, instead, that subtyping of PD requires the reverse view, where abnormal biological signals (i.e., biomarkers), rather than clinical definitions, are used to define disease phenotypes. Successful development of disease-modifying strategies will depend on how relevant the specific biological processes addressed by an intervention are to the pathogenetic mechanisms in the subgroup of targeted patients. This precision-medicine approach will likely yield smaller, but well-defined, subsets of PD amenable to successful neuroprotection.Fil: Espay, Alberto J.. University of Cincinnati; Estados UnidosFil: Schwarzschild, Michael A.. Massachusetts General Hospital; Estados UnidosFil: Tanner, Caroline M.. University of California; Estados UnidosFil: Fernandez, Hubert H.. Cleveland Clinic; Estados UnidosFil: Simon, David K.. Harvard Medical School; Estados UnidosFil: Leverenz, James B.. Cleveland Clinic; Estados UnidosFil: Merola, Aristide. University of Cincinnati; Estados UnidosFil: Chen Plotkin, Alice. University of Pennsylvania; Estados UnidosFil: Brundin, Patrik. Van Andel Research Institute. Center for Neurodegenerative Science; Estados UnidosFil: Kauffman, Marcelo Andres. Universidad Austral; Argentina. Universidad Austral. Facultad de Ciencias Biomédicas. Instituto de Investigaciones en Medicina Traslacional. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Medicina Traslacional; Argentina. Gobierno de la Ciudad de Buenos Aires. Hospital General de Agudos "Ramos Mejía"; ArgentinaFil: Erro, Roberto. Universita di Verona; Italia. University College London; Reino UnidoFil: Kieburtz, Karl. University of Rochester Medical Center; Estados UnidosFil: Woo, Daniel. University of Cincinnati; Estados UnidosFil: Macklin, Eric A.. Massachusetts General Hospital; Estados UnidosFil: Standaert, David G.. University of Alabama at Birmingahm; Estados UnidosFil: Lang, Anthony E.. University of Toronto; Canad
Digital Extended Specimens: Enabling an Extensible Network of Biodiversity Data Records as Integrated Digital Objects on the Internet
The early twenty-first century has witnessed massive expansions in availability and accessibility of digital data in virtually all domains of the biodiversity sciences. Led by an array of asynchronous digitization activities spanning ecological, environmental, climatological, and biological collections data, these initiatives have resulted in a plethora of mostly disconnected and siloed data, leaving to researchers the tedious and time-consuming manual task of finding and connecting them in usable ways, integrating them into coherent data sets, and making them interoperable. The focus to date has been on elevating analog and physical records to digital replicas in local databases prior to elevating them to ever-growing aggregations of essentially disconnected discipline-specific information. In the present article, we propose a new interconnected network of digital objects on the Internet—the Digital Extended Specimen (DES) network—that transcends existing aggregator technology, augments the DES with third-party data through machine algorithms, and provides a platform for more efficient research and robust interdisciplinary discovery
Chaste: an open source C++ library for computational physiology and biology
Chaste - Cancer, Heart And Soft Tissue Environment - is an open source C++ library for the computational simulation of mathematical models developed for physiology and biology. Code development has been driven by two initial applications: cardiac electrophysiology and cancer development. A large number of cardiac electrophysiology studies have been enabled and performed, including high performance computational investigations of defibrillation on realistic human cardiac geometries. New models for the initiation and growth of tumours have been developed. In particular, cell-based simulations have provided novel insight into the role of stem cells in the colorectal crypt. Chaste is constantly evolving and is now being applied to a far wider range of problems. The code provides modules for handling common scientific computing components, such as meshes and solvers for ordinary and partial differential equations (ODEs/PDEs). Re-use of these components avoids the need for researchers to "re-invent the wheel" with each new project, accelerating the rate of progress in new applications. Chaste is developed using industrially-derived techniques, in particular test-driven development, to ensure code quality, re-use and reliability. In this article we provide examples that illustrate the types of problems Chaste can be used to solve, which can be run on a desktop computer. We highlight some scientific studies that have used or are using Chaste, and the insights they have provided. The source code, both for specific releases and the development version, is available to download under an open source Berkeley Software Distribution (BSD) licence at http://www.cs.ox.ac.uk/chaste, together with details of a mailing list and links to documentation and tutorials
YesWorkflow:A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts
Scientific workflow management systems offer features for composing complex
computational pipelines from modular building blocks, for executing the
resulting automated workflows, and for recording the provenance of data
products resulting from workflow runs. Despite the advantages such features
provide, many automated workflows continue to be implemented and executed
outside of scientific workflow systems due to the convenience and familiarity
of scripting languages (such as Perl, Python, R, and MATLAB), and to the high
productivity many scientists experience when using these languages. YesWorkflow
is a set of software tools that aim to provide such users of scripting
languages with many of the benefits of scientific workflow systems. YesWorkflow
requires neither the use of a workflow engine nor the overhead of adapting code
to run effectively in such a system. Instead, YesWorkflow enables scientists to
annotate existing scripts with special comments that reveal the computational
modules and dataflows otherwise implicit in these scripts. YesWorkflow tools
extract and analyze these comments, represent the scripts in terms of entities
based on the typical scientific workflow model, and provide graphical
renderings of this workflow-like view of the scripts. Future versions of
YesWorkflow also will allow the prospective provenance of the data products of
these scripts to be queried in ways similar to those available to users of
scientific workflow systems
Recommended from our members
Effects of Metformin on Spatial and Verbal Memory in Children with ASD and Overweight Associated with Atypical Antipsychotic Use
Abstract Objectives: Studies in humans and rodents suggest that metformin, a medicine typically used to treat type 2 diabetes, may have beneficial effects on memory. We sought to determine whether metformin improved spatial or verbal memory in children with autism spectrum disorder (ASD) and overweight associated with atypical antipsychotic use. Methods: We studied the effects of metformin (Riomet®) concentrate on spatial and verbal memory in 51 youth with ASD, ages 6 through 17 years, who were taking atypical antipsychotic medications, had gained significant weight, and were enrolled in a trial of metformin for weight management. Phase 1 was a 16-week, randomized, double-blind, placebo-controlled, parallel-group comparison of metformin (500–850 mg given twice a day) versus placebo. During Phase 2, all participants took open-label metformin from week 17 through week 32. We assessed spatial and verbal memory using the Neuropsychological Assessment 2nd Edition (NEPSY–II) and a modified children's verbal learning task. Results: No measures differed between participants randomized to metformin versus placebo, at either 16 or 32 weeks, after adjustment for multiple comparisons. Sixteen-week change in memory for spatial location on the NEPSY–II was nominally better among participants randomized to placebo. However, patterns of treatment response across all measures revealed no systematic differences in performance, suggesting that metformin had no effect on spatial or verbal memory in these children. Conclusions: Although further study is needed to support these null effects, the overall impression is that metformin does not affect memory in overweight youth with ASD who were taking atypical antipsychotic medications
- …