52 research outputs found
Using ontologies to map between research data and policymakers’ presumptions: the experience of the KNOWMAK project
Understanding knowledge co-creation in key emerging areas of European research is critical for policy makers wishing to analyze impact and make strategic decisions. However, purely data-driven methods for characterising policy topics have limitations relating to the broad nature of such topics and the differences in language and topic structure between the political language and scientific and technological outputs. In this paper, we discuss the use of ontologies and semantic technologies as a means to bridge the linguistic and conceptual gap between policy questions and data sources for characterising European knowledge production. Our experience suggests that the integration between advanced techniques for language processing and expert assessment at critical junctures in the process is key for the success of this endeavour
'Open the Pod Bay Doors, Please, HAL': Here Comes the Semantic Web
Different kinds of knowledge management systems have been adopted by institutions in an attempt to tame the tsunami of unrelated facts, hard data, research and stray morsels of knowledge that abound in any moderately-sized organisation. Yet each of these islands of coherence, however effective, is just that - an island - and, as such, cannot provide a generalised way forward to making data usable. The Semantic Web is an attempt to solve this problem. In the context of the project, 'semantic' simply stands for 'machine-processable'. If information can be made comprehensible to machines such as computers, these machines can then do all the hard work of sorting and sifting and weighing up that is currently done (very imperfectly) by humans, and, because they are computers, they can do it more quickly and on an unimaginably huge scale. In addition, they can learn on the job so that they can make an even better fist of it the next time around, and better again the time after that
Open Data Sectors and Communities: Environment
Chapter 7 in the book The State of Open Data: Histories and Horizons
Information Outlook, July/August 2018
Volume 22, Issue 4https://scholarworks.sjsu.edu/sla_io_2018/1003/thumbnail.jp
The health care and life sciences community profile for dataset descriptions
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets
Transcript expression-aware annotation improves rare variant interpretation
The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)(1), we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project(2) and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.Peer reviewe
Methodology and System for Ontology-Enabled Traceability: Pilot Application to Design and Management of the Washington D.C. Metro System
This report describes a new methodology and system for
satisfying requirements, and an architectural framework for linking
discipline-specific dependencies through
interaction relationships at the meta-model (or ontology) level.
In state-of-the-art traceability mechanisms,
requirements are connected directly to design objects.
Here, in contrast, we ask the question:
What design concept (or family of design concepts)
should be applied to satisfy this requirement?
Solutions to this question establish links between requirements and design concepts.
Then, it is the implementation of these concepts that leads to the design itself.
These ideas are prototyped through a Washington DC Metro System requirements-to-design model mockup.
The proposed methodology offers several benefits not possible with state-of-the-art procedures.
First, procedures for design rule checking may be embedded into design concept nodes,
thereby creating a pathway for system validation and verification processes
that can be executed early in the systems lifecycle
where errors are cheapest and easiest to fix.
Second, the proposed model provides a much better big-picture view of
relevant design concepts and how they fit together,
than is possible with linking of domains at the model level.
And finally, the proposed procedures are automatically reusable across
families of projects where the ontologies are applicable
- …