69 research outputs found
Using MathML to Represent Units of Measurement for Improved Ontology Alignment
Ontologies provide a formal description of concepts and their relationships
in a knowledge domain. The goal of ontology alignment is to identify
semantically matching concepts and relationships across independently developed
ontologies that purport to describe the same knowledge. In order to handle the
widest possible class of ontologies, many alignment algorithms rely on
terminological and structural meth- ods, but the often fuzzy nature of concepts
complicates the matching process. However, one area that should provide clear
matching solutions due to its mathematical nature, is units of measurement.
Several on- tologies for units of measurement are available, but there has been
no attempt to align them, notwithstanding the obvious importance for tech-
nical interoperability. We propose a general strategy to map these (and
similar) ontologies by introducing MathML to accurately capture the semantic
description of concepts specified therein. We provide mapping results for three
ontologies, and show that our approach improves on lexical comparisons.Comment: Conferences on Intelligent Computer Mathematics (CICM 2013), Bath,
Englan
Semantically-Enabled Sensor Plug & Play for the Sensor Web
Environmental sensors have continuously improved by becoming smaller, cheaper, and more intelligent over the past years. As consequence of these technological advancements, sensors are increasingly deployed to monitor our environment. The large variety of available sensor types with often incompatible protocols complicates the integration of sensors into observing systems. The standardized Web service interfaces and data encodings defined within OGC’s Sensor Web Enablement (SWE) framework make sensors available over the Web and hide the heterogeneous sensor protocols from applications. So far, the SWE framework does not describe how to integrate sensors on-the-fly with minimal human intervention. The driver software which enables access to sensors has to be implemented and the measured sensor data has to be manually mapped to the SWE models. In this article we introduce a Sensor Plug & Play infrastructure for the Sensor Web by combining (1) semantic matchmaking functionality, (2) a publish/subscribe mechanism underlying the SensorWeb, as well as (3) a model for the declarative description of sensor interfaces which serves as a generic driver mechanism. We implement and evaluate our approach by applying it to an oil spill scenario. The matchmaking is realized using existing ontologies and reasoning engines and provides a strong case for the semantic integration capabilities provided by Semantic Web research
Annotation-based storage and retrieval of models and simulation descriptions in computational biology
This work aimed at enhancing reuse of computational biology models by identifying and formalizing relevant meta-information. One type of meta-information investigated in this thesis is experiment-related meta-information attached to a model, which is necessary to accurately recreate simulations. The main results are: a detailed concept for model annotation, a proposed format for the encoding of simulation experiment setups, a storage solution for standardized model representations and the development of a retrieval concept.Die vorliegende Arbeit widmete sich der besseren Wiederverwendung biologischer Simulationsmodelle. Ziele waren die Identifikation und Formalisierung relevanter Modell-Meta-Informationen, sowie die Entwicklung geeigneter Modellspeicherungs- und Modellretrieval-Konzepte.
Wichtigste Ergebnisse der Arbeit sind ein detailliertes Modellannotationskonzept, ein Formatvorschlag für standardisierte Kodierung von Simulationsexperimenten in XML, eine Speicherlösung für Modellrepräsentationen sowie ein Retrieval-Konzept
A mathematics rendering model to support chat-based tutoring
Dr Math is a math tutoring service implemented on the chat application Mxit. The service allows school learners to use their mobile phones to discuss mathematicsrelated topics with human tutors. Using the broad user-base provided by Mxit, the Dr Math service has grown to consist of tens of thousands of registered school learners. The tutors on the service are all volunteers and the learners far outnumber the available tutors at any given time. School learners on the service use a shorthand language-form called microtext, to phrase their queries. Microtext is an informal form of language which consists of a variety of misspellings and symbolic representations, which emerge spontaneously as a result of the idiosyncrasies of a learner. The specific form of microtext found on the Dr Math service contains mathematical questions and example equations, pertaining to the tutoring process. Deciphering the queries, to discover their embedded mathematical content, slows down the tutoring process. This wastes time that could have been spent addressing more learner queries. The microtext language thus creates an unnecessary burden on the tutors. This study describes the development of an automated process for the translation of Dr Math microtext queries into mathematical equations. Using the design science research paradigm as a guide, three artefacts are developed. These artefacts take the form of a construct, a model and an instantiation. The construct represents the creation of new knowledge as it provides greater insight into the contents and structure of the language found on a mobile mathematics tutoring service. The construct serves as the basis for the creation of a model for the translation of microtext queries into mathematical equations, formatted for display in an electronic medium. No such technique currently exists and therefore, the model contributes new knowledge. To validate the model, an instantiation was created to serve as a proof-of-concept. The instantiation applies various concepts and techniques, such as those related to natural language processing, to the learner queries on the Dr Math service. These techniques are employed in order to translate an input microtext statement into a mathematical equation, structured by using mark-up language. The creation of the instantiation thus constitutes a knowledge contribution, as most of these techniques have never been applied to the problem of translating microtext into mathematical equations. For the automated process to have utility, it should perform on a level comparable to that of a human performing a similar translation task. To determine how closely related the results from the automated process are to those of a human, three human participants were asked to perform coding and translation tasks. The results of the human participants were compared to the results of the automated process, across a variety of metrics, including agreement, correlation, precision, recall and others. The results from the human participants served as the baseline values for comparison. The baseline results from the human participants were compared with those of the automated process. Krippendorff’s α was used to determine the level of agreement and Pearson’s correlation coefficient to determine the level of correlation between the results. The agreement between the human participants and the automated process was calculated at a level deemed satisfactory for exploratory research and the level of correlation was calculated as moderate. These values correspond with the calculations made as the human baseline. Furthermore, the automated process was able to meet or improve on all of the human baseline metrics. These results serve to validate that the automated process is able to perform the translation at a level comparable to that of a human. The automated process is available for integration into any requesting application, by means of a publicly accessible web service
Recommended from our members
HOLMES: A Hybrid Ontology-Learning Materials Engineering System
Designing and discovering novel materials is challenging problem in many domains such as fuel additives, composites, pharmaceuticals, and so on. At the core of all this are models that capture how the different domain-specific data, information, and knowledge regarding the structures and properties of the materials are related to one another. This dissertation explores the difficult task of developing an artificial intelligence-based knowledge modeling environment, called Hybrid Ontology-Learning Materials Engineering System (HOLMES) that can assist humans in populating a materials science and engineering ontology through automatic information extraction from journal article abstracts. While what we propose may be adapted for a generic materials engineering application, our focus in this thesis is on the needs of the pharmaceutical industry. We develop the Columbia Ontology for Pharmaceutical Engineering (COPE), which is a modification of the Purdue Ontology for Pharmaceutical Engineering. COPE serves as the basis for HOLMES.
The HOLMES framework starts with journal articles that are in the Portable Document Format (PDF) and ends with the assignment of the entries in the journal articles into ontologies. While this might seem to be a simple task of information extraction, to fully extract the information such that the ontology is filled as completely and correctly as possible is not easy when considering a fully developed ontology.
In the development of the information extraction tasks, we note that there are new problems that have not arisen in previous information extraction work in the literature. The first is the necessity to extract auxiliary information in the form of concepts such as actions, ideas, problem specifications, properties, etc. The second problem is in the existence of multiple labels for a single token due to the existence of the aforementioned concepts. These two problems are the focus of this dissertation.
In this work, the HOLMES framework is presented as a whole, describing our successful progress as well as unsolved problems, which might help future research on this topic. The ontology is then presented to help in the identification of the relevant information that needs to be retrieved. The annotations are next developed to create the data sets necessary for the machine learning algorithms to perform. Then, the current level of information extraction for these concepts is explored and expanded. This is done through the introduction of entity feature sets that are based on previously extracted entities from the entity recognition task. And finally, the new task of handling multiple labels for tagging a single entity is also explored by the use of multiple-label algorithms used primarily in image processing
- …