125 research outputs found
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML
OQAFMA Querying Agent for the Foundational Model of Anatomy: a Prototype for Providing Flexible and Efficient Access to Large Semantic Networks
The development of large semantic networks, such as the UMLS, which are intended to support a variety of applications, requires a exible and e cient query interface for the extraction of information. Using one of the source vocabularies of UMLS as a test bed, we have developed such a prototype query interface. We rst identify common classes of queries needed by applications that access these semantic networks. Next, we survey STRUQL, an existing query language that we adopted, which supports all of these classes of queries. We then describe the OQAFMA Querying Agent for the Foundational Model of Anatomy (OQAFMA), which provides an e cient implementation of a subset of STRUQL by pre-computing a variety of indices. We describe how OQAFMA leverages database optimization by converting STRUQL queries to SQL. We evaluate the exibility and e ciency of our implementation using English queries written by anatomists. This evaluation veri es that OQAFMA provides exible, e cient access to one such large semantic network, the Foundational Model of Anatomy, and suggests that OQAFMA could be an e cient query interface to other large biomedical knowledge bases, such as the Uni ed Medical Language System
Application of Neuroanatomical Ontologies for Neuroimaging Data Annotation
The annotation of functional neuroimaging results for data sharing and re-use is particularly challenging, due to the diversity of terminologies of neuroanatomical structures and cortical parcellation schemes. To address this challenge, we extended the Foundational Model of Anatomy Ontology (FMA) to include cytoarchitectural, Brodmann area labels, and a morphological cortical labeling scheme (e.g., the part of Brodmann area 6 in the left precentral gyrus). This representation was also used to augment the neuroanatomical axis of RadLex, the ontology for clinical imaging. The resulting neuroanatomical ontology contains explicit relationships indicating which brain regions are “part of” which other regions, across cytoarchitectural and morphological labeling schemas. We annotated a large functional neuroimaging dataset with terms from the ontology and applied a reasoning engine to analyze this dataset in conjunction with the ontology, and achieved successful inferences from the most specific level (e.g., how many subjects showed activation in a subpart of the middle frontal gyrus) to more general (how many activations were found in areas connected via a known white matter tract?). In summary, we have produced a neuroanatomical ontology that harmonizes several different terminologies of neuroanatomical structures and cortical parcellation schemes. This neuroanatomical ontology is publicly available as a view of FMA at the Bioportal website1. The ontological encoding of anatomic knowledge can be exploited by computer reasoning engines to make inferences about neuroanatomical relationships described in imaging datasets using different terminologies. This approach could ultimately enable knowledge discovery from large, distributed fMRI studies or medical record mining
Keyword-Based Querying for the Social Semantic Web
Enabling non-experts to publish data on the web is an important
achievement of the social web and one of the primary goals of the social
semantic web. Making the data easily accessible in turn has received only
little attention, which is problematic from the point of view of
incentives: users are likely to be less motivated to participate in the
creation of content if the use of this content is mostly reserved to
experts.
Querying in semantic wikis, for example, is typically realized in terms of
full text search over the textual content and a web query language such as
SPARQL for the annotations. This approach has two shortcomings that limit
the extent to which data can be leveraged by users: combined queries over
content and annotations are not possible, and users either are restricted
to expressing their query intent using simple but vague keyword queries or
have to learn a complex web query language.
The work presented in this dissertation investigates a more suitable form
of querying for semantic wikis that consolidates two seemingly conflicting
characteristics of query languages, ease of use and expressiveness. This
work was carried out in the context of the semantic wiki KiWi, but the
underlying ideas apply more generally to the social semantic and social
web.
We begin by defining a simple modular conceptual model for the KiWi wiki
that enables rich and expressive knowledge representation. A component of
this model are structured tags, an annotation formalism that is simple yet
flexible and expressive, and aims at bridging the gap between atomic tags
and RDF. The viability of the approach is confirmed by a user study, which
finds that structured tags are suitable for quickly annotating evolving
knowledge and are perceived well by the users.
The main contribution of this dissertation is the design and
implementation of KWQL, a query language for semantic wikis. KWQL combines
keyword search and web querying to enable querying that scales with user
experience and information need: basic queries are easy to express; as the
search criteria become more complex, more expertise is needed to formulate
the corresponding query. A novel aspect of KWQL is that it combines both
paradigms in a bottom-up fashion. It treats neither of the two as an
extension to the other, but instead integrates both in one framework. The
language allows for rich combined queries of full text, metadata, document
structure, and informal to formal semantic annotations. KWilt, the KWQL
query engine, provides the full expressive power of first-order queries,
but at the same time can evaluate basic queries at almost the speed of the
underlying search engine. KWQL is accompanied by the visual query language
visKWQL, and an editor that displays both the textual and visual form of
the current query and reflects changes to either representation in the
other. A user study shows that participants quickly learn to construct
KWQL and visKWQL queries, even when given only a short introduction.
KWQL allows users to sift the wealth of structure and annotations in an
information system for relevant data. If relevant data constitutes a
substantial fraction of all data, ranking becomes important. To this end,
we propose PEST, a novel ranking method that propagates relevance among
structurally related or similarly annotated data. Extensive experiments,
including a user study on a real life wiki, show that pest improves the
quality of the ranking over a range of existing ranking approaches
NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.R.A.V. received support from the CIPRES project (NSF #EF-03314953 to W.P.M.), the FP7 Marie Curie Programme (Call FP7-PEOPLE-IEF-2008—Proposal No. 237046) and, for the NeXML implementation in TreeBASE, the pPOD project (NSF IIS 0629846); P.E.M. and J.S. received support from CIPRES (NSF #EF-0331495, #EF-0715370); M.T.H. was supported by NSF (DEB-ATOL-0732920); X.X. received support from NSERC (Canada) Discovery and RTI grants; W.P.M. received support from an NSERC (Canada) Discovery grant; J.C. received support from a Google Summer of Code 2007 grant; A.P. received support from a Google Summer of Code 2010 grant
Xcerpt: A Rule-Based Query and Transformation Language for the Web
This thesis investigates querying the Web and the Semantic Web. It proposes a new rulebased query language called Xcerpt. Xcerpt differs from other query languages in that it uses patterns instead of paths for the selection of data, and in that it supports both rule chaining and recursion. Rule chaining serves for structuring large queries, as well as for designing complex query programs (e.g. involving queries to the Semantic Web), and for modelling inference rules. Query patterns may contain special constructs like partial subqueries, optional subqueries, or negated subqueries that account for the particularly flexible structure of data on the Web.
Furthermore, this thesis introduces the syntax of the language Xcerpt, which is illustrated on a large collection of use cases both from the conventional Web and the Semantic Web. In addition, a declarative semantics in form of a Tarski-style model theory is described, and an algorithm is proposed that performs a backward chaining evaluation of Xcerpt programs. This algorithm has also been implemented (partly) in a prototypical runtime system. A salient aspect of this algorithm is the specification of a non-standard unification algorithm called simulation unification that supports the new query constructs described above. This unification is symmetric in the sense that variables in both terms can be bound. On the other hand it is in contrast to standard unification assymmetric in the sense that the unification determines that the one term is a subterm of the other term.Diese Arbeit untersucht das Anfragen des Webs und des Semantischen Webs. Sie stellt eine neue regel-basierte Anfragesprache namens Xcerpt vor. Xcerpt unterscheidet sich von anderen Anfragesprachen insofern, als dass es zur Selektion von Daten sog. Pattern (,,Muster'') verwendet und sowohl Regelschliessen als auch Rekursion unterstützt, was sowohl zur Strukturierung größerer Anfragen als auch zur Erstellung komplexer Anfrageprogramme, und zur Modellierung von Inferenzregeln dient. Anfrage-Pattern können spezielle Konstrukte, wie partielle Teilanfragen, optionale Teilanfragen, oder negierte Teilanfragen, enthalten, die der besonders flexiblen Struktur von Daten im Web genügen.
In dieser Arbeit wird weiterhin die Syntax von Xcerpt eingeführt, und mit Hilfe mehrerer Anwendungsszenarien sowohl aus dem konventionellen als auch aus dem semantischen Web erläutert. Ausserdem wird eine deklarative Semantik im Stil von Tarski's Modelltheorie beschrieben und ein Algorithmus vorgeschlagen, der eine rückwärtsschliessende Auswertung von Xcerpt durchführt und in einem prototypischen Laufzeitsystem implementiert wurde. Wesentlicher Bestandteil des Rückwärtsschliessens ist die Spezifikation eines nicht-standard Unifikations-Algorithmus, der die oben genannten speziellen Xcerpt-Konstrukte berücksichtigt. Diese Unifikation ist symmetrisch in dem Sinne, dass sie Variablen in beiden angeglichenen (,,unifizierten'') Termen binden kann. Andererseits ist sie im Gegensatz zur Standardunifikation assymmetrisch in dem Sinne, dass der dadurch geleistete Angleich den einen Term als ,,Teilterm'' des anderen erkennt
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Rethinking the web structure: focusing on events to create better information and experience management
The objective of the following research is to investigate the
problem of information management and conveyed experience on the
World Wide Web (WWW) when multi-modal sensors and media are
available. After studying related areas of work about the web and
heterogeneous media, it became apparent that one of the main
challenges of the area is the semantic unification of
heterogeneous media. This thesis will introduce an
event-based model to semantically unify media. An event is
defined as something of significance that takes place at a given
time and location. Using this definition and the corresponding
model, a system will be designed to illustrate practical use cases
for events.M.S.Committee Chair: Ramesh Jain; Committee Member: Jim [email protected]; Committee Member: Linda Will
- …