125 research outputs found

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

    Get PDF
    In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML

    OQAFMA Querying Agent for the Foundational Model of Anatomy: a Prototype for Providing Flexible and Efficient Access to Large Semantic Networks

    Get PDF
    The development of large semantic networks, such as the UMLS, which are intended to support a variety of applications, requires a exible and e cient query interface for the extraction of information. Using one of the source vocabularies of UMLS as a test bed, we have developed such a prototype query interface. We rst identify common classes of queries needed by applications that access these semantic networks. Next, we survey STRUQL, an existing query language that we adopted, which supports all of these classes of queries. We then describe the OQAFMA Querying Agent for the Foundational Model of Anatomy (OQAFMA), which provides an e cient implementation of a subset of STRUQL by pre-computing a variety of indices. We describe how OQAFMA leverages database optimization by converting STRUQL queries to SQL. We evaluate the exibility and e ciency of our implementation using English queries written by anatomists. This evaluation veri es that OQAFMA provides exible, e cient access to one such large semantic network, the Foundational Model of Anatomy, and suggests that OQAFMA could be an e cient query interface to other large biomedical knowledge bases, such as the Uni ed Medical Language System

    Application of Neuroanatomical Ontologies for Neuroimaging Data Annotation

    Get PDF
    The annotation of functional neuroimaging results for data sharing and re-use is particularly challenging, due to the diversity of terminologies of neuroanatomical structures and cortical parcellation schemes. To address this challenge, we extended the Foundational Model of Anatomy Ontology (FMA) to include cytoarchitectural, Brodmann area labels, and a morphological cortical labeling scheme (e.g., the part of Brodmann area 6 in the left precentral gyrus). This representation was also used to augment the neuroanatomical axis of RadLex, the ontology for clinical imaging. The resulting neuroanatomical ontology contains explicit relationships indicating which brain regions are “part of” which other regions, across cytoarchitectural and morphological labeling schemas. We annotated a large functional neuroimaging dataset with terms from the ontology and applied a reasoning engine to analyze this dataset in conjunction with the ontology, and achieved successful inferences from the most specific level (e.g., how many subjects showed activation in a subpart of the middle frontal gyrus) to more general (how many activations were found in areas connected via a known white matter tract?). In summary, we have produced a neuroanatomical ontology that harmonizes several different terminologies of neuroanatomical structures and cortical parcellation schemes. This neuroanatomical ontology is publicly available as a view of FMA at the Bioportal website1. The ontological encoding of anatomic knowledge can be exploited by computer reasoning engines to make inferences about neuroanatomical relationships described in imaging datasets using different terminologies. This approach could ultimately enable knowledge discovery from large, distributed fMRI studies or medical record mining

    Keyword-Based Querying for the Social Semantic Web

    Get PDF
    Enabling non-experts to publish data on the web is an important achievement of the social web and one of the primary goals of the social semantic web. Making the data easily accessible in turn has received only little attention, which is problematic from the point of view of incentives: users are likely to be less motivated to participate in the creation of content if the use of this content is mostly reserved to experts. Querying in semantic wikis, for example, is typically realized in terms of full text search over the textual content and a web query language such as SPARQL for the annotations. This approach has two shortcomings that limit the extent to which data can be leveraged by users: combined queries over content and annotations are not possible, and users either are restricted to expressing their query intent using simple but vague keyword queries or have to learn a complex web query language. The work presented in this dissertation investigates a more suitable form of querying for semantic wikis that consolidates two seemingly conflicting characteristics of query languages, ease of use and expressiveness. This work was carried out in the context of the semantic wiki KiWi, but the underlying ideas apply more generally to the social semantic and social web. We begin by defining a simple modular conceptual model for the KiWi wiki that enables rich and expressive knowledge representation. A component of this model are structured tags, an annotation formalism that is simple yet flexible and expressive, and aims at bridging the gap between atomic tags and RDF. The viability of the approach is confirmed by a user study, which finds that structured tags are suitable for quickly annotating evolving knowledge and are perceived well by the users. The main contribution of this dissertation is the design and implementation of KWQL, a query language for semantic wikis. KWQL combines keyword search and web querying to enable querying that scales with user experience and information need: basic queries are easy to express; as the search criteria become more complex, more expertise is needed to formulate the corresponding query. A novel aspect of KWQL is that it combines both paradigms in a bottom-up fashion. It treats neither of the two as an extension to the other, but instead integrates both in one framework. The language allows for rich combined queries of full text, metadata, document structure, and informal to formal semantic annotations. KWilt, the KWQL query engine, provides the full expressive power of first-order queries, but at the same time can evaluate basic queries at almost the speed of the underlying search engine. KWQL is accompanied by the visual query language visKWQL, and an editor that displays both the textual and visual form of the current query and reflects changes to either representation in the other. A user study shows that participants quickly learn to construct KWQL and visKWQL queries, even when given only a short introduction. KWQL allows users to sift the wealth of structure and annotations in an information system for relevant data. If relevant data constitutes a substantial fraction of all data, ranking becomes important. To this end, we propose PEST, a novel ranking method that propagates relevance among structurally related or similarly annotated data. Extensive experiments, including a user study on a real life wiki, show that pest improves the quality of the ranking over a range of existing ranking approaches

    NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

    Get PDF
    In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.R.A.V. received support from the CIPRES project (NSF #EF-03314953 to W.P.M.), the FP7 Marie Curie Programme (Call FP7-PEOPLE-IEF-2008—Proposal No. 237046) and, for the NeXML implementation in TreeBASE, the pPOD project (NSF IIS 0629846); P.E.M. and J.S. received support from CIPRES (NSF #EF-0331495, #EF-0715370); M.T.H. was supported by NSF (DEB-ATOL-0732920); X.X. received support from NSERC (Canada) Discovery and RTI grants; W.P.M. received support from an NSERC (Canada) Discovery grant; J.C. received support from a Google Summer of Code 2007 grant; A.P. received support from a Google Summer of Code 2010 grant

    Xcerpt: A Rule-Based Query and Transformation Language for the Web

    Get PDF
    This thesis investigates querying the Web and the Semantic Web. It proposes a new rulebased query language called Xcerpt. Xcerpt differs from other query languages in that it uses patterns instead of paths for the selection of data, and in that it supports both rule chaining and recursion. Rule chaining serves for structuring large queries, as well as for designing complex query programs (e.g. involving queries to the Semantic Web), and for modelling inference rules. Query patterns may contain special constructs like partial subqueries, optional subqueries, or negated subqueries that account for the particularly flexible structure of data on the Web. Furthermore, this thesis introduces the syntax of the language Xcerpt, which is illustrated on a large collection of use cases both from the conventional Web and the Semantic Web. In addition, a declarative semantics in form of a Tarski-style model theory is described, and an algorithm is proposed that performs a backward chaining evaluation of Xcerpt programs. This algorithm has also been implemented (partly) in a prototypical runtime system. A salient aspect of this algorithm is the specification of a non-standard unification algorithm called simulation unification that supports the new query constructs described above. This unification is symmetric in the sense that variables in both terms can be bound. On the other hand it is in contrast to standard unification assymmetric in the sense that the unification determines that the one term is a subterm of the other term.Diese Arbeit untersucht das Anfragen des Webs und des Semantischen Webs. Sie stellt eine neue regel-basierte Anfragesprache namens Xcerpt vor. Xcerpt unterscheidet sich von anderen Anfragesprachen insofern, als dass es zur Selektion von Daten sog. Pattern (,,Muster'') verwendet und sowohl Regelschliessen als auch Rekursion unterstützt, was sowohl zur Strukturierung größerer Anfragen als auch zur Erstellung komplexer Anfrageprogramme, und zur Modellierung von Inferenzregeln dient. Anfrage-Pattern können spezielle Konstrukte, wie partielle Teilanfragen, optionale Teilanfragen, oder negierte Teilanfragen, enthalten, die der besonders flexiblen Struktur von Daten im Web genügen. In dieser Arbeit wird weiterhin die Syntax von Xcerpt eingeführt, und mit Hilfe mehrerer Anwendungsszenarien sowohl aus dem konventionellen als auch aus dem semantischen Web erläutert. Ausserdem wird eine deklarative Semantik im Stil von Tarski's Modelltheorie beschrieben und ein Algorithmus vorgeschlagen, der eine rückwärtsschliessende Auswertung von Xcerpt durchführt und in einem prototypischen Laufzeitsystem implementiert wurde. Wesentlicher Bestandteil des Rückwärtsschliessens ist die Spezifikation eines nicht-standard Unifikations-Algorithmus, der die oben genannten speziellen Xcerpt-Konstrukte berücksichtigt. Diese Unifikation ist symmetrisch in dem Sinne, dass sie Variablen in beiden angeglichenen (,,unifizierten'') Termen binden kann. Andererseits ist sie im Gegensatz zur Standardunifikation assymmetrisch in dem Sinne, dass der dadurch geleistete Angleich den einen Term als ,,Teilterm'' des anderen erkennt

    Theory and Practice of Data Citation

    Full text link
    Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

    Rethinking the web structure: focusing on events to create better information and experience management

    Get PDF
    The objective of the following research is to investigate the problem of information management and conveyed experience on the World Wide Web (WWW) when multi-modal sensors and media are available. After studying related areas of work about the web and heterogeneous media, it became apparent that one of the main challenges of the area is the semantic unification of heterogeneous media. This thesis will introduce an event-based model to semantically unify media. An event is defined as something of significance that takes place at a given time and location. Using this definition and the corresponding model, a system will be designed to illustrate practical use cases for events.M.S.Committee Chair: Ramesh Jain; Committee Member: Jim [email protected]; Committee Member: Linda Will
    • …
    corecore