11,701 research outputs found
Non-hierarchical Structures: How to Model and Index Overlaps?
Overlap is a common phenomenon seen when structural components of a digital
object are neither disjoint nor nested inside each other. Overlapping
components resist reduction to a structural hierarchy, and tree-based indexing
and query processing techniques cannot be used for them. Our solution to this
data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a
novel extension of the XML data model for non-hierarchical structures. We
introduce an algorithm for constructing TGSA from annotated documents; the
algorithm can efficiently process non-hierarchical structures and is associated
with formal proofs, ensuring that transformation of the document to the data
model is valid. To enable high performance query analysis in large data
repositories, we further introduce an extension of XML pre-post indexing for
non-hierarchical structures, which can process both reachability and
overlapping relationships.Comment: The paper has been accepted at the Balisage 2014 conferenc
IVOA Recommendation: VOTable Format Definition Version 1.3
This document describes the structures making up the VOTable standard. The
main part of this document describes the adopted part of the VOTable standard;
it is followed by appendices presenting extensions which have been proposed
and/or discussed, but which are not part of the standard
Semantic Technologies for Manuscript Descriptions — Concepts and Visions
The contribution at hand relates recent developments in the area of the World Wide
Web to codicological research. In the last number of years, an informational extension
of the internet has been discussed and extensively researched: the Semantic Web. It
has already been applied in many areas, including digital information processing of
cultural heritage data. The Semantic Web facilitates the organisation and linking of
data across websites, according to a given semantic structure. Software can then process
this structural and semantic information to extract further knowledge. In the area
of codicological research, many institutions are making efforts to improve the online
availability of handwritten codices. If these resources could also employ Semantic
Web techniques, considerable research potential could be unleashed. However, data
acquisition from less structured data sources will be problematic. In particular, data
stemming from unstructured sources needs to be made accessible to SemanticWeb tools
through information extraction techniques. In the area of museum research, the CIDOC
Conceptual Reference Model (CRM) has been widely examined and is being adopted
successfully. The CRM translates well to Semantic Web research, and its concentration
on contextualization of objects could support approaches in codicological research.
Further concepts for the creation and management of bibliographic coherences and
structured vocabularies related to the CRM will be considered in this chapter. Finally, a
user scenario showing all processing steps in their context will be elaborated on
IVOA Recommendation: Simple Spectral Access Protocol Version 1.1
The Simple Spectral Access (SSA) Protocol (SSAP) defines a uniform interface
to remotely discover and access one dimensional spectra. SSA is a member of an
integrated family of data access interfaces altogether comprising the Data
Access Layer (DAL) of the IVOA. SSA is based on a more general data model
capable of describing most tabular spectrophotometric data, including time
series and spectral energy distributions (SEDs) as well as 1-D spectra; however
the scope of the SSA interface as specified in this document is limited to
simple 1-D spectra, including simple aggregations of 1-D spectra. The form of
the SSA interface is simple: clients first query the global resource registry
to find services of interest and then issue a data discovery query to selected
services to determine what relevant data is available from each service; the
candidate datasets available are described uniformly in a VOTable format
document which is returned in response to the query. Finally, the client may
retrieve selected datasets for analysis. Spectrum datasets returned by an SSA
spectrum service may be either precomputed, archival datasets, or they may be
virtual data which is computed on the fly to respond to a client request.
Spectrum datasets may conform to a standard data model defined by SSA, or may
be native spectra with custom project-defined content. Spectra may be returned
in any of a number of standard data formats. Spectral data is generally stored
externally to the VO in a format specific to each spectral data collection;
currently there is no standard way to represent astronomical spectra, and
virtually every project does it differently. Hence spectra may be actively
mediated to the standard SSA-defined data model at access time by the service,
so that client analysis programs do not have to be familiar with the
idiosyncratic details of each data collection to be accessed
A Web-Based Tool for Analysing Normative Documents in English
Our goal is to use formal methods to analyse normative documents written in
English, such as privacy policies and service-level agreements. This requires
the combination of a number of different elements, including information
extraction from natural language, formal languages for model representation,
and an interface for property specification and verification. We have worked on
a collection of components for this task: a natural language extraction tool, a
suitable formalism for representing such documents, an interface for building
models in this formalism, and methods for answering queries asked of a given
model. In this work, each of these concerns is brought together in a web-based
tool, providing a single interface for analysing normative texts in English.
Through the use of a running example, we describe each component and
demonstrate the workflow established by our tool
Expanding sensor networks to automate knowledge acquisition
The availability of accurate, low-cost sensors to scientists has resulted in widespread deployment in a variety of sporting and health environments. The sensor data output is often in a raw, proprietary or unstructured format. As a result, it is often difficult to query multiple sensors for complex properties or actions. In our research, we deploy a heterogeneous sensor network to detect the various biological and physiological properties in athletes during training activities. The goal for exercise physiologists is to quickly identify key intervals in exercise such as moments of stress or fatigue. This is not currently possible because of low level sensors and a lack of query language support. Thus, our motivation is to expand the sensor network with a contextual layer that enriches raw sensor data, so that it can be exploited by a high level query language. To achieve this, the domain expert specifies events in a tradiational event-condition-action format to deliver the required contextual enrichment
Knowledge Rich Natural Language Queries over Structured Biological Databases
Increasingly, keyword, natural language and NoSQL queries are being used for
information retrieval from traditional as well as non-traditional databases
such as web, document, image, GIS, legal, and health databases. While their
popularity are undeniable for obvious reasons, their engineering is far from
simple. In most part, semantics and intent preserving mapping of a well
understood natural language query expressed over a structured database schema
to a structured query language is still a difficult task, and research to tame
the complexity is intense. In this paper, we propose a multi-level
knowledge-based middleware to facilitate such mappings that separate the
conceptual level from the physical level. We augment these multi-level
abstractions with a concept reasoner and a query strategy engine to dynamically
link arbitrary natural language querying to well defined structured queries. We
demonstrate the feasibility of our approach by presenting a Datalog based
prototype system, called BioSmart, that can compute responses to arbitrary
natural language queries over arbitrary databases once a syntactic
classification of the natural language query is made
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met
- …