9,881 research outputs found
Report on the XBase Project
This project addressed the conceptual fundamentals of data storage,
investigating techniques for provision of highly generic storage facilities
that can be tailored to produce various individually customised storage
infrastructures, compliant to the needs of particular applications. This
requires the separation of mechanism and policy wherever possible. Aspirations
include: actors, whether users or individual processes, should be able to bind
to, update and manipulate data and programs transparently with respect to their
respective locations; programs should be expressed independently of the storage
and network technology involved in their execution; storage facilities should
be structure-neutral so that actors can impose multiple interpretations over
information, simultaneously and safely; information should not be discarded so
that arbitrary historical views are supported; raw stored information should be
open to all; where security restrictions on its use are required this should be
achieved using cryptographic techniques. The key advances of the research were:
1) the identification of a candidate set of minimal storage system building
blocks, which are sufficiently simple to avoid encapsulating policy where it
cannot be customised by applications, and composable to build highly flexible
storage architectures 2) insight into the nature of append-only storage
components, and the issues arising from their application to common storage
use-cases
DAS: a data management system for instrument tests and operations
The Data Access System (DAS) is a metadata and data management software
system, providing a reusable solution for the storage of data acquired both
from telescopes and auxiliary data sources during the instrument development
phases and operations. It is part of the Customizable Instrument WorkStation
system (CIWS-FW), a framework for the storage, processing and quick-look at the
data acquired from scientific instruments. The DAS provides a data access layer
mainly targeted to software applications: quick-look displays, pre-processing
pipelines and scientific workflows. It is logically organized in three main
components: an intuitive and compact Data Definition Language (DAS DDL) in XML
format, aimed for user-defined data types; an Application Programming Interface
(DAS API), automatically adding classes and methods supporting the DDL data
types, and providing an object-oriented query language; a data management
component, which maps the metadata of the DDL data types in a relational Data
Base Management System (DBMS), and stores the data in a shared (network) file
system. With the DAS DDL, developers define the data model for a particular
project, specifying for each data type the metadata attributes, the data format
and layout (if applicable), and named references to related or aggregated data
types. Together with the DDL user-defined data types, the DAS API acts as the
only interface to store, query and retrieve the metadata and data in the DAS
system, providing both an abstract interface and a data model specific one in
C, C++ and Python. The mapping of metadata in the back-end database is
automatic and supports several relational DBMSs, including MySQL, Oracle and
PostgreSQL.Comment: Accepted for pubblication on ADASS Conference Serie
Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML
OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a
database of all EC FP7 and H2020 funded research projects, including metadata
of their results (publications and datasets). These data are stored in an HBase
NoSQL database, post-processed, and exposed as HTML for human consumption, and
as XML through a web service interface. As an intermediate format to facilitate
statistical computations, CSV is generated internally. To interlink the
OpenAIRE data with related data on the Web, we aim at exporting them as Linked
Open Data (LOD). The LOD export is required to integrate into the overall data
processing workflow, where derived data are regenerated from the base data
every day. We thus faced the challenge of identifying the best-performing
conversion approach.We evaluated the performances of creating LOD by a
MapReduce job on top of HBase, by mapping the intermediate CSV files, and by
mapping the XML output.Comment: Accepted in 0th Metadata and Semantics Research Conferenc
A Query Integrator and Manager for the Query Web
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions
The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation
Background. 
The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community.

Description. 
SADI – Semantic Automated Discovery and Integration – is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services “stack”, SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers.

Conclusions.
SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behavior we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies
A Generic Storage API
We present a generic API suitable for provision of highly generic storage
facilities that can be tailored to produce various individually customised
storage infrastructures. The paper identifies a candidate set of minimal
storage system building blocks, which are sufficiently simple to avoid
encapsulating policy where it cannot be customised by applications, and
composable to build highly flexible storage architectures. Four main generic
components are defined: the store, the namer, the caster and the interpreter.
It is hypothesised that these are sufficiently general that they could act as
building blocks for any information storage and retrieval system. The essential
characteristics of each are defined by an interface, which may be implemented
by multiple implementing classes.Comment: Submitted to ACSC 200
- …