3,607 research outputs found
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Apache Calcite is a foundational software framework that provides query
processing, optimization, and query language support to many popular
open-source data processing systems such as Apache Hive, Apache Storm, Apache
Flink, Druid, and MapD. Calcite's architecture consists of a modular and
extensible query optimizer with hundreds of built-in optimization rules, a
query processor capable of processing a variety of query languages, an adapter
architecture designed for extensibility, and support for heterogeneous data
models and stores (relational, semi-structured, streaming, and geospatial).
This flexible, embeddable, and extensible architecture is what makes Calcite an
attractive choice for adoption in big-data frameworks. It is an active project
that continues to introduce support for the new types of data sources, query
languages, and approaches to query processing and optimization.Comment: SIGMOD'1
Data integration through service-based mediation for web-enabled information systems
The Web and its underlying platform technologies have often been used to integrate existing software and information systems. Traditional techniques for data representation and transformations between documents are not sufficient to support a flexible and maintainable data integration solution that meets the requirements of modern complex Web-enabled software and information systems. The difficulty
arises from the high degree of complexity of data structures, for example in business and technology applications, and from the constant change of data and its
representation. In the Web context, where the Web platform is used to integrate different organisations or software systems, additionally the problem of heterogeneity
arises. We introduce a specific data integration solution for Web applications such as Web-enabled information systems. Our contribution is an integration technology
framework for Web-enabled information systems comprising, firstly, a data integration technique based on the declarative specification of transformation rules and the construction of connectors that handle the integration and, secondly, a mediator architecture based on information services and the constructed connectors to handle the integration process
Mediated data integration and transformation for web service-based software architectures
Service-oriented architecture using XML-based web services has been widely accepted by many organisations as the standard infrastructure to integrate heterogeneous and autonomous data sources. As a result, many Web service providers are built up on top of the data sources to share the data by supporting provided and required interfaces and methods of data access in a unified manner. In the context of data integration, problems arise when Web services are assembled to deliver an integrated view of data, adaptable to the specific needs of individual clients and providers. Traditional approaches of data integration and transformation are not suitable to automate the construction of connectors dedicated to connect selected Web services to render integrated and tailored views of data. We propose a declarative approach that addresses the oftenneglected data integration and adaptivity aspects of serviceoriented
architecture
Context-Aware Information Retrieval for Enhanced Situation Awareness
In the coalition forces, users are increasingly challenged with the issues of information overload and correlation of information from heterogeneous sources. Users might need different pieces of information, ranging from information about a single building, to the resolution strategy of a global conflict. Sometimes, the time, location and past history of information access can also shape the information needs of users. Information systems need to help users pull together data from disparate sources according to their expressed needs (as represented by system queries), as well as less specific criteria. Information consumers have varying roles, tasks/missions, goals and agendas, knowledge and background, and personal preferences. These factors can be used to shape both the execution of user queries and the form in which retrieved information is packaged. However, full automation of this daunting information aggregation and customization task is not possible with existing approaches. In this paper we present an infrastructure for context-aware information retrieval to enhance situation awareness. The infrastructure provides each user with a customized, mission-oriented system that gives access to the right information from heterogeneous sources in the context of a particular task, plan and/or mission. The approach lays on five intertwined fundamental concepts, namely Workflow, Context, Ontology, Profile and Information Aggregation. The exploitation of this knowledge, using appropriate domain ontologies, will make it feasible to provide contextual assistance in various ways to the work performed according to a userâs taskrelevant information requirements. This paper formalizes these concepts and their interrelationships
Improving lifecycle query in integrated toolchains using linked data and MQTT-based data warehousing
The development of increasingly complex IoT systems requires large
engineering environments. These environments generally consist of tools from
different vendors and are not necessarily integrated well with each other. In
order to automate various analyses, queries across resources from multiple
tools have to be executed in parallel to the engineering activities. In this
paper, we identify the necessary requirements on such a query capability and
evaluate different architectures according to these requirements. We propose an
improved lifecycle query architecture, which builds upon the existing Tracked
Resource Set (TRS) protocol, and complements it with the MQTT messaging
protocol in order to allow the data in the warehouse to be kept updated in
real-time. As part of the case study focusing on the development of an IoT
automated warehouse, this architecture was implemented for a toolchain
integrated using RESTful microservices and linked data.Comment: 12 pages, worksho
Semantic Integration of Coastal Buoys Data using SPARQL
Currently, the data provided by the heterogeneous buoy sensors/networks (e.g. National Data Buoy center (NDBC), Gulf Of Maine Ocean Observing System (GoMoos) etc. is not amenable to the development of integrated systems due to conflicts in the data representation at syntactic and structural levels. With the rapid increase in the amount of information, the integration of heterogeneous resources is an important issue and requires integrative technologies such as semantic web. In distributed data dissemination system, normally querying on single database will not provide relevant information and requires querying across interrelated data sources to retrieve holistic information. In this thesis we develop system for integrating two different Resource Description Framework (RDF) data sources through intelligent querying using Simple Protocol and RDF Query Language (SPARQL). We use Semantic Web application framework from AllegroGraph that provides functionality for developing triple store for the ontological representations, forming federated stores and querying it through SPARQL
Security oriented e-infrastructures supporting neurological research and clinical trials
The neurological and wider clinical domains stand to gain greatly from the vision of the grid in providing seamless yet secure access to distributed, heterogeneous computational resources and data sets. Whilst a wealth of clinical data exists within local, regional and national healthcare boundaries, access to and usage of these data sets demands that fine grained security is supported and subsequently enforced. This paper explores the security challenges of the e-health domain, focusing in particular on authorization. The context of these explorations is the MRC funded VOTES (Virtual Organisations for Trials and Epidemiological Studies) and the JISC funded GLASS (Glasgow early adoption of Shibboleth project) which are developing Grid infrastructures for clinical trials with case studies in the brain trauma domain
- âŠ