3,607 research outputs found

    Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

    Get PDF
    Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.Comment: SIGMOD'1

    Data integration through service-based mediation for web-enabled information systems

    Get PDF
    The Web and its underlying platform technologies have often been used to integrate existing software and information systems. Traditional techniques for data representation and transformations between documents are not sufficient to support a flexible and maintainable data integration solution that meets the requirements of modern complex Web-enabled software and information systems. The difficulty arises from the high degree of complexity of data structures, for example in business and technology applications, and from the constant change of data and its representation. In the Web context, where the Web platform is used to integrate different organisations or software systems, additionally the problem of heterogeneity arises. We introduce a specific data integration solution for Web applications such as Web-enabled information systems. Our contribution is an integration technology framework for Web-enabled information systems comprising, firstly, a data integration technique based on the declarative specification of transformation rules and the construction of connectors that handle the integration and, secondly, a mediator architecture based on information services and the constructed connectors to handle the integration process

    Mediated data integration and transformation for web service-based software architectures

    Get PDF
    Service-oriented architecture using XML-based web services has been widely accepted by many organisations as the standard infrastructure to integrate heterogeneous and autonomous data sources. As a result, many Web service providers are built up on top of the data sources to share the data by supporting provided and required interfaces and methods of data access in a unified manner. In the context of data integration, problems arise when Web services are assembled to deliver an integrated view of data, adaptable to the specific needs of individual clients and providers. Traditional approaches of data integration and transformation are not suitable to automate the construction of connectors dedicated to connect selected Web services to render integrated and tailored views of data. We propose a declarative approach that addresses the oftenneglected data integration and adaptivity aspects of serviceoriented architecture

    Context-Aware Information Retrieval for Enhanced Situation Awareness

    No full text
    In the coalition forces, users are increasingly challenged with the issues of information overload and correlation of information from heterogeneous sources. Users might need different pieces of information, ranging from information about a single building, to the resolution strategy of a global conflict. Sometimes, the time, location and past history of information access can also shape the information needs of users. Information systems need to help users pull together data from disparate sources according to their expressed needs (as represented by system queries), as well as less specific criteria. Information consumers have varying roles, tasks/missions, goals and agendas, knowledge and background, and personal preferences. These factors can be used to shape both the execution of user queries and the form in which retrieved information is packaged. However, full automation of this daunting information aggregation and customization task is not possible with existing approaches. In this paper we present an infrastructure for context-aware information retrieval to enhance situation awareness. The infrastructure provides each user with a customized, mission-oriented system that gives access to the right information from heterogeneous sources in the context of a particular task, plan and/or mission. The approach lays on five intertwined fundamental concepts, namely Workflow, Context, Ontology, Profile and Information Aggregation. The exploitation of this knowledge, using appropriate domain ontologies, will make it feasible to provide contextual assistance in various ways to the work performed according to a user’s taskrelevant information requirements. This paper formalizes these concepts and their interrelationships

    Improving lifecycle query in integrated toolchains using linked data and MQTT-based data warehousing

    Full text link
    The development of increasingly complex IoT systems requires large engineering environments. These environments generally consist of tools from different vendors and are not necessarily integrated well with each other. In order to automate various analyses, queries across resources from multiple tools have to be executed in parallel to the engineering activities. In this paper, we identify the necessary requirements on such a query capability and evaluate different architectures according to these requirements. We propose an improved lifecycle query architecture, which builds upon the existing Tracked Resource Set (TRS) protocol, and complements it with the MQTT messaging protocol in order to allow the data in the warehouse to be kept updated in real-time. As part of the case study focusing on the development of an IoT automated warehouse, this architecture was implemented for a toolchain integrated using RESTful microservices and linked data.Comment: 12 pages, worksho

    Semantic Integration of Coastal Buoys Data using SPARQL

    Get PDF
    Currently, the data provided by the heterogeneous buoy sensors/networks (e.g. National Data Buoy center (NDBC), Gulf Of Maine Ocean Observing System (GoMoos) etc. is not amenable to the development of integrated systems due to conflicts in the data representation at syntactic and structural levels. With the rapid increase in the amount of information, the integration of heterogeneous resources is an important issue and requires integrative technologies such as semantic web. In distributed data dissemination system, normally querying on single database will not provide relevant information and requires querying across interrelated data sources to retrieve holistic information. In this thesis we develop system for integrating two different Resource Description Framework (RDF) data sources through intelligent querying using Simple Protocol and RDF Query Language (SPARQL). We use Semantic Web application framework from AllegroGraph that provides functionality for developing triple store for the ontological representations, forming federated stores and querying it through SPARQL

    Security oriented e-infrastructures supporting neurological research and clinical trials

    Get PDF
    The neurological and wider clinical domains stand to gain greatly from the vision of the grid in providing seamless yet secure access to distributed, heterogeneous computational resources and data sets. Whilst a wealth of clinical data exists within local, regional and national healthcare boundaries, access to and usage of these data sets demands that fine grained security is supported and subsequently enforced. This paper explores the security challenges of the e-health domain, focusing in particular on authorization. The context of these explorations is the MRC funded VOTES (Virtual Organisations for Trials and Epidemiological Studies) and the JISC funded GLASS (Glasgow early adoption of Shibboleth project) which are developing Grid infrastructures for clinical trials with case studies in the brain trauma domain
    • 

    corecore