11,982 research outputs found

    Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

    Get PDF
    Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.Comment: SIGMOD'1

    A grid-based infrastructure for distributed retrieval

    Get PDF
    In large-scale distributed retrieval, challenges of latency, heterogeneity, and dynamicity emphasise the importance of infrastructural support in reducing the development costs of state-of-the-art solutions. We present a service-based infrastructure for distributed retrieval which blends middleware facilities and a design framework to ‘lift’ the resource sharing approach and the computational services of a European Grid platform into the domain of e-Science applications. In this paper, we give an overview of the DILIGENT Search Framework and illustrate its exploitation in the ïŹeld of Earth Science

    A Development Environment for Visual Physics Analysis

    Full text link
    The Visual Physics Analysis (VISPA) project integrates different aspects of physics analyses into a graphical development environment. It addresses the typical development cycle of (re-)designing, executing and verifying an analysis. The project provides an extendable plug-in mechanism and includes plug-ins for designing the analysis flow, for running the analysis on batch systems, and for browsing the data content. The corresponding plug-ins are based on an object-oriented toolkit for modular data analysis. We introduce the main concepts of the project, describe the technical realization and demonstrate the functionality in example applications

    Addressing decision making for remanufacturing operations and design-for-remanufacture

    Get PDF
    Remanufacturing is a process of returning a used product to at least original equipment manufacturer original performance specification from the customers' perspective and giving the resultant product a warranty that is at least equal to that of a newly manufactured equivalent. This paper explains the need to combine ecological concerns and economic growth and the significance of remanufacturing in this. Using the experience of an international aero-engine manufacturer it discusses the impact of the need for sustainable manufacturing on organisational business models. It explains some key decision-making issues that hinder remanufacturing and suggests effective solutions. It presents a peer-validated, high-level design guideline to assist decision-making in design in order to support remanufacturing. The design guide was developed in the UK through the analysis of selections of products during case studies and workshops involving remanufacturing and conventional manufacturing practitioners as well as academics. It is one of the initial stages in the development of a robust design for remanufacture guideline

    Compressing High-Dimensional Data Spaces Using Non-Differential Augmented Vector Quantization

    Get PDF
    query processing times and space requirements. Database compression has been discovered to alleviate the I/O bottleneck, reduce disk space, improve disk access speed, speed up query, reduce overall retrieval time and increase the effective I/O bandwidth. However, random access to individual tuples in a compressed database is very difficult to achieve with most available compression techniques. We propose a lossless compression technique called non-differential augmented vector quantization, a close variant of the novel augmented vector quantization. The technique is applicable to a collection of tuples and especially effective for tuples with many low to medium cardinality fields. In addition, the technique supports standard database operations, permits very fast random access and atomic decompression of tuples in large collections. The technique maps a database relation into a static bitmap index cached access structure. Consequently, we were able to achieve substantial savings in space by storing each database tuple as a bit value in the computer memory. Important distinguishing characteristics of our technique is that individual tuples can be compressed and decompressed, rather than a full page or entire relation at a time, (b) the information needed for tuple compression and decompression can reside in the memory or at worst in a single page. Promising application domains include decision support systems, statistical databases and life databases with low cardinality fields and possibly no text field
    • 

    corecore