11,982 research outputs found
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Apache Calcite is a foundational software framework that provides query
processing, optimization, and query language support to many popular
open-source data processing systems such as Apache Hive, Apache Storm, Apache
Flink, Druid, and MapD. Calcite's architecture consists of a modular and
extensible query optimizer with hundreds of built-in optimization rules, a
query processor capable of processing a variety of query languages, an adapter
architecture designed for extensibility, and support for heterogeneous data
models and stores (relational, semi-structured, streaming, and geospatial).
This flexible, embeddable, and extensible architecture is what makes Calcite an
attractive choice for adoption in big-data frameworks. It is an active project
that continues to introduce support for the new types of data sources, query
languages, and approaches to query processing and optimization.Comment: SIGMOD'1
A grid-based infrastructure for distributed retrieval
In large-scale distributed retrieval, challenges of latency, heterogeneity, and dynamicity emphasise the importance of infrastructural support in reducing the development costs of state-of-the-art solutions. We present a service-based infrastructure for distributed retrieval which blends middleware facilities and a design framework to âliftâ the resource sharing approach and the computational services of a European Grid platform into the domain of e-Science applications. In this paper, we give an overview of the DILIGENT Search Framework and illustrate its exploitation in the ïŹeld of Earth Science
A Development Environment for Visual Physics Analysis
The Visual Physics Analysis (VISPA) project integrates different aspects of
physics analyses into a graphical development environment. It addresses the
typical development cycle of (re-)designing, executing and verifying an
analysis. The project provides an extendable plug-in mechanism and includes
plug-ins for designing the analysis flow, for running the analysis on batch
systems, and for browsing the data content. The corresponding plug-ins are
based on an object-oriented toolkit for modular data analysis. We introduce the
main concepts of the project, describe the technical realization and
demonstrate the functionality in example applications
Addressing decision making for remanufacturing operations and design-for-remanufacture
Remanufacturing is a process of returning a used product to at least original equipment manufacturer original performance specification from the customers' perspective and giving the resultant product a warranty that is at least equal to that of a newly manufactured equivalent. This paper explains the need to combine ecological concerns and economic growth and the significance of remanufacturing in this. Using the experience of an international aero-engine manufacturer it discusses the impact of the need for sustainable manufacturing on organisational business models. It explains some key decision-making issues that hinder remanufacturing and suggests effective solutions. It presents a peer-validated, high-level design guideline to assist decision-making in design in order to support remanufacturing. The design guide was developed in the UK through the analysis of selections of products during case studies and workshops involving remanufacturing and conventional manufacturing practitioners as well as academics. It is one of the initial stages in the development of a robust design for remanufacture guideline
Compressing High-Dimensional Data Spaces Using Non-Differential Augmented Vector Quantization
query processing times and space requirements. Database compression has been
discovered to alleviate the I/O bottleneck, reduce disk space, improve disk access speed,
speed up query, reduce overall retrieval time and increase the effective I/O bandwidth.
However, random access to individual tuples in a compressed database is very difficult to
achieve with most available compression techniques.
We propose a lossless compression technique called non-differential augmented vector
quantization, a close variant of the novel augmented vector quantization. The technique is
applicable to a collection of tuples and especially effective for tuples with many low to
medium cardinality fields. In addition, the technique supports standard database
operations, permits very fast random access and atomic decompression of tuples in large
collections. The technique maps a database relation into a static bitmap index cached
access structure. Consequently, we were able to achieve substantial savings in space by
storing each database tuple as a bit value in the computer memory.
Important distinguishing characteristics of our technique is that individual tuples can be
compressed and decompressed, rather than a full page or entire relation at a time, (b) the
information needed for tuple compression and decompression can reside in the memory or
at worst in a single page. Promising application domains include decision support systems,
statistical databases and life databases with low cardinality fields and possibly no text
field
- âŠ