158 research outputs found
Region-based indexing in an image database
Image retrieval systems based on the image-query-by-example paradigm locate their answer set using a similarity measure of the query image with all images stored in the database. Although this approach generally works for quick re-location of `identical' or partly occluded images, it does not support the more interesting query type aimed at finding images with a particular image fragment. In this paper we introduce a regionbased indexing scheme to support retrieval of images on the basis of both global and local image features
Navigating through a forest of quad trees to spot images in a database
This paper describes how we maintain color and spatial index information on more than 1,000,000 images and how we allow users to browse the spatial color feature space. We break down all our images in color-based quad trees and we store all quad trees in our main-memory database. We allow users to browse the quad trees directly, or they can pre-select images through our color bit vector, which acts as an index accelerator. A Java based textsc{gui is used to navigate through our image indexes
Self-organizing strategies for a column-store database
Column-store database systems open new vistas for improved maintenance through self-organization. Individual columns are the focal point, which simplify balancing conflicting requirements. This work presents two workload-driven self-organizing techniques in a column-store, i.e. adaptive segmentation and adaptive replication. Adaptive segmentation splits a column into non-overlapping segments based on the actual query load. Likewise, adaptive replication creates segment replicas. The strategies can support different application requirements by trading off the reorganization overhead for storage cost. Both techniques can significantly improve system performance as demonstrated in an evaluation of different scenarios
DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing
Comparisons between the merits of row-wise storage (NSM)
and columnar storage (DSM) are typically made with respect
to the persistent storage layer of database systems. In
this paper, however, we focus on the CPU efficiency tradeoffs
of tuple representations inside the query execution engine,
while tuples flow through a processing pipeline. We
analyze the performance in the context of query engines using
so-called "block-oriented" processing --- a recently popularized
technique that can strongly improve the CPU efficiency.
With this high efficiency, the performance trade-offs
between NSM and DSM can have a decisive impact on the
query execution performance, as we demonstrate using both
microbenchmarks and TPC-H query 1. This means that
NSM-based database systems can sometimes benefit from
converting tuples into DSM on-the-fly, and vice versa
Positional Delta Trees to reconcile updates with read-optimized data storage
We investigate techniques that marry the high readonly analytical query performance of compressed, replicated column storage (“read-optimized” databases) with the ability to handle a high-throughput update workload. Today’s large RAM sizes and the growing gap between sequential vs. random IO disk throughput, bring this once elusive goal in reach, as it has become possible to buffer enough updates in memory to allow background migration of these updates to disk, where efficient sequential IO is amortized among many updates. Our key goal is that read-only queries always see the latest database state, yet are not (significantly) slowed down by the update processing. To this end, we propose the Positional Delta Tree (PDT), that is designed to minimize the overhead of on-the-fly merging of differential updates into (index) scans on stale disk-based data. We describe the PDT data structure and its basic operations (lookup, insert, delete, modify) and provide an in-detail study of their performance. Further, we propose a storage architecture called Replicated Mirrors, that replicates tables in multiple orders, storing each table copy mirrored in both column- and row-wise data formats, and uses PDTs to handle updates. Experiments in the MonetDB/X100 system show that this integrated architecture is able to achieve our main goals
MonetDB/X100 - A DBMS in the CPU cache
X100 is a new execution engine for the MonetDB system, that improves execution speed and overcomes its main memory limitation. It introduces t
SciQL, Bridging the Gap between Science and Relational DBMS
Scientific discoveries increasingly rely on the ability to efficiently grind massive amounts of experimental data using database technologies. To bridge the gap between the needs of the Data-Intensive Research fields and the current DBMS technologies, we propose SciQL (pronounced as ‘cycle’), the first SQL-based query language for scientific applications with both tables and arrays as first class citizens. It provides a seamless symbiosis of array-, set- and sequence- interpretations. A key innovation is the extension of value-based grouping of SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between elements positions. This leads to a generalisation of window-based query processing with wide applicability in science domains. This paper describes the main language features of SciQL and illustrates
it using time-series concepts
A case for image quering through image spots
We present an image spot query technique as an alternative for content-based image retrieval based on similarity over feature vectors. Image spots are selective parts of a query image designated by users as highly relevant for the desired answer set. Compared to traditional approaches, our technique allows users to search image databases for local (spatial, color and color transition) characteristics rather than global features.
When a user query is presented to our search engine, the engine does not impose any (similarity, ranking, cutoff) policy of its own on the answer set; it performs an exact match based on the query terms against the database. Semantic higher concepts such as weighing the relevance of query terms, is left to the user as a task while refining their query to reach the desired answer set. Given the hundreds of feature terms involved in query spots, refinement algorithms are to be encapsulated in separate applications, which act as an intermediary between our search engine and the users
Efficient k-NN search on vertically decomposed data
Applications like multimedia retrieval require efficient support for similarity search on large data collections. Yet, nearest neighbor search is a difficult problem in high dimensional spaces, rendering efficient applications hard to realize: index structures degrade rapidly with increasing dimensionality, while sequential search is not an attractive solution for repositories with millions of objects. This paper approaches the problem from a different angle. A solution is sought in an unconventional storage scheme, that opens up a new range of techniques for processing k-NN queries, especially suited for high dimensional spaces. The suggested (physical) database design accommodates well a novel variant of branch-and-bound search, t
- …