17,030 research outputs found
Multidimensional Range Queries on Modern Hardware
Range queries over multidimensional data are an important part of database
workloads in many applications. Their execution may be accelerated by using
multidimensional index structures (MDIS), such as kd-trees or R-trees. As for
most index structures, the usefulness of this approach depends on the
selectivity of the queries, and common wisdom told that a simple scan beats
MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom
is largely based on evaluations that are almost two decades old, performed on
data being held on disks, applying IO-optimized data structures, and using
single-core systems. The question is whether this rule of thumb still holds
when multidimensional range queries (MDRQ) are performed on modern
architectures with large main memories holding all data, multi-core CPUs and
data-parallel instruction sets. In this paper, we study the question whether
and how much modern hardware influences the performance ratio between index
structures and scans for MDRQ. To this end, we conservatively adapted three
popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit
features of modern servers and compared their performance to different flavors
of parallel scans using multiple (synthetic and real-world) analytical
workloads over multiple (synthetic and real-world) datasets of varying size,
dimensionality, and skew. We find that all approaches benefit considerably from
using main memory and parallelization, yet to varying degrees. Our evaluation
indicates that, on current machines, scanning should be favored over parallel
versions of classical MDIS even for very selective queries
Generating collaborative systems for digital libraries: A model-driven approach
This is an open access article shared under a Creative Commons Attribution 3.0 Licence (http://creativecommons.org/licenses/by/3.0/). Copyright @ 2010 The Authors.The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
VisIVO - Integrated Tools and Services for Large-Scale Astrophysical Visualization
VisIVO is an integrated suite of tools and services specifically designed for
the Virtual Observatory. This suite constitutes a software framework for
effective visual discovery in currently available (and next-generation) very
large-scale astrophysical datasets. VisIVO consists of VisiVO Desktop - a stand
alone application for interactive visualization on standard PCs, VisIVO Server
- a grid-enabled platform for high performance visualization and VisIVO Web - a
custom designed web portal supporting services based on the VisIVO Server
functionality. The main characteristic of VisIVO is support for
high-performance, multidimensional visualization of very large-scale
astrophysical datasets. Users can obtain meaningful visualizations rapidly
while preserving full and intuitive control of the relevant visualization
parameters. This paper focuses on newly developed integrated tools in VisIVO
Server allowing intuitive visual discovery with 3D views being created from
data tables. VisIVO Server can be installed easily on any web server with a
database repository. We discuss briefly aspects of our implementation of VisiVO
Server on a computational grid and also outline the functionality of the
services offered by VisIVO Web. Finally we conclude with a summary of our work
and pointers to future developments
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
- âŠ