54,646 research outputs found
Clustered Integer 3SUM via Additive Combinatorics
We present a collection of new results on problems related to 3SUM,
including:
1. The first truly subquadratic algorithm for
1a. computing the (min,+) convolution for monotone increasing
sequences with integer values bounded by ,
1b. solving 3SUM for monotone sets in 2D with integer coordinates
bounded by , and
1c. preprocessing a binary string for histogram indexing (also
called jumbled indexing).
The running time is:
with
randomization, or deterministically. This greatly improves the
previous time bound obtained from Williams'
recent result on all-pairs shortest paths [STOC'14], and answers an open
question raised by several researchers studying the histogram indexing problem.
2. The first algorithm for histogram indexing for any constant alphabet size
that achieves truly subquadratic preprocessing time and truly sublinear query
time.
3. A truly subquadratic algorithm for integer 3SUM in the case when the given
set can be partitioned into clusters each covered by an interval
of length , for any constant .
4. An algorithm to preprocess any set of integers so that subsequently
3SUM on any given subset can be solved in
time.
All these results are obtained by a surprising new technique, based on the
Balog--Szemer\'edi--Gowers Theorem from additive combinatorics
Efficient Motion Retrieval in Large Motion Databases
There has been a recent paradigm shift in the computer animation industry with an increasing use of pre-recorded motion for animating virtual characters. A fundamental requirement to using motion capture data is an efficient method for indexing and retrieving motions. In this paper, we propose a flexible, efficient method for searching arbitrarily complex motions in large motion databases. Motions are encoded using keys which represent a wide array of structural, geometric and, dynamic features of human motion. Keys provide a representative search space for indexing motions and users can specify sequences of key values as well as multiple combination of key sequences to search for complex motions. We use a trie-based data structure to provide an efficient mapping from key sequences to motions. The search times (even on a single CPU) are very fast, opening the possibility of using large motion data sets in real-time applications
bdbms -- A Database Management System for Biological Data
Biologists are increasingly using databases for storing and managing their
data. Biological databases typically consist of a mixture of raw data,
metadata, sequences, annotations, and related data obtained from various
sources. Current database technology lacks several functionalities that are
needed by biological databases. In this paper, we introduce bdbms, an
extensible prototype database management system for supporting biological data.
bdbms extends the functionalities of current DBMSs to include: (1) Annotation
and provenance management including storage, indexing, manipulation, and
querying of annotation and provenance as first class objects in bdbms, (2)
Local dependency tracking to track the dependencies and derivations among data
items, (3) Update authorization to support data curation via content-based
authorization, in contrast to identity-based authorization, and (4) New access
methods and their supporting operators that support pattern matching on various
types of compressed biological data types. This paper presents the design of
bdbms along with the techniques proposed to support these functionalities
including an extension to SQL. We also outline some open issues in building
bdbms.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
Indexing of fictional video content for event detection and summarisation
This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach
Compressed Text Indexes:From Theory to Practice!
A compressed full-text self-index represents a text in a compressed form and
still answers queries efficiently. This technology represents a breakthrough
over the text indexing techniques of the previous decade, whose indexes
required several times the size of the text. Although it is relatively new,
this technology has matured up to a point where theoretical research is giving
way to practical developments. Nonetheless this requires significant
programming skills, a deep engineering effort, and a strong algorithmic
background to dig into the research results. To date only isolated
implementations and focused comparisons of compressed indexes have been
reported, and they missed a common API, which prevented their re-use or
deployment within other applications.
The goal of this paper is to fill this gap. First, we present the existing
implementations of compressed indexes from a practitioner's point of view.
Second, we introduce the Pizza&Chili site, which offers tuned implementations
and a standardized API for the most successful compressed full-text
self-indexes, together with effective testbeds and scripts for their automatic
validation and test. Third, we show the results of our extensive experiments on
these codes with the aim of demonstrating the practical relevance of this novel
and exciting technology
- …