977 research outputs found
Multiple Media Correlation: Theory and Applications
This thesis introduces multiple media correlation, a new technology for the automatic alignment of multiple media objects such as text, audio, and video. This research began with the question: what can be learned when multiple multimedia components are analyzed simultaneously? Most ongoing research in computational multimedia has focused on queries, indexing, and retrieval within a single media type. Video is compressed and searched independently of audio, text is indexed without regard to temporal relationships it may have to other media data. Multiple media correlation provides a framework for locating and exploiting correlations between multiple, potentially heterogeneous, media streams. The goal is computed synchronization, the determination of temporal and spatial alignments that optimize a correlation function and indicate commonality and synchronization between media objects. The model also provides a basis for comparison of media in unrelated domains. There are many real-world applications for this technology, including speaker localization, musical score alignment, and degraded media realignment. Two applications, text-to-speech alignment and parallel text alignment, are described in detail with experimental validation. Text-to-speech alignment computes the alignment between a textual transcript and speech-based audio. The presented solutions are effective for a wide variety of content and are useful not only for retrieval of content, but in support of automatic captioning of movies and video. Parallel text alignment provides a tool for the comparison of alternative translations of the same document that is particularly useful to the classics scholar interested in comparing translation techniques or styles. The results presented in this thesis include (a) new media models more useful in analysis applications, (b) a theoretical model for multiple media correlation, (c) two practical application solutions that have wide-spread applicability, and (d) Xtrieve, a multimedia database retrieval system that demonstrates this new technology and demonstrates application of multiple media correlation to information retrieval. This thesis demonstrates that computed alignment of media objects is practical and can provide immediate solutions to many information retrieval and content presentation problems. It also introduces a new area for research in media data analysis
Rank, select and access in grammar-compressed strings
Given a string of length on a fixed alphabet of symbols, a
grammar compressor produces a context-free grammar of size that
generates and only . In this paper we describe data structures to
support the following operations on a grammar-compressed string:
\mbox{rank}_c(S,i) (return the number of occurrences of symbol before
position in ); \mbox{select}_c(S,i) (return the position of the th
occurrence of in ); and \mbox{access}(S,i,j) (return substring
). For rank and select we describe data structures of size
bits that support the two operations in time. We
propose another structure that uses
bits and that supports the two queries in , where
is an arbitrary constant. To our knowledge, we are the first to
study the asymptotic complexity of rank and select in the grammar-compressed
setting, and we provide a hardness result showing that significantly improving
the bounds we achieve would imply a major breakthrough on a hard
graph-theoretical problem. Our main result for access is a method that requires
bits of space and time to extract
consecutive symbols from . Alternatively, we can achieve query time using bits of space. This matches a lower bound stated by Verbin
and Yu for strings where is polynomially related to .Comment: 16 page
Compressed indexing data structures for biological sequences
Ph.DDOCTOR OF PHILOSOPH
Indexes and Computation over Compressed Structured Data (Dagstuhl Seminar 13232)
This report documents the program and the outcomes of Dagstuhl Seminar
13232 "Indexes and Computation over Compressed Structured Data"
Parallel Construction of Wavelet Trees on Multicore Architectures
The wavelet tree has become a very useful data structure to efficiently
represent and query large volumes of data in many different domains, from
bioinformatics to geographic information systems. One problem with wavelet
trees is their construction time. In this paper, we introduce two algorithms
that reduce the time complexity of a wavelet tree's construction by taking
advantage of nowadays ubiquitous multicore machines.
Our first algorithm constructs all the levels of the wavelet in parallel in
time and bits of working space, where
is the size of the input sequence and is the size of the alphabet. Our
second algorithm constructs the wavelet tree in a domain-decomposition fashion,
using our first algorithm in each segment, reaching time and
bits of extra space, where is the
number of available cores. Both algorithms are practical and report good
speedup for large real datasets.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
- …