81,286 research outputs found
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
A quick search method for audio signals based on a piecewise linear representation of feature trajectories
This paper presents a new method for a quick similarity-based search through
long unlabeled audio streams to detect and locate audio clips provided by
users. The method involves feature-dimension reduction based on a piecewise
linear representation of a sequential feature trajectory extracted from a long
audio stream. Two techniques enable us to obtain a piecewise linear
representation: the dynamic segmentation of feature trajectories and the
segment-based Karhunen-L\'{o}eve (KL) transform. The proposed search method
guarantees the same search results as the search method without the proposed
feature-dimension reduction method in principle. Experiment results indicate
significant improvements in search speed. For example the proposed method
reduced the total search time to approximately 1/12 that of previous methods
and detected queries in approximately 0.3 seconds from a 200-hour audio
database.Comment: 20 pages, to appear in IEEE Transactions on Audio, Speech and
Language Processin
Thermo-visual feature fusion for object tracking using multiple spatiogram trackers
In this paper, we propose a framework that can efficiently combine features for robust tracking based on fusing the outputs of multiple spatiogram trackers. This is achieved without the exponential increase in storage and processing that other multimodal tracking approaches suffer from. The framework allows the features to be split arbitrarily between the trackers, as well as providing the flexibility to add, remove or dynamically weight features. We derive a mean-shift type algorithm for the framework that allows efficient object tracking with very low computational overhead. We especially target the fusion of thermal infrared and visible spectrum features as the most useful features for automated surveillance applications. Results are shown on multimodal video sequences clearly illustrating the benefits of combining multiple features using our framework
Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models
Many complex multi-target prediction problems that concern large target
spaces are characterised by a need for efficient prediction strategies that
avoid the computation of predictions for all targets explicitly. Examples of
such problems emerge in several subfields of machine learning, such as
collaborative filtering, multi-label classification, dyadic prediction and
biological network inference. In this article we analyse efficient and exact
algorithms for computing the top- predictions in the above problem settings,
using a general class of models that we refer to as separable linear relational
models. We show how to use those inference algorithms, which are modifications
of well-known information retrieval methods, in a variety of machine learning
settings. Furthermore, we study the possibility of scoring items incompletely,
while still retaining an exact top-K retrieval. Experimental results in several
application domains reveal that the so-called threshold algorithm is very
scalable, performing often many orders of magnitude more efficiently than the
naive approach
Unsupervised Spoken Term Detection with Spoken Queries by Multi-level Acoustic Patterns with Varying Model Granularity
This paper presents a new approach for unsupervised Spoken Term Detection
with spoken queries using multiple sets of acoustic patterns automatically
discovered from the target corpus. The different pattern HMM
configurations(number of states per model, number of distinct models, number of
Gaussians per state)form a three-dimensional model granularity space. Different
sets of acoustic patterns automatically discovered on different points properly
distributed over this three-dimensional space are complementary to one another,
thus can jointly capture the characteristics of the spoken terms. By
representing the spoken content and spoken query as sequences of acoustic
patterns, a series of approaches for matching the pattern index sequences while
considering the signal variations are developed. In this way, not only the
on-line computation load can be reduced, but the signal distributions caused by
different speakers and acoustic conditions can be reasonably taken care of. The
results indicate that this approach significantly outperformed the unsupervised
feature-based DTW baseline by 16.16\% in mean average precision on the TIMIT
corpus.Comment: Accepted by ICASSP 201
- …