3,850 research outputs found
Efficient estimation of AUC in a sliding window
In many applications, monitoring area under the ROC curve (AUC) in a sliding
window over a data stream is a natural way of detecting changes in the system.
The drawback is that computing AUC in a sliding window is expensive, especially
if the window size is large and the data flow is significant.
In this paper we propose a scheme for maintaining an approximate AUC in a
sliding window of length . More specifically, we propose an algorithm that,
given , estimates AUC within , and can maintain this
estimate in time, per update, as the window slides.
This provides a speed-up over the exact computation of AUC, which requires
time, per update. The speed-up becomes more significant as the size of
the window increases. Our estimate is based on grouping the data points
together, and using these groups to calculate AUC. The grouping is designed
carefully such that () the groups are small enough, so that the error stays
small, () the number of groups is small, so that enumerating them is not
expensive, and () the definition is flexible enough so that we can
maintain the groups efficiently.
Our experimental evaluation demonstrates that the average approximation error
in practice is much smaller than the approximation guarantee ,
and that we can achieve significant speed-ups with only a modest sacrifice in
accuracy
Temporally-aware algorithms for the classification of anuran sounds
Several authors have shown that the sounds of anurans can be used as an indicator of
climate change. Hence, the recording, storage and further processing of a huge
number of anuran sounds, distributed over time and space, are required in order to
obtain this indicator. Furthermore, it is desirable to have algorithms and tools for
the automatic classification of the different classes of sounds. In this paper, six
classification methods are proposed, all based on the data-mining domain, which
strive to take advantage of the temporal character of the sounds. The definition and
comparison of these classification methods is undertaken using several approaches.
The main conclusions of this paper are that: (i) the sliding window method attained
the best results in the experiments presented, and even outperformed the hidden
Markov models usually employed in similar applications; (ii) noteworthy overall
classification performance has been obtained, which is an especially striking result
considering that the sounds analysed were affected by a highly noisy background;
(iii) the instance selection for the determination of the sounds in the training dataset
offers better results than cross-validation techniques; and (iv) the temporally-aware
classifiers have revealed that they can obtain better performance than their nontemporally-aware
counterparts.Consejería de Innovación, Ciencia y Empresa (Junta de Andalucía, Spain): excellence eSAPIENS number TIC 570
Object Proposals for Text Extraction in the Wild
Object Proposals is a recent computer vision technique receiving increasing
interest from the research community. Its main objective is to generate a
relatively small set of bounding box proposals that are most likely to contain
objects of interest. The use of Object Proposals techniques in the scene text
understanding field is innovative. Motivated by the success of powerful while
expensive techniques to recognize words in a holistic way, Object Proposals
techniques emerge as an alternative to the traditional text detectors.
In this paper we study to what extent the existing generic Object Proposals
methods may be useful for scene text understanding. Also, we propose a new
Object Proposals algorithm that is specifically designed for text and compare
it with other generic methods in the state of the art. Experiments show that
our proposal is superior in its ability of producing good quality word
proposals in an efficient way. The source code of our method is made publicly
available.Comment: 13th International Conference on Document Analysis and Recognition
(ICDAR 2015
DeepBox: Learning Objectness with Convolutional Networks
Existing object proposal approaches use primarily bottom-up cues to rank
proposals, while we believe that objectness is in fact a high level construct.
We argue for a data-driven, semantic approach for ranking object proposals. Our
framework, which we call DeepBox, uses convolutional neural networks (CNNs) to
rerank proposals from a bottom-up method. We use a novel four-layer CNN
architecture that is as good as much larger networks on the task of evaluating
objectness while being much faster. We show that DeepBox significantly improves
over the bottom-up ranking, achieving the same recall with 500 proposals as
achieved by bottom-up methods with 2000. This improvement generalizes to
categories the CNN has never seen before and leads to a 4.5-point gain in
detection mAP. Our implementation achieves this performance while running at
260 ms per image.Comment: ICCV 2015 Camera-ready versio
Learning to track for spatio-temporal action localization
We propose an effective approach for spatio-temporal action localization in
realistic videos. The approach first detects proposals at the frame-level and
scores them with a combination of static and motion CNN features. It then
tracks high-scoring proposals throughout the video using a
tracking-by-detection approach. Our tracker relies simultaneously on
instance-level and class-level detectors. The tracks are scored using a
spatio-temporal motion histogram, a descriptor at the track level, in
combination with the CNN features. Finally, we perform temporal localization of
the action using a sliding-window approach at the track level. We present
experimental results for spatio-temporal localization on the UCF-Sports, J-HMDB
and UCF-101 action localization datasets, where our approach outperforms the
state of the art with a margin of 15%, 7% and 12% respectively in mAP
Spectral Sequence Motif Discovery
Sequence discovery tools play a central role in several fields of
computational biology. In the framework of Transcription Factor binding
studies, motif finding algorithms of increasingly high performance are required
to process the big datasets produced by new high-throughput sequencing
technologies. Most existing algorithms are computationally demanding and often
cannot support the large size of new experimental data. We present a new motif
discovery algorithm that is built on a recent machine learning technique,
referred to as Method of Moments. Based on spectral decompositions, this method
is robust under model misspecification and is not prone to locally optimal
solutions. We obtain an algorithm that is extremely fast and designed for the
analysis of big sequencing data. In a few minutes, we can process datasets of
hundreds of thousand sequences and extract motif profiles that match those
computed by various state-of-the-art algorithms.Comment: 20 pages, 3 figures, 1 tabl
- …