124 research outputs found
Balancing clusters to reduce response time variability in large scale image search
Many algorithms for approximate nearest neighbor search in high-dimensional
spaces partition the data into clusters. At query time, in order to avoid
exhaustive search, an index selects the few (or a single) clusters nearest to
the query point. Clusters are often produced by the well-known -means
approach since it has several desirable properties. On the downside, it tends
to produce clusters having quite different cardinalities. Imbalanced clusters
negatively impact both the variance and the expectation of query response
times. This paper proposes to modify -means centroids to produce clusters
with more comparable sizes without sacrificing the desirable properties.
Experiments with a large scale collection of image descriptors show that our
algorithm significantly reduces the variance of response times without
seriously impacting the search quality
Classification of MODIS Time Series with Dense Bag-of-Temporal-SIFT-Words: Application to Cropland Mapping in the Brazilian Amazon
International audienceMapping croplands is a challenging problem in a context of climate change and evolving agricultural calendars. Classification based on MODIS vegetation index time series is performed in order to map crop types in the Brazilian state of Mato Grosso. We used the recently developed Dense Bag-of-Temporal-SIFT-Words algorithm, which is able to capture temporal locality of the data. It allows the accurate detection of around 70% of the agricultural areas. It leads to better classification rates than a baseline algorithm, discriminating more accurately classes with similar profiles
Searching in one billion vectors: re-rank with source coding
Recent indexing techniques inspired by source coding have been shown
successful to index billions of high-dimensional vectors in memory. In this
paper, we propose an approach that re-ranks the neighbor hypotheses obtained by
these compressed-domain indexing methods. In contrast to the usual
post-verification scheme, which performs exact distance calculation on the
short-list of hypotheses, the estimated distances are refined based on short
quantization codes, to avoid reading the full vectors from disk. We have
released a new public dataset of one billion 128-dimensional vectors and
proposed an experimental setup to evaluate high dimensional indexing algorithms
on a realistic scale. Experiments show that our method accurately and
efficiently re-ranks the neighbor hypotheses using little memory compared to
the full vectors representation.Comment: International Conference on Acoustics, Speech and Signal Processing,
Prague : Czech Republic (2011
Alarm-Based Prescriptive Process Monitoring
Predictive process monitoring is concerned with the analysis of events
produced during the execution of a process in order to predict the future state
of ongoing cases thereof. Existing techniques in this field are able to
predict, at each step of a case, the likelihood that the case will end up in an
undesired outcome. These techniques, however, do not take into account what
process workers may do with the generated predictions in order to decrease the
likelihood of undesired outcomes. This paper proposes a framework for
prescriptive process monitoring, which extends predictive process monitoring
approaches with the concepts of alarms, interventions, compensations, and
mitigation effects. The framework incorporates a parameterized cost model to
assess the cost-benefit tradeoffs of applying prescriptive process monitoring
in a given setting. The paper also outlines an approach to optimize the
generation of alarms given a dataset and a set of cost model parameters. The
proposed approach is empirically evaluated using a range of real-life event
logs
Match-And-Deform: Time Series Domain Adaptation through Optimal Transport and Temporal Alignment
While large volumes of unlabeled data are usually available, associated
labels are often scarce. The unsupervised domain adaptation problem aims at
exploiting labels from a source domain to classify data from a related, yet
different, target domain. When time series are at stake, new difficulties arise
as temporal shifts may appear in addition to the standard feature distribution
shift. In this paper, we introduce the Match-And-Deform (MAD) approach that
aims at finding correspondences between the source and target time series while
allowing temporal distortions. The associated optimization problem
simultaneously aligns the series thanks to an optimal transport loss and the
time stamps through dynamic time warping. When embedded into a deep neural
network, MAD helps learning new representations of time series that both align
the domains and maximize the discriminative power of the network. Empirical
studies on benchmark datasets and remote sensing data demonstrate that MAD
makes meaningful sample-to-sample pairing and time shift estimation, reaching
similar or better classification performance than state-of-the-art deep time
series domain adaptation strategies
Time-Sensitive Topic Models for Action Recognition in Videos
In this paper, we postulate that temporal information is important for action recognition in videos. Keeping temporal information, videos are represented as wordĂtime documents. We propose to use time-sensitive probabilistic topic models and we extend them for the con-text of supervised learning. Our time-sensitive approach is com-pared to both PLSA and Bag-of-Words. Our approach is shown to both capture semantics from data and yield classification perfor-mance comparable to other methods, outperforming them when the amount of training data is low. 1
T-Patterns Revisited: Mining for Temporal Patterns in Sensor Data
The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events
A hybrid approach to classification with shapelets
Shapelets are phase independent subseries that can be used to discriminate between time series. Shapelets have proved to be very effective primitives for time series classification. The two most prominent shapelet based classification algorithms are the shapelet transform (ST) and learned shapelets (LS). One significant difference between these approaches is that ST is data driven, whereas LS searches the entire shapelet space through stochastic gradient descent. The weakness of the former is that full enumeration of possible shapelets is very time consuming. The problem with the latter is that it is very dependent on the initialisation of the shapelets. We propose hybridising the two approaches through a pipeline that includes a time constrained data driven shapelet search which is then passed to a neural network architecture of learned shapelets for tuning. The tuned shapelets are extracted and formed into a transform, which is then classified with a rotation forest. We show that this hybrid approach is significantly better than either approach in isolation, and that the resulting classifier is not significantly worse than a full shapelet search
- âŠ