1,097 research outputs found
Knowledge Extraction in Video Through the Interaction Analysis of Activities
Video is a massive amount of data that contains complex interactions between moving objects. The extraction of knowledge from this type of information creates a demand for video analytics systems that uncover statistical relationships between activities and learn the correspondence between content and labels. However, those are open research problems that have high complexity when multiple actors simultaneously perform activities, videos contain noise, and streaming scenarios are considered. The techniques introduced in this dissertation provide a basis for analyzing video. The primary contributions of this research consist of providing new algorithms for the efficient search of activities in video, scene understanding based on interactions between activities, and the predicting of labels for new scenes
Recommended from our members
Fast, Scalable, and Accurate Algorithms for Time-Series Analysis
Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs.
For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis.
For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings
Efficient and Scalable Techniques for Multivariate Time Series Analysis and Search
Innovation and advances in technology have led to the growth of time series data at a phenomenal rate in many applications. Query processing and the analysis of time series data have been studied and, numerous solutions have been proposed. In this research, we focus on multivariate time series (MTS) and devise techniques for high dimensional and voluminous MTS data.
The success of such solution techniques relies on effective dimensionality reduction in a preprocessing step. Feature selection has often been used as a dimensionality reduction technique. It helps identify a subset of features that capture most characteristics from the data. We propose a more effective feature subset selection technique, termed Weighted Scores (WS), based on statistics drawn from the Principal Component Analysis (PCA) of the input MTS data matrix. The technique allows reducing the dimensionality of the data, while retaining and ranking its most influential features. We then consider feature grouping and develop a technique termed FRG (Feature Ranking and Grouping) to improve the effectiveness of our technique in sparse vector frameworks. We also developed a PCA based MTS representation technique M2U (Multivariate to Univariate transformation) which allows to transform the MTS with large number of variables to a univariate signal prior to performing downstream pattern recognition tasks such as seeking correlations within the set.
In related research, we study the similarity search problem for MTS, and developed a novel correlation based method for standard MTS, ESTMSS (Efficient and Scalable Technique for MTS Similarity Search). For this, we uses randomized dimensionality reduction, and a threshold based correlation computation. The results of our numerous experiments on real benchmark data indicate the effectiveness of our methods.
The technique improves computation time by at least an order of magnitude compared to other techniques, and affords a large reduction in memory requirement while providing comparable accuracy and precision results in large scale frameworks
A Review on Outlier/Anomaly Detection in Time Series Data
Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique.KK/2019-00095
IT1244-19
TIN2016-78365-R
PID2019-104966GB-I0
Recommended from our members
Data Abstraction for Visualizing Large Time Series
Numeric time series is a class of data consisting of chronologically ordered observations represented by numeric values. Much of the data in various domains, such as financial, medical and scientific, are represented in the form of time series. To cope with the increasing sizes of datasets, numerous approaches for abstracting large temporal data are developed in the area of data mining. Many of them proved to be useful for time series visualization. However, despite the existence of numerous surveys on time series mining and visualization, there is no comprehensive classification of the existing methods based on the needs of visualization designers. We propose a classification framework that defines essential criteria for selecting an abstraction method with an eye to subsequent visualization and support of users' analysis tasks. We show that approaches developed in the data mining field are capable of creating representations that are useful for visualizing time series data. We evaluate these methods in terms of the defined criteria and provide a summary table that can be easily used for selecting suitable abstraction methods depending on data properties, desirable form of representation, behaviour features to be studied, required accuracy and level of detail, and the necessity of efficient search and querying. We also indicate directions for possible extension of the proposed classification framework
Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping
The proliferation and ubiquity of temporal data across many disciplines has
sparked interest for similarity, classification and clustering methods
specifically designed to handle time series data. A core issue when dealing
with time series is determining their pairwise similarity, i.e., the degree to
which a given time series resembles another. Traditional distance measures such
as the Euclidean are not well-suited due to the time-dependent nature of the
data. Elastic metrics such as dynamic time warping (DTW) offer a promising
approach, but are limited by their computational complexity,
non-differentiability and sensitivity to noise and outliers. This thesis
proposes novel elastic alignment methods that use parametric \& diffeomorphic
warping transformations as a means of overcoming the shortcomings of DTW-based
metrics. The proposed method is differentiable \& invertible, well-suited for
deep learning architectures, robust to noise and outliers, computationally
efficient, and is expressive and flexible enough to capture complex patterns.
Furthermore, a closed-form solution was developed for the gradient of these
diffeomorphic transformations, which allows an efficient search in the
parameter space, leading to better solutions at convergence. Leveraging the
benefits of these closed-form diffeomorphic transformations, this thesis
proposes a suite of advancements that include: (a) an enhanced temporal
transformer network for time series alignment and averaging, (b) a
deep-learning based time series classification model to simultaneously align
and classify signals with high accuracy, (c) an incremental time series
clustering algorithm that is warping-invariant, scalable and can operate under
limited computational and time resources, and finally, (d) a normalizing flow
model that enhances the flexibility of affine transformations in coupling and
autoregressive layers.Comment: PhD Thesis, defended at the University of Navarra on July 17, 2023.
277 pages, 8 chapters, 1 appendi
DeepVATS : Deep Visual Analytics for time series
The field of Deep Visual Analytics (DVA) has recently arisen from the idea of developing Visual Interactive Systems supported by deep learning, in order to provide them with large-scale data processing capabilities and to unify their implementation across different data and domains. In this paper we present DeepVATS, an open-source tool that brings the field of DVA into time series data. DeepVATS trains, in a self-supervised way, a masked time series autoencoder that reconstructs patches of a time series, and projects the knowledge contained in the embeddings of that model in an interactive plot, from which time series patterns and anomalies emerge and can be easily spotted. The tool includes a back-end for data processing pipeline and model training, as well as a front-end with an interactive user interface. We report on results that validate the utility of DeepVATS, running experiments on both synthetic and real datasets. The code is publicly available on https://github.com/vrodriguezf/deepvats
Contributions to time series data mining towards the detection of outliers/anomalies
148 p.Los recientes avances tecnológicos han supuesto un gran progreso en la recogida de datos, permitiendo recopilar una gran cantidad de datos a lo largo del tiempo. Estos datos se presentan comúnmente en forma de series temporales, donde las observaciones se han registrado de forma cronológica y están correlacionadas en el tiempo. A menudo, estas dependencias temporales contienen información significativa y útil, por lo que, en los últimos años, ha surgido un gran interés por extraer dicha información. En particular, el área de investigación que se centra en esta tarea se denomina minería de datos de series temporales.La comunidad de investigadores de esta área se ha dedicado a resolver diferentes tareas como por ejemplo la clasificación, la predicción, el clustering o agrupamiento y la detección de valores atípicos/anomalías. Los valores atípicos o anomalías son aquellas observaciones que no siguen el comportamiento esperado en una serie temporal. Estos valores atípicos o anómalos suelen representar mediciones no deseadas o eventos de interés, y, por lo tanto, detectarlos suele ser relevante ya que pueden empeorar la calidad de los datos o reflejar fenómenos interesantes para el analista.Esta tesis presenta varias contribuciones en el campo de la minería de datos de series temporales, más específicamente sobre la detección de valores atípicos o anomalías. Estas contribuciones se pueden dividir en dos partes o bloques. Por una parte, la tesis presenta contribuciones en el campo de la detección de valores atípicos o anomalías en series temporales. Para ello, se ofrece una revisión de las técnicas en la literatura, y se presenta una nueva técnica de detección de anomalías en series temporales univariantes para la detección de fugas de agua, basada en el aprendizaje autosupervisado. Por otra parte, la tesis también introduce contribuciones relacionadas con el tratamiento de las series temporales con valores perdidos y demuestra su aplicabilidad en el campo de la detección de anomalías
- …