54 research outputs found
Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm
A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of “warped” time series, which then allows us to create super-efficient nearest “centroid” classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors
Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping
The proliferation and ubiquity of temporal data across many disciplines has
sparked interest for similarity, classification and clustering methods
specifically designed to handle time series data. A core issue when dealing
with time series is determining their pairwise similarity, i.e., the degree to
which a given time series resembles another. Traditional distance measures such
as the Euclidean are not well-suited due to the time-dependent nature of the
data. Elastic metrics such as dynamic time warping (DTW) offer a promising
approach, but are limited by their computational complexity,
non-differentiability and sensitivity to noise and outliers. This thesis
proposes novel elastic alignment methods that use parametric \& diffeomorphic
warping transformations as a means of overcoming the shortcomings of DTW-based
metrics. The proposed method is differentiable \& invertible, well-suited for
deep learning architectures, robust to noise and outliers, computationally
efficient, and is expressive and flexible enough to capture complex patterns.
Furthermore, a closed-form solution was developed for the gradient of these
diffeomorphic transformations, which allows an efficient search in the
parameter space, leading to better solutions at convergence. Leveraging the
benefits of these closed-form diffeomorphic transformations, this thesis
proposes a suite of advancements that include: (a) an enhanced temporal
transformer network for time series alignment and averaging, (b) a
deep-learning based time series classification model to simultaneously align
and classify signals with high accuracy, (c) an incremental time series
clustering algorithm that is warping-invariant, scalable and can operate under
limited computational and time resources, and finally, (d) a normalizing flow
model that enhances the flexibility of affine transformations in coupling and
autoregressive layers.Comment: PhD Thesis, defended at the University of Navarra on July 17, 2023.
277 pages, 8 chapters, 1 appendi
Statistical learning in complex and temporal data: distances, two-sample testing, clustering, classification and Big Data
Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Resumo]
Esta tesis trata sobre aprendizaxe estatístico en obxetos complexos, con énfase en
series temporais. O problema abórdase introducindo coñecemento sobre o dominio do
fenómeno subxacente, mediante distancias e características.
Proponse un contraste de dúas mostras basado en distancias e estúdase o seu
funcionamento nun gran abanico de escenarios. As distancias para clasificación e
clustering de series temporais acadan un incremento da potencia estatística cando se
aplican a contrastes de dúas mostras. O noso test compárase de xeito favorable con
outros métodos gracias á súa flexibilidade ante diferentes alternativas.
Defínese unha nova distancia entre series temporais mediante un xeito innovador
de comparar as distribucións retardadas das series. Esta distancia herda o bo funcionamento
empírico doutros métodos pero elimina algunhas das súas limitacións.
Proponse un método de predicción baseada en características das series. O método
combina diferentes algoritmos estándar de predicción mediante unha suma ponderada.
Os pesos desta suma veñen dun modelo que se axusta a un conxunto de entrenamento
de gran tamaño.
Propónse un método de clasificación distribuida, baseado en comparar, mediante
unha distancia, as funcións de distribución empíricas do conxuto de proba común e as
dos datos que recibe cada nodo de cómputo.[Resumen]
Esta tesis trata sobre aprendizaje estadístico en objetos complejos, con énfasis en
series temporales. El problema se aborda introduciendo conocimiento del dominio del
fenómeno subyacente, mediante distancias y características.
Se propone un test de dos muestras basado en distancias y se estudia su funcionamiento
en un gran abanico de escenarios. La distancias para clasificación y
clustering de series temporales consiguen un incremento de la potencia estadística
cuando se aplican al tests de dos muestras. Nuestro test se compara favorablemente
con otros métodos gracias a su flexibilidad antes diferentes alternativas.
Se define una nueva distancia entre series temporales mediante una manera innovadora
de comparar las distribuciones retardadas de la series. Esta distancia hereda el
buen funcionamiento empírico de otros métodos pero elimina algunas de sus limitaciones.
Se propone un método de predicción basado en características de las series. El
método combina diferentes algoritmos estándar de predicción mediante una suma
ponderada. Los pesos de esta suma salen de un modelo que se ajusta a un conjunto de
entrenamiento de gran tamaño.
Se propone un método de clasificación distribuida, basado en comparar, mediante
una distancia, las funciones de distribución empírica del conjuto de prueba común y
las de los datos que recibe cada nodo de cómputo.[Abstract]
This thesis deals with the problem of statistical learning in complex objects, with
emphasis on time series data. The problem is approached by facilitating the introduction
of domain knoweldge of the underlying phenomena by means of distances and features.
A distance-based two sample test is proposed, and its performance is studied under
a wide range of scenarios. Distances for time series classification and clustering are
also shown to increase statistical power when applied to two-sample testing. Our
test compares favorably to other methods regarding its flexibility against different
alternatives. A new distance for time series is defined by considering an innovative
way of comparing lagged distributions of the series. This distance inherits the good
empirical performance of existing methods while removing some of their limitations.
A forecast method based on times series features is proposed. The method works
by combining individual standard forecasting algorithms using a weighted average.
These weights come from a learning model fitted on a large training set. A distributed
classification algorithm is proposed, based on comparing, using a distance, the empirical
distribution functions between the dataset that each computing node receives and the
test set
Template estimation for samples of curves and functional calibration estimation via the method of maximum entropy on the mean
L'une des principales difficultés de l'analyse des données fonctionnelles consiste à extraire un motif commun qui synthétise l'information contenue par toutes les fonctions de l'échantillon. Le Chapitre 2 examine le problème d'identification d'une fonction qui représente le motif commun en supposant que les données appartiennent à une variété ou en sont suffisamment proches, d'une variété non linéaire de basse dimension intrinsèque munie d'une structure géométrique inconnue et incluse dans un espace de grande dimension. Sous cette hypothèse, un approximation de la distance géodésique est proposé basé sur une version modifiée de l'algorithme Isomap. Cette approximation est utilisée pour calculer la fonction médiane empirique de Fréchet correspondante. Cela fournit un estimateur intrinsèque robuste de la forme commune. Le Chapitre 3 étudie les propriétés asymptotiques de la méthode de normalisation quantile développée par Bolstad, et al. (2003) qui est devenue l'une des méthodes les plus populaires pour aligner des courbes de densité en analyse de données de microarrays en bioinformatique. Les propriétés sont démontrées considérant la méthode comme un cas particulier de la procédure de la moyenne structurelle pour l'alignement des courbes proposée par Dupuy, Loubes and Maza (2011). Toutefois, la méthode échoue dans certains cas. Ainsi, nous proposons une nouvelle méthode, pour faire face à ce problème. Cette méthode utilise l'algorithme développée dans le Chapitre 2. Dans le Chapitre 4, nous étendons le problème d'estimation de calage pour la moyenne d'une population finie de la variable de sondage dans un cadre de données fonctionnelles. Nous considérons le problème de l'estimation des poids de sondage fonctionnel à travers le principe du maximum d'entropie sur la moyenne -MEM-. En particulier, l'estimation par calage est considérée comme un problème inverse linéaire de dimension infinie suivant la structure de l'approche du MEM. Nous donnons un résultat précis d'estimation des poids de calage fonctionnels pour deux types de mesures aléatoires a priori: la measure Gaussienne centrée et la measure de Poisson généralisée.One of the main difficulties in functional data analysis is the extraction of a meaningful common pattern that summarizes the information conveyed by all functions in the sample. The problem of finding a meaningful template function that represents this pattern is considered in Chapter 2 assuming that the functional data lie on an intrinsically low-dimensional smooth manifold with an unknown underlying geometric structure embedding in a high-dimensional space. Under this setting, an approximation of the geodesic distance is developed based on a robust version of the Isomap algorithm. This approximation is used to compute the corresponding empirical Fréchet median function, which provides a robust intrinsic estimator of the template. The Chapter 3 investigates the asymptotic properties of the quantile normalization method by Bolstad, et al. (2003) which is one of the most popular methods to align density curves in microarray data analysis. The properties are proved by considering the method as a particular case of the structural mean curve alignment procedure by Dupuy, Loubes and Maza (2011). However, the method fails in some case of mixtures, and a new methodology to cope with this issue is proposed via the algorithm developed in Chapter 2. Finally, the problem of calibration estimation for the finite population mean of a survey variable under a functional data framework is studied in Chapter 4. The functional calibration sampling weights of the estimator are obtained by matching the calibration estimation problem with the maximum entropy on the mean -MEM- principle. In particular, the calibration estimation is viewed as an infinite-dimensional linear inverse problem following the structure of the MEM approach. A precise theoretical setting is given and the estimation of functional calibration weights assuming, as prior measures, the centered Gaussian and compound Poisson random measures is carried out
Deep learning for time series classification
Time series analysis is a field of data science which is interested in
analyzing sequences of numerical values ordered in time. Time series are
particularly interesting because they allow us to visualize and understand the
evolution of a process over time. Their analysis can reveal trends,
relationships and similarities across the data. There exists numerous fields
containing data in the form of time series: health care (electrocardiogram,
blood sugar, etc.), activity recognition, remote sensing, finance (stock market
price), industry (sensors), etc. Time series classification consists of
constructing algorithms dedicated to automatically label time series data. The
sequential aspect of time series data requires the development of algorithms
that are able to harness this temporal property, thus making the existing
off-the-shelf machine learning models for traditional tabular data suboptimal
for solving the underlying task. In this context, deep learning has emerged in
recent years as one of the most effective methods for tackling the supervised
classification task, particularly in the field of computer vision. The main
objective of this thesis was to study and develop deep neural networks
specifically constructed for the classification of time series data. We thus
carried out the first large scale experimental study allowing us to compare the
existing deep methods and to position them compared other non-deep learning
based state-of-the-art methods. Subsequently, we made numerous contributions in
this area, notably in the context of transfer learning, data augmentation,
ensembling and adversarial attacks. Finally, we have also proposed a novel
architecture, based on the famous Inception network (Google), which ranks among
the most efficient to date.Comment: PhD thesi
Klasifikasi Penentuan Pengajuan Kartu Kredit Menggunakan K-Nearest Neighbor
A credit card is a device payment issued by the bank certain made of plastic and useful as a tool payment on credit carried out by the owner of the card or in accordance with the name of listed in a credit card is on when making purchases goods or services. The problems facing in giving a credit cards to customers bank that have signed up is difficult to determine the category of a credit cards in accordance with the customer bank. By doing this research is expected to facilitate the bank or the analysis to determine the category of a credit card to customers bank right. The research used is by applying methods K-Nearest Neighbor to classify prospective customers in the making a credit card in accordance with the category of customers by using data customers at the Bank BNI Syariah Surabaya. A method K-Nearest Neighbor used to seek patterns on the data customers so established variable as factors supporters in the form of gender, the status of the house, the status, the number of dependants (children), a profession and revenue annually. The results of this research shows that an average of the value of precision of 92%, the value of recall of 83%, and the value of accuracy of 93%. Thus, this application is effective to help analyst credit cards in classifying customers to get credit cards that appropriate criteria
Recommended from our members
Fast, Scalable, and Accurate Algorithms for Time-Series Analysis
Time is a critical element for the understanding of natural processes (e.g., earthquakes and weather) or human-made artifacts (e.g., stock market and speech signals). The analysis of time series, the result of sequentially collecting observations of such processes and artifacts, is becoming increasingly prevalent across scientific and industrial applications. The extraction of non-trivial features (e.g., patterns, correlations, and trends) in time series is a critical step for devising effective time-series mining methods for real-world problems and the subject of active research for decades. In this dissertation, we address this fundamental problem by studying and presenting computational methods for efficient unsupervised learning of robust feature representations from time series. Our objective is to (i) simplify and unify the design of scalable and accurate time-series mining algorithms; and (ii) provide a set of readily available tools for effective time-series analysis. We focus on applications operating solely over time-series collections and on applications where the analysis of time series complements the analysis of other types of data, such as text and graphs.
For applications operating solely over time-series collections, we propose a generic computational framework, GRAIL, to learn low-dimensional representations that natively preserve the invariances offered by a given time-series comparison method. GRAIL represents a departure from classic approaches in the time-series literature where representation methods are agnostic to the similarity function used in subsequent learning processes. GRAIL relies on the attractive idea that once we construct the data-to-data similarity matrix most time-series mining tasks can be trivially solved. To overcome scalability issues associated with approaches relying on such matrices, GRAIL exploits time-series clustering to construct a small set of landmark time series and learns representations to reduce the data-to-data matrix to a data-to-landmark points matrix. To demonstrate the effectiveness of GRAIL, we first present domain-independent, highly accurate, and scalable time-series clustering methods to facilitate exploration and summarization of time-series collections. Then, we show that GRAIL representations, when combined with suitable methods, significantly outperform, in terms of efficiency and accuracy, state-of-the-art methods in major time-series mining tasks, such as querying, clustering, classification, sampling, and visualization. Overall, GRAIL rises as a new primitive for highly accurate, yet scalable, time-series analysis.
For applications where the analysis of time series complements the analysis of other types of data, such as text and graphs, we propose generic, simple, and lightweight methodologies to learn features from time-varying measurements. Such applications often organize operations over different types of data in a pipeline such that one operation provides input---in the form of feature vectors---to subsequent operations. To reason about the temporal patterns and trends in the underlying features, we need to (i) track the evolution of features over different time periods; and (ii) transform these time-varying features into actionable knowledge (e.g., forecasting an outcome). To address this challenging problem, we propose principled approaches to model time-varying features and study two large-scale, real-world, applications. Specifically, we first study the problem of predicting the impact of scientific concepts through temporal analysis of characteristics extracted from the metadata and full text of scientific articles. Then, we explore the promise of harnessing temporal patterns in behavioral signals extracted from web search engine logs for early detection of devastating diseases. In both applications, combinations of features with time-series relevant features yielded the greatest impact than any other indicator considered in our analysis. We believe that our simple methodology, along with the interesting domain-specific findings that our work revealed, will motivate new studies across different scientific and industrial settings
Neural radiance fields for heads: towards accurate digital avatars
La digitalització d'éssers humans en entorns 3D ha estat durant dècades objecte d'estudi en la visió per computador i els gràfics digitals, però és encara un problema obert. Avui dia, cap tecnologia és capaç de digitalitzar persones amb una qualitat i dinamisme excel·lent, i que pugui ser utilitzada en motors 3D, com ara un casc de realitat virtual o un telèfon mòbil, en temps real.
En aquesta tesi, intentem contribuir a aquest problema explorant com combinar els dos mètodes més usats en els últims anys: \textit{neural radiance fields} i models paramètrics 3D. Intentem dissenyar un model capaç de crear avatars digitals i animables de cares humans a velocitats raonables. La nostra feina se centra principalment a crear un model d'aprenentatge automàtic capaç de generar un avatar facial a partir d'una col·lecció d'imatges i càmeres, però també desenvolupem una eina per integrar l'obtenció d'aquestes dades, de manera que podem provar el nostre mètode en dades reals. A més, també implementem una llibreria per generar dades sintètiques, per tal de controlar els errors que podrien sorgir quan s'obtenen dades reals, per exemple problemes amb la calibració de les càmeres, i facilitar el desenvolupament d'altres projectes relacionats amb humans.La digitalización de seres humanos en entornos 3D ha sido durante décadas objeto de estudio en la visión por computador y los gráficos digitales, pero es aún un problema abierto. Actualmente, ninguna tecnología es capaz de digitalizar personas con una cualidad y dinamismo excelente, y que pueda ser usada en motores 3D, como por ejemplo un casco de realidad virtual o un teléfono móvil, en tiempo real.
En esta tesis, intentamos contribuir a este problema explorando como combinar los dos métodos más usados en los últimos años: \textit{neural radiance fields} y modelos paramétricos 3D. Intentamos diseñar un modelo capaz de crear avatares digitales y animables de caras humanas a velocidades razonables. Nuestro trabajo se centra principalmente en crear un modelo de aprendizaje automático capaz de generar un avatar facial a partir de una colección de imágenes y cámaras, pero también desarrollamos una herramienta para obtener estos datos, de manera que podemos probar nuestro método en datos reales. Además, también implementamos una librería para generar datos sintéticos, para poder controlar los errores que pueden surgir al obtener datos reales, como problemas con la calibración de las cámaras, y facilitar el desarrollo de otros proyectos relacionados con humanos.Digitalizing humans in 3D environments has been a subject of study in computer vision and computer graphics for decades, but it still remains an open problem. No current technology can digitalize humans with excellent quality and dynamism that can be used in 3D engines, such as in a virtual reality headset or a mobile phone, at real-time speeds.
In this thesis, we aim to contribute to this problem by exploring how to combine the two most commonly used approaches in recent years: neural radiance fields and parametric 3D meshes. We attempt to design a model capable of creating digital, animatable avatars of human faces at reasonable speeds. Our work focuses mostly on creating a machine learning model capable of generating a facial avatar from a set of images and camera poses, but we will also build a pipeline to integrate all steps of obtaining such data, allowing us to demonstrate our method in real-world data. Additionally, we implement a framework to generate synthetic data, in order to alleviate the errors in obtaining real-data, such as problems with camera calibration, and facilitate the development of other human-related projects.Outgoin
The 8th International Conference on Time Series and Forecasting
The aim of ITISE 2022 is to create a friendly environment that could lead to the establishment or strengthening of scientific collaborations and exchanges among attendees. Therefore, ITISE 2022 is soliciting high-quality original research papers (including significant works-in-progress) on any aspect time series analysis and forecasting, in order to motivating the generation and use of new knowledge, computational techniques and methods on forecasting in a wide range of fields
An automated algorithmic approach for activity recognition and step detection in the presence of functional compromise
Wearable technology is a potential stepping stone towards personalised healthcare. It provides the opportunity to collect objective physical activity data from the users and could enable clinicians to make more informed decisions and hence provide better treatments. Current physical activity monitors generally work well in healthy populations but can be problematic when used in some patient groups with severely abnormal function.
We studied healthy volunteers to assess how different algorithms might perform for those with normal and simulated-pathological conditions. Participants (n=30) were recruited from the University of Leeds to perform nine predifined activities under normal and simulated-pathological conditions using two MOX accelerometers on wrist and ankle (Maastricht Instruments, NL). Condition classification was performed using a Support Vector Machine algorithm. Activity classification was performed with five different Machine Learning algorithms: Support Vector Machine, k-Nearest Neighbour, Random Forest, Multilayer Perceptron, and Naive Bayes. A step count algorithm was developed based on pattern recognition approach, using two main techniques, Dynamic Time Warping and Dynamic Time Warping-Barycentre Averaging. Finally, synthetic acceleration signal was generated that represented walking activities since there was limited access to patient data and to refine synthetic data generation in this field. Three dynamic coupled equations were used to represent the morphology of the desired signal.
Wrist and ankle locations performed similarly and the wrist location was used for further analysis. Both condition and activity classification algorithms achieved good performance metrics i.e. that the volunteer has been correctly classified in the right condition, and the activities performed have been correctly recognised. Additionally, the novel step count algorithm achieved more accurate results for both conditions in comparison to existing algorithms from the literature. Finally, the signal generation approach seems promising since the normal condition synthetic signals matched closely to their associated original signals.
Algorithms developed for a specific group or even person with functional pathology, using techniques such as Dynamic Time Warping-Barycentre Averaging produce better results than traditional algorithms trained on data from a different group
- …