54 research outputs found

    Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm

    Get PDF
    A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of “warped” time series, which then allows us to create super-efficient nearest “centroid” classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors

    Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping

    Full text link
    The proliferation and ubiquity of temporal data across many disciplines has sparked interest for similarity, classification and clustering methods specifically designed to handle time series data. A core issue when dealing with time series is determining their pairwise similarity, i.e., the degree to which a given time series resembles another. Traditional distance measures such as the Euclidean are not well-suited due to the time-dependent nature of the data. Elastic metrics such as dynamic time warping (DTW) offer a promising approach, but are limited by their computational complexity, non-differentiability and sensitivity to noise and outliers. This thesis proposes novel elastic alignment methods that use parametric \& diffeomorphic warping transformations as a means of overcoming the shortcomings of DTW-based metrics. The proposed method is differentiable \& invertible, well-suited for deep learning architectures, robust to noise and outliers, computationally efficient, and is expressive and flexible enough to capture complex patterns. Furthermore, a closed-form solution was developed for the gradient of these diffeomorphic transformations, which allows an efficient search in the parameter space, leading to better solutions at convergence. Leveraging the benefits of these closed-form diffeomorphic transformations, this thesis proposes a suite of advancements that include: (a) an enhanced temporal transformer network for time series alignment and averaging, (b) a deep-learning based time series classification model to simultaneously align and classify signals with high accuracy, (c) an incremental time series clustering algorithm that is warping-invariant, scalable and can operate under limited computational and time resources, and finally, (d) a normalizing flow model that enhances the flexibility of affine transformations in coupling and autoregressive layers.Comment: PhD Thesis, defended at the University of Navarra on July 17, 2023. 277 pages, 8 chapters, 1 appendi

    Statistical learning in complex and temporal data: distances, two-sample testing, clustering, classification and Big Data

    Get PDF
    Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Resumo] Esta tesis trata sobre aprendizaxe estatístico en obxetos complexos, con énfase en series temporais. O problema abórdase introducindo coñecemento sobre o dominio do fenómeno subxacente, mediante distancias e características. Proponse un contraste de dúas mostras basado en distancias e estúdase o seu funcionamento nun gran abanico de escenarios. As distancias para clasificación e clustering de series temporais acadan un incremento da potencia estatística cando se aplican a contrastes de dúas mostras. O noso test compárase de xeito favorable con outros métodos gracias á súa flexibilidade ante diferentes alternativas. Defínese unha nova distancia entre series temporais mediante un xeito innovador de comparar as distribucións retardadas das series. Esta distancia herda o bo funcionamento empírico doutros métodos pero elimina algunhas das súas limitacións. Proponse un método de predicción baseada en características das series. O método combina diferentes algoritmos estándar de predicción mediante unha suma ponderada. Os pesos desta suma veñen dun modelo que se axusta a un conxunto de entrenamento de gran tamaño. Propónse un método de clasificación distribuida, baseado en comparar, mediante unha distancia, as funcións de distribución empíricas do conxuto de proba común e as dos datos que recibe cada nodo de cómputo.[Resumen] Esta tesis trata sobre aprendizaje estadístico en objetos complejos, con énfasis en series temporales. El problema se aborda introduciendo conocimiento del dominio del fenómeno subyacente, mediante distancias y características. Se propone un test de dos muestras basado en distancias y se estudia su funcionamiento en un gran abanico de escenarios. La distancias para clasificación y clustering de series temporales consiguen un incremento de la potencia estadística cuando se aplican al tests de dos muestras. Nuestro test se compara favorablemente con otros métodos gracias a su flexibilidad antes diferentes alternativas. Se define una nueva distancia entre series temporales mediante una manera innovadora de comparar las distribuciones retardadas de la series. Esta distancia hereda el buen funcionamiento empírico de otros métodos pero elimina algunas de sus limitaciones. Se propone un método de predicción basado en características de las series. El método combina diferentes algoritmos estándar de predicción mediante una suma ponderada. Los pesos de esta suma salen de un modelo que se ajusta a un conjunto de entrenamiento de gran tamaño. Se propone un método de clasificación distribuida, basado en comparar, mediante una distancia, las funciones de distribución empírica del conjuto de prueba común y las de los datos que recibe cada nodo de cómputo.[Abstract] This thesis deals with the problem of statistical learning in complex objects, with emphasis on time series data. The problem is approached by facilitating the introduction of domain knoweldge of the underlying phenomena by means of distances and features. A distance-based two sample test is proposed, and its performance is studied under a wide range of scenarios. Distances for time series classification and clustering are also shown to increase statistical power when applied to two-sample testing. Our test compares favorably to other methods regarding its flexibility against different alternatives. A new distance for time series is defined by considering an innovative way of comparing lagged distributions of the series. This distance inherits the good empirical performance of existing methods while removing some of their limitations. A forecast method based on times series features is proposed. The method works by combining individual standard forecasting algorithms using a weighted average. These weights come from a learning model fitted on a large training set. A distributed classification algorithm is proposed, based on comparing, using a distance, the empirical distribution functions between the dataset that each computing node receives and the test set

    Template estimation for samples of curves and functional calibration estimation via the method of maximum entropy on the mean

    Get PDF
    L'une des principales difficultés de l'analyse des données fonctionnelles consiste à extraire un motif commun qui synthétise l'information contenue par toutes les fonctions de l'échantillon. Le Chapitre 2 examine le problème d'identification d'une fonction qui représente le motif commun en supposant que les données appartiennent à une variété ou en sont suffisamment proches, d'une variété non linéaire de basse dimension intrinsèque munie d'une structure géométrique inconnue et incluse dans un espace de grande dimension. Sous cette hypothèse, un approximation de la distance géodésique est proposé basé sur une version modifiée de l'algorithme Isomap. Cette approximation est utilisée pour calculer la fonction médiane empirique de Fréchet correspondante. Cela fournit un estimateur intrinsèque robuste de la forme commune. Le Chapitre 3 étudie les propriétés asymptotiques de la méthode de normalisation quantile développée par Bolstad, et al. (2003) qui est devenue l'une des méthodes les plus populaires pour aligner des courbes de densité en analyse de données de microarrays en bioinformatique. Les propriétés sont démontrées considérant la méthode comme un cas particulier de la procédure de la moyenne structurelle pour l'alignement des courbes proposée par Dupuy, Loubes and Maza (2011). Toutefois, la méthode échoue dans certains cas. Ainsi, nous proposons une nouvelle méthode, pour faire face à ce problème. Cette méthode utilise l'algorithme développée dans le Chapitre 2. Dans le Chapitre 4, nous étendons le problème d'estimation de calage pour la moyenne d'une population finie de la variable de sondage dans un cadre de données fonctionnelles. Nous considérons le problème de l'estimation des poids de sondage fonctionnel à travers le principe du maximum d'entropie sur la moyenne -MEM-. En particulier, l'estimation par calage est considérée comme un problème inverse linéaire de dimension infinie suivant la structure de l'approche du MEM. Nous donnons un résultat précis d'estimation des poids de calage fonctionnels pour deux types de mesures aléatoires a priori: la measure Gaussienne centrée et la measure de Poisson généralisée.One of the main difficulties in functional data analysis is the extraction of a meaningful common pattern that summarizes the information conveyed by all functions in the sample. The problem of finding a meaningful template function that represents this pattern is considered in Chapter 2 assuming that the functional data lie on an intrinsically low-dimensional smooth manifold with an unknown underlying geometric structure embedding in a high-dimensional space. Under this setting, an approximation of the geodesic distance is developed based on a robust version of the Isomap algorithm. This approximation is used to compute the corresponding empirical Fréchet median function, which provides a robust intrinsic estimator of the template. The Chapter 3 investigates the asymptotic properties of the quantile normalization method by Bolstad, et al. (2003) which is one of the most popular methods to align density curves in microarray data analysis. The properties are proved by considering the method as a particular case of the structural mean curve alignment procedure by Dupuy, Loubes and Maza (2011). However, the method fails in some case of mixtures, and a new methodology to cope with this issue is proposed via the algorithm developed in Chapter 2. Finally, the problem of calibration estimation for the finite population mean of a survey variable under a functional data framework is studied in Chapter 4. The functional calibration sampling weights of the estimator are obtained by matching the calibration estimation problem with the maximum entropy on the mean -MEM- principle. In particular, the calibration estimation is viewed as an infinite-dimensional linear inverse problem following the structure of the MEM approach. A precise theoretical setting is given and the estimation of functional calibration weights assuming, as prior measures, the centered Gaussian and compound Poisson random measures is carried out

    Deep learning for time series classification

    Full text link
    Time series analysis is a field of data science which is interested in analyzing sequences of numerical values ordered in time. Time series are particularly interesting because they allow us to visualize and understand the evolution of a process over time. Their analysis can reveal trends, relationships and similarities across the data. There exists numerous fields containing data in the form of time series: health care (electrocardiogram, blood sugar, etc.), activity recognition, remote sensing, finance (stock market price), industry (sensors), etc. Time series classification consists of constructing algorithms dedicated to automatically label time series data. The sequential aspect of time series data requires the development of algorithms that are able to harness this temporal property, thus making the existing off-the-shelf machine learning models for traditional tabular data suboptimal for solving the underlying task. In this context, deep learning has emerged in recent years as one of the most effective methods for tackling the supervised classification task, particularly in the field of computer vision. The main objective of this thesis was to study and develop deep neural networks specifically constructed for the classification of time series data. We thus carried out the first large scale experimental study allowing us to compare the existing deep methods and to position them compared other non-deep learning based state-of-the-art methods. Subsequently, we made numerous contributions in this area, notably in the context of transfer learning, data augmentation, ensembling and adversarial attacks. Finally, we have also proposed a novel architecture, based on the famous Inception network (Google), which ranks among the most efficient to date.Comment: PhD thesi

    Klasifikasi Penentuan Pengajuan Kartu Kredit Menggunakan K-Nearest Neighbor

    Get PDF
    A credit card is a device payment issued by the bank certain made of plastic and useful as a tool payment on credit carried out by the owner of the card or in accordance with the name of listed in a credit card is on when making purchases goods or services. The problems facing in giving a credit cards to customers bank that have signed up is difficult to determine the category of a credit cards in accordance with the customer bank. By doing this research is expected to facilitate the bank or the analysis to determine the category of a credit card to customers bank right. The research used is by applying methods K-Nearest Neighbor to classify prospective customers in the making a credit card in accordance with the category of  customers by using data customers at the Bank BNI Syariah Surabaya. A method K-Nearest Neighbor used to seek patterns on the data customers so established variable as factors supporters in the form of gender, the status of the house, the status, the number of dependants (children), a profession and revenue annually. The results of this research shows that an average of the value of precision of 92%, the value of recall of 83%, and the value of accuracy of 93%. Thus, this application is effective to help analyst credit cards in classifying customers to get credit cards that appropriate criteria

    Neural radiance fields for heads: towards accurate digital avatars

    Get PDF
    La digitalització d'éssers humans en entorns 3D ha estat durant dècades objecte d'estudi en la visió per computador i els gràfics digitals, però és encara un problema obert. Avui dia, cap tecnologia és capaç de digitalitzar persones amb una qualitat i dinamisme excel·lent, i que pugui ser utilitzada en motors 3D, com ara un casc de realitat virtual o un telèfon mòbil, en temps real. En aquesta tesi, intentem contribuir a aquest problema explorant com combinar els dos mètodes més usats en els últims anys: \textit{neural radiance fields} i models paramètrics 3D. Intentem dissenyar un model capaç de crear avatars digitals i animables de cares humans a velocitats raonables. La nostra feina se centra principalment a crear un model d'aprenentatge automàtic capaç de generar un avatar facial a partir d'una col·lecció d'imatges i càmeres, però també desenvolupem una eina per integrar l'obtenció d'aquestes dades, de manera que podem provar el nostre mètode en dades reals. A més, també implementem una llibreria per generar dades sintètiques, per tal de controlar els errors que podrien sorgir quan s'obtenen dades reals, per exemple problemes amb la calibració de les càmeres, i facilitar el desenvolupament d'altres projectes relacionats amb humans.La digitalización de seres humanos en entornos 3D ha sido durante décadas objeto de estudio en la visión por computador y los gráficos digitales, pero es aún un problema abierto. Actualmente, ninguna tecnología es capaz de digitalizar personas con una cualidad y dinamismo excelente, y que pueda ser usada en motores 3D, como por ejemplo un casco de realidad virtual o un teléfono móvil, en tiempo real. En esta tesis, intentamos contribuir a este problema explorando como combinar los dos métodos más usados en los últimos años: \textit{neural radiance fields} y modelos paramétricos 3D. Intentamos diseñar un modelo capaz de crear avatares digitales y animables de caras humanas a velocidades razonables. Nuestro trabajo se centra principalmente en crear un modelo de aprendizaje automático capaz de generar un avatar facial a partir de una colección de imágenes y cámaras, pero también desarrollamos una herramienta para obtener estos datos, de manera que podemos probar nuestro método en datos reales. Además, también implementamos una librería para generar datos sintéticos, para poder controlar los errores que pueden surgir al obtener datos reales, como problemas con la calibración de las cámaras, y facilitar el desarrollo de otros proyectos relacionados con humanos.Digitalizing humans in 3D environments has been a subject of study in computer vision and computer graphics for decades, but it still remains an open problem. No current technology can digitalize humans with excellent quality and dynamism that can be used in 3D engines, such as in a virtual reality headset or a mobile phone, at real-time speeds. In this thesis, we aim to contribute to this problem by exploring how to combine the two most commonly used approaches in recent years: neural radiance fields and parametric 3D meshes. We attempt to design a model capable of creating digital, animatable avatars of human faces at reasonable speeds. Our work focuses mostly on creating a machine learning model capable of generating a facial avatar from a set of images and camera poses, but we will also build a pipeline to integrate all steps of obtaining such data, allowing us to demonstrate our method in real-world data. Additionally, we implement a framework to generate synthetic data, in order to alleviate the errors in obtaining real-data, such as problems with camera calibration, and facilitate the development of other human-related projects.Outgoin

    The 8th International Conference on Time Series and Forecasting

    Get PDF
    The aim of ITISE 2022 is to create a friendly environment that could lead to the establishment or strengthening of scientific collaborations and exchanges among attendees. Therefore, ITISE 2022 is soliciting high-quality original research papers (including significant works-in-progress) on any aspect time series analysis and forecasting, in order to motivating the generation and use of new knowledge, computational techniques and methods on forecasting in a wide range of fields

    An automated algorithmic approach for activity recognition and step detection in the presence of functional compromise

    Get PDF
    Wearable technology is a potential stepping stone towards personalised healthcare. It provides the opportunity to collect objective physical activity data from the users and could enable clinicians to make more informed decisions and hence provide better treatments. Current physical activity monitors generally work well in healthy populations but can be problematic when used in some patient groups with severely abnormal function. We studied healthy volunteers to assess how different algorithms might perform for those with normal and simulated-pathological conditions. Participants (n=30) were recruited from the University of Leeds to perform nine predifined activities under normal and simulated-pathological conditions using two MOX accelerometers on wrist and ankle (Maastricht Instruments, NL). Condition classification was performed using a Support Vector Machine algorithm. Activity classification was performed with five different Machine Learning algorithms: Support Vector Machine, k-Nearest Neighbour, Random Forest, Multilayer Perceptron, and Naive Bayes. A step count algorithm was developed based on pattern recognition approach, using two main techniques, Dynamic Time Warping and Dynamic Time Warping-Barycentre Averaging. Finally, synthetic acceleration signal was generated that represented walking activities since there was limited access to patient data and to refine synthetic data generation in this field. Three dynamic coupled equations were used to represent the morphology of the desired signal. Wrist and ankle locations performed similarly and the wrist location was used for further analysis. Both condition and activity classification algorithms achieved good performance metrics i.e. that the volunteer has been correctly classified in the right condition, and the activities performed have been correctly recognised. Additionally, the novel step count algorithm achieved more accurate results for both conditions in comparison to existing algorithms from the literature. Finally, the signal generation approach seems promising since the normal condition synthetic signals matched closely to their associated original signals. Algorithms developed for a specific group or even person with functional pathology, using techniques such as Dynamic Time Warping-Barycentre Averaging produce better results than traditional algorithms trained on data from a different group
    corecore