674 research outputs found

    Sintel: A Machine Learning Framework to Extract Insights from Signals

    Full text link
    The detection of anomalies in time series data is a critical task with many monitoring applications. Existing systems often fail to encompass an end-to-end detection process, to facilitate comparative analysis of various anomaly detection methods, or to incorporate human knowledge to refine output. This precludes current methods from being used in real-world settings by practitioners who are not ML experts. In this paper, we introduce Sintel, a machine learning framework for end-to-end time series tasks such as anomaly detection. The framework uses state-of-the-art approaches to support all steps of the anomaly detection process. Sintel logs the entire anomaly detection journey, providing detailed documentation of anomalies over time. It enables users to analyze signals, compare methods, and investigate anomalies through an interactive visualization tool, where they can annotate, modify, create, and remove events. Using these annotations, the framework leverages human knowledge to improve the anomaly detection pipeline. We demonstrate the usability, efficiency, and effectiveness of Sintel through a series of experiments on three public time series datasets, as well as one real-world use case involving spacecraft experts tasked with anomaly analysis tasks. Sintel's framework, code, and datasets are open-sourced at https://github.com/sintel-dev/.Comment: This work is accepted by ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD 2022

    In-Network Outlier Detection in Wireless Sensor Networks

    Full text link
    To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy usage,(3) only uses single hop communication thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance using simulation with real sensor data streams. Our results demonstrate that our approach is accurate and imposes a reasonable communication load and level of power consumption.Comment: Extended version of a paper appearing in the Int'l Conference on Distributed Computing Systems 200

    Towards Learning Discrete Representations via Self-Supervision for Wearables-Based Human Activity Recognition

    Full text link
    Human activity recognition (HAR) in wearable computing is typically based on direct processing of sensor data. Sensor readings are translated into representations, either derived through dedicated preprocessing, or integrated into end-to-end learning. Independent of their origin, for the vast majority of contemporary HAR, those representations are typically continuous in nature. That has not always been the case. In the early days of HAR, discretization approaches have been explored - primarily motivated by the desire to minimize computational requirements, but also with a view on applications beyond mere recognition, such as, activity discovery, fingerprinting, or large-scale search. Those traditional discretization approaches, however, suffer from substantial loss in precision and resolution in the resulting representations with detrimental effects on downstream tasks. Times have changed and in this paper we propose a return to discretized representations. We adopt and apply recent advancements in Vector Quantization (VQ) to wearables applications, which enables us to directly learn a mapping between short spans of sensor data and a codebook of vectors, resulting in recognition performance that is generally on par with their contemporary, continuous counterparts - sometimes surpassing them. Therefore, this work presents a proof-of-concept for demonstrating how effective discrete representations can be derived, enabling applications beyond mere activity classification but also opening up the field to advanced tools for the analysis of symbolic sequences, as they are known, for example, from domains such as natural language processing. Based on an extensive experimental evaluation on a suite of wearables-based benchmark HAR tasks, we demonstrate the potential of our learned discretization scheme and discuss how discretized sensor data analysis can lead to substantial changes in HAR

    Discovering human activities from binary data in smart homes

    Get PDF
    With the rapid development in sensing technology, data mining, and machine learning fields for human health monitoring, it became possible to enable monitoring of personal motion and vital signs in a manner that minimizes the disruption of an individual’s daily routine and assist individuals with difficulties to live independently at home. A primary difficulty that researchers confront is acquiring an adequate amount of labeled data for model training and validation purposes. Therefore, activity discovery handles the problem that activity labels are not available using approaches based on sequence mining and clustering. In this paper, we introduce an unsupervised method for discovering activities from a network of motion detectors in a smart home setting. First, we present an intra-day clustering algorithm to find frequent sequential patterns within a day. As a second step, we present an inter-day clustering algorithm to find the common frequent patterns between days. Furthermore, we refine the patterns to have more compressed and defined cluster characterizations. Finally, we track the occurrences of various regular routines to monitor the functional health in an individual’s patterns and lifestyle. We evaluate our methods on two public data sets captured in real-life settings from two apartments during seven-month and three-month periods

    Patterns in Motion - From the Detection of Primitives to Steering Animations

    Get PDF
    In recent decades, the world of technology has developed rapidly. Illustrative of this trend is the growing number of affrdable methods for recording new and bigger data sets. The resulting masses of multivariate and high-dimensional data represent a new challenge for research and industry. This thesis is dedicated to the development of novel methods for processing multivariate time series data, thus meeting this Data Science related challenge. This is done by introducing a range of different methods designed to deal with time series data. The variety of methods re ects the different requirements and the typical stage of data processing ranging from pre-processing to post- processing and data recycling. Many of the techniques introduced work in a general setting. However, various types of motion recordings of human and animal subjects were chosen as representatives of multi-variate time series. The different data modalities include Motion Capture data, accelerations, gyroscopes, electromyography, depth data (Kinect) and animated 3D-meshes. It is the goal of this thesis to provide a deeper understanding of working with multi-variate time series by taking the example of multi-variate motion data. However, in order to maintain an overview of the matter, the thesis follows a basic general pipeline. This pipeline was developed as a guideline for time series processing and is the first contribution of this work. Each part of the thesis represents one important stage of this pipeline which can be summarized under the topics segmentation, analysis and synthesis. Specific examples of different data modalities, processing requirements and methods to meet those are discussed in the chapters of the respective parts. One important contribution of this thesis is a novel method for temporal segmentation of motion data. It is based on the idea of self-similarities within motion data and is capable of unsupervised segmentation of range of motion data into distinct activities and motion primitives. The examples concerned with the analysis of multi-variate time series re ect the role of data analysis in different inter-disciplinary contexts and also the variety of requirements that comes with collaboration with other sciences. These requirements are directly connected to current challenges in data science. Finally, the problem of synthesis of multi-variate time series is discussed using a graph-based example and examples related to rigging or steering of meshes. Synthesis is an important stage in data processing because it creates new data from existing ones in a controlled way. This makes exploiting existing data sets and and access of more condensed data possible, thus providing feasible alternatives to otherwise time-consuming manual processing.Muster in Bewegung - Von der Erkennung von Primitiven zur Steuerung von Animationen In den letzten Jahrzehnten hat sich die Welt der Technologie rapide entwickelt. Beispielhaft für diese Entwicklung ist die wachsende Zahl erschwinglicher Methoden zum Aufzeichnen neuer und immer größerer Datenmengen. Die sich daraus ergebenden Massen multivariater und hochdimensionaler Daten stellen Forschung wie Industrie vor neuartige Probleme. Diese Arbeit ist der Entwicklung neuer Verfahren zur Verarbeitung multivariater Zeitreihen gewidmet und stellt sich damit einer großen Herausforderung, welche unmittelbar mit dem neuen Feld der sogenannten Data Science verbunden ist. In ihr werden ein Reihe von verschiedenen Verfahren zur Verarbeitung multivariater Zeitserien eingeführt. Die verschiedenen Verfahren gehen jeweils auf unterschiedliche Anforderungen und typische Stadien der Datenverarbeitung ein und reichen von Vorverarbeitung bis zur Nachverarbeitung und darüber hinaus zur Wiederverwertung. Viele der vorgestellten Techniken eignen sich zur Verarbeitung allgemeiner multivariater Zeitreihen. Allerdings wurden hier eine Anzahl verschiedenartiger Aufnahmen von menschlichen und tierischen Subjekte ausgewählt, welche als Vertreter für allgemeine multivariate Zeitreihen gelten können. Zu den unterschiedlichen Modalitäten der Aufnahmen gehören Motion Capture Daten, Beschleunigungen, Gyroskopdaten, Elektromyographie, Tiefenbilder ( Kinect ) und animierte 3D -Meshes. Es ist das Ziel dieser Arbeit, am Beispiel der multivariaten Bewegungsdaten ein tieferes Verstndnis für den Umgang mit multivariaten Zeitreihen zu vermitteln. Um jedoch einen Überblick ber die Materie zu wahren, folgt sie jedoch einer grundlegenden und allgemeinen Pipeline. Diese Pipeline wurde als Leitfaden für die Verarbeitung von Zeitreihen entwickelt und ist der erste Beitrag dieser Arbeit. Jeder weitere Teil der Arbeit behandelt eine von drei größeren Stationen in der Pipeline, welche sich unter unter die Themen Segmentierung, Analyse und Synthese eingliedern lassen. Beispiele verschiedener Datenmodalitäten und Anforderungen an ihre Verarbeitung erläutern die jeweiligen Verfahren. Ein wichtiger Beitrag dieser Arbeit ist ein neuartiges Verfahren zur zeitlichen Segmentierung von Bewegungsdaten. Dieses basiert auf der Idee der Selbstähnlichkeit von Bewegungsdaten und ist in der Lage, verschiedenste Bewegungsdaten voll-automatisch in unterschiedliche Aktivitäten und Bewegungs-Primitive zu zerlegen. Die Beispiele fr die Analyse multivariater Zeitreihen spiegeln die Rolle der Datenanalyse in verschiedenen interdisziplinären Zusammenhänge besonders wider und illustrieren auch die Vielfalt der Anforderungen, die sich in interdisziplinären Kontexten auftun. Schließlich wird das Problem der Synthese multivariater Zeitreihen unter Verwendung eines graph-basierten und eines Steering Beispiels diskutiert. Synthese ist insofern ein wichtiger Schritt in der Datenverarbeitung, da sie es erlaubt, auf kontrollierte Art neue Daten aus vorhandenen zu erzeugen. Dies macht die Nutzung bestehender Datensätze und den Zugang zu dichteren Datenmodellen möglich, wodurch Alternativen zur ansonsten zeitaufwendigen manuellen Verarbeitung aufgezeigt werden

    Acquisition and distribution of synergistic reactive control skills

    Get PDF
    Learning from demonstration is an afficient way to attain a new skill. In the context of autonomous robots, using a demonstration to teach a robot accelerates the robot learning process significantly. It helps to identify feasible solutions as starting points for future exploration or to avoid actions that lead to failure. But the acquisition of pertinent observationa is predicated on first segmenting the data into meaningful sequences. These segments form the basis for learning models capable of recognising future actions and reconstructing the motion to control a robot. Furthermore, learning algorithms for generative models are generally not tuned to produce stable trajectories and suffer from parameter redundancy for high degree of freedom robots This thesis addresses these issues by firstly investigating algorithms, based on dynamic programming and mixture models, for segmentation sensitivity and recognition accuracy on human motion capture data sets of repetitive and categorical motion classes. A stability analysis of the non-linear dynamical systems derived from the resultant mixture model representations aims to ensure that any trajectories converge to the intended target motion as observed in the demonstrations. Finally, these concepts are extended to humanoid robots by deploying a factor analyser for each mixture model component and coordinating the structure into a low dimensional representation of the demonstrated trajectories. This representation can be constructed as a correspondence map is learned between the demonstrator and robot for joint space actions. Applying these algorithms for demonstrating movement skills to robot is a further step towards autonomous incremental robot learning

    SSTS: A syntactic tool for pattern search on time series

    Get PDF
    We would like to acknowledge the financial support obtained from North Portugal Regional Operational Programme (NORTE 2020), Portugal 2020 and the European Regional Development Fund (ERDF) from European Union through the project Symbiotic technology for societal efficiency gains: Deus ex Machina (DEM), NORTE-01-0145-FEDER-000026. We would like to acknowledge as well the projects AHA CMUP-ERI/HCI/0046 and INSIDE CMUP-ERI/HCI/051/2013 both financed by Fundcao para a Ciencia e Tecnologia (FCT).Nowadays, data scientists are capable of manipulating and extracting complex information from time series data, given the current diversity of tools at their disposal. However, the plethora of tools that target data exploration and pattern search may require an extensive amount of time to develop methods that correspond to the data scientist's reasoning, in order to solve their queries. The development of new methods, tightly related with the reasoning and visual analysis of time series data, is of great relevance to improving complexity and productivity of pattern and query search tasks. In this work, we propose a novel tool, capable of exploring time series data for pattern and query search tasks in a set of 3 symbolic steps: Pre-Processing, Symbolic Connotation and Search. The framework is called SSTS (Symbolic Search in Time Series) and uses regular expression queries to search the desired patterns in a symbolic representation of the signal. By adopting a set of symbolic methods, this approach has the purpose of increasing the expressiveness in solving standard pattern and query tasks, enabling the creation of queries more closely related to the reasoning and visual analysis of the signal. We demonstrate the tool's effectiveness by presenting 9 examples with several types of queries on time series. The SSTS queries were compared with standard code developed in Python, in terms of cognitive effort, vocabulary required, code length, volume, interpretation and difficulty metrics based on the Halstead complexity measures. The results demonstrate that this methodology is a valid approach and delivers a new abstraction layer on data analysis of time series.publishersversionpublishe
    corecore