1,364 research outputs found

    Unsupervised online activity discovery using temporal behaviour assumption

    Get PDF
    We present a novel unsupervised approach, UnADevs, for discovering activity clusters corresponding to periodic and stationary activities in streaming sensor data. Such activities usually last for some time, which is exploited by our method; it includes mechanisms to regulate sensitivity to brief outliers and can discover multiple clusters overlapping in time to better deal with deviations from nominal behaviour. The method was evaluated on two activity datasets containing large number of activities (14 and 33 respectively) against online agglomerative clustering and DBSCAN. In a multi-criteria evaluation, our approach achieved significantly better performance on majority of the measures, with the advantages that: (i) it does not require to specify the number of clusters beforehand (it is open ended); (ii) it is online and can find clusters in real time; (iii) it has constant time complexity; (iv) and it is memory efficient as it does not keep the data samples in memory. Overall, it has managed to discover 616 of the total 717 activities. Because it discovers clusters of activities in real time, it is ideal to work alongside an active learning system

    A Review of Subsequence Time Series Clustering

    Get PDF
    Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies

    Discord Monitoring for Streaming Time-Series

    Full text link
    Many applications generate time-series and analyze it. One of the most important time-series analysis tools is anomaly detection, and discord discovery aims at finding an anomaly subsequence in a time-series. Time-series is essentially dynamic, so monitoring the discord of a streaming time-series is an important problem. This paper addresses this problem and proposes SDM (Streaming Discord Monitoring), an algorithm that efficiently updates the discord of a streaming time-series over a sliding window. We show that SDM is approximation-friendly, i.e., the computational efficiency is accelerated by monitoring an approximate discord with theoretical bound. Our experiments on real datasets demonstrate the efficiency of SDM and its approximate version.This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-030-27615-7_6. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms.Kato S., Amagata D., Nishio S., et al. Discord Monitoring for Streaming Time-Series. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11706 LNCS, 79 (2019

    Complexity and Criticality in financial markets: systemic risk across frequencies and cross sections

    Get PDF
    Extreme market events and systemic collapses cause most of the popular attention to finance and financial markets. Extreme phenomena and the dynamics of con- nected/interacting systems have been the subject of financial modeling since early derivatives modeling, exposure risk modeling and portfolio construction. In the present work we discuss how traditional methods have for the most part failed to properly model the interconnected global financial and economic system. This led to systemic risk events and simplistic regulation which does not properly account for its implications. Analogously, we discuss how from as early as Mandelbrot’s works on financial prices and fat tails, academics, practitioners and regulators alike were warned of fat tails in financial modeling and in particular market making and derivatives pricing. The improper modeling or dismissal of these lies at the cen- tre of financial downturns ranging from LTCM’s collapse to the quant downturn of August 2007. The solution I promote in this thesis is that of complexity and criticality. In line with this we propose two lines of work. The former analyses markets as complex networks and their structure through to practical takeaways including a proof of concept for portfolio construction. The latter instead focuses on extreme events in high frequency markets with results for both tail modeling and systemic events and practical insights from those. Recent events have shown how retail investors and their savings are now heavily involved in financial markets. We hope that our contribution of methods of practical use for proper risk modeling will encourage their adoption by practitioners and regulators with the outcome of a more stable and efficient financial system

    Lesson Learned from Collecting Quantified Self Information via Mobile and Wearable Devices

    Get PDF
    The ubiquity and affordability of mobile and wearable devices has enabled us to continually and digitally record our daily life activities. Consequently, we are seeing the growth of data collection experiments in several scientific disciplines. Although these have yielded promising results, mobile and wearable data collection experiments are often restricted to a specific configuration that has been designed for a unique study goal. These approaches do not address all the real-world challenges of “continuous data collection” systems. As a result, there have been few discussions or reports about such issues that are faced when “implementing these platforms” in a practical situation. To address this, we have summarized our technical and user-centric findings from three lifelogging and Quantified Self data collection studies, which we have conducted in real-world settings, for both smartphones and smartwatches. In addition to (i) privacy and (ii) battery related issues; based on our findings we recommend further works to consider (iii) implementing multivariate reflection of the data; (iv) resolving the uncertainty and data loss; and (v) consider to minimize the manual intervention required by users. These findings have provided insights that can be used as a guideline for further Quantified Self or lifelogging studies

    Harnessing rare category trinity for complex data

    Get PDF
    In the era of big data, we are inundated with the sheer volume of data being collected from various domains. In contrast, it is often the rare occurrences that are crucially important to many high-impact domains with diverse data types. For example, in online transaction platforms, the percentage of fraudulent transactions might be small, but the resultant financial loss could be significant; in social networks, a novel topic is often neglected by the majority of users at the initial stage, but it could burst into an emerging trend afterward; in the Sloan Digital Sky Survey, the vast majority of sky images (e.g., known stars, comets, nebulae, etc.) are of no interest to the astronomers, while only 0.001% of the sky images lead to novel scientific discoveries; in the worldwide pandemics (e.g., SARS, MERS, COVID19, etc.), the primary cases might be limited, but the consequences could be catastrophic (e.g., mass mortality and economic recession). Therefore, studying such complex rare categories have profound significance and longstanding impact in many aspects of modern society, from preventing financial fraud to uncovering hot topics and trends, from supporting scientific research to forecasting pandemic and natural disasters. In this thesis, we propose a generic learning mechanism with trinity modules for complex rare category analysis: (M1) Rare Category Characterization - characterizing the rare patterns with a compact representation; (M2) Rare Category Explanation - interpreting the prediction results and providing relevant clues for the end-users; (M3) Rare Category Generation - producing synthetic rare category examples that resemble the real ones. The key philosophy of our mechanism lies in "all for one and one for all" - each module makes unique contributions to the whole mechanism and thus receives support from its companions. In particular, M1 serves as the de-novo step to discover rare category patterns on complex data; M2 provides a proper lens to the end-users to examine the outputs and understand the learning process; and M3 synthesizes real rare category examples for data augmentation to further improve M1 and M2. To enrich the learning mechanism, we develop principled theorems and solutions to characterize, understand, and synthesize rare categories on complex scenarios, ranging from static rare categories to time-evolving rare categories, from attributed data to graph-structured data, from homogeneous data to heterogeneous data, from low-order connectivity patterns to high-order connectivity patterns, etc. It is worthy of mentioning that we have also launched one of the first visual analytic systems for dynamic rare category analysis, which integrates our developed techniques and enables users to investigate complex rare categories in practice
    • 

    corecore