160 research outputs found

    DESQ: Frequent Sequence Mining with Subsequence Constraints

    Full text link
    Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this paper, we show that many subsequence constraints---including and beyond those considered in the literature---can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive "pattern expressions" to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to compressed finite state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms---although more general---are competitive to existing state-of-the-art algorithms.Comment: Long version of the paper accepted at the IEEE ICDM 2016 conferenc

    Mining for Frequent Events in Time Series

    Get PDF
    While much work has been done in mining nominal sequential data much less has been done on mining numeric time series data. This stems primarily from the problems of relating numeric data, which likely contains error or other variations which make directly relating values difficult. To handle this problem, many algorithms first convert data into a sequence of events. In some cases these events are known a priori, but in others they are not. Our work evaluates a set of time series data instances in order to determine likely candidates for unknown underlying events. We use the concept of bounding envelopes to represent the area around a numeric time series in which the unknown noise-free points could exist. We then use an algorithm similar to Apriori to build up sets of envelope intersections. The areas created by these intersections represent common patterns found throughout the data

    多様なポストゲノムデータのためのアラインメントフリーなアルゴリズムの構造

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 今井 浩, 東京大学教授 小林 直樹, 東京大学教授 五十嵐 健夫, 東京大学教授 杉山 将, 東京大学講師 笠原 雅弘University of Tokyo(東京大学

    OPR-Miner: Order-preserving rule mining for time series

    Full text link
    Discovering frequent trends in time series is a critical task in data mining. Recently, order-preserving matching was proposed to find all occurrences of a pattern in a time series, where the pattern is a relative order (regarded as a trend) and an occurrence is a sub-time series whose relative order coincides with the pattern. Inspired by the order-preserving matching, the existing order-preserving pattern (OPP) mining algorithm employs order-preserving matching to calculate the support, which leads to low efficiency. To address this deficiency, this paper proposes an algorithm called efficient frequent OPP miner (EFO-Miner) to find all frequent OPPs. EFO-Miner is composed of four parts: a pattern fusion strategy to generate candidate patterns, a matching process for the results of sub-patterns to calculate the support of super-patterns, a screening strategy to dynamically reduce the size of prefix and suffix arrays, and a pruning strategy to further dynamically prune candidate patterns. Moreover, this paper explores the order-preserving rule (OPR) mining and proposes an algorithm called OPR-Miner to discover strong rules from all frequent OPPs using EFO-Miner. Experimental results verify that OPR-Miner gives better performance than other competitive algorithms. More importantly, clustering and classification experiments further validate that OPR-Miner achieves good performance

    Scalable frequent sequence mining with flexible subsequence constraints

    Get PDF
    We study scalable algorithms for frequent sequence mining under flexible subsequence constraints. Such constraints enable applications to specify concisely which patterns are of interest and which are not. We focus on the bulk synchronous parallel model with one round of communication; this model is suitable for platforms such as MapReduce or Spark. We derive a general framework for frequent sequence mining under this model and propose the D-SEQ and D-CAND algorithms within this framework. The algorithms differ in what data are communicated and how computation is split up among workers. To the best of our knowledge, D-SEQ and D-CAND are the first scalable algorithms for frequent sequence mining with flexible constraints. We conducted an experimental study on multiple real-world datasets that suggests that our algorithms scale nearly linearly, outperform common baselines, and offer acceptable generalization overhead over existing, less general mining algorithms

    Analyzing very large time series using suffix arrays

    Get PDF

    A schema framework for graph event data

    Get PDF

    IoT-MQTT based denial of service attack modelling and detection

    Get PDF
    Internet of Things (IoT) is poised to transform the quality of life and provide new business opportunities with its wide range of applications. However, the bene_ts of this emerging paradigm are coupled with serious cyber security issues. The lack of strong cyber security measures in protecting IoT systems can result in cyber attacks targeting all the layers of IoT architecture which includes the IoT devices, the IoT communication protocols and the services accessing the IoT data. Various IoT malware such as Mirai, BASHLITE and BrickBot show an already rising IoT device based attacks as well as the usage of infected IoT devices to launch other cyber attacks. However, as sustained IoT deployment and functionality are heavily reliant on the use of e_ective data communication protocols, the attacks on other layers of IoT architecture are anticipated to increase. In the IoT landscape, the publish/- subscribe based Message Queuing Telemetry Transport (MQTT) protocol is widely popular. Hence, cyber security threats against the MQTT protocol are projected to rise at par with its increasing use by IoT manufacturers. In particular, the Internet exposed MQTT brokers are vulnerable to protocolbased Application Layer Denial of Service (DoS) attacks, which have been known to cause wide spread service disruptions in legacy systems. In this thesis, we propose Application Layer based DoS attacks that target the authentication and authorisation mechanism of the the MQTT protocol. In addition, we also propose an MQTT protocol attack detection framework based on machine learning. Through extensive experiments, we demonstrate the impact of authentication and authorisation DoS attacks on three opensource MQTT brokers. Based on the proposed DoS attack scenarios, an IoT-MQTT attack dataset was generated to evaluate the e_ectiveness of the proposed framework to detect these malicious attacks. The DoS attack evaluation results obtained indicate that such attacks can overwhelm the MQTT brokers resources even when legitimate access to it was denied and resources were restricted. The evaluations also indicate that the proposed DoS attack scenarios can signi_cantly increase the MQTT message delay, especially in QoS2 messages causing heavy tail latencies. In addition, the proposed MQTT features showed high attack detection accuracy compared to simply using TCP based features to detect MQTT based attacks. It was also observed that the protocol _eld size and length based features drastically reduced the false positive rates and hence, are suitable for detecting IoT based attacks
    corecore