58,156 research outputs found
Discovering unbounded episodes in sequential data
One basic goal in the analysis of time-series data is
to find frequent interesting episodes, i.e, collections
of events occurring frequently together in the input sequence.
Most widely-known work decide the interestingness of an episode from a
fixed user-specified window width or interval, that bounds the
subsequent sequential association rules.
We present in this paper, a more intuitive definition that
allows, in turn, interesting episodes to grow during the mining without any
user-specified help. A convenient algorithm to
efficiently discover the proposed unbounded episodes is also implemented.
Experimental results confirm that our approach results useful
and advantageous.Postprint (published version
Using patterns position distribution for software failure detection
Pattern-based software failure detection is an important topic of research in recent years. In this method, a set of patterns from program execution traces are extracted, and represented as features, while their occurrence frequencies are treated as the corresponding feature values. But this conventional method has its limitation due to ignore the pattern’s position information, which is important for the classification of program traces. Patterns occurs in the different positions of the trace are likely to represent different meanings. In this paper, we present a novel approach for using pattern’s position distribution as features to detect software failure. The comparative experiments in both artificial and real datasets show the effectiveness of this method
HoloDetect: Few-Shot Learning for Error Detection
We introduce a few-shot learning framework for error detection. We show that
data augmentation (a form of weak supervision) is key to training high-quality,
ML-based error detection models that require minimal human involvement. Our
framework consists of two parts: (1) an expressive model to learn rich
representations that capture the inherent syntactic and semantic heterogeneity
of errors; and (2) a data augmentation model that, given a small seed of clean
records, uses dataset-specific transformations to automatically generate
additional training data. Our key insight is to learn data augmentation
policies from the noisy input dataset in a weakly supervised manner. We show
that our framework detects errors with an average precision of ~94% and an
average recall of ~93% across a diverse array of datasets that exhibit
different types and amounts of errors. We compare our approach to a
comprehensive collection of error detection methods, ranging from traditional
rule-based methods to ensemble-based and active learning approaches. We show
that data augmentation yields an average improvement of 20 F1 points while it
requires access to 3x fewer labeled examples compared to other ML approaches.Comment: 18 pages
Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare
For the last years, time-series mining has become a challenging issue for
researchers. An important application lies in most monitoring purposes, which
require analyzing large sets of time-series for learning usual patterns. Any
deviation from this learned profile is then considered as an unexpected
situation. Moreover, complex applications may involve the temporal study of
several heterogeneous parameters. In that paper, we propose a method for mining
heterogeneous multivariate time-series for learning meaningful patterns. The
proposed approach allows for mixed time-series -- containing both pattern and
non-pattern data -- such as for imprecise matches, outliers, stretching and
global translating of patterns instances in time. We present the early results
of our approach in the context of monitoring the health status of a person at
home. The purpose is to build a behavioral profile of a person by analyzing the
time variations of several quantitative or qualitative parameters recorded
through a provision of sensors installed in the home
- …