257 research outputs found
The Extended Edit Distance Metric
Similarity search is an important problem in information retrieval. This
similarity is based on a distance. Symbolic representation of time series has
attracted many researchers recently, since it reduces the dimensionality of
these high dimensional data objects. We propose a new distance metric that is
applied to symbolic data objects and we test it on time series data bases in a
classification task. We compare it to other distances that are well known in
the literature for symbolic data objects. We also prove, mathematically, that
our distance is metric.Comment: Technical repor
Dynamic PRA: an Overview of New Algorithms to Generate, Analyze and Visualize Data
State of the art PRA methods, i.e. Dynamic PRA
(DPRA) methodologies, largely employ system
simulator codes to accurately model system dynamics.
Typically, these system simulator codes (e.g., RELAP5 )
are coupled with other codes (e.g., ADAPT,
RAVEN that monitor and control the simulation. The
latter codes, in particular, introduce both deterministic
(e.g., system control logic, operating procedures) and
stochastic (e.g., component failures, variable uncertainties)
elements into the simulation. A typical DPRA analysis is
performed by:
1. Sampling values of a set of parameters from the
uncertainty space of interest
2. Simulating the system behavior for that specific set of
parameter values
3. Analyzing the set of simulation runs
4. Visualizing the correlations between parameter values
and simulation outcome
Step 1 is typically performed by randomly sampling
from a given distribution (i.e., Monte-Carlo) or selecting
such parameter values as inputs from the user (i.e.,
Dynamic Event Tre
One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification
For the last few decades, optimization has been developing at a fast rate.
Bio-inspired optimization algorithms are metaheuristics inspired by nature.
These algorithms have been applied to solve different problems in engineering,
economics, and other domains. Bio-inspired algorithms have also been applied in
different branches of information technology such as networking and software
engineering. Time series data mining is a field of information technology that
has its share of these applications too. In previous works we showed how
bio-inspired algorithms such as the genetic algorithms and differential
evolution can be used to find the locations of the breakpoints used in the
symbolic aggregate approximation of time series representation, and in another
work we showed how we can utilize the particle swarm optimization, one of the
famous bio-inspired algorithms, to set weights to the different segments in the
symbolic aggregate approximation representation. In this paper we present, in
two different approaches, a new meta optimization process that produces optimal
locations of the breakpoints in addition to optimal weights of the segments.
The experiments of time series classification task that we conducted show an
interesting example of how the overfitting phenomenon, a frequently encountered
problem in data mining which happens when the model overfits the training set,
can interfere in the optimization process and hide the superior performance of
an optimization algorithm
- …