600 research outputs found
One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification
For the last few decades, optimization has been developing at a fast rate.
Bio-inspired optimization algorithms are metaheuristics inspired by nature.
These algorithms have been applied to solve different problems in engineering,
economics, and other domains. Bio-inspired algorithms have also been applied in
different branches of information technology such as networking and software
engineering. Time series data mining is a field of information technology that
has its share of these applications too. In previous works we showed how
bio-inspired algorithms such as the genetic algorithms and differential
evolution can be used to find the locations of the breakpoints used in the
symbolic aggregate approximation of time series representation, and in another
work we showed how we can utilize the particle swarm optimization, one of the
famous bio-inspired algorithms, to set weights to the different segments in the
symbolic aggregate approximation representation. In this paper we present, in
two different approaches, a new meta optimization process that produces optimal
locations of the breakpoints in addition to optimal weights of the segments.
The experiments of time series classification task that we conducted show an
interesting example of how the overfitting phenomenon, a frequently encountered
problem in data mining which happens when the model overfits the training set,
can interfere in the optimization process and hide the superior performance of
an optimization algorithm
Self-organising symbolic aggregate approximation for real-time fault detection and diagnosis in transient dynamic systems
The development of accurate fault detection and diagnosis (FDD) techniques are an important aspect of monitoring system health, whether it be an industrial machine or human system. In FDD systems where real-time or mobile monitoring is required there is a need to minimise computational overhead whilst maintaining detection and diagnosis accuracy. Symbolic Aggregate Approximation (SAX) is one such method, whereby reduced representations of signals are used to create symbolic representations for similarity search. Data reduction is achieved through application of the Piecewise Aggregate Approximation (PAA) algorithm. However, this can often lead to the loss of key information characteristics resulting in misclassification of signal types and a high risk of false alarms. This paper proposes a novel methodology based on SAX for generating more accurate symbolic representations, called Self-Organising Symbolic Aggregate Approximation (SOSAX). Data reduction is achieved through the application of an optimised PAA algorithm, Self-Organising Piecewise Aggregate Approximation (SOPAA). The approach is validated through the classification of electrocardiogram (ECG) signals where it is shown to outperform standard SAX in terms of inter-class separation and intra-class distance of signal types
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
Contributions to time series data mining towards the detection of outliers/anomalies
148 p.Los recientes avances tecnológicos han supuesto un gran progreso en la recogida de datos, permitiendo recopilar una gran cantidad de datos a lo largo del tiempo. Estos datos se presentan comúnmente en forma de series temporales, donde las observaciones se han registrado de forma cronológica y están correlacionadas en el tiempo. A menudo, estas dependencias temporales contienen información significativa y útil, por lo que, en los últimos años, ha surgido un gran interés por extraer dicha información. En particular, el área de investigación que se centra en esta tarea se denomina minería de datos de series temporales.La comunidad de investigadores de esta área se ha dedicado a resolver diferentes tareas como por ejemplo la clasificación, la predicción, el clustering o agrupamiento y la detección de valores atípicos/anomalías. Los valores atípicos o anomalías son aquellas observaciones que no siguen el comportamiento esperado en una serie temporal. Estos valores atípicos o anómalos suelen representar mediciones no deseadas o eventos de interés, y, por lo tanto, detectarlos suele ser relevante ya que pueden empeorar la calidad de los datos o reflejar fenómenos interesantes para el analista.Esta tesis presenta varias contribuciones en el campo de la minería de datos de series temporales, más específicamente sobre la detección de valores atípicos o anomalías. Estas contribuciones se pueden dividir en dos partes o bloques. Por una parte, la tesis presenta contribuciones en el campo de la detección de valores atípicos o anomalías en series temporales. Para ello, se ofrece una revisión de las técnicas en la literatura, y se presenta una nueva técnica de detección de anomalías en series temporales univariantes para la detección de fugas de agua, basada en el aprendizaje autosupervisado. Por otra parte, la tesis también introduce contribuciones relacionadas con el tratamiento de las series temporales con valores perdidos y demuestra su aplicabilidad en el campo de la detección de anomalías
Mining typical load profiles in buildings to support energy management in the smart city context
Mining typical load profiles in buildings to
drive energy management strategies is a fundamental
task
to be addressed in a smart
city environment. In this work,
a general framework
on load profiles characterisation in buildings based on the
recent
scientific
literature
is proposed
. The
process
relies on the combination of different pattern recognition and classification algorithms in order
to provide a robust insight of the energy usage patterns at different level
s and at different scales (from single building to stock of
buildings).
Several im
plications related to energy profiling in buildings, including tariff design, demand side management and
advanced energy diagnos
is are discussed.
Moreover,
a robust methodology
to mine typical energy patterns to
support advanced
energy
diagnosis
in buildin
gs is introduced
by analysing the monitored energy consumption of
a cooling/heating mechanical room
- …