54 research outputs found

    Time series data mining: preprocessing, analysis, segmentation and prediction. Applications

    Get PDF
    Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which significantly reduces the computational cost of previously proposed coral reef methods. Also, the optimisation of both objectives (clustering quality and approximation quality), which are in conflict, could be an interesting open challenge, which will be tackled in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences

    On the use of evolutionary time series analysis for segmenting paleoclimate data

    Get PDF
    Recent studies propose that different dynamical systems, such as climate, ecological and financial systems, among others, present critical transition points named to as tipping points (TPs). Climate TPs can severely affect millions of lives on Earth so that an active scientific community is working on finding early warning signals. This paper deals with the development of a time series segmentation algorithm for paleoclimate data in order to find segments sharing common statistical patterns. The proposed algorithm uses a clustering-based approach for evaluating the solutions and six statistical features, most of which have been previously considered in the detection of early warning signals in paleoclimate TPs. Due to the limitations of classical statistical methods, we propose the use of a genetic algorithm to automatically segment the series, together with a method to compare the segmentations. The final segments provided by the algorithm are used to construct a prediction model, whose promising results show the importance of segmentation for improving the understanding of a time series

    Detection of early warning signals in paleoclimate data using a genetic time series segmentation algorithm

    Get PDF
    This paper proposes a time series segmentation algorithm combining a clustering technique and a genetic algorithm to automatically find segments sharing common statistical characteristics in paleoclimate time series. The segments are transformed into a six-dimensional space composed of six statistical measures, most of which have been previously considered in the detection of warning signals of critical transitions. Experimental results show that the proposed approach applied to paleoclimate data could effectively analyse Dansgaard–Oeschger (DO) events and uncover commonalities and differences in their statistical and possibly their dynamical characterisation. In particular, warning signals were robustly detected in the GISP2 and NGRIP δ18O ice core data for several DO events (e.g. DO 1, 4, 8 and 12) in the form of an order of magnitude increase in variance, autocorrelation and mean square distance from a linear approximation (i.e. the mean square error). The increase in mean square error, suggesting nonlinear behaviour, has been found to correspond with an increase in variance prior to several DO events for ∼90 % of the algorithm runs for the GISP2 δ18O dataset and for ∼100 % of the algorithm runs for the NGRIP δ18O dataset. The proposed approach applied to well-known dynamical systems and paleoclimate datasets provides a novel visualisation tool in the field of climate time series analysi

    Segmentación de series temporales mediante un algoritmo multiobjetivo evolutivo

    Get PDF
    Premio extraordinario de Trabajo Fin de Máster curso 2015-2016. Ingeniería Informátic

    9th International Conference, HAIS 2014, Salamanca, Spain, June 11-13, 2014. Proceedings

    Get PDF
    This volume constitutes the proceedings of the 9th International Conference on Hybrid Artificial Intelligent Systems, HAIS 2014, held in Salamanca, Spain, in June 2014. The 61 papers published in this volume were carefully reviewed and selected from 199 submissions. They are organized in topical sessions on HAIS applications; data mining and knowledge discovery; video and image analysis; bio-inspired models and evolutionary computation; learning algorithms; hybrid intelligent systems for data mining and applications and classification and cluster analysis

    A mixed distribution to fix the threshold for Peak-Over-Threshold wave height estimation

    Get PDF
    Modelling extreme values distributions, such as wave height time series where the higher waves are much less frequent than the lower ones, has been tackled from the point of view of the Peak-OverThreshold (POT) methodologies, where modelling is based on those values higher than a threshold. This threshold is usually predefned by the user, while the rest of values are ignored. In this paper, we propose a new method to estimate the distribution of the complete time series, including both extreme and regular values. This methodology assumes that extreme values time series can be modelled by a normal distribution in a combination of a uniform one. The resulting theoretical distribution is then used to fx the threshold for the POT methodology. The methodology is tested in nine real-world time series collected in the Gulf of Alaska, Puerto Rico and Gibraltar (Spain), which are provided by the National Data Buoy Center (USA) and Puertos del Estado (Spain). By using the Kolmogorov-Smirnov statistical test, the results confrm that the time series can be modelled with this type of mixed distribution. Based on this, the return values and the confdence intervals for wave height in diferent periods of time are also calculated

    Classification and detection of Critical Transitions: from theory to data

    Get PDF
    From population collapses to cell-fate decision, critical phenomena are abundant in complex real-world systems. Among modelling theories to address them, the critical transitions framework gained traction for its purpose of determining classes of critical mechanisms and identifying generic indicators to detect and alert them (“early warning signals”). This thesis contributes to such research field by elucidating its relevance within the systems biology landscape, by providing a systematic classification of leading mechanisms for critical transitions, and by assessing the theoretical and empirical performance of early warning signals. The thesis thus bridges general results concerning the critical transitions field – possibly applicable to multidisciplinary contexts – and specific applications in biology and epidemiology, towards the development of sound risk monitoring system
    corecore