192 research outputs found

    Time series data mining: preprocessing, analysis, segmentation and prediction. Applications

    Get PDF
    Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which significantly reduces the computational cost of previously proposed coral reef methods. Also, the optimisation of both objectives (clustering quality and approximation quality), which are in conflict, could be an interesting open challenge, which will be tackled in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences

    Biorefinery of agricultural residues by fractionation of their components through hydrothermal and organosolv processes

    Get PDF
    The combined production of the most abundant agricultural residues in Spain (viz. cereal straw, sunflower stalks, vine shoots, cotton stalks, olive, orange and peach tree prunings, and horticultural and related residues) amounts to over 50 million tons per year. Agricultural residues can be valorized by converting their components jointly (combustion, pyrolysis, gasification, liquefaction) or separately (fractionation). The most useful method for exploiting such components separately involves isolating cellulose fibres for papermaking purposes. In recent times, this valorization method has led to the development of the biorefining concept. Biorefining involves the fractionation or separation of the different lignocellulosic components of agricultural residues with a view totheir integral exploitation rather than the mere use of cellulose fibre to obtain paper products. Biorefining replaces the classical pulping methods based on Kraft, sulphite and soda reagents with a hydrothermal treatment followed by organosolv pulping. The hydrothermal treatment provides a liquid phase containing hemicellulose decomposition products [both oligomers and monomers (glucose, xylose, and arabinose)] and a solid phase rich in cellulose and lignin. By contrast, the organosolv process gives a solid fraction(pulp) and a residual liquid fraction containing ligninand other useful substances for various purposes

    Tumores óseos metafiso-diafisarios: reconstrucción con aloinjertos intercalares

    Get PDF
    Desde Juni o de 1987 hast a Juni o de 1991, el Departament o de Cirugía Ortopédica y Traumatología de la Clínica Universitaria de Navarra ha tratado 20 paciente s con tumore s óseos malignos localizados en región metafiso-diafisaria mediant e resección radical y reconstrucción con aloinjerto óseo crioconservado intercalar. El 74% presentaron unos resultados funcionales excelente s o buenos siguiendo los criterios de Mankin. La incorporación del injerto a nivel metafisario no ofreció problemas. A nivel diafisario los resultados fueron peores; hemos valorado el protocolo ISOLS, la fusión, reabsorción, fractura, acortamiento y fijación. Dentro de las complicaciones, la infección (10%), y desanclaje de la osteosíntesis (30%) son las más importantes.From June 1987 to June 1991, 20 patients with malignant bone tumors at the diaphysis or metaphysis of long bone hav e been treated by radical reseccion and intercalary allograft replacement. The functional results wer e excellent and good in 13 patients. The radiologic evaluation of the metaphysis wa s excellent in all case s as regards allograft incorporation. In the diaphysis results wer e worse; we evaluated following the ISOLS protocol (fusion, resortion, fracture, graft shortening and fixation). The main complications were deep infection (10%) and delayed union or non-union (30%)

    A mixed distribution to fix the threshold for Peak-Over-Threshold wave height estimation

    Get PDF
    Modelling extreme values distributions, such as wave height time series where the higher waves are much less frequent than the lower ones, has been tackled from the point of view of the Peak-OverThreshold (POT) methodologies, where modelling is based on those values higher than a threshold. This threshold is usually predefned by the user, while the rest of values are ignored. In this paper, we propose a new method to estimate the distribution of the complete time series, including both extreme and regular values. This methodology assumes that extreme values time series can be modelled by a normal distribution in a combination of a uniform one. The resulting theoretical distribution is then used to fx the threshold for the POT methodology. The methodology is tested in nine real-world time series collected in the Gulf of Alaska, Puerto Rico and Gibraltar (Spain), which are provided by the National Data Buoy Center (USA) and Puertos del Estado (Spain). By using the Kolmogorov-Smirnov statistical test, the results confrm that the time series can be modelled with this type of mixed distribution. Based on this, the return values and the confdence intervals for wave height in diferent periods of time are also calculated

    An Evolutionary Artificial Neural Network approach for spatio-temporal wave height time series reconstruction

    Get PDF
    This paper proposes a novel methodology for recovering missing time series data, a crucial task for subsequent Machine Learning (ML) analyses. The methodology is specifically applied to Significant Wave Height (SWH) time series in the field of marine engineering. The proposed approach involves two phases. Firstly, the SWH time series for each buoy is independently reconstructed using three transfer function models: regression-based, correlation-based, and distance-based. The distance-based transfer function exhibits the best overall performance. Secondly, Evolutionary Artificial Neural Networks (EANNs) are utilised for the final recovery of each time series, using as inputs highly correlated buoys that have been intermediately recovered. The EANNs are evolved considering two metrics, the novel squared error relevance area, which balances the importance of extreme and around-mean values, and the well-known mean squared error. The study considers SWH time series data from 15 buoys in two coastal zones in the United States. The results demonstrate that the distance-based transfer function is generally the best transfer function, and that EANNs outperform a range of state-of-the-art ML techniques in 12 out of the 15 buoys, with a number of connections comparable to linear models. Furthermore, the proposed methodology outperforms the two most popular approaches for time series reconstruction, BRITS and SAITS, for all buoys except one. Therefore, the proposed methodology provides a promising approach, which may be applied to time series from other fields, such as wind or solar energy farms in the field of green energy

    Técnicas estadísticas de análisis multivariante aplicadas a la interpretación de variables del cambio climático

    Get PDF
    Multivariate data analysis are a very useful tool in data series with a large number of variables, which often do not have a direct correlation, but which need to be interpreted and estimated. An example is all the data that may be related to climate change. Countries make measurements of many factors that can be cause or are a consequence of it. This provides very large databases, which are difficult to interpret. Analysis methods as Principal Component or Factor Analysis help the interpretation and grouping large number of parameters in simpler series. For this study, data from the World Bank were used, specifically for Latin American countries. Data were selected on agricultural land, forest area, protected land areas, population growth, total population, urban population growth and urban population. All of these seem to have some correlation, but the same is not so obvious and especially when it comes to measurements in different units. However, with Principal component method, we found groups that could be related to facts like the need for food, the need for land for housing and the loss of ecosystems. In the case of Factor Analysis, the groups in the factors found show concepts such as land use, total populations and population growth. In both analyzes the usefulness of these methods for the interpretation of large groups of data is evidenced.Los métodos multivariados son una herramienta de mucha aplicación en series de datos con gran cantidad de variables, las cuales muchas veces no tienen una correlación directa, pero de los cuales se necesita hacer interpretaciones y estimaciones.  Un ejemplo son todos los datos que pueden estar relacionados con el cambio climático. Los países hacen mediciones de muchos factores que pueden ser causa o son consecuencia del mismo. Esto aporta bases de datos muy grandes, que son de difícil interpretación.  Los métodos de análisis como el de componentes principales o el factorial, ayudan a la interpretación y agrupamiento de gran número de parámetros en series más sencillas. Para este estudio se utilizaron datos del Banco Mundial, en específico para los países de América Latina. Se eligieron datos sobre tierras agrícolas, área selvática, áreas terrestres protegidas, crecimiento de la población, población total, crecimiento de la población urbana y población urbana. Todas estas parecen tener cierta correlación, pero la misma no es tan evidente y en especial cuando se trata de mediciones en diferentes unidades.  Sin embargo con el método de componentes principales se lograron encontrar grupos que se pueden relacionar con la necesidad de alimento,  con la necesidad de tierra para vivienda y con la pérdida de ecosistemas. En el caso del análisis factorial, los grupos en los factores encontrados muestran conceptos como el uso de la tierra, las poblaciones totales y los crecimientos poblacionales. En ambos análisis se evidencia la utilidad de estos métodos para interpretación de grandes grupos de datos

    Hybridization of neural network models for the prediction of Extreme Significant Wave Height segments

    Get PDF
    This work proposes a hybrid methodology for the detection and prediction of Extreme Significant Wave Height (ESWH) periods in oceans. In a first step, wave height time series is approximated by a labeled sequence of segments, which is obtained using a genetic algorithm in combination with a likelihood-based segmentation (GA+LS). Then, an artificial neural network classifier with hybrid basis functions is trained with a multiobjetive evolutionary algorithm (MOEA) in order to predict the occurrence of future ESWH segments based on past values. The methodology is applied to a buoy in the Gulf of Alaska and another one in Puerto Rico. The results show that the GA+LS is able to segment and group the ESWH values, and the neural network models, obtained by the MOEA, make good predictions maintaining a balance between global accuracy and minimum sensitivity for the detection of ESWH events. Moreover, hybrid neural networks are shown to lead to better results than pure models

    Técnicas estadísticas de análisis multivariante aplicadas a la interpretación de variables del cambio climático

    Get PDF
    Los métodos multivariados son una herramienta de mucha aplicación en series de datos con gran cantidad de variables, las cuales muchas veces no tienen una correlación directa, pero de los cuales se necesita hacer interpretaciones y estimaciones.  Un ejemplo son todos los datos que pueden estar relacionados con el cambio climático. Los países hacen mediciones de muchos factores que pueden ser causa o son consecuencia del mismo. Esto aporta bases de datos muy grandes, que son de difícil interpretación.  Los métodos de análisis como el de componentes principales o el factorial, ayudan a la interpretación y agrupamiento de gran número de parámetros en series más sencillas. Para este estudio se utilizaron datos del Banco Mundial, en específico para los países de América Latina. Se eligieron datos sobre tierras agrícolas, área selvática, áreas terrestres protegidas, crecimiento de la población, población total, crecimiento de la población urbana y población urbana. Todas estas parecen tener cierta correlación, pero la misma no es tan evidente y en especial cuando se trata de mediciones en diferentes unidades.  Sin embargo con el método de componentes principales se lograron encontrar grupos que se pueden relacionar con la necesidad de alimento,  con la necesidad de tierra para vivienda y con la pérdida de ecosistemas. En el caso del análisis factorial, los grupos en los factores encontrados muestran conceptos como el uso de la tierra, las poblaciones totales y los crecimientos poblacionales. En ambos análisis se evidencia la utilidad de estos métodos para interpretación de grandes grupos de datos

    Hemangioendotelioma epiteloide óseo multicentrico : a propósito de un caso

    Get PDF
    El Hemangioendotelioma epiteloide óseo (HEEO) es un tumor vascular, infrecuente, con apariencia epiteloide que si no se conoce, puede confundirse con un carcinoma metastásico. Presentamos un caso de HEEO que por las características clínicas del paciente, se podría confundir con lesiones metastásicas de un carcinoma de origen desconocido. Se han descrito pocos casos de HEEO. Clínicamente el tumor cursa con un crecimiento lento y aunque el comportamiento es benigno, se han descrito metástasis a diversos niveles. Suele ser de loralización multicéntrica, teniendo especial predilección en los huesos de una extremidad. Esta característica hace necesaria una cirugía radical en estos pacientes. En nuestro caso el tratamiento realizado, aunque agresivo, fue efectivo, ya que el paciente ha vuelto a su actividad normal, una vez implantada la ortesis.The epitheloid hemangioendothelioma of bone is an infrequent vascular tumor which can be often mistaken for a metastatic carcinoma. We report a case mistaken for a metastatic carcinoma of unknown origin due to the clinical characteristics of the patient. To date, few cases of epitheloid hemangioendothelioma of bone have been described. The tumor shows a low growth rate. Although the tumor has a benign behavior, cases with metastatic spreadming have been reported. Often the tumor is multicentric with special affinitty for the bones of the extremities. This fact allows radial surgery as the best treatment choice. In our case the treatment, supracondylar amputation, was aggresive but effective, since the patient returned to his daily activities after application of the orthesis
    corecore