192 research outputs found
Time series data mining: preprocessing, analysis, segmentation and prediction. Applications
Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which
significantly reduces the computational cost of previously proposed coral reef methods.
Also, the optimisation of both objectives (clustering quality and approximation quality),
which are in conflict, could be an interesting open challenge, which will be tackled
in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences
Biorefinery of agricultural residues by fractionation of their components through hydrothermal and organosolv processes
The combined production of the most abundant agricultural residues in Spain (viz. cereal straw, sunflower stalks, vine shoots, cotton stalks, olive, orange and peach tree prunings, and horticultural and related residues) amounts to over 50 million tons per year. Agricultural residues can be valorized by converting their components jointly (combustion, pyrolysis, gasification, liquefaction) or separately (fractionation). The most useful method for exploiting such components separately involves isolating cellulose fibres for papermaking purposes. In recent times, this valorization method has led to the development of the biorefining concept. Biorefining involves the fractionation or separation of the different lignocellulosic components of agricultural residues with a view totheir integral exploitation rather than the mere use of cellulose fibre to obtain paper products. Biorefining replaces the classical pulping methods based on Kraft, sulphite and soda reagents with a hydrothermal treatment followed by organosolv pulping. The hydrothermal treatment provides a liquid phase containing hemicellulose decomposition products [both oligomers and monomers (glucose, xylose, and arabinose)] and a solid phase rich in cellulose and lignin. By contrast, the organosolv process gives a solid fraction(pulp) and a residual liquid fraction containing ligninand other useful substances for various purposes
Tumores óseos metafiso-diafisarios: reconstrucción con aloinjertos intercalares
Desde Juni o de 1987 hast a Juni o de 1991, el Departament o de Cirugía
Ortopédica y Traumatología de la Clínica Universitaria de Navarra ha tratado 20 paciente
s con tumore s óseos malignos localizados en región metafiso-diafisaria mediant e
resección radical y reconstrucción con aloinjerto óseo crioconservado intercalar. El 74%
presentaron unos resultados funcionales excelente s o buenos siguiendo los criterios de
Mankin. La incorporación del injerto a nivel metafisario no ofreció problemas. A nivel
diafisario los resultados fueron peores; hemos valorado el protocolo ISOLS, la fusión,
reabsorción, fractura, acortamiento y fijación. Dentro de las complicaciones, la infección
(10%), y desanclaje de la osteosíntesis (30%) son las más importantes.From June 1987 to June 1991, 20 patients with malignant bone tumors at
the diaphysis or metaphysis of long bone hav e been treated by radical reseccion and
intercalary allograft replacement. The functional results wer e excellent and good in 13
patients. The radiologic evaluation of the metaphysis wa s excellent in all case s as regards
allograft incorporation. In the diaphysis results wer e worse; we evaluated following
the ISOLS protocol (fusion, resortion, fracture, graft shortening and fixation). The
main complications were deep infection (10%) and delayed union or non-union (30%)
A mixed distribution to fix the threshold for Peak-Over-Threshold wave height estimation
Modelling extreme values distributions, such as wave height time series where the higher waves are
much less frequent than the lower ones, has been tackled from the point of view of the Peak-OverThreshold (POT) methodologies, where modelling is based on those values higher than a threshold.
This threshold is usually predefned by the user, while the rest of values are ignored. In this paper,
we propose a new method to estimate the distribution of the complete time series, including both
extreme and regular values. This methodology assumes that extreme values time series can be
modelled by a normal distribution in a combination of a uniform one. The resulting theoretical
distribution is then used to fx the threshold for the POT methodology. The methodology is tested
in nine real-world time series collected in the Gulf of Alaska, Puerto Rico and Gibraltar (Spain), which
are provided by the National Data Buoy Center (USA) and Puertos del Estado (Spain). By using the
Kolmogorov-Smirnov statistical test, the results confrm that the time series can be modelled with
this type of mixed distribution. Based on this, the return values and the confdence intervals for wave
height in diferent periods of time are also calculated
An Evolutionary Artificial Neural Network approach for spatio-temporal wave height time series reconstruction
This paper proposes a novel methodology for recovering missing time series data, a crucial task for
subsequent Machine Learning (ML) analyses. The methodology is specifically applied to Significant
Wave Height (SWH) time series in the field of marine engineering. The proposed approach involves two
phases. Firstly, the SWH time series for each buoy is independently reconstructed using three transfer
function models: regression-based, correlation-based, and distance-based. The distance-based transfer
function exhibits the best overall performance. Secondly, Evolutionary Artificial Neural Networks
(EANNs) are utilised for the final recovery of each time series, using as inputs highly correlated buoys
that have been intermediately recovered. The EANNs are evolved considering two metrics, the novel
squared error relevance area, which balances the importance of extreme and around-mean values, and
the well-known mean squared error. The study considers SWH time series data from 15 buoys in two
coastal zones in the United States. The results demonstrate that the distance-based transfer function
is generally the best transfer function, and that EANNs outperform a range of state-of-the-art ML
techniques in 12 out of the 15 buoys, with a number of connections comparable to linear models.
Furthermore, the proposed methodology outperforms the two most popular approaches for time
series reconstruction, BRITS and SAITS, for all buoys except one. Therefore, the proposed methodology
provides a promising approach, which may be applied to time series from other fields, such as wind
or solar energy farms in the field of green energy
Técnicas estadísticas de análisis multivariante aplicadas a la interpretación de variables del cambio climático
Multivariate data analysis are a very useful tool in data series with a large number of variables, which often do not have a direct correlation, but which need to be interpreted and estimated. An example is all the data that may be related to climate change. Countries make measurements of many factors that can be cause or are a consequence of it. This provides very large databases, which are difficult to interpret. Analysis methods as Principal Component or Factor Analysis help the interpretation and grouping large number of parameters in simpler series. For this study, data from the World Bank were used, specifically for Latin American countries. Data were selected on agricultural land, forest area, protected land areas, population growth, total population, urban population growth and urban population. All of these seem to have some correlation, but the same is not so obvious and especially when it comes to measurements in different units. However, with Principal component method, we found groups that could be related to facts like the need for food, the need for land for housing and the loss of ecosystems. In the case of Factor Analysis, the groups in the factors found show concepts such as land use, total populations and population growth. In both analyzes the usefulness of these methods for the interpretation of large groups of data is evidenced.Los métodos multivariados son una herramienta de mucha aplicación en series de datos con gran cantidad de variables, las cuales muchas veces no tienen una correlación directa, pero de los cuales se necesita hacer interpretaciones y estimaciones. Un ejemplo son todos los datos que pueden estar relacionados con el cambio climático. Los países hacen mediciones de muchos factores que pueden ser causa o son consecuencia del mismo. Esto aporta bases de datos muy grandes, que son de difícil interpretación. Los métodos de análisis como el de componentes principales o el factorial, ayudan a la interpretación y agrupamiento de gran número de parámetros en series más sencillas. Para este estudio se utilizaron datos del Banco Mundial, en específico para los países de América Latina. Se eligieron datos sobre tierras agrícolas, área selvática, áreas terrestres protegidas, crecimiento de la población, población total, crecimiento de la población urbana y población urbana. Todas estas parecen tener cierta correlación, pero la misma no es tan evidente y en especial cuando se trata de mediciones en diferentes unidades. Sin embargo con el método de componentes principales se lograron encontrar grupos que se pueden relacionar con la necesidad de alimento, con la necesidad de tierra para vivienda y con la pérdida de ecosistemas. En el caso del análisis factorial, los grupos en los factores encontrados muestran conceptos como el uso de la tierra, las poblaciones totales y los crecimientos poblacionales. En ambos análisis se evidencia la utilidad de estos métodos para interpretación de grandes grupos de datos
Hybridization of neural network models for the prediction of Extreme Significant Wave Height segments
This work proposes a hybrid methodology for the
detection and prediction of Extreme Significant Wave Height
(ESWH) periods in oceans. In a first step, wave height time
series is approximated by a labeled sequence of segments, which
is obtained using a genetic algorithm in combination with
a likelihood-based segmentation (GA+LS). Then, an artificial
neural network classifier with hybrid basis functions is trained
with a multiobjetive evolutionary algorithm (MOEA) in order
to predict the occurrence of future ESWH segments based on
past values. The methodology is applied to a buoy in the Gulf of
Alaska and another one in Puerto Rico. The results show that
the GA+LS is able to segment and group the ESWH values, and
the neural network models, obtained by the MOEA, make good
predictions maintaining a balance between global accuracy and
minimum sensitivity for the detection of ESWH events. Moreover,
hybrid neural networks are shown to lead to better results than
pure models
Técnicas estadísticas de análisis multivariante aplicadas a la interpretación de variables del cambio climático
Los métodos multivariados son una herramienta de mucha aplicación en series de datos con gran cantidad de variables, las cuales muchas veces no tienen una correlación directa, pero de los cuales se necesita hacer interpretaciones y estimaciones. Un ejemplo son todos los datos que pueden estar relacionados con el cambio climático. Los países hacen mediciones de muchos factores que pueden ser causa o son consecuencia del mismo. Esto aporta bases de datos muy grandes, que son de difícil interpretación. Los métodos de análisis como el de componentes principales o el factorial, ayudan a la interpretación y agrupamiento de gran número de parámetros en series más sencillas. Para este estudio se utilizaron datos del Banco Mundial, en específico para los países de América Latina. Se eligieron datos sobre tierras agrícolas, área selvática, áreas terrestres protegidas, crecimiento de la población, población total, crecimiento de la población urbana y población urbana. Todas estas parecen tener cierta correlación, pero la misma no es tan evidente y en especial cuando se trata de mediciones en diferentes unidades. Sin embargo con el método de componentes principales se lograron encontrar grupos que se pueden relacionar con la necesidad de alimento, con la necesidad de tierra para vivienda y con la pérdida de ecosistemas. En el caso del análisis factorial, los grupos en los factores encontrados muestran conceptos como el uso de la tierra, las poblaciones totales y los crecimientos poblacionales. En ambos análisis se evidencia la utilidad de estos métodos para interpretación de grandes grupos de datos
Hemangioendotelioma epiteloide óseo multicentrico : a propósito de un caso
El Hemangioendotelioma epiteloide óseo (HEEO) es un tumor vascular, infrecuente,
con apariencia epiteloide que si no se conoce, puede confundirse con un carcinoma
metastásico. Presentamos un caso de HEEO que por las características clínicas del paciente, se
podría confundir con lesiones metastásicas de un carcinoma de origen desconocido. Se han
descrito pocos casos de HEEO. Clínicamente el tumor cursa con un crecimiento lento y aunque
el comportamiento es benigno, se han descrito metástasis a diversos niveles. Suele ser de
loralización multicéntrica, teniendo especial predilección en los huesos de una extremidad. Esta
característica hace necesaria una cirugía radical en estos pacientes. En nuestro caso el tratamiento
realizado, aunque agresivo, fue efectivo, ya que el paciente ha vuelto a su actividad
normal, una vez implantada la ortesis.The epitheloid hemangioendothelioma of bone is an infrequent vascular tumor
which can be often mistaken for a metastatic carcinoma. We report a case mistaken for a metastatic
carcinoma of unknown origin due to the clinical characteristics of the patient. To date,
few cases of epitheloid hemangioendothelioma of bone have been described. The tumor shows
a low growth rate. Although the tumor has a benign behavior, cases with metastatic spreadming
have been reported. Often the tumor is multicentric with special affinitty for the bones of the
extremities. This fact allows radial surgery as the best treatment choice. In our case the treatment,
supracondylar amputation, was aggresive but effective, since the patient returned to his
daily activities after application of the orthesis
- …