1,115 research outputs found
ISBIS 2016: Meeting on Statistics in Business and Industry
This Book includes the abstracts of the talks presented at the 2016 International Symposium on Business and Industrial Statistics, held at Barcelona, June 8-10, 2016, hosted at the Universitat Politècnica de Catalunya - Barcelona TECH, by the Department of Statistics and Operations Research. The location of the meeting was at ETSEIB Building (Escola Tecnica Superior d'Enginyeria Industrial) at Avda Diagonal 647.
The meeting organizers celebrated the continued success of ISBIS and ENBIS society, and the meeting draw together the international community of statisticians, both academics and industry professionals, who share the goal of making statistics the foundation for decision making in business and related applications. The Scientific Program Committee was constituted by:
David Banks, Duke University
AmĂlcar Oliveira, DCeT - Universidade Aberta and CEAUL
Teresa A. Oliveira, DCeT - Universidade Aberta and CEAUL
Nalini Ravishankar, University of Connecticut
Xavier Tort Martorell, Universitat Politécnica de Catalunya, Barcelona TECH
Martina Vandebroek, KU Leuven
Vincenzo Esposito Vinzi, ESSEC Business Schoo
Time series data mining: preprocessing, analysis, segmentation and prediction. Applications
Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which
significantly reduces the computational cost of previously proposed coral reef methods.
Also, the optimisation of both objectives (clustering quality and approximation quality),
which are in conflict, could be an interesting open challenge, which will be tackled
in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences
Recommended from our members
Simulation of sea-state sequences
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The present PhD study, in its first part, uses artificial neural networks (ANNs), an optimization technique called simulated annealing, and statistics to simulate the significant wave height (Hs) and mean zero-up-crossing period ( ) of 3-hourly sea-states of a location in the North East Pacific using a proposed distribution called hepta-parameter spline distribution for the conditional distribution of Hs or given some inputs. Two different seven- network sets of ANNs for the simulation and prediction of Hs and were trained using 20-year observed Hs’s and ’s. The preceding Hs’s and ’s were the most important inputs given to the networks, but the starting day of the simulated period was also necessary. However, the code replaced the day with the corresponding time and the season. The networks were trained by a simulated annealing algorithm and the outputs of the two sets of networks were used for calculating the parameters of the probability density function (pdf) of the proposed hepta-parameter distribution. After the calculation of the seven parameters of the pdf from the network outputs, the Hs and of the future sea-state is predicted by generating random numbers from the corresponding pdf.
In another part of the thesis, vertical piles have been studied with the goal of identifying the range of sea-states suitable for the safe pile driving operation. Pile configuration including the non-linear foundation and the gap between the pile and the pile sleeve shims were modeled using the finite elements analysis facilities within ABAQUS. Dynamic analyses of the system for a sea-state characterized by Hs and and modeled as a combination of several wave components were performed. A table of safe and unsafe sea-states was generated by repeating the analysis for various sea-states. If the prediction for a particular sea-state is repeated N times of which n times prove to be safe, then it could be said that the predicted sea-state is safe with the probability of 100(n/N).
The last part of the thesis deals with the Hs return values. The return value is a widely used measure of wave extremes having an important role in determining the design wave used in the design of maritime structures. In this part, Hs return value was calculated demonstrating another application of the above simulation of future 3-hourly Hs’s. The maxima method for calculating return values was applied in such a way that avoids the conventional need for unrealistic assumptions. The significant wave height return value has also been calculated using the convolution concept from a model presented by Anderson et al. (2001)
Agglomerative Clustering with Threshold Optimization via Extreme Value Theory
Clustering is a critical part of many tasks and, in most applications, the number of clusters in the data are unknown and must be estimated. This paper presents an Extreme Value Theory-based approach to threshold selection for clustering, proving that the “correct” linkage distances must follow a Weibull distribution for smooth feature spaces. Deep networks and their associated deep features have transformed many aspects of learning, and this paper shows they are consistent with our extreme-linkage theory and provide Unreasonable Clusterability. We show how our novel threshold selection can be applied to both classic agglomerative clustering and the more recent FINCH (First Integer Neighbor Clustering Hierarchy) algorithm. Our evaluation utilizes over a dozen different large-scale vision datasets/subsets, including multiple face-clustering datasets and ImageNet for both in-domain and, more importantly, out-of-domain object clustering. Across multiple deep features clustering tasks with very different characteristics, our novel automated threshold selection performs well, often outperforming state-of-the-art clustering techniques even when they select parameters on the test set
Dynamic non-linear system modelling using wavelet-based soft computing techniques
The enormous number of complex systems results in the necessity of high-level and cost-efficient
modelling structures for the operators and system designers. Model-based approaches offer a very
challenging way to integrate a priori knowledge into the procedure. Soft computing based models
in particular, can successfully be applied in cases of highly nonlinear problems. A further reason
for dealing with so called soft computational model based techniques is that in real-world cases,
many times only partial, uncertain and/or inaccurate data is available.
Wavelet-Based soft computing techniques are considered, as one of the latest trends in system
identification/modelling. This thesis provides a comprehensive synopsis of the main wavelet-based
approaches to model the non-linear dynamical systems in real world problems in conjunction with
possible twists and novelties aiming for more accurate and less complex modelling structure.
Initially, an on-line structure and parameter design has been considered in an adaptive Neuro-
Fuzzy (NF) scheme. The problem of redundant membership functions and consequently fuzzy
rules is circumvented by applying an adaptive structure. The growth of a special type of Fungus
(Monascus ruber van Tieghem) is examined against several other approaches for further
justification of the proposed methodology.
By extending the line of research, two Morlet Wavelet Neural Network (WNN) structures have
been introduced. Increasing the accuracy and decreasing the computational cost are both the
primary targets of proposed novelties. Modifying the synoptic weights by replacing them with
Linear Combination Weights (LCW) and also imposing a Hybrid Learning Algorithm (HLA)
comprising of Gradient Descent (GD) and Recursive Least Square (RLS), are the tools utilised for
the above challenges. These two models differ from the point of view of structure while they share
the same HLA scheme. The second approach contains an additional Multiplication layer, plus its
hidden layer contains several sub-WNNs for each input dimension. The practical superiority of
these extensions is demonstrated by simulation and experimental results on real non-linear
dynamic system; Listeria Monocytogenes survival curves in Ultra-High Temperature (UHT)
whole milk, and consolidated with comprehensive comparison with other suggested schemes.
At the next stage, the extended clustering-based fuzzy version of the proposed WNN schemes, is
presented as the ultimate structure in this thesis. The proposed Fuzzy Wavelet Neural network
(FWNN) benefitted from Gaussian Mixture Models (GMMs) clustering feature, updated by a
modified Expectation-Maximization (EM) algorithm. One of the main aims of this thesis is to illustrate how the GMM-EM scheme could be used not only for detecting useful knowledge from
the data by building accurate regression, but also for the identification of complex systems.
The structure of FWNN is based on the basis of fuzzy rules including wavelet functions in the
consequent parts of rules. In order to improve the function approximation accuracy and general
capability of the FWNN system, an efficient hybrid learning approach is used to adjust the
parameters of dilation, translation, weights, and membership. Extended Kalman Filter (EKF) is
employed for wavelet parameters adjustment together with Weighted Least Square (WLS) which
is dedicated for the Linear Combination Weights fine-tuning. The results of a real-world
application of Short Time Load Forecasting (STLF) further re-enforced the plausibility of the
above technique
Support vector machine based classification in condition monitoring of induction motors
Continuous and trouble-free operation of induction motors is an essential part of modern power and production plants. Faults and failures of electrical machinery may cause remarkable economical losses but also highly dangerous situations. In addition to analytical and knowledge-based models, application of data-based models has established a firm position in the induction motor fault diagnostics during the last decade. For example, pattern recognition with Neural Networks (NN) is widely studied.
Support Vector Machine (SVM) is a novel machine learning method introduced in early 90's. It is based on the statistical learning theory presented by V.N. Vapnik, and it has been successfully applied to numerous classification and pattern recognition problems such as text categorization, image recognition and bioinformatics. SVM based classifier is built to minimize the structural misclassification risk, whereas conventional classification techniques often apply minimization of the empirical risk. Therefore, SVM is claimed to lead enhanced generalisation properties. Further, application of SVM results in the global solution for a classification problem. Thirdly, SVM based classification is attractive, because its efficiency does not directly depend on the dimension of classified entities. This property is very useful in fault diagnostics, because the number of fault classification features does not have to be drastically limited. However, SVM has not yet been widely studied in the area of fault diagnostics. Specifically, in the condition monitoring of induction motor, it does not seem to have been considered before this research.
In this thesis, a SVM based classification scheme is designed for different tasks in induction motor fault diagnostics and for partial discharge analysis of insulation condition monitoring. Several variables are compared as fault indicators, and forces on rotor are found to be important in fault detection instead of motor current that is currently widely studied. The measurement of forces is difficult, but easily measurable vibrations are directly related to the forces. Hence, vibration monitoring is considered in more detail as the medium for the motor fault diagnostics.
SVM classifiers are essentially 2-class classifiers. In addition to the induction motor fault diagnostics, the results of this thesis cover various methods for coupling SVMs for carrying out a multi-class classification problem.reviewe
- …