Search CORE

238 research outputs found

Resampling Methods for Unsupervised Learning from Sample Data

Author: Ulrich M&#246
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

IntechOpen

Multivariate and qualitative data-analysis for monitoring, diagnosis and control of sequencing batch reactors for wastewater treatment

Author: Villez Kris
Publication venue: Ghent University. Faculty of Bioscience Engineering
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography

Generalized rule antecedent structure for TSK type of dynamic models: Structure identification and parameter estimation

Author: Su Ming
Publication venue
Publication date: 01/12/2009
Field of study

Scope and Method of Study: A novel rule antecedent structure is proposed to generalize TSK type of dynamic fuzzy models to deal with the problem of curse of dimensionality in conventional TSK fuzzy models. The proposed antecedent structure uses only nonlinear variables, which directly reduce the number of possible rules by reducing antecedent dimension. Additionally, one more degree of freedom is introduced to design antecedents to cover an antecedent space more efficiently, which further reduces the number of rules. The resultant GTSK model is identified in two stages. A novel recursive estimation based on spatially rearranged data is used to determine the consequent and antecedent variables. Model parameter values are obtained from partitioned antecedent space, which is the result of solving a series of splitting and regression problems.Findings and Conclusions: The proposed rule antecedent structure is able to substantially reduce the complexity in a TSK type of dynamic model. The proposed dynamic order determination and nonlinear component detection methods are tested to be able to identify model structures and shown to be less sensitive to noise than other methods. Instead of directly estimating model parameters, the proposed approach solves a series of splitting and regression problems to partition the antecedent space as well as compute the antecedent and consequent parameters. The resultant antecedent partition is meaningful. The boundaries divide an antecedent space into regions, within which a linear relation is valid. The resultant GTSK model is tested on several nonlinear dynamic processes and shown to be more interpretable and informative than other modeling methods without loss of accuracy

SHAREOK repository

Recommended from our members

Bearing condition monitoring using acoustic emission and vibration: The systems approach

Author: Kaewkongka Tonphong
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2002
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.This thesis proposes a bearing condition monitoring system using acceleration and acoustic emission (AE) signals. Bearings are perhaps the most omnipresent machine elements and their condition is often critical to the success of an operation or process. Consequently, there is a great need for a timely knowledge of the health status of bearings. Generally, bearing monitoring is the prediction of the component's health or status based on signal detection, processing and classification in order to identify the causes of the problem. As the monitoring system uses both acceleration and acoustic emission signals, it is considered a multi-sensor system. This has the advantage that not only do the two sensors provide increased reliability they also permit a larger range of rotating speeds to be monitored successfully. When more than one sensor is used, if one fails to work properly the other is still able to provide adequate monitoring. Vibration techniques are suitable for higher rotating speeds whilst acoustic emission techniques for low rotating speeds. Vibration techniques investigated in this research concern the use of the continuous wavelet transform (CWT), a joint time- and frequency domain method, This gives a more accurate representation of the vibration phenomenon than either time-domain analysis or frequency- domain analysis. The image processing technique, called binarising, is performed to produce binary image from the CWT transformed image in order to reduce computational time for classification. The back-propagation neural network (BPNN) is used for classification. The AE monitoring techniques investigated can be categorised, based on the features used, into: 1) the traditional AE parameters of energy, event duration and peak amplitude and 2) the statistical parameters estimated from the Weibull distribution of the inter-arrival times of AE events in what is called the STL method. Traditional AE parameters of peak amplitude, energy and event duration are extracted from individual AE events. These events are then ordered, selected and normalised before the selected events are displayed in a three-dimensional Cartesian feature space in terms of the three AE parameters as axes. The fuzzy C-mean clustering technique is used to establish the cluster centres as signatures for different machine conditions. A minimum distance classifier is then used to classify incoming AE events into the different machine conditions. The novel STL method is based on the detection of inter-arrival times of successive AE events. These inter-arrival times follow a Weibull distribution. The method provides two parameters: STL and L63 that are derived from the estimated Weibull parameters of the distribution's shape (y), characteristic life (0) and guaranteed life (to). It is found that STL and 43 are related hyperbolically. In addition, the STL value is found to be sensitive to bearing wear, the load applied to the bearing and the bearing rotating speed. Of the three influencing factors, bearing wear has the strongest influence on STL and L63. For the proposed bearing condition monitoring system to work, the effects of load and speed on STL need to be compensated. These issues are resolved satisfactorily in the project.Royal Thai government and the Department of Physics, Faculty of Science, Chulalongkorn Universit

Brunel University Research Archive

Investigating Abstract Algebra Students' Representational Fluency and Example-Based Intuitions

Author: Lajos Jessica
Publication venue
Publication date: 04/08/2021
Field of study

The quotient group concept is a difficult for many students getting started in abstract algebra (Dubinsky et al., 1994; Melhuish, Lew, Hicks, and Kandasamy, 2020). The first study in this thesis explores an undergraduate, a first-year graduate, and second-year graduate students' representational fluency as they work on a "collapsing structure", quotient, task across multiple registers: Cayley tables, group presentations, Cayley digraphs to Schreier coset digraphs, and formal-symbolic mappings. The second study characterizes the (partial) make-up of two graduate learners' example-based intuitions related to orbit-stabilizer relationships induced by group actions. The (partial) make-up of a learner's intuition as a quantifiable object was defined in this thesis as a point viewed in R17, 12 variable values collected with a new prototype instrument, The Non-Creative versus Creative Forms of Intuition Survey (NCCFIS), 2 values for confidence in truth value, and 3 additional variables: error to non-error type, unique versus common, and network thinking. The revised Fuzzy C-Means Clustering Algorithm (FCM) by Bezdek et al. (1981) was used to classify the (partial) make-up of learners' reported intuitions into fuzzy sets based on attribute similarity

SHAREOK repository

Recommended from our members

Financial predictions using intelligent systems

Author: Milanovic Vlade
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2007
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis presents a collection of practical techniques for analysing various market properties in order to design advanced self-evolving trading systems based on neural networks combined with a genetic algorithm optimisation approach. Nonlinear multivariate statistical models have gained increasing importance in financial time series analysis, as it is very hard to fmd statistically significant market inefficiencies using standard linear modes. Nonlinear models capture more of the underlying dynamics of these high dimensional noisy systems than traditional models, whilst at the same time making fewer restrictive assumptions about them. These adaptive trading systems can extract information about associated time varying processes that may not be readily captured by traditional models. In order to characterise the fmancial time series in terms of its dynamic nature, this research employs various methods such as fractal analysis, chaos theory and dynamical recurrence analysis. These techniques are used for evaluating whether markets are stochastic and deterministic or nonlinear and chaotic, and to discover regularities that are completely hidden in these time series and not detectable using conventional analysis. Particular emphasis is placed on examining the feasibility of prediction in fmancial time series and the analysis of extreme market events. The market's fractal structure and log-periodic oscillations, typical of periods before extreme events occur, are revealed through recurrence plots. Recurrence qualification analysis indicated a strong presence of structure, recurrence and determinism in the fmancial time series studied. Crucial fmancial time series transition periods were also detected. This research performs several tests on a large number of US and European stocks using methodologies inspired by both fundamental analysis and technical trading rules. Results from the tests show that profitable trading models utilising advanced nonlinear trading systems can be created after accounting for realistic transaction costs. The return achieved by applying the trading model to a portfolio of real price series differs significantly from that achieved by applying it to a randomly generated price series. In some cases, these models are compared against simpler alternative approaches to ensure that there is an added value in the use of these more complex models. The superior performance of multivariate nonlinear models is also demonstrated. The long-short trading strategies performed well in both bull and bear markets, as well as in a sideways market, showing a great degree of flexibility and adjustability to changing market conditions. Empirical evidence shows that information is not instantly incorporated into market pnces and supports the claim that the fmancial time series studied, for the periods analysed, are not entirely random. This research clearly shows that equity markets are partially inefficient and do not behave along lines dictated by the efficient market hypothesis

Brunel University Research Archive

CONTRIBUTIONS IN CLASSIFICATION: VISUAL PRUNING FOR DECISION TREES, P-SPLINE BASED CLUSTERING OF CORRELATED SERIES, BOOSTED-ORIENTED PROBABILISTIC CLUSTERING OF SERIES.

Author: Iorio Carmela
Publication venue
Publication date: 30/03/2015
Field of study

This work consists of three papers written during my Ph.D. period. The thesis consists of five chapters. In chapter 2 the basic building blocks of our works are introduced. In particular we briefly recall the concepts of classification (supervised and unsupervised) and penalized spline. In chapter 3 we present a paper whose idea was presented at Cladag 2013 Symposium. Within the framework of recursive partitioning algorithms by tree-based methods, this paper provides a contribution on both the visual representation of the data partition in a geometrical space and the selection of the decision tree. In our visual approach the identification of both the best tree and of weakest links is immediately evaluable by the graphical analysis of the tree structure without considering the pruning sequence. The results in terms of error rate are really similar to the ones returned by the Classification And Regression Trees procedure, showing how this new way to select the best tree is a valid alternative to the well known cost-complexity pruning In chapter 4 we present a paper on parsimonious clustering of correlated series. Clustering of time series has become an important topic, motivated by the increased interest in these type of data. Most of the time, these procedures do not facilitate the removal of noise from data, have difficulties handling time series with unequal length and require a preprocessing step of the data considered, i.e. by modeling each series with an appropriate model for time series. In this work we propose a new clustering data (time) series way, which can be considered as belonging to both model-based and feature-based approach. Our method consists of since we model each series by penalized spline (P-spline) smoothers and performing clustering directly on spline coefficients. Using the P-spline smoothers the signal of series is separated from the noise, capturing the different shapes of series. The P-spline coefficients are close to the fitted curve and present the skeleton of the fit. Thus, summarizing each series by coefficients reduces the dimensionality of the problem, improving significantly computation time without reduction in performance of clustering procedure. To select the smoothing parameter we adopt a V-curve procedure. This criterion does not require the computation of the effective model dimension and it is insensitive to serial correlation in the noise around the trend. Using the P-spline smoothers, moments of the original data are conserved. This implies that mean and variance of the estimated series are equal to those of the raw series. This consideration allows to use a similar approach in dealing with series of different length. The performance is evaluated analyzing a simulated data set,also considering series with different length. An application of our proposal on financial time series is also performed. In Chapter 5 we present a paper that proposes a fuzzy clustering algorithm that is independent from the choice of the fuzzifier. It comes from two approaches, theoretically motivated for respectively unsupervised and supervised classification cases. The first is the Probabilistic Distance (PD) clustering procedure. The second is the well known Boosting philosophy. From the PD approach we took the idea of determining the probabilities of each series to any of the k clusters. As this probability is unequivocally related to the distance of each series from the cluster centers, there are no degrees of freedom in determine the membership matrix. From the Boosting approach we took the idea of weighting each series according some measure of badness of fit in order to define an unsupervised learning process based on a weighted re-sampling procedure. Our idea is to adapt the boosting philosophy to unsupervised learning problems, specially to non hierarchical cluster analysis. In such a case there not exists a target variable, but as the goal is to assign each instance (i.e. a series) of a data set to a cluster, we have a target instance. The representative instance of a given cluster (i.e. the center of a cluster) can be assumed as a target instance, a loss function to be minimized can be assumed as a synthetic index of the global performance, the probability of each series to belong to a given cluster can be assumed as the individual contribution of a given instance to the overall solution. In contrast to the boosting approach, the higher is the probability of a given series to be member of a given cluster, the higher is the weight of that instance in the re-sampling process. As a learner we use a P-spline smoother. To define the probabilities of each series to belong to a given cluster we use the PD clustering approach. This approach allows us to define a suitable loss function and, at the same time, to propose a fuzzy clustering procedure that does not depend on the definition of a fuzzifier parameter. The global performance of the proposed method is investigated by three experiments (one of them on simulated data and the remaining two on data sets known in literature) evaluated by using a fuzzy variant of the Rand Index. Chapter 6 concludes the thesis

Università degli Studi di Napoli Federico Il Open Archive

Development of an unsupervised remote sensing methodology of detect surface leakage from terrestrial CO2 storage sites

Author: Govindan Rajesh
Govindan Rajesh
Publication venue
Publication date: 01/01/2011
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Measuring Latent Variables is space and/or time: A Gender Statistics exercise

Author: Bertarelli G
Crippa F
Mecatti F
Publication venue: place:Athens
Publication date: 01/01/2017
Field of study

Archivio della ricerca della Scuola Superiore Sant'Anna

Data Clustering and Partial Supervision with Some Parallel Developments

Author: A. Salem Sameh
Publication venue
Publication date
Field of study

Data Clustering and Partial Supell'ision with SOllie Parallel Developments by Sameh A. Salem Clustering is an important and irreplaceable step towards the search for structures in the data. Many different clustering algorithms have been proposed. Yet, the sources of variability in most clustering algorithms affect the reliability of their results. Moreover, the majority tend to be based on the knowledge of the number of clusters as one of the input parameters. Unfortunately, there are many scenarios, where this knowledge may not be available. In addition, clustering algorithms are very computationally intensive which leads to a major challenging problem in scaling up to large datasets. This thesis gives possible solutions for such problems. First, new measures - called clustering performance measures (CPMs) - for assessing the reliability of a clustering algorithm are introduced. These CPMs can be used to evaluate: I) clustering algorithms that have a structure bias to certain type of data distribution as well as those that have no such biases, 2) clustering algorithms that have initialisation dependency as well as the clustering algorithms that have a unique solution for a given set of parameter values with no initialisation dependency. Then, a novel clustering algorithm, which is a RAdius based Clustering ALgorithm (RACAL), is proposed. RACAL uses a distance based principle to map the distributions of the data assuming that clusters are determined by a distance parameter, without having to specify the number of clusters. Furthermore, RACAL is enhanced by a validity index to choose the best clustering result, i.e. result has compact clusters with wide cluster separations, for a given input parameter. Comparisons with other clustering algorithms indicate the applicability and reliability of the proposed clustering algorithm. Additionally, an adaptive partial supervision strategy is proposed for using in conjunction with RACAL_to make it act as a classifier. Results from RACAL with partial supervision, RACAL-PS, indicate its robustness in classification. Additionally, a parallel version of RACAL (P-RACAL) is proposed. The parallel evaluations of P-RACAL indicate that P-RACAL is scalable in terms of speedup and scaleup, which gives the ability to handle large datasets of high dimensions in a reasonable time. Next, a novel clustering algorithm, which achieves clustering without any control of cluster sizes, is introduced. This algorithm, which is called Nearest Neighbour Clustering, Algorithm (NNCA), uses the same concept as the K-Nearest Neighbour (KNN) classifier with the advantage that the algorithm needs no training set and it is completely unsupervised. Additionally, NNCA is augmented with a partial supervision strategy, NNCA-PS, to act as a classifier. Comparisons with other methods indicate the robustness of the proposed method in classification. Additionally, experiments on parallel environment indicate the suitability and scalability of the parallel NNCA, P-NNCA, in handling large datasets. Further investigations on more challenging data are carried out. In this context, microarray data is considered. In such data, the number of clusters is not clearly defined. This points directly towards the clustering algorithms that does not require the knowledge of the number of clusters. Therefore, the efficacy of one of these algorithms is examined. Finally, a novel integrated clustering performance measure (lCPM) is proposed to be used as a guideline for choosing the proper clustering algorithm that has the ability to extract useful biological information in a particular dataset. Supplied by The British Library - 'The world's knowledge' Supplied by The British Library - 'The world's knowledge

University of Liverpool Repository