823 research outputs found

    Radial basis function classifier construction using particle swarm optimisation aided orthogonal forward regression

    Get PDF
    We develop a particle swarm optimisation (PSO) aided orthogonal forward regression (OFR) approach for constructing radial basis function (RBF) classifiers with tunable nodes. At each stage of the OFR construction process, the centre vector and diagonal covariance matrix of one RBF node is determined efficiently by minimising the leave-one-out (LOO) misclassification rate (MR) using a PSO algorithm. Compared with the state-of-the-art regularisation assisted orthogonal least square algorithm based on the LOO MR for selecting fixednode RBF classifiers, the proposed PSO aided OFR algorithm for constructing tunable-node RBF classifiers offers significant advantages in terms of better generalisation performance and smaller model size as well as imposes lower computational complexity in classifier construction process. Moreover, the proposed algorithm does not have any hyperparameter that requires costly tuning based on cross validation

    Elastic net prefiltering for two class classification

    No full text
    A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model’s generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems

    Sparse multinomial kernel discriminant analysis (sMKDA)

    No full text
    Dimensionality reduction via canonical variate analysis (CVA) is important for pattern recognition and has been extended variously to permit more flexibility, e.g. by "kernelizing" the formulation. This can lead to over-fitting, usually ameliorated by regularization. Here, a method for sparse, multinomial kernel discriminant analysis (sMKDA) is proposed, using a sparse basis to control complexity. It is based on the connection between CVA and least-squares, and uses forward selection via orthogonal least-squares to approximate a basis, generalizing a similar approach for binomial problems. Classification can be performed directly via minimum Mahalanobis distance in the canonical variates. sMKDA achieves state-of-the-art performance in terms of accuracy and sparseness on 11 benchmark datasets

    Data driven process monitoring based on neural networks and classification trees

    Get PDF
    Process monitoring in the chemical and other process industries has been of great practical importance. Early detection of faults is critical in avoiding product quality deterioration, equipment damage, and personal injury. The goal of this dissertation is to develop process monitoring schemes that can be applied to complex process systems. Neural networks have been a popular tool for modeling and pattern classification for monitoring of process systems. However, due to the prohibitive computational cost caused by high dimensionality and frequently changing operating conditions in batch processes, their applications have been difficult. The first part of this work tackles this problem by employing a polynomial-based data preprocessing step that greatly reduces the dimensionality of the neural network process model. The process measurements and manipulated variables go through a polynomial regression step and the polynomial coefficients, which are usually of far lower dimensionality than the original data, are used to build a neural network model to produce residuals for fault classification. Case studies show a significant reduction in neural model construction time and sometimes better classification results as well. The second part of this research investigates classification trees as a promising approach to fault detection and classification. It is found that the underlying principles of classification trees often result in complicated trees even for rather simple problems, and construction time can excessive for high dimensional problems. Fisher Discriminant Analysis (FDA), which features an optimal linear discrimination between different faults and projects original data on to perpendicular scores, is used as a dimensionality reduction tool. Classification trees use the scores to separate observations into different fault classes. A procedure identifies the order of FDA scores that results in a minimum tree cost as the optimal order. Comparisons to other popular multivariate statistical analysis based methods indicate that the new scheme exhibits better performance on a benchmarking problem

    Random projection ensemble classification

    Get PDF
    We introduce a very general method for high-dimensional classification, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower-dimensional space. In one special case that we study in detail, the random projections are divided into disjoint groups, and within each group we select the projection yielding the smallest estimate of the test error. Our random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment. Our theoretical results elucidate the effect on performance of increasing the number of projections. Moreover, under a boundary condition implied by the sufficient dimension reduction assumption, we show that the test excess risk of the random projection ensemble classifier can be controlled by terms that do not depend on the original data dimension and a term that becomes negligible as the number of projections increases. The classifier is also compared empirically with several other popular high-dimensional classifiers via an extensive simulation study, which reveals its excellent finite-sample performance.Both authors are supported by an Engineering and Physical Sciences Research Council Fellowship EP/J017213/1; the second author is also supported by a Philip Leverhulme prize

    Active Wavelength Selection for Chemical Identification Using Tunable Spectroscopy

    Get PDF
    Spectrometers are the cornerstone of analytical chemistry. Recent advances in microoptics manufacturing provide lightweight and portable alternatives to traditional spectrometers. In this dissertation, we developed a spectrometer based on Fabry-Perot interferometers (FPIs). A FPI is a tunable (it can only scan one wavelength at a time) optical filter. However, compared to its traditional counterparts such as FTIR (Fourier transform infrared spectroscopy), FPIs provide lower resolution and lower signal-noiseratio (SNR). Wavelength selection can help alleviate these drawbacks. Eliminating uninformative wavelengths not only speeds up the sensing process but also helps improve accuracy by avoiding nonlinearity and noise. Traditional wavelength selection algorithms follow a training-validation process, and thus they are only optimal for the target analyte. However, for chemical identification, the identities are unknown. To address the above issue, this dissertation proposes active sensing algorithms that select wavelengths online while sensing. These algorithms are able to generate analytedependent wavelengths. We envision this algorithm deployed on a portable chemical gas platform that has low-cost sensors and limited computation resources. We develop three algorithms focusing on three different aspects of the chemical identification problems. First, we consider the problem of single chemical identification. We formulate the problem as a typical classification problem where each chemical is considered as a distinct class. We use Bayesian risk as the utility function for wavelength selection, which calculates the misclassification cost between classes (chemicals), and we select the wavelength with the maximum reduction in the risk. We evaluate this approach on both synthesized and experimental data. The results suggest that active sensing outperforms the passive method, especially in a noisy environment. Second, we consider the problem of chemical mixture identification. Since the number of potential chemical mixtures grows exponentially as the number of components increases, it is intractable to formulate all potential mixtures as classes. To circumvent combinatorial explosion, we developed a multi-modal non-negative least squares (MMNNLS) method that searches multiple near-optimal solutions as an approximation of all the solutions. We project the solutions onto spectral space, calculate the variance of the projected spectra at each wavelength, and select the next wavelength using the variance as the guidance. We validate this approach on synthesized and experimental data. The results suggest that active approaches are superior to their passive counterparts especially when the condition number of the mixture grows larger (the analytes consist of more components, or the constituent spectra are very similar to each other). Third, we consider improving the computational speed for chemical mixture identification. MM-NNLS scales poorly as the chemical mixture becomes more complex. Therefore, we develop a wavelength selection method based on Gaussian process regression (GPR). GPR aims to reconstruct the spectrum rather than solving the mixture problem, thus, its computational cost is a function of the number of wavelengths. We evaluate the approach on both synthesized and experimental data. The results again demonstrate more accurate and robust performance in contrast to passive algorithms

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Large Scale Kernel Methods for Fun and Profit

    Get PDF
    Kernel methods are among the most flexible classes of machine learning models with strong theoretical guarantees. Wide classes of functions can be approximated arbitrarily well with kernels, while fast convergence and learning rates have been formally shown to hold. Exact kernel methods are known to scale poorly with increasing dataset size, and we believe that one of the factors limiting their usage in modern machine learning is the lack of scalable and easy to use algorithms and software. The main goal of this thesis is to study kernel methods from the point of view of efficient learning, with particular emphasis on large-scale data, but also on low-latency training, and user efficiency. We improve the state-of-the-art for scaling kernel solvers to datasets with billions of points using the Falkon algorithm, which combines random projections with fast optimization. Running it on GPUs, we show how to fully utilize available computing power for training kernel machines. To boost the ease-of-use of approximate kernel solvers, we propose an algorithm for automated hyperparameter tuning. By minimizing a penalized loss function, a model can be learned together with its hyperparameters, reducing the time needed for user-driven experimentation. In the setting of multi-class learning, we show that – under stringent but realistic assumptions on the separation between classes – a wide set of algorithms needs much fewer data points than in the more general setting (without assumptions on class separation) to reach the same accuracy. The first part of the thesis develops a framework for efficient and scalable kernel machines. This raises the question of whether our approaches can be used successfully in real-world applications, especially compared to alternatives based on deep learning which are often deemed hard to beat. The second part aims to investigate this question on two main applications, chosen because of the paramount importance of having an efficient algorithm. First, we consider the problem of instance segmentation of images taken from the iCub robot. Here Falkon is used as part of a larger pipeline, but the efficiency afforded by our solver is essential to ensure smooth human-robot interactions. In the second instance, we consider time-series forecasting of wind speed, analysing the relevance of different physical variables on the predictions themselves. We investigate different schemes to adapt i.i.d. learning to the time-series setting. Overall, this work aims to demonstrate, through novel algorithms and examples, that kernel methods are up to computationally demanding tasks, and that there are concrete applications in which their use is warranted and more efficient than that of other, more complex, and less theoretically grounded models
    corecore