220 research outputs found

    Ensemble of heterogeneous flexible neural trees using multiobjective genetic programming

    Get PDF
    Machine learning algorithms are inherently multiobjective in nature, where approximation error minimization and model's complexity simplification are two conflicting objectives. We proposed a multiobjective genetic programming (MOGP) for creating a heterogeneous flexible neural tree (HFNT), tree-like flexible feedforward neural network model. The functional heterogeneity in neural tree nodes was introduced to capture a better insight of data during learning because each input in a dataset possess different features. MOGP guided an initial HFNT population towards Pareto-optimal solutions, where the final population was used for making an ensemble system. A diversity index measure along with approximation error and complexity was introduced to maintain diversity among the candidates in the population. Hence, the ensemble was created by using accurate, structurally simple, and diverse candidates from MOGP final population. Differential evolution algorithm was applied to fine-tune the underlying parameters of the selected candidates. A comprehensive test over classification, regression, and time-series datasets proved the efficiency of the proposed algorithm over other available prediction methods. Moreover, the heterogeneous creation of HFNT proved to be efficient in making ensemble system from the final population

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Nonlinear Classifier Stacking on Riemannian and Grassmann Manifolds with Application to Video Analysis

    Get PDF
    This research is devoted to the problem of overfitting in Machine Learning and Pattern Recognition. It should lead to improving the generalisation ability and accuracy boosting in the case of small and/or difficult classification datasets. The aforementioned two problems have been solved in two different ways: by splitting the entire datasets into functional groups depending on the classification difficulty using consensus of classifiers, and by embedding the data obtained during classifier stacking into nonlinear spaces i.e. Riemannian and Grassmann manifolds. These two techniques are the main contributions of the thesis. The insight behind the first approach is that we are not going to use the entire training subset to train our classifiers but some part of it in order to approximate the true geometry and properties of classes. In terms of Data Science, this process can also be understood as Data Cleaning. According to the first approach, instances with high positive (easy) and negative (misclassified) margins are not considered for training as those that do not improve (or even worsen) the evaluation of the true geometry of classes. The main goal of using Riemannian geometry consists of embedding our classes in nonlinear spaces where the geometry of classes in terms of easier classification has to be obtained. Before embedding our classes on Riemannian and Grassmann manifolds we do several Data Transformations using different variants of Classifier Stacking. Riemannian manifolds of Symmetric Positive Definite matrices are created using the classifier interactions while Grassmann manifolds are built based on Decision Profiles. The purpose of the two aforementioned approaches is Data Complexity reduction. There is a consensus among researchers, that Data Complexity reduction should lead to an overfitting decrease as well as to classification accuracy enhancement. We carried out our experiments on various datasets from the UCI Machine Learning repository. We also tested our approaches on two datasets related to the Video Analysis problem. The first dataset is a Phase Gesture Segmentation dataset taken from the UCI Machine Learning repository. The second one is the Deep Fake detection Challenge dataset. In order to apply our approach to solve the second problem, some image processing has been carried out. Numerous experiments on datasets of general character and those related to Video Analysis problems show the consistency and efficiency of the proposed techniques. We also compared our techniques with the state-of-the-art techniques. The obtained results show the superiority of our approaches for most of the cases. The significance of carried out research and obtained results manifests in better representation and evaluation of the geometry of classes which may overlap only in feature space due to some improper measurements, errors, noises, or by selecting features that do not represent well our classes. Carried out research is a pioneering in terms of Data Cleaning and Classifier Ensemble Learning in Riemannian geometry

    Software Quality Assessment using Ensemble Models

    Get PDF

    Meta-learning to improve unsupervised intrusion detection in cyber-physical systems

    Get PDF

    Machine Learning for Hand Gesture Classification from Surface Electromyography Signals

    Get PDF
    Classifying hand gestures from Surface Electromyography (sEMG) is a process which has applications in human-machine interaction, rehabilitation and prosthetic control. Reduction in the cost and increase in the availability of necessary hardware over recent years has made sEMG a more viable solution for hand gesture classification. The research challenge is the development of processes to robustly and accurately predict the current gesture based on incoming sEMG data. This thesis presents a set of methods, techniques and designs that improve upon evaluation of, and performance on, the classification problem as a whole. These are brought together to set a new baseline for the potential classification. Evaluation is improved by careful choice of metrics and design of cross-validation techniques that account for data bias caused by common experimental techniques. A landmark study is re-evaluated with these improved techniques, and it is shown that data augmentation can be used to significantly improve upon the performance using conventional classification methods. A novel neural network architecture and supporting improvements are presented that further improve performance and is refined such that the network can achieve similar performance with many fewer parameters than competing designs. Supporting techniques such as subject adaptation and smoothing algorithms are then explored to improve overall performance and also provide more nuanced trade-offs with various aspects of performance, such as incurred latency and prediction smoothness. A new study is presented which compares the performance potential of medical grade electrodes and a low-cost commercial alternative showing that for a modest-sized gesture set, they can compete. The data is also used to explore data labelling in experimental design and to evaluate the numerous aspects of performance that must be traded off

    Longitudinal tracking of physiological state with electromyographic signals.

    Get PDF
    Electrophysiological measurements have been used in recent history to classify instantaneous physiological configurations, e.g., hand gestures. This work investigates the feasibility of working with changes in physiological configurations over time (i.e., longitudinally) using a variety of algorithms from the machine learning domain. We demonstrate a high degree of classification accuracy for a binary classification problem derived from electromyography measurements before and after a 35-day bedrest. The problem difficulty is increased with a more dynamic experiment testing for changes in astronaut sensorimotor performance by taking electromyography and force plate measurements before, during, and after a jump from a small platform. A LASSO regularization is performed to observe changes in relationship between electromyography features and force plate outcomes. SVM classifiers are employed to correctly identify the times at which these experiments are performed, which is important as these indicate a trajectory of adaptation

    Process-Oriented Stream Classification Pipeline:A Literature Review

    Get PDF
    Featured Application: Nowadays, many applications and disciplines work on the basis of stream data. Common examples are the IoT sector (e.g., sensor data analysis), or video, image, and text analysis applications (e.g., in social media analytics or astronomy). With our work, we gather different approaches and terminology, and give a broad overview over the topic. Our main target groups are practitioners and newcomers to the field of data stream classification. Due to the rise of continuous data-generating applications, analyzing data streams has gained increasing attention over the past decades. A core research area in stream data is stream classification, which categorizes or detects data points within an evolving stream of observations. Areas of stream classification are diverse—ranging, e.g., from monitoring sensor data to analyzing a wide range of (social) media applications. Research in stream classification is related to developing methods that adapt to the changing and potentially volatile data stream. It focuses on individual aspects of the stream classification pipeline, e.g., designing suitable algorithm architectures, an efficient train and test procedure, or detecting so-called concept drifts. As a result of the many different research questions and strands, the field is challenging to grasp, especially for beginners. This survey explores, summarizes, and categorizes work within the domain of stream classification and identifies core research threads over the past few years. It is structured based on the stream classification process to facilitate coordination within this complex topic, including common application scenarios and benchmarking data sets. Thus, both newcomers to the field and experts who want to widen their scope can gain (additional) insight into this research area and find starting points and pointers to more in-depth literature on specific issues and research directions in the field.</p
    corecore