3,644 research outputs found

    Benchmarking the Semi-Supervised Naïve Bayes Classifier

    Get PDF
    Semi-supervised learning involves constructing predictive models with both labelled and unlabelled training data. The need for semi-supervised learning is driven by the fact that unlabelled data are often easy and cheap to obtain, whereas labelling data requires costly and time consuming human intervention and expertise. Semi-supervised methods commonly use self training, which involves using the labelled data to predict the unlabelled data, then iteratively reconstructing classifiers using the predicted labels. Our aim is to determine whether self training classifiers actually improves performance. Expectation maximization is a commonly used self training scheme. We investigate whether an expectation maximization scheme improves a naïve Bayes classifier through experimentation with 30 discrete and 20 continuous real world benchmark UCI datasets. Rather surprisingly we find that in practice the self training actually makes the classifier worse. The cause for this detrimental affect on performance could either be with the self training scheme itself, or how self training works in conjunction with the classifier. Our hypothesis is that it is the latter cause, and the violation of the naïve Bayes model assumption of independence of attributes means predictive errors propagate through the self training scheme. To test whether this is the case, we generate simulated data with the same attribute distribution as the UCI data, but where the attributes are independent. Experiments with this data demonstrate that semi-supervised learning does improve performance, leading to significantly more accurate classifiers. These results demonstrate that semi-supervised learning cannot be applied blindly without considering the nature of the classifier, because the assumptions implicit in the classifier may result in a degradation in performance

    A Physics-Based Approach to Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems

    Full text link
    Given that observational and numerical climate data are being produced at ever more prodigious rates, increasingly sophisticated and automated analysis techniques have become essential. Deep learning is quickly becoming a standard approach for such analyses and, while great progress is being made, major challenges remain. Unlike commercial applications in which deep learning has led to surprising successes, scientific data is highly complex and typically unlabeled. Moreover, interpretability and detecting new mechanisms are key to scientific discovery. To enhance discovery we present a complementary physics-based, data-driven approach that exploits the causal nature of spatiotemporal data sets generated by local dynamics (e.g. hydrodynamic flows). We illustrate how novel patterns and coherent structures can be discovered in cellular automata and outline the path from them to climate data.Comment: 4 pages, 1 figure; http://csc.ucdavis.edu/~cmg/compmech/pubs/ci2017_Rupe_et_al.ht

    Machine Learning for Fluid Mechanics

    Full text link
    The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from field measurements, experiments and large-scale simulations at multiple spatiotemporal scales. Machine learning offers a wealth of techniques to extract information from data that could be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. It outlines fundamental machine learning methodologies and discusses their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation. Machine learning provides a powerful information processing framework that can enrich, and possibly even transform, current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202

    Combining Kernel Functions in Supervised Learning Models.

    Get PDF
    The research activity has mainly dealt with supervised Machine Learning algorithms, specifically within the context of kernel methods. A kernel function is a positive definite function mapping data from the original input space into a higher dimensional Hilbert space. Differently from classical linear methods, where problems are solved seeking for a linear function separating points in the input space, kernel methods all have in common the same basic focus: original input data is mapped onto a higher dimensional feature set where new coordinates are not computed, but only the inner product of input points. In this way, kernel methods make possible to deal with non-linearly separable set of data, making use of linear models in the feature space: all the Machine Learning methods using a linear function to determine the best fitting for a set of given data. Instead of employing one single kernel function, Multiple Kernel Learning algorithms tackle the problem of selecting kernel functions by using a combination of preset base kernels. Infinite Kernel Learning further extends such idea by exploiting a combination of possibly infinite base kernels. The research activity core idea is utilize a novel complex combination of kernel functions in already existing or modified supervised Machine Learning frameworks. Specifically, we considered two frameworks: Extreme Learning Machine, having the structure of classical feedforward Neural Networks but being characterized by hidden nodes variables randomly assigned at the beginning of the algorithm; Support Vector Machine, a class of linear algorithms based on the idea of separating data with a hyperplane having as wide a margin as possible. The first proposed model extends the classical Extreme Learning Machine formulation using a combination of possibly infinitely many base kernel, presenting a two-step algorithm. The second result uses a preexisting multi-task kernel function in a novel Support Vector Machine framework. Multi-task learning defines the Machine Learning problem of solving more than one task at the same time, with the main goal of taking into account the existing multi-task relationships. To be able to use the existing multi-task kernel function, we had to construct a new framework based on the classical Support Vector Machine one, taking care of every multi-task correlation factor

    Laplacian one class extreme learning machines for human action recognition

    Get PDF

    A robust approach to model-based classification based on trimming and constraints

    Full text link
    In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method
    corecore