6,955 research outputs found

    Random Prism: An Alternative to Random Forests.

    Get PDF
    Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting

    Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles

    Get PDF
    We examine a network of learners which address the same classification task but must learn from different data sets. The learners cannot share data but instead share their models. Models are shared only one time so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach allowing to aggregate the predictions of the classifiers trained by each learner. The proposed method aggregates the base classifiers using a probabilistic model relying on Gaussian copulas. Experiments on logistic regressor ensembles demonstrate competing accuracy and increased robustness in case of dependent classifiers. A companion python implementation can be downloaded at https://github.com/john-klein/DELC

    Efficient exploration of unknown indoor environments using a team of mobile robots

    Get PDF
    Whenever multiple robots have to solve a common task, they need to coordinate their actions to carry out the task efficiently and to avoid interferences between individual robots. This is especially the case when considering the problem of exploring an unknown environment with a team of mobile robots. To achieve efficient terrain coverage with the sensors of the robots, one first needs to identify unknown areas in the environment. Second, one has to assign target locations to the individual robots so that they gather new and relevant information about the environment with their sensors. This assignment should lead to a distribution of the robots over the environment in a way that they avoid redundant work and do not interfere with each other by, for example, blocking their paths. In this paper, we address the problem of efficiently coordinating a large team of mobile robots. To better distribute the robots over the environment and to avoid redundant work, we take into account the type of place a potential target is located in (e.g., a corridor or a room). This knowledge allows us to improve the distribution of robots over the environment compared to approaches lacking this capability. To autonomously determine the type of a place, we apply a classifier learned using the AdaBoost algorithm. The resulting classifier takes laser range data as input and is able to classify the current location with high accuracy. We additionally use a hidden Markov model to consider the spatial dependencies between nearby locations. Our approach to incorporate the information about the type of places in the assignment process has been implemented and tested in different environments. The experiments illustrate that our system effectively distributes the robots over the environment and allows them to accomplish their mission faster compared to approaches that ignore the place labels

    Classifier selection with permutation tests

    Get PDF
    This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft

    Ensemble candidate classification for the LOTAAS pulsar survey

    Get PDF
    One of the biggest challenges arising from modern large-scale pulsar surveys is the number of candidates generated. Here, we implemented several improvements to the machine learning (ML) classifier previously used by the LOFAR Tied-Array All-Sky Survey (LOTAAS) to look for new pulsars via filtering the candidates obtained during periodicity searches. To assist the ML algorithm, we have introduced new features which capture the frequency and time evolution of the signal and improved the signal-to-noise calculation accounting for broad profiles. We enhanced the ML classifier by including a third class characterizing RFI instances, allowing candidates arising from RFI to be isolated, reducing the false positive return rate. We also introduced a new training data set used by the ML algorithm that includes a large sample of pulsars misclassified by the previous classifier. Lastly, we developed an ensemble classifier comprised of five different Decision Trees. Taken together these updates improve the pulsar recall rate by 2.5 per cent, while also improving the ability to identify pulsars with wide pulse profiles, often misclassified by the previous classifier. The new ensemble classifier is also able to reduce the percentage of false positive candidates identified from each LOTAAS pointing from 2.5 per cent (∼500 candidates) to 1.1 per cent (∼220 candidates)

    Randomized Reference Classifier with Gaussian Distribution and Soft Confusion Matrix Applied to the Improving Weak Classifiers

    Full text link
    In this paper, an issue of building the RRC model using probability distributions other than beta distribution is addressed. More precisely, in this paper, we propose to build the RRR model using the truncated normal distribution. Heuristic procedures for expected value and the variance of the truncated-normal distribution are also proposed. The proposed approach is tested using SCM-based model for testing the consequences of applying the truncated normal distribution in the RRC model. The experimental evaluation is performed using four different base classifiers and seven quality measures. The results showed that the proposed approach is comparable to the RRC model built using beta distribution. What is more, for some base classifiers, the truncated-normal-based SCM algorithm turned out to be better at discovering objects coming from minority classes.Comment: arXiv admin note: text overlap with arXiv:1901.0882
    corecore