19,435 research outputs found

    A Hybrid Deep Learning Approach for Texture Analysis

    Full text link
    Texture classification is a problem that has various applications such as remote sensing and forest species recognition. Solutions tend to be custom fit to the dataset used but fails to generalize. The Convolutional Neural Network (CNN) in combination with Support Vector Machine (SVM) form a robust selection between powerful invariant feature extractor and accurate classifier. The fusion of experts provides stability in classification rates among different datasets

    Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets

    Get PDF
    Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.Comment: Proceedings of the Seventh IEEE Conference on Innovative Smart Grid Technologies (ISGT 2016

    A Framework for Evaluating Land Use and Land Cover Classification Using Convolutional Neural Networks

    Get PDF
    Analyzing land use and land cover (LULC) using remote sensing (RS) imagery is essential for many environmental and social applications. The increase in availability of RS data has led to the development of new techniques for digital pattern classification. Very recently, deep learning (DL) models have emerged as a powerful solution to approach many machine learning (ML) problems. In particular, convolutional neural networks (CNNs) are currently the state of the art for many image classification tasks. While there exist several promising proposals on the application of CNNs to LULC classification, the validation framework proposed for the comparison of different methods could be improved with the use of a standard validation procedure for ML based on cross-validation and its subsequent statistical analysis. In this paper, we propose a general CNN, with a fixed architecture and parametrization, to achieve high accuracy on LULC classification over RS data from different sources such as radar and hyperspectral. We also present a methodology to perform a rigorous experimental comparison between our proposed DL method and other ML algorithms such as support vector machines, random forests, and k-nearest-neighbors. The analysis carried out demonstrates that the CNN outperforms the rest of techniques, achieving a high level of performance for all the datasets studied, regardless of their different characteristics.Ministerio de Economía y Competitividad TIN2014-55894-C2-1-RMinisterio de Economía y Competitividad TIN2017-88209-C2-2-

    Automatic Environmental Sound Recognition: Performance versus Computational Cost

    Get PDF
    In the context of the Internet of Things (IoT), sound sensing applications are required to run on embedded platforms where notions of product pricing and form factor impose hard constraints on the available computing power. Whereas Automatic Environmental Sound Recognition (AESR) algorithms are most often developed with limited consideration for computational cost, this article seeks which AESR algorithm can make the most of a limited amount of computing power by comparing the sound classification performance em as a function of its computational cost. Results suggest that Deep Neural Networks yield the best ratio of sound classification accuracy across a range of computational costs, while Gaussian Mixture Models offer a reasonable accuracy at a consistently small cost, and Support Vector Machines stand between both in terms of compromise between accuracy and computational cost

    CausaLM: Causal Model Explanation Through Counterfactual Language Models

    Full text link
    Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all ML-based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at: https://amirfeder.github.io/CausaLM/ Under review for the Computational Linguistics journa

    Eliminating Latent Discrimination: Train Then Mask

    Get PDF
    How can we control for latent discrimination in predictive models? How can we provably remove it? Such questions are at the heart of algorithmic fairness and its impacts on society. In this paper, we define a new operational fairness criteria, inspired by the well-understood notion of omitted variable-bias in statistics and econometrics. Our notion of fairness effectively controls for sensitive features and provides diagnostics for deviations from fair decision making. We then establish analytical and algorithmic results about the existence of a fair classifier in the context of supervised learning. Our results readily imply a simple, but rather counter-intuitive, strategy for eliminating latent discrimination. In order to prevent other features proxying for sensitive features, we need to include sensitive features in the training phase, but exclude them in the test/evaluation phase while controlling for their effects. We evaluate the performance of our algorithm on several real-world datasets and show how fairness for these datasets can be improved with a very small loss in accuracy
    corecore