15,638 research outputs found

    A no-regret generalization of hierarchical softmax to extreme multi-label classification

    Full text link
    Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@k is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic - a reduction technique from multi-label to multi-class that is routinely used along with HSM - is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.Comment: Accepted at NIPS 201

    Streaming Label Learning for Modeling Labels on the Fly

    Full text link
    It is challenging to handle a large volume of labels in multi-label learning. However, existing approaches explicitly or implicitly assume that all the labels in the learning process are given, which could be easily violated in changing environments. In this paper, we define and study streaming label learning (SLL), i.e., labels are arrived on the fly, to model newly arrived labels with the help of the knowledge learned from past labels. The core of SLL is to explore and exploit the relationships between new labels and past labels and then inherit the relationship into hypotheses of labels to boost the performance of new classifiers. In specific, we use the label self-representation to model the label relationship, and SLL will be divided into two steps: a regression problem and a empirical risk minimization (ERM) problem. Both problems are simple and can be efficiently solved. We further show that SLL can generate a tighter generalization error bound for new labels than the general ERM framework with trace norm or Frobenius norm regularization. Finally, we implement extensive experiments on various benchmark datasets to validate the new setting. And results show that SLL can effectively handle the constantly emerging new labels and provides excellent classification performance

    A Hierarchical Spectral Method for Extreme Classification

    Full text link
    Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable. One strategy for dealing with the computational burden is via a tree decomposition of the output space. While this typically leads to training and inference that scales sublinearly with the number of outputs, it also results in reduced statistical performance. In this work, we identify two shortcomings of tree decomposition methods, and describe two heuristic mitigations. We compose these with an eigenvalue technique for constructing the tree. The end result is a computationally efficient algorithm that provides good statistical performance on several extreme data sets.Comment: Reference implementation available at https://github.com/pmineiro/xls

    Hybrid Generative/Discriminative Learning for Automatic Image Annotation

    Full text link
    Automatic image annotation (AIA) raises tremendous challenges to machine learning as it requires modeling of data that are both ambiguous in input and output, e.g., images containing multiple objects and labeled with multiple semantic tags. Even more challenging is that the number of candidate tags is usually huge (as large as the vocabulary size) yet each image is only related to a few of them. This paper presents a hybrid generative-discriminative classifier to simultaneously address the extreme data-ambiguity and overfitting-vulnerability issues in tasks such as AIA. Particularly: (1) an Exponential-Multinomial Mixture (EMM) model is established to capture both the input and output ambiguity and in the meanwhile to encourage prediction sparsity; and (2) the prediction ability of the EMM model is explicitly maximized through discriminative learning that integrates variational inference of graphical models and the pairwise formulation of ordinal regression. Experiments show that our approach achieves both superior annotation performance and better tag scalability.Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010

    Accelerating Extreme Classification via Adaptive Feature Agglomeration

    Full text link
    Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction times being a benchmark. We propose DEFRAG, an adaptive feature agglomeration technique to accelerate extreme classification algorithms. Despite past works on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of features, and is especially beneficial when feature sets are sparse, which is typical of recommendation and multi-label datasets. The method comes with provable performance guarantees and performs efficient task-driven agglomeration to reduce feature dimensionalities by an order of magnitude or more. Experiments show that DEFRAG can not only reduce training and prediction times of several leading extreme classification algorithms by as much as 40%, but also be used for feature reconstruction to address the problem of missing features, as well as offer superior coverage on rare labels.Comment: A version of this paper without the appendices will appear at the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). Code for this paper is available at https://github.com/purushottamkar/defrag

    The Extreme Value Machine

    Full text link
    It is often desirable to be able to recognize when inputs to a recognition function learned in a supervised manner correspond to classes unseen at training time. With this ability, new class labels could be assigned to these inputs by a human operator, allowing them to be incorporated into the recognition function --- ideally under an efficient incremental update mechanism. While good algorithms that assume inputs from a fixed set of classes exist, e.g., artificial neural networks and kernel machines, it is not immediately obvious how to extend them to perform incremental learning in the presence of unknown query classes. Existing algorithms take little to no distributional information into account when learning recognition functions and lack a strong theoretical foundation. We address this gap by formulating a novel, theoretically sound classifier --- the Extreme Value Machine (EVM). The EVM has a well-grounded interpretation derived from statistical Extreme Value Theory (EVT), and is the first classifier to be able to perform nonlinear kernel-free variable bandwidth incremental learning. Compared to other classifiers in the same deep network derived feature space, the EVM is accurate and efficient on an established benchmark partition of the ImageNet dataset.Comment: Pre-print of a manuscript accepted to the IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) journa

    Cortex Neural Network: learning with Neural Network groups

    Full text link
    Neural Network has been successfully applied to many real-world problems, such as image recognition and machine translation. However, for the current architecture of neural networks, it is hard to perform complex cognitive tasks, for example, to process the image and audio inputs together. Cortex, as an important architecture in the brain, is important for animals to perform the complex cognitive task. We view the architecture of Cortex in the brain as a missing part in the design of the current artificial neural network. In this paper, we purpose Cortex Neural Network (CrtxNN). The Cortex Neural Network is an upper architecture of neural networks which motivated from cerebral cortex in the brain to handle different tasks in the same learning system. It is able to identify different tasks and solve them with different methods. In our implementation, the Cortex Neural Network is able to process different cognitive tasks and perform reflection to get a higher accuracy. We provide a series of experiments to examine the capability of the cortex architecture on traditional neural networks. Our experiments proved its ability on the Cortex Neural Network can reach accuracy by 98.32% on MNIST and 62% on CIFAR10 at the same time, which can promisingly reduce the loss by 40%

    How to Learn a Model Checker

    Full text link
    We show how machine-learning techniques, particularly neural networks, offer a very effective and highly efficient solution to the approximate model-checking problem for continuous and hybrid systems, a solution where the general-purpose model checker is replaced by a model-specific classifier trained by sampling model trajectories. To the best of our knowledge, we are the first to establish this link from machine learning to model checking. Our method comprises a pipeline of analysis techniques for estimating and obtaining statistical guarantees on the classifier's prediction performance, as well as tuning techniques to improve such performance. Our experimental evaluation considers the time-bounded reachability problem for three well-established benchmarks in the hybrid systems community. On these examples, we achieve an accuracy of 99.82% to 100% and a false-negative rate (incorrectly predicting that unsafe states are not reachable from a given state) of 0.0007 to 0. We believe that this level of accuracy is acceptable in many practical applications and we show how the approximate model checker can be made more conservative by tuning the classifier through further training and selection of the classification threshold.Comment: 16 pages, 13 figures, short version submitted to HSCC201

    Predicting Outcome of Indian Premier League (IPL) Matches Using Machine Learning

    Full text link
    Cricket, especially the Twenty20 format, has maximum uncertainty, where a single over can completely change the momentum of the game. With millions of people following the Indian Premier League (IPL), developing a model for predicting the outcome of its matches is a real-world problem. A cricket match depends upon various factors, and in this work, the factors which significantly influence the outcome of a Twenty20 cricket match are identified. Each player's performance in the field is considered to find out the overall weight (relative strength) of the teams. A multivariate regression based solution is proposed to calculate points for each player in the league and the overall weight of a team is computed based on the past performance of the players who have appeared most for the team. Finally, a dataset is modeled based on the identified seven factors which influence the outcome of an IPL match. Six machine learning models were trained and used for predicting the outcome of each 2018 IPL match, 15 minutes before the gameplay, immediately after the toss. Three of the trained models were seen to be correctly predicting more than 40 matches, with Multilayer Perceptron outperforming all other models with an impressive accuracy of 71.66%

    Security Matters: A Survey on Adversarial Machine Learning

    Full text link
    Adversarial machine learning is a fast growing research area, which considers the scenarios when machine learning systems may face potential adversarial attackers, who intentionally synthesize input data to make a well-trained model to make mistake. It always involves a defending side, usually a classifier, and an attacking side that aims to cause incorrect output. The earliest studies on the adversarial examples for machine learning algorithms start from the information security area, which considers a much wider varieties of attacking methods. But recent research focus that popularized by the deep learning community places strong emphasis on how the "imperceivable" perturbations on the normal inputs may cause dramatic mistakes by the deep learning with supposed super-human accuracy. This paper serves to give a comprehensive introduction to a range of aspects of the adversarial deep learning topic, including its foundations, typical attacking and defending strategies, and some extended studies
    • …
    corecore