3,202 research outputs found

    Bayes-optimal Hierarchical Classification over Asymmetric Tree-Distance Loss

    Full text link
    Hierarchical classification is supervised multi-class classification problem over the set of class labels organized according to a hierarchy. In this report, we study the work by Ramaswamy et. al. on hierarchical classification over symmetric tree distance loss. We extend the consistency of hierarchical classification algorithm over asymmetric tree distance loss. We design a O(nklogn)\mathcal{O}(nk\log{}n) algorithm to find Bayes optimal classification for a k-ary tree as a hierarchy. We show that under reasonable assumptions over asymmetric loss function, the Bayes optimal classification over this asymmetric loss can be found in O(klogn)\mathcal{O}(k\log{}n). We exploit this insight and attempt to extend the Ova-Cascade algorithm \citet{ramaswamy2015convex} for hierarchical classification over the asymmetric loss.Comment: CS 396 Undergraduate Project Report, 17 Pages 3 Figure

    Efficient Set-Valued Prediction in Multi-Class Classification

    Full text link
    In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between the correctness (the true class is among the candidates) and the precision (the candidates are not too many) of its prediction. We formalize this problem within a general decision-theoretic framework that unifies most of the existing work in this area. In this framework, uncertainty is quantified in terms of conditional class probabilities, and the quality of a predicted set is measured in terms of a utility function. We then address the problem of finding the Bayes-optimal prediction, i.e., the subset of class labels with highest expected utility. For this problem, which is computationally challenging as there are exponentially (in the number of classes) many predictions to choose from, we propose efficient algorithms that can be applied to a broad family of utility functions. Our theoretical results are complemented by experimental studies, in which we analyze the proposed algorithms in terms of predictive accuracy and runtime efficiency

    Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling

    Full text link
    We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning of the training vocabulary in a stepwise manner such that resulting partitions represent topics. It is governed by a simple generative model, where the likelihood to generate the training documents via topics is optimized. The algorithm starts with one-word topics and joins two topics at every step. It therefore generates a solution for every desired number of topics ranging between the size of the training vocabulary and one. The process represents an agglomerative clustering that corresponds to a binary tree of topics. A resulting tree may act as a containment hierarchy, typically with more general topics towards the root of tree and more specific topics towards the leaves. Topic Grouper is not governed by a background distribution such as the Dirichlet and avoids hyper parameter optimizations. We show that Topic Grouper has reasonable predictive power and also a reasonable theoretical and practical complexity. Topic Grouper can deal well with stop words and function words and tends to push them into their own topics. Also, it can handle topic distributions, where some topics are more frequent than others. We present typical examples of computed topics from evaluation datasets, where topics appear conclusive and coherent. In this context, the fact that each word belongs to exactly one topic is not a major limitation; in some scenarios this can even be a genuine advantage, e.g.~a related shopping basket analysis may aid in optimizing groupings of articles in sales catalogs

    Every Untrue Label is Untrue in its Own Way: Controlling Error Type with the Log Bilinear Loss

    Full text link
    Deep learning has become the method of choice in many application domains of machine learning in recent years, especially for multi-class classification tasks. The most common loss function used in this context is the cross-entropy loss, which reduces to the log loss in the typical case when there is a single correct response label. While this loss is insensitive to the identity of the assigned class in the case of misclassification, in practice it is often the case that some errors may be more detrimental than others. Here we present the bilinear-loss (and related log-bilinear-loss) which differentially penalizes the different wrong assignments of the model. We thoroughly test this method using standard models and benchmark image datasets. As one application, we show the ability of this method to better contain error within the correct super-class, in the hierarchically labeled CIFAR100 dataset, without affecting the overall performance of the classifier

    Developing Bayesian Information Entropy-based Techniques for Spatially Explicit Model Assessment

    Full text link
    The aim of this paper is to explore and develop advanced spatial Bayesian assessment methods and techniques for land use modeling. The paper provides a comprehensive guide for assessing additional informational entropy value of model predictions at the spatially explicit domain of knowledge, and proposes a few alternative metrics and indicators for extracting higher-order information dynamics from simulation tournaments. A seven-county study area in South-Eastern Wisconsin (SEWI) has been used to simulate and assess the accuracy of historical land use changes (1963-1990) using artificial neural network simulations of the Land Transformation Model (LTM). The use of the analysis and the performance of the metrics helps: (a) understand and learn how well the model runs fits to different combinations of presence and absence of transitions in a landscape, not simply how well the model fits our given data; (b) derive (estimate) a theoretical accuracy that we would expect a model to assess under the presence of incomplete information and measurement; (c) understand the spatially explicit role and patterns of uncertainty in simulations and model estimations, by comparing results across simulation runs; (d) compare the significance or estimation contribution of transitional presence and absence (change versus no change) to model performance, and the contribution of the spatial drivers and variables to the explanatory value of our model; and (e) compare measurements of informational uncertainty at different scales of spatial resolution.Comment: 13 pages, 10 figures, 3 tables, 25 equations Submitted to IEEE Trans Inf Theor

    Cost-Sensitive Label Embedding for Multi-Label Classification

    Full text link
    Label embedding (LE) is an important family of multi-label classification algorithms that digest the label information jointly for better performance. Different real-world applications evaluate performance by different cost functions of interest. Current LE algorithms often aim to optimize one specific cost function, but they can suffer from bad performance with respect to other cost functions. In this paper, we resolve the performance issue by proposing a novel cost-sensitive LE algorithm that takes the cost function of interest into account. The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using the classic multidimensional scaling approach for manifold learning. CLEMS is able to deal with both symmetric and asymmetric cost functions, and effectively makes cost-sensitive decisions by nearest-neighbor decoding within the embedded vectors. We derive theoretical results that justify how CLEMS achieves the desired cost-sensitivity. Furthermore, extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions

    Taskonomy: Disentangling Task Transfer Learning

    Full text link
    Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity. We proposes a fully computational approach for modeling the structure of space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.Comment: CVPR 2018 (Oral). See project website and live demos at http://taskonomy.vision

    A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications

    Full text link
    Graph is an important data representation which appears in a wide diversity of real-world scenarios. Effective graph analytics provides users a deeper understanding of what is behind the data, and thus can benefit a lot of useful applications such as node classification, node recommendation, link prediction, etc. However, most graph analytics methods suffer the high computation and space cost. Graph embedding is an effective yet efficient way to solve the graph analytics problem. It converts the graph data into a low dimensional space in which the graph structural information and graph properties are maximally preserved. In this survey, we conduct a comprehensive review of the literature in graph embedding. We first introduce the formal definition of graph embedding as well as the related concepts. After that, we propose two taxonomies of graph embedding which correspond to what challenges exist in different graph embedding problem settings and how the existing work address these challenges in their solutions. Finally, we summarize the applications that graph embedding enables and suggest four promising future research directions in terms of computation efficiency, problem settings, techniques and application scenarios.Comment: A 20-page comprehensive survey of graph/network embedding for over 150+ papers till year 2018. It provides systematic categorization of problems, techniques and applications. Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE). Comments and suggestions are welcomed for continuously improving this surve

    Hierarchical Density Order Embeddings

    Full text link
    By representing words with probability densities rather than point vectors, probabilistic word embeddings can capture rich and interpretable semantic information and uncertainty. The uncertainty information can be particularly meaningful in capturing entailment relationships -- whereby general words such as "entity" correspond to broad distributions that encompass more specific words such as "animal" or "instrument". We introduce density order embeddings, which learn hierarchical representations through encapsulation of probability densities. In particular, we propose simple yet effective loss functions and distance metrics, as well as graph-based schemes to select negative samples to better learn hierarchical density representations. Our approach provides state-of-the-art performance on the WordNet hypernym relationship prediction task and the challenging HyperLex lexical entailment dataset -- while retaining a rich and interpretable density representation.Comment: Published at ICLR 201

    Online Machine Learning in Big Data Streams

    Full text link
    The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail
    corecore