62,827 research outputs found

    Learning Label Structures with Neural Networks for Multi-label Classification

    Get PDF
    Multi-label classification (MLC) is the task of predicting a set of labels for a given input instance. A key challenge in MLC is how to capture underlying structures in label spaces. Due to the computational cost of learning from all possible label combinations, it is crucial to take into account scalability as well as predictive performance when we deal with large scale MLC problems. Another problem that arises when building MLC systems is which evaluation measures need to be used for performance comparison. Unlike traditional multi-class classification, several evaluation measures are often used together in MLC because each measure prefers a different MLC system. In other words, we need to understand the properties of MLC evaluation measures and build a system which performs well in terms of those evaluation measures in which we are particularly interested. In this thesis, we develop neural network architectures that efficiently and effectively utilize underlying label structures in large-scale MLC problems. In the literature, neural networks (NNs) that learn from pairwise relationships between labels have been used, but they do not scale well on large-scale label spaces. Thus, we propose a comparably simple NN architecture that uses a loss function which ignores label dependencies. We demonstrate that simpler NNs using cross-entropy per label works better than more complex NNs, particularly in terms of rank loss, an evaluation measure that takes into account the number of incorrectly ranked label pairs. Another commonly considered evaluation measure is subset 0/1 loss. Classifier chains (CCs) have shown state-of-the-art performance in terms of that measure because the joint probability of labels is optimized explicitly. CCs essentially convert the problem of learning the joint probability into a sequential prediction problem. Then, the task is to predict a sequence of binary values for labels. Contrary to the aforementioned NN architecture which ignores label structures, we study recurrent neural networks (RNNs) so as to make use of sequential structures on label chains. The proposed RNNs are advantageous over CC approaches when dealing with a large number of labels due to parameter sharing effects in RNNs and their abilities to learn from long sequences. Our experimental results also confirm that their superior performance on very large label spaces. In addition to NNs that learn from label sequences, we present two novel NN-based methods that learn a joint space of instances and labels efficiently while exploiting label structures. The proposed joint space learning methods project both instances and labels into a lower dimensional space in a way that minimizes the distance between an instance and its relevant labels in that space. While the goal of both joint space learning methods is same, they use different additional information on label spaces during training: One approach makes use of hierarchical structures of labels and can be useful when such label structures are given by human experts. The other uses latent label spaces learned from textual label descriptions so that we can apply it to more general MLC problems where no explicit label structures are available. Notwithstanding the difference between the two approaches, both approaches allow us to make predictions with respect to labels that have not been seen during training

    A Quasi-Wasserstein Loss for Learning Graph Neural Networks

    Full text link
    When learning graph neural networks (GNNs) in node-level prediction tasks, most existing loss functions are applied for each node independently, even if node embeddings and their labels are non-i.i.d. because of their graph structures. To eliminate such inconsistency, in this study we propose a novel Quasi-Wasserstein (QW) loss with the help of the optimal transport defined on graphs, leading to new learning and prediction paradigms of GNNs. In particular, we design a "Quasi-Wasserstein" distance between the observed multi-dimensional node labels and their estimations, optimizing the label transport defined on graph edges. The estimations are parameterized by a GNN in which the optimal label transport may determine the graph edge weights optionally. By reformulating the strict constraint of the label transport to a Bregman divergence-based regularizer, we obtain the proposed Quasi-Wasserstein loss associated with two efficient solvers learning the GNN together with optimal label transport. When predicting node labels, our model combines the output of the GNN with the residual component provided by the optimal label transport, leading to a new transductive prediction paradigm. Experiments show that the proposed QW loss applies to various GNNs and helps to improve their performance in node-level classification and regression tasks

    Hyperbolic Interaction Model For Hierarchical Multi-Label Classification

    Full text link
    Different from the traditional classification tasks which assume mutual exclusion of labels, hierarchical multi-label classification (HMLC) aims to assign multiple labels to every instance with the labels organized under hierarchical relations. Besides the labels, since linguistic ontologies are intrinsic hierarchies, the conceptual relations between words can also form hierarchical structures. Thus it can be a challenge to learn mappings from word hierarchies to label hierarchies. We propose to model the word and label hierarchies by embedding them jointly in the hyperbolic space. The main reason is that the tree-likeness of the hyperbolic space matches the complexity of symbolic data with hierarchical structures. A new Hyperbolic Interaction Model (HyperIM) is designed to learn the label-aware document representations and make predictions for HMLC. Extensive experiments are conducted on three benchmark datasets. The results have demonstrated that the new model can realistically capture the complex data structures and further improve the performance for HMLC comparing with the state-of-the-art methods. To facilitate future research, our code is publicly available

    Multi-Entity Dependence Learning with Rich Context via Conditional Variational Auto-encoder

    Full text link
    Multi-Entity Dependence Learning (MEDL) explores conditional correlations among multiple entities. The availability of rich contextual information requires a nimble learning scheme that tightly integrates with deep neural networks and has the ability to capture correlation structures among exponentially many outcomes. We propose MEDL_CVAE, which encodes a conditional multivariate distribution as a generating process. As a result, the variational lower bound of the joint likelihood can be optimized via a conditional variational auto-encoder and trained end-to-end on GPUs. Our MEDL_CVAE was motivated by two real-world applications in computational sustainability: one studies the spatial correlation among multiple bird species using the eBird data and the other models multi-dimensional landscape composition and human footprint in the Amazon rainforest with satellite images. We show that MEDL_CVAE captures rich dependency structures, scales better than previous methods, and further improves on the joint likelihood taking advantage of very large datasets that are beyond the capacity of previous methods.Comment: The first two authors contribute equall

    A simple yet effective baseline for non-attributed graph classification

    Full text link
    Graphs are complex objects that do not lend themselves easily to typical learning tasks. Recently, a range of approaches based on graph kernels or graph neural networks have been developed for graph classification and for representation learning on graphs in general. As the developed methodologies become more sophisticated, it is important to understand which components of the increasingly complex methods are necessary or most effective. As a first step, we develop a simple yet meaningful graph representation, and explore its effectiveness in graph classification. We test our baseline representation for the graph classification task on a range of graph datasets. Interestingly, this simple representation achieves similar performance as the state-of-the-art graph kernels and graph neural networks for non-attributed graph classification. Its performance on classifying attributed graphs is slightly weaker as it does not incorporate attributes. However, given its simplicity and efficiency, we believe that it still serves as an effective baseline for attributed graph classification. Our graph representation is efficient (linear-time) to compute. We also provide a simple connection with the graph neural networks. Note that these observations are only for the task of graph classification while existing methods are often designed for a broader scope including node embedding and link prediction. The results are also likely biased due to the limited amount of benchmark datasets available. Nevertheless, the good performance of our simple baseline calls for the development of new, more comprehensive benchmark datasets so as to better evaluate and analyze different graph learning methods. Furthermore, given the computational efficiency of our graph summary, we believe that it is a good candidate as a baseline method for future graph classification (or even other graph learning) studies.Comment: 13 pages. Shorter version appears at 2019 ICLR Workshop: Representation Learning on Graphs and Manifolds. arXiv admin note: text overlap with arXiv:1810.00826 by other author
    • …
    corecore