20 research outputs found

    Semi-Supervised Dimension Reduction for Multi-Label Classification

    No full text
    A significant challenge to make learning techniques more suitable for general purpose use in AI is to move beyond i) complete supervision, ii) low dimensional data and iii) a single label per instance. Solving this challenge would allow making predictions for high dimensional large dataset with multiple (but possibly incomplete) labelings. While other work has addressed each of these problems separately, in this paper we show how to address them together, namely the problem of semi-supervised dimension reduction for multi-labeled classification, SSDR-MC. To our knowledge this is the first paper that attempts to address all challenges together. In this work, we study a novel joint learning framework which performs optimization for dimension reduction and multi-label inference in semi-supervised setting. The experimental results validate the performance of our approach, and demonstrate the effectiveness of connecting dimension reduction and learning

    Improving document clustering using automated machine translation

    No full text
    With the development of statistical machine translation, we have ready-to-use tools that can translate documents in one language into many different languages. These translations provides different yet correlated views of the same set of doc-uments. This gives rise to a natural question: can we use the extra information to achieve a better clustering of the docu-ments? Some recent work on multiview clustering provided positive answers to this question. In this work, we propose an alternative approach to address this problem using the constrained clustering framework. Unlike traditional Must-Link and Cannot-Link constraints, the constraints generated by machine translation are dense yet noisy. We show how to incorporate this type of constraints by presenting two algorithms, one parametric and one non-parametric. Our algorithms are easy to implement, efficient, and can consis-tently improve the clustering of real-world data, namely the Reuters RCV1/RCV2 Multilingual Dataset. In contrast to the existing multiview clustering techniques, our technique does not rely on the compatibility and conditional indepen-dence assumptions, nor does it involve subtle parameter tun-ing

    Labels vs. Pairwise Constraints: A Unified View of Label Propagation and Constrained Spectral Clustering

    No full text
    Abstract—In many real-world applications we can model the data as a graph with each node being an instance and the edges indicating a degree of similarity. Side information is often available in the form of labels for a small subset of instances, which gives rise to two problem settings and two types of algorithms. In the label propagation style algorithms, the known labels are propagated to the unlabeled nodes. In the constrained clustering style algorithms, known labels are first converted to pairwise constraints (Must-Link and Cannot-Link), then a constrained cut is computed as a tradeoff between minimizing the cut cost and maximizing the constraint satisfaction. Both techniques are evaluated by their ability to recover the ground truth labeling, i.e. by 0/1 loss function either directly on the labels or on the pairwise relations derived from the labels. These two fields have developed separately, but in this paper, we show that they are indeed related. This insight allows us to propose a novel way to generate constraints from the propagated labels, which our empirical study shows outperforms and is more stable than the state-of-the-art label propagation and constrained spectral clustering algorithms. Keywords-label propagation; constrained spectral clustering; semi-supervised learning; I

    Propagating Ranking Functions on a Graph: Algorithms and Applications

    No full text
    Learning to rank is an emerging learning task that opens up a diverse set of applications. However, most existing work focuses on learning a single ranking function whilst in many real world applications, there can be many ranking functions to fulfill various retrieval tasks on the same data set. How to train many ranking functions is challenging due to the limited availability of training data which is further compounded when plentiful training data is available for a small subset of the ranking functions. This is particularly true in settings, such as personalized ranking/retrieval, where each person requires a unique ranking function according to their preference, but only the functions of the persons who provide sufficient ratings (of objects, such as movies and music) can be well trained. To address this, we propose to construct a graph where each node corresponds to a retrieval task, and then propagate ranking functions on the graph. We illustrate the usefulness of the idea of propagating ranking functions and our method by exploring two real world applications

    Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding

    No full text
    Image captioning and visual language grounding are two important tasks for image understanding, but are seldom considered together. In this paper, we propose a Progressive Attention-Guided Network (PAGNet), which simultaneously generates image captions and predicts bounding boxes for caption words. PAGNet mainly has two distinctive properties: i) It can progressively refine the predictive results of image captioning, by updating the attention map with the predicted bounding boxes. ii) It learns bounding boxes of the words using a weakly supervised strategy, which combines the frameworks of Multiple Instance Learning (MIL) and Markov Decision Process (MDP). By using the attention map generated in the captioning process, PAGNet significantly reduces the search space of the MDP. We conduct experiments on benchmark datasets to demonstrate the effectiveness of PAGNet and results show that PAGNet achieves the best performance
    corecore