5,142 research outputs found

    FAME: Face Association through Model Evolution

    Full text link
    We attack the problem of learning face models for public faces from weakly-labelled images collected from web through querying a name. The data is very noisy even after face detection, with several irrelevant faces corresponding to other people. We propose a novel method, Face Association through Model Evolution (FAME), that is able to prune the data in an iterative way, for the face models associated to a name to evolve. The idea is based on capturing discriminativeness and representativeness of each instance and eliminating the outliers. The final models are used to classify faces on novel datasets with possibly different characteristics. On benchmark datasets, our results are comparable to or better than state-of-the-art studies for the task of face identification.Comment: Draft version of the stud

    Learning Mixtures of Bernoulli Templates by Two-Round EM with Performance Guarantee

    Full text link
    Dasgupta and Shulman showed that a two-round variant of the EM algorithm can learn mixture of Gaussian distributions with near optimal precision with high probability if the Gaussian distributions are well separated and if the dimension is sufficiently high. In this paper, we generalize their theory to learning mixture of high-dimensional Bernoulli templates. Each template is a binary vector, and a template generates examples by randomly switching its binary components independently with a certain probability. In computer vision applications, a binary vector is a feature map of an image, where each binary component indicates whether a local feature or structure is present or absent within a certain cell of the image domain. A Bernoulli template can be considered as a statistical model for images of objects (or parts of objects) from the same category. We show that the two-round EM algorithm can learn mixture of Bernoulli templates with near optimal precision with high probability, if the Bernoulli templates are sufficiently different and if the number of features is sufficiently high. We illustrate the theoretical results by synthetic and real examples.Comment: 27 pages, 8 figure

    An investigation into the performance and representation of a stochastic evolutionary neural tree

    Get PDF
    Copyright Springer.The Stochastic Competitive Evolutionary Neural Tree (SCENT) is a new unsupervised neural net that dynamically evolves a representational structure in response to its training data. Uniquely SCENT requires no initial parameter setting as it autonomously creates appropriate parameterisation at runtime. Pruning and convergence are stochastically controlled using locally calculated heuristics. A thorough investigation into the performance of SCENT is presented. The network is compared to other dynamic tree based models and to a high quality flat clusterer over a variety of data sets and runs

    Interpretable Clustering using Unsupervised Binary Trees

    Get PDF
    We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not descend from the same node originally. Consistency results are obtained, and the procedure is used on simulated and real data sets.Comment: 25 pages, 6 figure

    Survey of data mining approaches to user modeling for adaptive hypermedia

    Get PDF
    The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio

    Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging

    Get PDF
    We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages
    corecore