5,142 research outputs found
FAME: Face Association through Model Evolution
We attack the problem of learning face models for public faces from
weakly-labelled images collected from web through querying a name. The data is
very noisy even after face detection, with several irrelevant faces
corresponding to other people. We propose a novel method, Face Association
through Model Evolution (FAME), that is able to prune the data in an iterative
way, for the face models associated to a name to evolve. The idea is based on
capturing discriminativeness and representativeness of each instance and
eliminating the outliers. The final models are used to classify faces on novel
datasets with possibly different characteristics. On benchmark datasets, our
results are comparable to or better than state-of-the-art studies for the task
of face identification.Comment: Draft version of the stud
Learning Mixtures of Bernoulli Templates by Two-Round EM with Performance Guarantee
Dasgupta and Shulman showed that a two-round variant of the EM algorithm can
learn mixture of Gaussian distributions with near optimal precision with high
probability if the Gaussian distributions are well separated and if the
dimension is sufficiently high. In this paper, we generalize their theory to
learning mixture of high-dimensional Bernoulli templates. Each template is a
binary vector, and a template generates examples by randomly switching its
binary components independently with a certain probability. In computer vision
applications, a binary vector is a feature map of an image, where each binary
component indicates whether a local feature or structure is present or absent
within a certain cell of the image domain. A Bernoulli template can be
considered as a statistical model for images of objects (or parts of objects)
from the same category. We show that the two-round EM algorithm can learn
mixture of Bernoulli templates with near optimal precision with high
probability, if the Bernoulli templates are sufficiently different and if the
number of features is sufficiently high. We illustrate the theoretical results
by synthetic and real examples.Comment: 27 pages, 8 figure
An investigation into the performance and representation of a stochastic evolutionary neural tree
Copyright Springer.The Stochastic Competitive Evolutionary Neural Tree (SCENT) is a new unsupervised neural net that dynamically evolves a representational structure in response to its training data. Uniquely SCENT requires no initial parameter setting as it autonomously creates appropriate parameterisation at runtime. Pruning and convergence are stochastically controlled using locally calculated heuristics. A thorough investigation into the performance of SCENT is presented. The network is compared to other dynamic tree based models and to a high quality flat clusterer over a variety of data sets and runs
Interpretable Clustering using Unsupervised Binary Trees
We herein introduce a new method of interpretable clustering that uses
unsupervised binary trees. It is a three-stage procedure, the first stage of
which entails a series of recursive binary splits to reduce the heterogeneity
of the data within the new subsamples. During the second stage (pruning),
consideration is given to whether adjacent nodes can be aggregated. Finally,
during the third stage (joining), similar clusters are joined together, even if
they do not descend from the same node originally. Consistency results are
obtained, and the procedure is used on simulated and real data sets.Comment: 25 pages, 6 figure
Survey of data mining approaches to user modeling for adaptive hypermedia
The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages
- …