20,167 research outputs found
Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression
Although fully generative models have been successfully used to model the
contents of text documents, they are often awkward to apply to combinations of
text data and document metadata. In this paper we propose a
Dirichlet-multinomial regression (DMR) topic model that includes a log-linear
prior on document-topic distributions that is a function of observed features
of the document, such as author, publication venue, references, and dates. We
show that by selecting appropriate features, DMR topic models can meet or
exceed the performance of several previously published topic models designed
for specific data.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty
in Artificial Intelligence (UAI2008
Simulated evaluation of faceted browsing based on feature selection
In this paper we explore the limitations of facet based browsing which uses sub-needs of an information need for querying and organising the search process in video retrieval. The underlying assumption of this approach is that the search effectiveness will be enhanced if such an approach is employed for interactive video retrieval using textual and visual features. We explore the performance bounds of a faceted system by carrying out a simulated user evaluation on TRECVid data sets, and also on the logs of a prior user experiment with the system. We first present a methodology to reduce the dimensionality of features by selecting the most important ones. Then, we discuss the simulated evaluation strategies employed in our evaluation and the effect on the use of both textual and visual features. Facets created by users are simulated by clustering video shots using textual and visual features. The experimental results of our study demonstrate that the faceted browser can potentially improve the search effectiveness
A framework for improving the performance of verification algorithms with a low false positive rate requirement and limited training data
In this paper we address the problem of matching patterns in the so-called
verification setting in which a novel, query pattern is verified against a
single training pattern: the decision sought is whether the two match (i.e.
belong to the same class) or not. Unlike previous work which has universally
focused on the development of more discriminative distance functions between
patterns, here we consider the equally important and pervasive task of
selecting a distance threshold which fits a particular operational requirement
- specifically, the target false positive rate (FPR). First, we argue on
theoretical grounds that a data-driven approach is inherently ill-conditioned
when the desired FPR is low, because by the very nature of the challenge only a
small portion of training data affects or is affected by the desired threshold.
This leads us to propose a general, statistical model-based method instead. Our
approach is based on the interpretation of an inter-pattern distance as
implicitly defining a pattern embedding which approximately distributes
patterns according to an isotropic multi-variate normal distribution in some
space. This interpretation is then used to show that the distribution of
training inter-pattern distances is the non-central chi2 distribution,
differently parameterized for each class. Thus, to make the class-specific
threshold choice we propose a novel analysis-by-synthesis iterative algorithm
which estimates the three free parameters of the model (for each class) using
task-specific constraints. The validity of the premises of our work and the
effectiveness of the proposed method are demonstrated by applying the method to
the task of set-based face verification on a large database of pseudo-random
head motion videos.Comment: IEEE/IAPR International Joint Conference on Biometrics, 201
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Image Reconstruction from Bag-of-Visual-Words
The objective of this work is to reconstruct an original image from
Bag-of-Visual-Words (BoVW). Image reconstruction from features can be a means
of identifying the characteristics of features. Additionally, it enables us to
generate novel images via features. Although BoVW is the de facto standard
feature for image recognition and retrieval, successful image reconstruction
from BoVW has not been reported yet. What complicates this task is that BoVW
lacks the spatial information for including visual words. As described in this
paper, to estimate an original arrangement, we propose an evaluation function
that incorporates the naturalness of local adjacency and the global position,
with a method to obtain related parameters using an external image database. To
evaluate the performance of our method, we reconstruct images of objects of 101
kinds. Additionally, we apply our method to analyze object classifiers and to
generate novel images via BoVW
- …