Search CORE

20,167 research outputs found

Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression

Author: McCallum Andrew
Mimno David
Publication venue
Publication date: 13/06/2012
Field of study

Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specific data.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Simulated evaluation of faceted browsing based on feature selection

Author: Bernejo Lopez P.
Hopfgartner F.
Jose J.M.
Urruty T.
Villa R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In this paper we explore the limitations of facet based browsing which uses sub-needs of an information need for querying and organising the search process in video retrieval. The underlying assumption of this approach is that the search effectiveness will be enhanced if such an approach is employed for interactive video retrieval using textual and visual features. We explore the performance bounds of a faceted system by carrying out a simulated user evaluation on TRECVid data sets, and also on the logs of a prior user experiment with the system. We first present a methodology to reduce the dimensionality of features by selecting the most important ones. Then, we discuss the simulated evaluation strategies employed in our evaluation and the effect on the use of both textual and visual features. Facets created by users are simulated by clustering video shots using textual and visual features. The experimental results of our study demonstrate that the faceted browser can potentially improve the search effectiveness

Enlighten

A framework for improving the performance of verification algorithms with a low false positive rate requirement and limited training data

Author: Arandjelovic Ognjen
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we address the problem of matching patterns in the so-called verification setting in which a novel, query pattern is verified against a single training pattern: the decision sought is whether the two match (i.e. belong to the same class) or not. Unlike previous work which has universally focused on the development of more discriminative distance functions between patterns, here we consider the equally important and pervasive task of selecting a distance threshold which fits a particular operational requirement - specifically, the target false positive rate (FPR). First, we argue on theoretical grounds that a data-driven approach is inherently ill-conditioned when the desired FPR is low, because by the very nature of the challenge only a small portion of training data affects or is affected by the desired threshold. This leads us to propose a general, statistical model-based method instead. Our approach is based on the interpretation of an inter-pattern distance as implicitly defining a pattern embedding which approximately distributes patterns according to an isotropic multi-variate normal distribution in some space. This interpretation is then used to show that the distribution of training inter-pattern distances is the non-central chi2 distribution, differently parameterized for each class. Thus, to make the class-specific threshold choice we propose a novel analysis-by-synthesis iterative algorithm which estimates the three free parameters of the model (for each class) using task-specific constraints. The validity of the premises of our work and the effectiveness of the proposed method are demonstrated by applying the method to the task of set-based face verification on a large database of pseudo-random head motion videos.Comment: IEEE/IAPR International Joint Conference on Biometrics, 201

arXiv.org e-Print Archive

Deakin Research Online

Crossref

University of St. Andrews - Pure

Recommended from our members

A niching memetic algorithm for simultaneous clustering and feature selection

Author: Fairhurst M
Liu X
Sheng W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2008
Field of study

Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data

Brunel University Research Archive

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Crossref

Access to Research at National University of Ireland, Galway

Image Reconstruction from Bag-of-Visual-Words

Author: Harada Tatsuya
Kato Hiroharu
Publication venue
Publication date: 19/05/2015
Field of study

The objective of this work is to reconstruct an original image from Bag-of-Visual-Words (BoVW). Image reconstruction from features can be a means of identifying the characteristics of features. Additionally, it enables us to generate novel images via features. Although BoVW is the de facto standard feature for image recognition and retrieval, successful image reconstruction from BoVW has not been reported yet. What complicates this task is that BoVW lacks the spatial information for including visual words. As described in this paper, to estimate an original arrangement, we propose an evaluation function that incorporates the naturalness of local adjacency and the global position, with a method to obtain related parameters using an external image database. To evaluate the performance of our method, we reconstruct images of objects of 101 kinds. Additionally, we apply our method to analyze object classifiers and to generate novel images via BoVW

arXiv.org e-Print Archive

Crossref