25 research outputs found
Constructing Category Hierarchies for Visual Recognition
International audienceClass hierarchies are commonly used to reduce the complexity of the classification problem. This is crucial in situations when one has to deal with multiple categories. In this work, we evaluate the suitability of class hierarchies currently constructed for visual recognition. We show that top-down as well as bottom-up approaches that are commonly used to automatically construct hierarchies, incorporate assumptions about separability of classes that cannot be fulfilled in the case of visual recognition of a large number of object categories. We propose a modification which is appropriate for most top-down approaches. It allows to construct better class hierarchies that postpone decisions in the presence of uncertainty and thus provide higher recognition accuracy. We also compare our method to flat one-against-all approach and show how to control the speed-for-accuracy trade-off by using our method. For the experimental evaluation, we use the Caltech-256 visual object classes dataset and compare to the state-of-the-art
Unsupervised Spoken Term Detection with Spoken Queries by Multi-level Acoustic Patterns with Varying Model Granularity
This paper presents a new approach for unsupervised Spoken Term Detection
with spoken queries using multiple sets of acoustic patterns automatically
discovered from the target corpus. The different pattern HMM
configurations(number of states per model, number of distinct models, number of
Gaussians per state)form a three-dimensional model granularity space. Different
sets of acoustic patterns automatically discovered on different points properly
distributed over this three-dimensional space are complementary to one another,
thus can jointly capture the characteristics of the spoken terms. By
representing the spoken content and spoken query as sequences of acoustic
patterns, a series of approaches for matching the pattern index sequences while
considering the signal variations are developed. In this way, not only the
on-line computation load can be reduced, but the signal distributions caused by
different speakers and acoustic conditions can be reasonably taken care of. The
results indicate that this approach significantly outperformed the unsupervised
feature-based DTW baseline by 16.16\% in mean average precision on the TIMIT
corpus.Comment: Accepted by ICASSP 201
Indexing ensembles of exemplar-SVMs with rejecting taxonomies
Ensembles of Exemplar-SVMs have been used for a wide variety of tasks, such as object detection, segmentation, label transfer and mid-level feature learning. In order to make this technique effective though a large collection of classifiers is needed, which often makes the evaluation phase prohibitive. To overcome this issue we exploit the joint distribution of exemplar classifier scores to build a taxonomy capable of indexing each Exemplar-SVM and enabling a fast evaluation of the whole ensemble. We experiment with the Pascal 2007 benchmark on the task of object detection and on a simple segmentation task, in order to verify the robustness of our indexing data structure with reference to the standard Ensemble. We also introduce a rejection strategy to discard not relevant image patches for a more efficient access to the data
Hierarchical Cascade of Classifiers for Efficient Poselet Evaluation
Poselets have been used in a variety of computer vision tasks, such as detection, segmentation, action classification, pose estimation and action recognition, often achieving state-of-the-art performance. Poselet evaluation, however, is computationally intensive as it involves running thousands of scanning window classifiers. We present an algorithm for training a hierarchical cascade of part-based detectors and apply it to speed up poselet evaluation. Our cascade hierarchy leverages common components shared across poselets. We generate a family of cascade hierarchies, including trees that grow logarithmically on the number of poselet classifiers. Our algorithm, under some reasonable assumptions, finds the optimal tree structure that maximizes speed for a given target detection rate. We test our system on the PASCAL dataset and show an order of magnitude speedup at less than 1% loss in AP