30,673 research outputs found
Interactive retrieval of video using pre-computed shot-shot similarities
A probabilistic framework for content-based interactive video retrieval is described. The developed indexing of video fragments originates from the probability of the user's positive judgment about key-frames of video shots. Initial estimates of the probabilities are obtained from low-level feature representation. Only statistically significant estimates are picked out, the rest are replaced by an appropriate constant allowing efficient access at search time without loss of search quality and leading to improvement in most experiments. With time, these probability estimates are updated from the relevance judgment of users performing searches, resulting in further substantial increases in mean average precision
Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach
Feature selection is playing an increasingly significant role with respect to
many computer vision applications spanning from object recognition to visual
object tracking. However, most of the recent solutions in feature selection are
not robust across different and heterogeneous set of data. In this paper, we
address this issue proposing a robust probabilistic latent graph-based feature
selection algorithm that performs the ranking step while considering all the
possible subsets of features, as paths on a graph, bypassing the combinatorial
problem analytically. An appealing characteristic of the approach is that it
aims to discover an abstraction behind low-level sensory data, that is,
relevancy. Relevancy is modelled as a latent variable in a PLSA-inspired
generative process that allows the investigation of the importance of a feature
when injected into an arbitrary set of cues. The proposed method has been
tested on ten diverse benchmarks, and compared against eleven state of the art
feature selection methods. Results show that the proposed approach attains the
highest performance levels across many different scenarios and difficulties,
thereby confirming its strong robustness while setting a new state of the art
in feature selection domain.Comment: Accepted at the IEEE International Conference on Computer Vision
(ICCV), 2017, Venice. Preprint cop
Probabilistic Search for Object Segmentation and Recognition
The problem of searching for a model-based scene interpretation is analyzed within a probabilistic framework. Object models are formulated as generative models for range data of the scene. A new statistical criterion, the truncated object probability, is introduced to infer an optimal sequence of object hypotheses to be evaluated for their match to the data. The truncated probability is partly determined by prior knowledge of the objects and partly learned from data. Some experiments on sequence quality and object segmentation and recognition from stereo data are presented. The article recovers classic concepts from object recognition (grouping, geometric hashing, alignment) from the probabilistic perspective and adds insight into the optimal ordering of object hypotheses for evaluation. Moreover, it introduces point-relation densities, a key component of the truncated probability, as statistical models of local surface shape
Probabilistic Search for Object Segmentation and Recognition
The problem of searching for a model-based scene interpretation is analyzed
within a probabilistic framework. Object models are formulated as generative
models for range data of the scene. A new statistical criterion, the truncated
object probability, is introduced to infer an optimal sequence of object
hypotheses to be evaluated for their match to the data. The truncated
probability is partly determined by prior knowledge of the objects and partly
learned from data. Some experiments on sequence quality and object segmentation
and recognition from stereo data are presented. The article recovers classic
concepts from object recognition (grouping, geometric hashing, alignment) from
the probabilistic perspective and adds insight into the optimal ordering of
object hypotheses for evaluation. Moreover, it introduces point-relation
densities, a key component of the truncated probability, as statistical models
of local surface shape.Comment: 18 pages, 5 figure
Measuring concept similarities in multimedia ontologies: analysis and evaluations
The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing
Using Variable Dwell Time to Accelerate Gaze-Based Web Browsing with Two-Step Selection
In order to avoid the "Midas Touch" problem, gaze-based interfaces for
selection often introduce a dwell time: a fixed amount of time the user must
fixate upon an object before it is selected. Past interfaces have used a
uniform dwell time across all objects. Here, we propose a gaze-based browser
using a two-step selection policy with variable dwell time. In the first step,
a command, e.g. "back" or "select", is chosen from a menu using a dwell time
that is constant across the different commands. In the second step, if the
"select" command is chosen, the user selects a hyperlink using a dwell time
that varies between different hyperlinks. We assign shorter dwell times to more
likely hyperlinks and longer dwell times to less likely hyperlinks. In order to
infer the likelihood each hyperlink will be selected, we have developed a
probabilistic model of natural gaze behavior while surfing the web. We have
evaluated a number of heuristic and probabilistic methods for varying the dwell
times using both simulation and experiment. Our results demonstrate that
varying dwell time improves the user experience in comparison with fixed dwell
time, resulting in fewer errors and increased speed. While all of the methods
for varying dwell time resulted in improved performance, the probabilistic
models yielded much greater gains than the simple heuristics. The best
performing model reduces error rate by 50% compared to 100ms uniform dwell time
while maintaining a similar response time. It reduces response time by 60%
compared to 300ms uniform dwell time while maintaining a similar error rate.Comment: This is an Accepted Manuscript of an article published by Taylor &
Francis in the International Journal of Human-Computer Interaction on 30
March, 2018, available online:
http://www.tandfonline.com/10.1080/10447318.2018.1452351 . For an eprint of
the final published article, please access:
https://www.tandfonline.com/eprint/T9d4cNwwRUqXPPiZYm8Z/ful
Similarity-Based Models of Word Cooccurrence Probabilities
In many applications of natural language processing (NLP) it is necessary to
determine the likelihood of a given word combination. For example, a speech
recognizer may need to determine which of the two word combinations ``eat a
peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine
the likelihood of a word combination from its frequency in a training corpus.
However, the nature of language is such that many word combinations are
infrequent and do not occur in any given corpus. In this work we propose a
method for estimating the probability of such previously unseen word
combinations using available information on ``most similar'' words.
We describe probabilistic word association models based on distributional
word similarity, and apply them to two tasks, language modeling and pseudo-word
disambiguation. In the language modeling task, a similarity-based model is used
to improve probability estimates for unseen bigrams in a back-off language
model. The similarity-based method yields a 20% perplexity improvement in the
prediction of unseen bigrams and statistically significant reductions in
speech-recognition error.
We also compare four similarity-based estimation methods against back-off and
maximum-likelihood estimation methods on a pseudo-word sense disambiguation
task in which we controlled for both unigram and bigram frequency to avoid
giving too much weight to easy-to-disambiguate high-frequency configurations.
The similarity-based methods perform up to 40% better on this particular task.Comment: 26 pages, 5 figure
Understanding of Object Manipulation Actions Using Human Multi-Modal Sensory Data
Object manipulation actions represent an important share of the Activities of
Daily Living (ADLs). In this work, we study how to enable service robots to use
human multi-modal data to understand object manipulation actions, and how they
can recognize such actions when humans perform them during human-robot
collaboration tasks. The multi-modal data in this study consists of videos,
hand motion data, applied forces as represented by the pressure patterns on the
hand, and measurements of the bending of the fingers, collected as human
subjects performed manipulation actions. We investigate two different
approaches. In the first one, we show that multi-modal signal (motion, finger
bending and hand pressure) generated by the action can be decomposed into a set
of primitives that can be seen as its building blocks. These primitives are
used to define 24 multi-modal primitive features. The primitive features can in
turn be used as an abstract representation of the multi-modal signal and
employed for action recognition. In the latter approach, the visual features
are extracted from the data using a pre-trained image classification deep
convolutional neural network. The visual features are subsequently used to
train the classifier. We also investigate whether adding data from other
modalities produces a statistically significant improvement in the classifier
performance. We show that both approaches produce a comparable performance.
This implies that image-based methods can successfully recognize human actions
during human-robot collaboration. On the other hand, in order to provide
training data for the robot so it can learn how to perform object manipulation
actions, multi-modal data provides a better alternative
Local wavelet features for statistical object classification and localisation
This article presents a system for texture-based
probabilistic classification and localisation of 3D objects in 2D digital images and discusses selected applications. The objects are described by local feature vectors computed using the wavelet transform. In the training phase, object features are statistically modelled as normal density functions. In the recognition phase, a maximisation algorithm compares the learned density functions
with the feature vectors extracted from a real scene and yields the classes and poses of objects found in it. Experiments carried out on a real dataset of over 40000 images demonstrate the robustness of the system in terms of classification and localisation accuracy. Finally, two important application scenarios are discussed, namely classification of museum artefacts and classification of
metallography images
- …