22,181 research outputs found
Linguistically-driven framework for computationally efficient and scalable sign recognition
We introduce a new general framework for sign recognition from monocular video using limited quantities of annotated data. The novelty of the hybrid framework we describe here is that we exploit state-of-the art learning methods while also incorporating features based on what we know about the linguistic composition of lexical signs. In particular, we analyze hand shape, orientation, location, and motion trajectories, and then use CRFs to combine this linguistically significant information for purposes of sign recognition. Our robust modeling and recognition of these sub-components of sign production allow an efficient parameterization of the sign recognition problem as compared with purely data-driven methods. This parameterization enables a scalable and extendable time-series learning approach that advances the state of the art in sign recognition, as shown by the results reported here for recognition of isolated, citation-form, lexical signs from American Sign Language (ASL)
Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories
Attribute-based recognition models, due to their impressive performance and
their ability to generalize well on novel categories, have been widely adopted
for many computer vision applications. However, usually both the attribute
vocabulary and the class-attribute associations have to be provided manually by
domain experts or large number of annotators. This is very costly and not
necessarily optimal regarding recognition performance, and most importantly, it
limits the applicability of attribute-based models to large scale data sets. To
tackle this problem, we propose an end-to-end unsupervised attribute learning
approach. We utilize online text corpora to automatically discover a salient
and discriminative vocabulary that correlates well with the human concept of
semantic attributes. Moreover, we propose a deep convolutional model to
optimize class-attribute associations with a linguistic prior that accounts for
noise and missing data in text. In a thorough evaluation on ImageNet, we
demonstrate that our model is able to efficiently discover and learn semantic
attributes at a large scale. Furthermore, we demonstrate that our model
outperforms the state-of-the-art in zero-shot learning on three data sets:
ImageNet, Animals with Attributes and aPascal/aYahoo. Finally, we enable
attribute-based learning on ImageNet and will share the attributes and
associations for future research.Comment: Accepted as a conference paper at CVPR 201
A Machine learning approach to POS tagging
We have applied inductive learning of statistical decision trees
and relaxation labelling to the Natural Language Processing (NLP)
task of morphosyntactic disambiguation (Part Of Speech Tagging).
The learning process is supervised and obtains a language
model oriented to resolve POS ambiguities. This model consists
of a set of statistical decision trees expressing distribution of
tags and words in some relevant contexts.
The acquired language models are complete enough to be directly
used as sets of POS disambiguation rules, and include more complex
contextual information than simple collections of n-grams usually
used in statistical taggers.
We have implemented a quite simple and fast tagger that has been
tested and evaluated on the Wall Street Journal (WSJ) corpus with
a remarkable accuracy.
However, better results can be obtained by translating the trees
into rules to feed a flexible relaxation labelling based tagger.
In this direction we describe a tagger which is able to use
information of any kind (n-grams, automatically acquired constraints,
linguistically motivated manually written constraints, etc.), and in
particular to incorporate the machine learned decision trees.
Simultaneously, we address the problem of tagging when only
small training material is available, which is crucial in any process
of constructing, from scratch, an annotated corpus. We show that quite
high accuracy can be achieved with our system in this situation.Postprint (published version
- …