5,879 research outputs found
Object Referring in Videos with Language and Human Gaze
We investigate the problem of object referring (OR) i.e. to localize a target
object in a visual scene coming with a language description. Humans perceive
the world more as continued video snippets than as static images, and describe
objects not only by their appearance, but also by their spatio-temporal context
and motion features. Humans also gaze at the object when they issue a referring
expression. Existing works for OR mostly focus on static images only, which
fall short in providing many such cues. This paper addresses OR in videos with
language and human gaze. To that end, we present a new video dataset for OR,
with 30, 000 objects over 5, 000 stereo video sequences annotated for their
descriptions and gaze. We further propose a novel network model for OR in
videos, by integrating appearance, motion, gaze, and spatio-temporal context
into one network. Experimental results show that our method effectively
utilizes motion cues, human gaze, and spatio-temporal context. Our method
outperforms previousOR methods. For dataset and code, please refer
https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure
Analysis of prepositions: near and away from Frames of reference.
XXII Jornades de Foment de la Investigació de la Facultat de Ciències Humanes i Socials (Any 2017)Traditional strategies and procedures to learn a foreign language
include the study of rules of grammar and doing exercises such as
filling the gaps, repetition of words, drills, memorization of irregular
verbs and sentences which may express usual expressions of
everyday life. Even if the array of exercises is adequate, polysemy in
prepositions causes difficulties in choosing the proper preposition
conveying the meaning required by different contexts.
Two prepositions of the horizontal axis (near and away from) are
taken into consideration in this paper. Approaching the problem
from the theory of polysemy and understanding, the use of these
prepositions is explored along the dimensions of function, topology –
which is the study of physical space–, and force dynamics –
introduced in studies such as Navarro (1998)–, as well as the notion
of frame of reference (Levinson, 2004). Then, the different senses
and uses of these prepositions of the horizontal axis are
systematized, explained and examples are used to illustrate the
difficulties in learning a language and the doubts which students may
have in some situations
Facial Feature Tracking and Occlusion Recovery in American Sign Language
Facial features play an important role in expressing grammatical information in signed languages, including American Sign Language(ASL). Gestures such as raising or furrowing the eyebrows are key indicators of constructions such as yes-no questions. Periodic head movements (nods and shakes) are also an essential part of the expression of syntactic information, such as negation (associated with a side-to-side headshake). Therefore, identification of these facial gestures is essential to sign language recognition. One problem with detection of such grammatical indicators is occlusion recovery. If the signer's hand blocks his/her eyebrows during production of a sign, it becomes difficult to track the eyebrows. We have developed a system to detect such grammatical markers in ASL that recovers promptly from occlusion. Our system detects and tracks evolving templates of facial features, which are based on an anthropometric face model, and interprets the geometric relationships of these templates to identify grammatical markers. It was tested on a variety of ASL sentences signed by various Deaf native signers and detected facial gestures used to express grammatical information, such as raised and furrowed eyebrows as well as headshakes.National Science Foundation (IIS-0329009, IIS-0093367, IIS-9912573, EIA-0202067, EIA-9809340
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
- …