2,809 research outputs found
Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze
Unsupervised segmentation of action segments in egocentric videos is a
desirable feature in tasks such as activity recognition and content-based video
retrieval. Reducing the search space into a finite set of action segments
facilitates a faster and less noisy matching. However, there exist a
substantial gap in machine understanding of natural temporal cuts during a
continuous human activity. This work reports on a novel gaze-based approach for
segmenting action segments in videos captured using an egocentric camera. Gaze
is used to locate the region-of-interest inside a frame. By tracking two simple
motion-based parameters inside successive regions-of-interest, we discover a
finite set of temporal cuts. We present several results using combinations (of
the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains
egocentric videos depicting several daily-living activities. The quality of the
temporal cuts is further improved by implementing two entropy measures.Comment: To appear in 2017 IEEE International Conference On Signal and Image
Processing Application
Recommended from our members
An investigation of visual cues used to create and support frames of reference and visual search tasks in desktop virtual environments
Visual depth cues are combined to produce the essential depth and dimensionality of Desktop Virtual Environments (DVEs). This study discusses DVEs in terms of the visual depth cues that create and support perception of frames of references and accomplishment of visual search tasks. This paper presents the results of an investigation that identifies the effects of the experimental stimuli positions and visual depth cues: luminance, texture, relative height and motion parallax on precise depth judgements made within a DVE. Results indicate that the experimental stimuli positions significantly affect precise depth judgements, texture is only significantly effective for certain conditions, and motion parallax, in line with previous results, is inconclusive to determine depth judgement accuracy for egocentrically viewed DVEs. Results also show that exocentric views, incorporating relative height and motion parallax visual cues, are effective for precise depth judgements made in DVEs. The results help us to understand the effects of certain visual depth cues to support the perception of frames of references and precise depth judgements, suggesting that the visual depth cues employed to create frames of references in DVEs may influence how effectively precise depth judgements are undertaken
Referential precedents in spoken language comprehension: a review and meta-analysis
Listeners’ interpretations of referring expressions are influenced by referential
precedents—temporary conventions established in a discourse that associate linguistic
expressions with referents. A number of psycholinguistic studies have investigated how
much precedent effects depend on beliefs about the speaker’s perspective versus more
egocentric, domain-general processes. We review and provide a meta-analysis of
visual-world eyetracking studies of precedent use, focusing on three principal effects: (1) a
same speaker advantage for maintained precedents; (2) a different speaker advantage for
broken precedents; and (3) an overall main effect of precedents. Despite inconsistent claims
in the literature, our combined analysis reveals surprisingly consistent evidence supporting
the existence of all three effects, but with different temporal profiles. These findings carry
important implications for existing theoretical explanations of precedent use, and challenge
explanations based solely on the use of information about speakers’ perspectives
EGO-TOPO: Environment Affordances from Egocentric Video
First-person video naturally brings the use of a physical environment to the
forefront, since it shows the camera wearer interacting fluidly in a space
based on his intentions. However, current methods largely separate the observed
actions from the persistent space itself. We introduce a model for environment
affordances that is learned directly from egocentric video. The main idea is to
gain a human-centric model of a physical space (such as a kitchen) that
captures (1) the primary spatial zones of interaction and (2) the likely
activities they support. Our approach decomposes a space into a topological map
derived from first-person activity, organizing an ego-video into a series of
visits to the different zones. Further, we show how to link zones across
multiple related environments (e.g., from videos of multiple kitchens) to
obtain a consolidated representation of environment functionality. On
EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene
affordances and anticipating future actions in long-form video.Comment: Published in CVPR 2020, project page:
http://vision.cs.utexas.edu/projects/ego-topo
Challenges for identifying the neural mechanisms that support spatial navigation: the impact of spatial scale.
Spatial navigation is a fascinating behavior that is essential for our everyday lives. It involves nearly all sensory systems, it requires numerous parallel computations, and it engages multiple memory systems. One of the key problems in this field pertains to the question of reference frames: spatial information such as direction or distance can be coded egocentrically-relative to an observer-or allocentrically-in a reference frame independent of the observer. While many studies have associated striatal and parietal circuits with egocentric coding and entorhinal/hippocampal circuits with allocentric coding, this strict dissociation is not in line with a growing body of experimental data. In this review, we discuss some of the problems that can arise when studying the neural mechanisms that are presumed to support different spatial reference frames. We argue that the scale of space in which a navigation task takes place plays a crucial role in determining the processes that are being recruited. This has important implications, particularly for the inferences that can be made from animal studies in small scale space about the neural mechanisms supporting human spatial navigation in large (environmental) spaces. Furthermore, we argue that many of the commonly used tasks to study spatial navigation and the underlying neuronal mechanisms involve different types of reference frames, which can complicate the interpretation of neurophysiological data
Graph learning in robotics: a survey
Deep neural networks for graphs have emerged as a powerful tool for learning
on complex non-euclidean data, which is becoming increasingly common for a
variety of different applications. Yet, although their potential has been
widely recognised in the machine learning community, graph learning is largely
unexplored for downstream tasks such as robotics applications. To fully unlock
their potential, hence, we propose a review of graph neural architectures from
a robotics perspective. The paper covers the fundamentals of graph-based
models, including their architecture, training procedures, and applications. It
also discusses recent advancements and challenges that arise in applied
settings, related for example to the integration of perception,
decision-making, and control. Finally, the paper provides an extensive review
of various robotic applications that benefit from learning on graph structures,
such as bodies and contacts modelling, robotic manipulation, action
recognition, fleet motion planning, and many more. This survey aims to provide
readers with a thorough understanding of the capabilities and limitations of
graph neural architectures in robotics, and to highlight potential avenues for
future research
- …