2,809 research outputs found

    Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze

    Full text link
    Unsupervised segmentation of action segments in egocentric videos is a desirable feature in tasks such as activity recognition and content-based video retrieval. Reducing the search space into a finite set of action segments facilitates a faster and less noisy matching. However, there exist a substantial gap in machine understanding of natural temporal cuts during a continuous human activity. This work reports on a novel gaze-based approach for segmenting action segments in videos captured using an egocentric camera. Gaze is used to locate the region-of-interest inside a frame. By tracking two simple motion-based parameters inside successive regions-of-interest, we discover a finite set of temporal cuts. We present several results using combinations (of the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains egocentric videos depicting several daily-living activities. The quality of the temporal cuts is further improved by implementing two entropy measures.Comment: To appear in 2017 IEEE International Conference On Signal and Image Processing Application

    Referential precedents in spoken language comprehension: a review and meta-analysis

    Get PDF
    Listeners’ interpretations of referring expressions are influenced by referential precedents—temporary conventions established in a discourse that associate linguistic expressions with referents. A number of psycholinguistic studies have investigated how much precedent effects depend on beliefs about the speaker’s perspective versus more egocentric, domain-general processes. We review and provide a meta-analysis of visual-world eyetracking studies of precedent use, focusing on three principal effects: (1) a same speaker advantage for maintained precedents; (2) a different speaker advantage for broken precedents; and (3) an overall main effect of precedents. Despite inconsistent claims in the literature, our combined analysis reveals surprisingly consistent evidence supporting the existence of all three effects, but with different temporal profiles. These findings carry important implications for existing theoretical explanations of precedent use, and challenge explanations based solely on the use of information about speakers’ perspectives

    EGO-TOPO: Environment Affordances from Egocentric Video

    Full text link
    First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a human-centric model of a physical space (such as a kitchen) that captures (1) the primary spatial zones of interaction and (2) the likely activities they support. Our approach decomposes a space into a topological map derived from first-person activity, organizing an ego-video into a series of visits to the different zones. Further, we show how to link zones across multiple related environments (e.g., from videos of multiple kitchens) to obtain a consolidated representation of environment functionality. On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video.Comment: Published in CVPR 2020, project page: http://vision.cs.utexas.edu/projects/ego-topo

    Challenges for identifying the neural mechanisms that support spatial navigation: the impact of spatial scale.

    Get PDF
    Spatial navigation is a fascinating behavior that is essential for our everyday lives. It involves nearly all sensory systems, it requires numerous parallel computations, and it engages multiple memory systems. One of the key problems in this field pertains to the question of reference frames: spatial information such as direction or distance can be coded egocentrically-relative to an observer-or allocentrically-in a reference frame independent of the observer. While many studies have associated striatal and parietal circuits with egocentric coding and entorhinal/hippocampal circuits with allocentric coding, this strict dissociation is not in line with a growing body of experimental data. In this review, we discuss some of the problems that can arise when studying the neural mechanisms that are presumed to support different spatial reference frames. We argue that the scale of space in which a navigation task takes place plays a crucial role in determining the processes that are being recruited. This has important implications, particularly for the inferences that can be made from animal studies in small scale space about the neural mechanisms supporting human spatial navigation in large (environmental) spaces. Furthermore, we argue that many of the commonly used tasks to study spatial navigation and the underlying neuronal mechanisms involve different types of reference frames, which can complicate the interpretation of neurophysiological data

    Graph learning in robotics: a survey

    Full text link
    Deep neural networks for graphs have emerged as a powerful tool for learning on complex non-euclidean data, which is becoming increasingly common for a variety of different applications. Yet, although their potential has been widely recognised in the machine learning community, graph learning is largely unexplored for downstream tasks such as robotics applications. To fully unlock their potential, hence, we propose a review of graph neural architectures from a robotics perspective. The paper covers the fundamentals of graph-based models, including their architecture, training procedures, and applications. It also discusses recent advancements and challenges that arise in applied settings, related for example to the integration of perception, decision-making, and control. Finally, the paper provides an extensive review of various robotic applications that benefit from learning on graph structures, such as bodies and contacts modelling, robotic manipulation, action recognition, fleet motion planning, and many more. This survey aims to provide readers with a thorough understanding of the capabilities and limitations of graph neural architectures in robotics, and to highlight potential avenues for future research
    corecore