1,797 research outputs found
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Unsupervised routine discovery in egocentric photo-streams
The routine of a person is defined by the occurrence of activities throughout
different days, and can directly affect the person's health. In this work, we
address the recognition of routine related days. To do so, we rely on
egocentric images, which are recorded by a wearable camera and allow to monitor
the life of the user from a first-person view perspective. We propose an
unsupervised model that identifies routine related days, following an outlier
detection approach. We test the proposed framework over a total of 72 days in
the form of photo-streams covering around 2 weeks of the life of 5 different
camera wearers. Our model achieves an average of 76% Accuracy and 68% Weighted
F-Score for all the users. Thus, we show that our framework is able to
recognise routine related days and opens the door to the understanding of the
behaviour of people
Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems
Predicting the future location of vehicles is essential for safety-critical
applications such as advanced driver assistance systems (ADAS) and autonomous
driving. This paper introduces a novel approach to simultaneously predict both
the location and scale of target vehicles in the first-person (egocentric) view
of an ego-vehicle. We present a multi-stream recurrent neural network (RNN)
encoder-decoder model that separately captures both object location and scale
and pixel-level observations for future vehicle localization. We show that
incorporating dense optical flow improves prediction results significantly
since it captures information about motion as well as appearance change. We
also find that explicitly modeling future motion of the ego-vehicle improves
the prediction accuracy, which could be especially beneficial in intelligent
and automated vehicles that have motion planning capability. To evaluate the
performance of our approach, we present a new dataset of first-person videos
collected from a variety of scenarios at road intersections, which are
particularly challenging moments for prediction because vehicle trajectories
are diverse and dynamic.Comment: To appear on ICRA 201
An Outlook into the Future of Egocentric Vision
What will the future be? We wonder! In this survey, we explore the gap
between current research in egocentric vision and the ever-anticipated future,
where wearable computing, with outward facing cameras and digital overlays, is
expected to be integrated in our every day lives. To understand this gap, the
article starts by envisaging the future through character-based stories,
showcasing through examples the limitations of current technology. We then
provide a mapping between this future and previously defined research tasks.
For each task, we survey its seminal works, current state-of-the-art
methodologies and available datasets, then reflect on shortcomings that limit
its applicability to future research. Note that this survey focuses on software
models for egocentric vision, independent of any specific hardware. The paper
concludes with recommendations for areas of immediate explorations so as to
unlock our path to the future always-on, personalised and life-enhancing
egocentric vision.Comment: We invite comments, suggestions and corrections here:
https://openreview.net/forum?id=V3974SUk1
Analysis of the hands in egocentric vision: A survey
Egocentric vision (a.k.a. first-person vision - FPV) applications have
thrived over the past few years, thanks to the availability of affordable
wearable cameras and large annotated datasets. The position of the wearable
camera (usually mounted on the head) allows recording exactly what the camera
wearers have in front of them, in particular hands and manipulated objects.
This intrinsic advantage enables the study of the hands from multiple
perspectives: localizing hands and their parts within the images; understanding
what actions and activities the hands are involved in; and developing
human-computer interfaces that rely on hand gestures. In this survey, we review
the literature that focuses on the hands using egocentric vision, categorizing
the existing approaches into: localization (where are the hands or parts of
them?); interpretation (what are the hands doing?); and application (e.g.,
systems that used egocentric hand cues for solving a specific problem).
Moreover, a list of the most prominent datasets with hand-based annotations is
provided
Multitask Learning to Improve Egocentric Action Recognition
In this work we employ multitask learning to capitalize on the structure that
exists in related supervised tasks to train complex neural networks. It allows
training a network for multiple objectives in parallel, in order to improve
performance on at least one of them by capitalizing on a shared representation
that is developed to accommodate more information than it otherwise would for a
single task. We employ this idea to tackle action recognition in egocentric
videos by introducing additional supervised tasks. We consider learning the
verbs and nouns from which action labels consist of and predict coordinates
that capture the hand locations and the gaze-based visual saliency for all the
frames of the input video segments. This forces the network to explicitly focus
on cues from secondary tasks that it might otherwise have missed resulting in
improved inference. Our experiments on EPIC-Kitchens and EGTEA Gaze+ show
consistent improvements when training with multiple tasks over the single-task
baseline. Furthermore, in EGTEA Gaze+ we outperform the state-of-the-art in
action recognition by 3.84%. Apart from actions, our method produces accurate
hand and gaze estimations as side tasks, without requiring any additional input
at test time other than the RGB video clips.Comment: 10 pages, 3 figures, accepted at the 5th Egocentric Perception,
Interaction and Computing (EPIC) workshop at ICCV 2019, code repository:
https://github.com/georkap/hand_track_classificatio
Understanding First-Person and Third-Person Videos in Computer Vision
Due to advancements in technology and social media, a large amount of visual information is created. There is a lot of interesting research going on in Computer Vision that takes into consideration either visual information generated by first-person (egocentric) or third-person(exocentric) cameras. Video data generated by YouTubers, Surveillance cameras, and Drones which is referred to as third-person or exocentric video data. Whereas first-person or egocentric is the one which is generated by GoPro cameras and Google Glass. Exocentric view capture wide and global views whereas egocentric view capture activities an actor is involved in w.r.t. objects. These two perspectives seem to be independent yet related. In Computer Vision, these two perspectives have been studied by various domains like Activity Recognition, Object Detection, Action Recognition, and Summarization independently. Their relationship and comparison are less discussed in the literature. This paper tries to bridge this gap by presenting a systematic study of first-person and third-person videos. Further, we implemented an algorithm to classify videos as first-person/third-person with the validation accuracy of 88.4% and an F1-score of 86.10% using the Charades dataset.
- …