Search CORE

352 research outputs found

The Evolution of First Person Vision Methods: A Survey

Author: Betancourt Alejandro
Morerio Pietro
Rauterberg Matthias
Regazzoni Carlo S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

arXiv.org e-Print Archive

CiteSeerX

Pure OAI Repository

Archivio istituzionale della ricerca - Università di Genova

MoSculp: Interactive Visualization of Shape and Time

Author: Dekel Tali
Freeman William T.
He Qiurui
Mueller Stefanie
Owens Andrew
Wu Jiajun
Xue Tianfan
Zhang Xiuming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

We present a system that allows users to visualize complex human motion via 3D motion sculptures---a representation that conveys the 3D structure swept by a human body as it moves through space. Given an input video, our system computes the motion sculptures and provides a user interface for rendering it in different styles, including the options to insert the sculpture back into the original video, render it in a synthetic scene or physically print it. To provide this end-to-end workflow, we introduce an algorithm that estimates that human's 3D geometry over time from a set of 2D images and develop a 3D-aware image-based rendering approach that embeds the sculpture back into the scene. By automating the process, our system takes motion sculpture creation out of the realm of professional artists, and makes it applicable to a wide range of existing video material. By providing viewers with 3D information, motion sculptures reveal space-time motion information that is difficult to perceive with the naked eye, and allow viewers to interpret how different parts of the object interact over time. We validate the effectiveness of this approach with user studies, finding that our motion sculpture visualizations are significantly more informative about motion than existing stroboscopic and space-time visualization methods.Comment: UIST 2018. Project page: http://mosculp.csail.mit.edu

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Deep Learning for Semantic Video Understanding

Author: Kulhare Sourabh
Publication venue: RIT Scholar Works
Publication date: 01/05/2017
Field of study

The field of computer vision has long strived to extract understanding from images and videos sequences. The recent flood of video data along with massive increments in computing power have provided the perfect environment to generate advanced research to extract intelligence from video data. Video data is ubiquitous, occurring in numerous everyday activities such as surveillance, traffic, movies, sports, etc. This massive amount of video needs to be analyzed and processed efficiently to extract semantic features towards video understanding. Such capabilities could benefit surveillance, video analytics and visually challenged people. While watching a long video, humans have the uncanny ability to bypass unnecessary information and concentrate on the important events. These key events can be used as a higher-level description or summary of a long video. Inspired by the human visual cortex, this research affords such abilities in computers using neural networks. Useful or interesting events are first extracted from a video and then deep learning methodologies are used to extract natural language summaries for each video sequence. Previous approaches of video description either have been domain specific or use a template based approach to fill detected objects such as verbs or actions to constitute a grammatically correct sentence. This work involves exploiting temporal contextual information for sentence generation while working on wide domain datasets. Current state-of- the-art video description methodologies are well suited for small video clips whereas this research can also be applied to long sequences of video. This work proposes methods to generate visual summaries of long videos, and in addition proposes techniques to annotate and generate textual summaries of the videos using recurrent networks. End to end video summarization immensely depends on abstractive summarization of video descriptions. State-of- the-art neural language & attention joint models have been used to generate textual summaries. Interesting segments of long video are extracted based on image quality as well as cinematographic and consumer preference. This novel approach will be a stepping stone for a variety of innovative applications such as video retrieval, automatic summarization for visually impaired persons, automatic movie review generation, video question and answering systems

RIT Scholar Works

SoccerDB: A Large-Scale Database for Comprehensive Video Understanding

Author: Chen Leilei
Cui Kaixu
Jiang Yudong
Wang Canjin
Xu Changliang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/09/2020
Field of study

Soccer videos can serve as a perfect research object for video understanding because soccer games are played under well-defined rules while complex and intriguing enough for researchers to study. In this paper, we propose a new soccer video database named SoccerDB, comprising 171,191 video segments from 346 high-quality soccer games. The database contains 702,096 bounding boxes, 37,709 essential event labels with time boundary and 17,115 highlight annotations for object detection, action recognition, temporal action localization, and highlight detection tasks. To our knowledge, it is the largest database for comprehensive sports video understanding on various aspects. We further survey a collection of strong baselines on SoccerDB, which have demonstrated state-of-the-art performances on independent tasks. Our evaluation suggests that we can benefit significantly when jointly considering the inner correlations among those tasks. We believe the release of SoccerDB will tremendously advance researches around comprehensive video understanding. {\itshape Our dataset and code published on https://github.com/newsdata/SoccerDB.}Comment: accepted by MM2020 sports worksho

arXiv.org e-Print Archive

Crossref

A survey of video based action recognition in sports

Author: As'ari Muhammad Amir
Ghazali Nurul Fathiah
Rahmad Nur Azmina
Shahar Norazman
Sufri Nur Anis Jasmin
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/09/2018
Field of study

Sport performance analysis which is crucial in sport practice is used to improve the performance of athletes during the games. Many studies and investigation have been done in detecting different movements of player for notational analysis using either sensor based or video based modality. Recently, vision based modality has become the research interest due to the vast development of video transmission online. There are tremendous experimental studies have been done using vision based modality in sport but only a few review study has been done previously. Hence, we provide a review study on the video based technique to recognize sport action toward establishing the automated notational analysis system. The paper will be organized into four parts. Firstly, we provide an overview of the current existing technologies of the video based sports intelligence systems. Secondly, we review the framework of action recognition in all fields before we further discuss the implementation of deep learning in vision based modality for sport actions. Finally, the paper summarizes the further trend and research direction in action recognition for sports using video approach. We believed that this review study would be very beneficial in providing a complete overview on video based action recognition in sports

Universiti Teknologi Malaysia Institutional Repository

Audiovisual processing for sports-video summarisation technology

Author: Sadlier David A.
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

In this thesis a novel audiovisual feature-based scheme is proposed for the automatic summarization of sports-video content The scope of operability of the scheme is designed to encompass the wide variety o f sports genres that come under the description ‘field-sports’. Given the assumption that, in terms of conveying the narrative of a field-sports-video, score-update events constitute the most significant moments, it is proposed that their detection should thus yield a favourable summarisation solution. To this end, a generic methodology is proposed for the automatic identification of score-update events in field-sports-video content. The scheme is based on the development of robust extractors for a set of critical features, which are shown to reliably indicate their locations. The evidence gathered by the feature extractors is combined and analysed using a Support Vector Machine (SVM), which performs the event detection process. An SVM is chosen on the basis that its underlying technology represents an implementation of the latest generation of machine learning algorithms, based on the recent advances in statistical learning. Effectively, an SVM offers a solution to optimising the classification performance of a decision hypothesis, inferred from a given set of training data. Via a learning phase that utilizes a 90-hour field-sports-video trainmg-corpus, the SVM infers a score-update event model by observing patterns in the extracted feature evidence. Using a similar but distinct 90-hour evaluation corpus, the effectiveness of this model is then tested genencally across multiple genres of fieldsports- video including soccer, rugby, field hockey, hurling, and Gaelic football. The results suggest that in terms o f the summarization task, both high event retrieval and content rejection statistics are achievable

Irish Universities

DCU Online Research Access Service

An Outlook into the Future of Egocentric Vision

Author: Bansal Siddhant
Damen Dima
Farinella Giovanni Maria
Furnari Antonino
Goletto Gabriele
Plizzari Chiara
Ragusa Francesco
Tommasi Tatiana
Publication venue
Publication date: 14/08/2023
Field of study

What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.Comment: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1

arXiv.org e-Print Archive

Towards a unified framework for hand-based methods in First Person Vision

Author: Barakova Emilia
Betancourt Alejandro Arango
Marcenaro Lucio
Morerio Pietro
Rauterberg Matthias
Regazzoni Carlo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

First Person Vision (Egocentric) video analysis stands nowadays as one of the emerging fields in computer vision. The availability of wearable devices recording exactly what the user is looking at is ineluctable and the opportunities and challenges carried by this kind of devices are broad. Particularly, for the first time a device is so intimate with the user to be able to record the movements of his hands, making hand-based applications for First Person Vision one the most explored area in the field. This paper explores the more popular processing steps to develop hand-based applications, and proposes a hierarchical structure that optimally switches between each of the levels to reduce the computational cost of the system and improve its performance

Repository TU/e

Pure OAI Repository

Archivio istituzionale della ricerca - Università di Genova