Search CORE

94,204 research outputs found

Fast Fight Detection

Author: Bueno Garcia G
Deniz Suarez O
Kim T-K
Serrano Gracia I
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/01/2015
Field of study

Action recognition has become a hot topic within computer vision. However, the action recognition community has focused mainly on relatively simple actions like clapping, walking, jogging, etc. The detection of specific events with direct practical use such as fights or in general aggressive behavior has been comparatively less studied. Such capability may be extremely useful in some video surveillance scenarios like prisons, psychiatric centers or even embedded in camera phones. As a consequence, there is growing interest in developing violence detection algorithms. Recent work considered the well-known Bag-of-Words framework for the specific problem of fight detection. Under this framework, spatio-temporal features are extracted from the video sequences and used for classification. Despite encouraging results in which high accuracy rates were achieved, the computational cost of extracting such features is prohibitive for practical applications. This work proposes a novel method to detect violence sequences. Features extracted from motion blobs are used to discriminate fight and non-fight sequences. Although the method is outperformed in accuracy by state of the art, it has a significantly faster computation time thus making it amenable for real-time applications

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

A Dataset for Movie Description

Author: Rohrbach Anna
Rohrbach Marcus
Schiele Bernt
Tandon Niket
Publication venue
Publication date: 01/01/2015
Field of study

Descriptive video service (DVS) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed DVS, which is temporally aligned to full length HD movies. In addition we also collected the aligned movie scripts which have been used in prior work and compare the two different sources of descriptions. In total the Movie Description dataset contains a parallel corpus of over 54,000 sentences and video snippets from 72 HD movies. We characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing DVS to scripts, we find that DVS is far more visual and describes precisely what is shown rather than what should happen according to the scripts created prior to movie production

arXiv.org e-Print Archive

CiteSeerX

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Intimate interfaces in action: assessing the usability and subtlety of emg-based motionless gestures

Author: Allen Rebecca
Costanza Enrico
Inverso Samuel A.
Maes Pattie
Publication venue
Publication date: 01/01/2007
Field of study

Mobile communication devices, such as mobile phones and networked personal digital assistants (PDAs), allow users to be constantly connected and communicate anywhere and at any time, often resulting in personal and private communication taking place in public spaces. This private -- public contrast can be problematic. As a remedy, we promote intimate interfaces: interfaces that allow subtle and minimal mobile interaction, without disruption of the surrounding environment. In particular, motionless gestures sensed through the electromyographic (EMG) signal have been proposed as a solution to allow subtle input in a mobile context. In this paper we present an expansion of the work on EMG-based motionless gestures including (1) a novel study of their usability in a mobile context for controlling a realistic, multimodal interface and (2) a formal assessment of how noticeable they are to informed observers. Experimental results confirm that subtle gestures can be profitably used within a multimodal interface and that it is difficult for observers to guess when someone is performing a gesture, confirming the hypothesis of subtlety

Southampton (e-Prints Soton)

UCL Discovery

Multimodal Visual Concept Learning with Weakly Supervised Techniques

Author: Bouritsas Giorgos
Koutras Petros
Maragos Petros
Zlatintsi Athanasia
Publication venue
Publication date: 04/04/2018
Field of study

Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description's semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref