Search CORE

8 research outputs found

Single Shot Temporal Action Detection

Author: Abadi M.
Escorcia V.
Gemert J.
Glorot X.
Glorot X.
He K.
Kingma D.
Kuehne H.
Liu W.
Oneata D.
Qiu Z.
Simonyan K.
Szegedy C.
Wang R.
Yuan J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/10/2017
Field of study

Temporal action detection is a very important yet challenging problem, since videos in real applications are usually long, untrimmed and contain multiple action instances. This problem requires not only recognizing action categories but also detecting start time and end time of each action instance. Many state-of-the-art methods adopt the "detection by classification" framework: first do proposal, and then classify proposals. The main drawback of this framework is that the boundaries of action instance proposals have been fixed during the classification step. To address this issue, we propose a novel Single Shot Action Detector (SSAD) network based on 1D temporal convolutional layers to skip the proposal generation step via directly detecting action instances in untrimmed video. On pursuit of designing a particular SSAD network that can work effectively for temporal action detection, we empirically search for the best network architecture of SSAD due to lacking existing models that can be directly adopted. Moreover, we investigate into input feature types and fusion strategies to further improve detection accuracy. We conduct extensive experiments on two challenging datasets: THUMOS 2014 and MEXaction2. When setting Intersection-over-Union threshold to 0.5 during evaluation, SSAD significantly outperforms other state-of-the-art systems by increasing mAP from 19.0% to 24.6% on THUMOS 2014 and from 7.4% to 11.0% on MEXaction2.Comment: ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

DAPs: Deep Action Proposals for Action Understanding

Author: A Gaidon
B Hariharan
CL Zitnick
D Oneata
J Hosang
JRR Uijlings
M Everingham
O Russakovsky
S Gupta
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Spatio-Temporal Attention Models for Grounded Video Captioning

Author: A Rohrbach
D Oneata
EH Taralova
O Russakovsky
S Hochreiter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/10/2016
Field of study

Automatic video captioning is challenging due to the complex interactions in dynamic real scenes. A comprehensive system would ultimately localize and track the objects, actions and interactions present in a video and generate a description that relies on temporal localization in order to ground the visual concepts. However, most existing automatic video captioning systems map from raw video data to high level textual description, bypassing localization and recognition, thus discarding potentially valuable information for content localization and generalization. In this work we present an automatic video captioning model that combines spatio-temporal attention and image classification by means of deep neural network structures based on long short-term memory. The resulting system is demonstrated to produce state-of-the-art results in the standard YouTube captioning benchmark while also offering the advantage of localizing the visual concepts (subjects, verbs, objects), with no grounding supervision, over space and time

arXiv.org e-Print Archive

Crossref

Lund University Publications

LEAR @ TrecVid MED 2012

Author: Douze M.
Harchaoui Z.
Oneata D.
Potapov D.
Revaud J.
Schmid C.
Schwenninger J.
Verbeek J.
Wang H.
Publication venue
Publication date
Field of study

Fraunhofer-ePrints

Neubildung deutschen Bauerntums. Innere Kolonisation im Dritten Reich. Fallstudien in Schleswig-Holstein

Author: A Bearman
C Vondrick
D Oneata
E Adeli Mosabbeb
J Sánchez
Khurram Soomro
L Wang
L Wang
O Russakovsky
P Siva
PH Tseng
S Bianco
Publication venue
Publication date: 01/01/1983
Field of study

Dissertation at the Katholieke Univ. Nijmegen (Netherlands), 1983Bibliothek Weltwirtschaft Kiel A151,236 / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEDEGerman

arXiv.org e-Print Archive

Crossref

OpenGrey Repository

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Towards Segmenting Consumer Stereo Videos: Benchmark, Baselines and Ensembles

Author: A Geiger
C Zach
D Oneata
D Scharstein
D Scharstein
DJ Butler
EH Taralova
J Shi
M Bleyer
P Arbelaez
P Ochs
Q Zhang
R Achanta
T Basha
T Brox
T Kanade
U Luxburg von
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Are we ready to segment consumer stereo videos? The amount of this data type is rapidly increasing and encompasses rich information of appearance, motion and depth cues. However, the segmentation of such data is still largely unexplored. First, we propose therefore a new benchmark: videos, annotations and metrics to measure progress on this emerging challenge. Second, we evaluate several state of the art segmentation methods and propose a novel ensemble method based on recent spectral theory. This combines existing image and video segmentation techniques in an efficient scheme. Finally, we propose and integrate into this model a novel regressor, learnt to optimize the stereo segmentation performance directly via a differentiable proxy. The regressor makes our segmentation ensemble adaptive to each stereo video and outperforms the segmentations of the ensemble as well as a most recent RGB-D segmentation technique

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

Archivio della ricerca- Università di Roma La Sapienza

MPG.PuRe

A survey of research by UK NGOs

Author: A Rohrbach
A Zellner
C Zhang
D Oneata
H Bilen
I Yildirim
J Zhu
K Cho
K Ganchev
MJ Wainwright
P Bojanowski
S Ali
SJ Gershman
T Cour
TL Griffiths
V Ramanathan
Z Ghahramani
Z Shi
Publication venue
Publication date: 01/01/1994
Field of study

NGOs - Non-Governmental OrganisationsAvailable from British Library Document Supply Centre-DSC:6224.922(INTRAC-OP--3) / BLDSC - British Library Document Supply CentreSIGLEGBUnited Kingdo

Crossref

OpenGrey Repository

A compact representation of human actions by sliding coordinate coding

Author: Beaudry C
Blank M
Bregonzio M
Brendel W
Chen CY
Ciptadi A
Dollár P
Escorcia V
Gilbert A
Glaser T
Han D
Harada T
Klaser A
Kovashka A
Laptev I
Lazebnik S
Liu H
Liu J
Liu M
Liu M
Lv F
Marszalek M
Messing R
Morioka N
Oneata D
Penatti O
Reddy KK
Ryoo MS
Ryoo MS
Sadanand S
Savarese S
Schuldt C
Scovanner P
Sun J
Vedaldi A
Wang H
Wang H
Wang Y
Yan P
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref