Search CORE

104,322 research outputs found

Interaction between high-level and low-level image analysis for semantic video object extraction

Author: Cavallaro A
Ebrahimi T
Publication venue
Publication date: 01/01/2004
Field of study

Authors of articles published in EURASIP Journal on Advances in Signal Processing are the copyright holders of their articles and have granted to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate the article, according to the SpringerOpen copyright and license agreement (http://www.springeropen.com/authors/license)

Springer - Publisher Connector

Directory of Open Access Journals

Queen Mary Research Online

Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements

Author: Cao Wenfei
Cichocki Andrzej
Meng Deyu
Sun Jian
Wang Yao
Xu Zongben
Yang Can
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Background subtraction has been a fundamental and widely studied task in video analysis, with a wide range of applications in video surveillance, teleconferencing and 3D modeling. Recently, motivated by compressive imaging, background subtraction from compressive measurements (BSCM) is becoming an active research task in video surveillance. In this paper, we propose a novel tensor-based robust PCA (TenRPCA) approach for BSCM by decomposing video frames into backgrounds with spatial-temporal correlations and foregrounds with spatio-temporal continuity in a tensor framework. In this approach, we use 3D total variation (TV) to enhance the spatio-temporal continuity of foregrounds, and Tucker decomposition to model the spatio-temporal correlations of video background. Based on this idea, we design a basic tensor RPCA model over the video frames, dubbed as the holistic TenRPCA model (H-TenRPCA). To characterize the correlations among the groups of similar 3D patches of video background, we further design a patch-group-based tensor RPCA model (PG-TenRPCA) by joint tensor Tucker decompositions of 3D patch groups for modeling the video background. Efficient algorithms using alternating direction method of multipliers (ADMM) are developed to solve the proposed models. Extensive experiments on simulated and real-world videos demonstrate the superiority of the proposed approaches over the existing state-of-the-art approaches.Comment: To appear in IEEE TI

arXiv.org e-Print Archive

Institutional Repository of Institute of Automation, CAS

Shenyang Institute of Automation,Chinese Academy Of Sciences

Unsupervised Discovery of Parts, Structure, and Dynamics

Author: Freeman William T.
Liu Zhijian
Murphy Kevin
Sun Chen
Tenenbaum Joshua B.
Wu Jiajun
Xu Zhenjia
Publication venue
Publication date: 12/03/2019
Field of study

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, first, recognize the object parts via a layered image representation; second, predict hierarchy via a structural descriptor that composes low-level concepts into a hierarchical structure; and third, model the system dynamics by predicting the future. Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all three tasks: segmenting object parts, building their hierarchical structure, and capturing their motion distributions.Comment: ICLR 2019. The first two authors contributed equally to this wor

arXiv.org e-Print Archive

DSpace@MIT

Ambient Sound Provides Supervision for Visual Learning

Author: Freeman William T.
McDermott Josh H.
Owens Andrew
Torralba Antonio
Wu Jiajun
Publication venue
Publication date: 01/09/2016
Field of study

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.Comment: ECCV 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref