Search CORE

1,088 research outputs found

3D scene modeling and understanding from image sequences

Author: Tang Hao
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2014
Field of study

A new method for 3D modeling is proposed, which generates a content-based 3D mosaic (CB3M) representation for long video sequences of 3D, dynamic urban scenes captured by a camera on a mobile platform. In the first phase, a set of parallel-perspective (pushbroom) mosaics with varying viewing directions is generated to capture both the 3D and dynamic aspects of the scene under the camera coverage. In the second phase, a unified patch-based stereo matching algorithm is applied to extract parametric representations of the color, structure and motion of the dynamic and/or 3D objects in urban scenes, where a lot of planar surfaces exist. Multiple pairs of stereo mosaics are used for facilitating reliable stereo matching, occlusion handling, accurate 3D reconstruction and robust moving target detection. The outcome of this phase is a CB3M representation, which is a highly compressed visual representation for a dynamic 3D scene, and has object contents of both 3D and motion information. In the third phase, a multi-layer based scene understanding algorithm is proposed, resulting in a planar surface model for higher-level object representations. Experimental results are given for both simulated and several different real video sequences of large-scale 3D scenes to show the accuracy and effectiveness of the representation. We also show the patch-based stereo matching algorithm and the CB3M representation can be generalized to 3D modeling with perspective views using either a single camera or a stereovision head on a ground mobile platform or a pedestrian. Applications of the proposed method include airborne or ground video surveillance, 3D urban scene modeling, traffic survey, transportation planning and the visual aid for perception and navigation of blind people

Directory of Open Access Journals

Revistes Catalanes amb Accés Obert

Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)

Diposit Digital de Documents de la UAB

A Survey on Video-based Graphics and Video Visualization

Author: Xianghua Xie
Publication venue: EUROGRAPHICS
Publication date: 01/01/2011
Field of study

Cronfa at Swansea University

State of the Art Report on Video-based Graphics and Video Visualizations

Author: Agarwal
Agarwal
Agarwala
Aggarwal
Ahonen
Andriluka
Arulampalam
Assa
Assa
Avidan
Bai
Ballan
Barnes
Barron
Bartoli
Bay
Bennett
Bhat
Bishop
Botchen
Bousseau
Boykov
Brandel
Bruhn
Brutzer
Buehler
Caspi
Chen
Cheng
Collomosse
Cornelis
Correa
Coughlan
Cremers
Dalal
Daniel
Davison
Dellaert
Deutscher
Divvala
Dollar
Durou
Faugeras
Felzenszwalb
Felzenszwalb
Felzenszwalb
Fleet
Furukawa
Gall
Galvin
Gibson
Goldman
Hannuna
Harris
Hartley
Hoiem
Horn
Hu
Huang
Höferlin
Kakumanu
Kang
Kang
Ke
Kimber
Klein
Koutsourakis
Kumar
Kutulakos
Kwatra
Laptev
Laptev
Laurentini
Le
Lee
Li
Lindeberg
Liu
Lobay
Lowe
Lucas
Matas
McIvor
Mei
Mikolajczyk
Mikolajczyk
Moons
Moreels
Nienhaus
Patel
Peker
Pellegrini
Petrovic
Piccardi
Pritch
Radke
Ramanan
Rav-Acha
Rav-Acha
Rav-Acha
Reisfeld
Romdhani
Rother
Rubinstein
Rubinstein
Rubinstein
Russell
Schoeffmann
Seitz
Setlur
Setlur
Sezgin
Shesh
Shi
Sion
Starck
Stein
Stoykova
Sull
Sun
Szeliski
Szeliski
Teodosio
Torresani
Torresani
Truong
Urtasun
Van
Viola
Vlasic
Vogiatzis
Wang
Wang
Wang
Wang
Wang
Wang
Weickert
Welch
Wilson
Winnemöller
Wolf
Xu
Yeung
Zhao
Zhu
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Crossref

Cronfa at Swansea University

Dynamic 3D Urban Scene Modeling Using Multiple Pushbroom Mosaics

Author: George Wolberg
Hao Tang
Zhigang Zhu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In this paper, a unified, segmentation-based approach is proposed to deal with both stereo reconstruction and moving objects detection problems using multiple stereo mosaics. Each set of parallel-perspective (pushbroom) stereo mosaics is generated from a video sequence captured by a single video camera. First a colorsegmentation approach is used to extract the so-called natural matching primitives from a reference view of a pair of stereo mosaics to facilitate both 3D reconstruction of textureless urban scenes and man-made moving targets (e.g. vehicles). Multiple pairs of stereo mosaics are used to improve the accuracy and robustness in 3D recovery and occlusion handling. Moving targets are detected by inspecting their 3D anomalies, either violating the epipolar geometry of the pushbroom stereo or exhibiting abnormal 3D structure. Experimental results on both simulated and real video sequences are provided to show the effectiveness of our approach. 1

CiteSeerX

Crossref

TR-2008013: Content-Based 3D Mosaics for Large-Scale Dynamic Urban Scenes

Author: Tang Hao
Zhu Zhigang
Publication venue: CUNY Academic Works
Publication date: 01/01/2008
Field of study

City University of New York

Image-Based Rendering Of Real Environments For Virtual Reality

Author: Bertel Tobias
Publication venue
Publication date: 14/02/2022
Field of study

OPUS

Deep Neural Network Architectures and Learning Methodologies for Classification and Application in 3D Reconstruction

Author: Forbes Timothy
Publication venue
Publication date: 01/12/2018
Field of study

In this work we explore two different scenarios of 3D reconstruction. The first, urban scenes, is approached using a deep learning network trained to identify structurally important classes within aerial imagery of cities. The network was trained using data taken from ISPRS benchmark dataset of the city of Vaihingen. Using the segmented maps generated by the network we can proceed to more accurately reconstruct the scenes by a process of clustering and then class specific model generation. The second scenario is that of underwater scenes. We use two separate networks to first identify caustics and then remove them from a scene. Data was generated synthetically as real world datasets for this subject are extremely hard to produce. Using the generated caustic free image we can then reconstruct the scene with more precision and accuracy through a process of structure from motion. We investigate different deep learning architectures and parameters for both scenarios. Our results are evaluated to be efficient and effective by comparing them with online benchmarks and alternative reconstruction attempts. We conclude by discussing the limitations of problem specific datasets and our potential research into the generation of datasets through the use of Generative-Adverserial-Networks

Concordia University Research Repository

Video Processing with Additional Information

Author: Ramachandran Mahesh
Publication venue
Publication date: 01/01/2010
Field of study

Cameras are frequently deployed along with many additional sensors in aerial and ground-based platforms. Many video datasets have metadata containing measurements from inertial sensors, GPS units, etc. Hence the development of better video processing algorithms using additional information attains special significance. We first describe an intensity-based algorithm for stabilizing low resolution and low quality aerial videos. The primary contribution is the idea of minimizing the discrepancy in the intensity of selected pixels between two images. This is an application of inverse compositional alignment for registering images of low resolution and low quality, for which minimizing the intensity difference over salient pixels with high gradients results in faster and better convergence than when using all the pixels. Secondly, we describe a feature-based method for stabilization of aerial videos and segmentation of small moving objects. We use the coherency of background motion to jointly track features through the sequence. This enables accurate tracking of large numbers of features in the presence of repetitive texture, lack of well conditioned feature windows etc. We incorporate the segmentation problem within the joint feature tracking framework and propose the first combined joint-tracking and segmentation algorithm. The proposed approach enables highly accurate tracking, and segmentation of feature tracks that is used in a MAP-MRF framework for obtaining dense pixelwise labeling of the scene. We demonstrate competitive moving object detection in challenging video sequences of the VIVID dataset containing moving vehicles and humans that are small enough to cause background subtraction approaches to fail. Structure from Motion (SfM) has matured to a stage, where the emphasis is on developing fast, scalable and robust algorithms for large reconstruction problems. The availability of additional sensors such as inertial units and GPS along with video cameras motivate the development of SfM algorithms that leverage these additional measurements. In the third part, we study the benefits of the availability of a specific form of additional information - the vertical direction (gravity) and the height of the camera both of which can be conveniently measured using inertial sensors, and a monocular video sequence for 3D urban modeling. We show that in the presence of this information, the SfM equations can be rewritten in a bilinear form. This allows us to derive a fast, robust, and scalable SfM algorithm for large scale applications. The proposed SfM algorithm is experimentally demonstrated to have favorable properties compared to the sparse bundle adjustment algorithm. We provide experimental evidence indicating that the proposed algorithm converges in many cases to solutions with lower error than state-of-art implementations of bundle adjustment. We also demonstrate that for the case of large reconstruction problems, the proposed algorithm takes lesser time to reach its solution compared to bundle adjustment. We also present SfM results using our algorithm on the Google StreetView research dataset, and several other datasets

Digital Repository at the University of Maryland

Change blindness: eradication of gestalt strategies

Author: Goddard Paul
Wilson Steve
Publication venue: 'Pion Ltd'
Publication date: 01/08/2011
Field of study

Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

University of Lincoln Institutional Repository

Detection and Generalization of Spatio-temporal Trajectories for Motion Imagery

Author: Partsinevelos Panayotis
Publication venue: DigitalCommons@UMaine
Publication date: 01/01/2002
Field of study

In today\u27s world of vast information availability users often confront large unorganized amounts of data with limited tools for managing them. Motion imagery datasets have become increasingly popular means for exposing and disseminating information. Commonly, moving objects are of primary interest in modeling such datasets. Users may require different levels of detail mainly for visualization and further processing purposes according to the application at hand. In this thesis we exploit the geometric attributes of objects for dataset summarization by using a series of image processing and neural network tools. In order to form data summaries we select representative time instances through the segmentation of an object\u27s spatio-temporal trajectory lines. High movement variation instances are selected through a new hybrid self-organizing map (SOM) technique to describe a single spatio-temporal trajectory. Multiple objects move in diverse yet classifiable patterns. In order to group corresponding trajectories we utilize an abstraction mechanism that investigates a vague moving relevance between the data in space and time. Thus, we introduce the spatio-temporal neighborhood unit as a variable generalization surface. By altering the unit\u27s dimensions, scaled generalization is accomplished. Common complications in tracking applications that include occlusion, noise, information gaps and unconnected segments of data sequences are addressed through the hybrid-SOM analysis. Nevertheless, entangled data sequences where no information on which data entry belongs to each corresponding trajectory are frequently evident. A multidimensional classification technique that combines geometric and backpropagation neural network implementation is used to distinguish between trajectory data. Further more, modeling and summarization of two-dimensional phenomena evolving in time brings forward the novel concept of spatio-temporal helixes as compact event representations. The phenomena models are comprised of SOM movement nodes (spines) and cardinality shape-change descriptors (prongs). While we focus on the analysis of MI datasets, the framework can be generalized to function with other types of spatio-temporal datasets. Multiple scale generalization is allowed in a dynamic significance-based scale rather than a constant one. The constructed summaries are not just a visualization product but they support further processing for metadata creation, indexing, and querying. Experimentation, comparisons and error estimations for each technique support the analyses discussed

CiteSeerX

University of Maine