3 research outputs found
ģģ§ģ“ė ė¬¼ģ²“ ź²ģ¶ ė° ģ¶ģ ģ ģķ ģģ²“ ėŖØė°© ėŖØėø
ķģė
¼ė¬ø (ė°ģ¬)-- ģģøėķźµ ėķģ : ģ źø°Ā·ģ»“ķØķ°ź³µķė¶, 2014. 2. ģµģ§ģ.In this thesis, we propose bio-mimetic models for motion detection and visual tracking to overcome the limitations of existing methods in actual environments. The models are inspired from the theory that there are four different forms of visual memory for human visual perception when representing a scenevisible persistence, informational persistence, visual short-term memory (VSTM), and visual long-term memory (VLTM). We view our problem as a problem of modeling and representing an observed scene with temporary short-term models (TSTM) and conservative long-term models (CLTM). We study on building efficient and effective models for TSTM and CLTM, and utilizing them together to obtain robust detection and tracking results under occlusions, clumsy initializations, background clutters, drifting, and non-rigid deformations encountered in actual environments.
First, we propose an efficient representation of TSTM to be used for moving object detection on non-stationary cameras, which runs within 5.8 milliseconds (ms) on a PC, and real-time on mobile devices. To achieve real-time capability with robust performance, our method models the background through the proposed dual-mode kernel model (DMKM) and compensates the motion of the camera by mixing neighboring models. Modeling through DMKM prevents the background model from being contaminated by foreground pixels, while still allowing the model to be able to adapt to changes of the background. Mixing neighboring models reduces the errors arising from motion compensation and their influences are further reduced by keeping the age of the model. Also, to decrease computation load, the proposed method applies one DMKM to multiple pixels without performance degradation. Experimental results show the computational lightness and the real-time capability of our method on a smart phone with robust detection performances.
Second, by using the concept from both TSTM and CLTM, a new visual tracking method using the novel tri-model is proposed. The proposed method aims to solve the problems of occlusions, background clutters, and drifting simultaneously with the new tri-model. The proposed tri-model is composed of three models, where each model learns the target object, the background, and other non-target moving objects online. The proposed scheme performs tracking by finding the best explanation of the scene with the three learned models. By utilizing the information in the background and the foreground models as well as the target object model, our method obtains robust results under occlusions and background clutters. Also, the target object model is updated in a conservative way to prevent drifting. Furthermore, our method is not restricted to bounding-boxes when representing the target object, and is able to give pixel-wise tracking results.
Third, we go beyond pixel-wise modeling and propose a local feature based tracking model using both TSTM and CLTM to track objects in case of uncertain initializations and severe occlusions. To track objects accurately in such situations, the proposed scheme uses ``motion saliency'' and ``descriptor saliency'' of local features and performs tracking based on generalized Hough transform (GHT). The proposed motion saliency of a local feature utilizes instantaneous velocity of features to form TSTM and emphasizes features having distinctive motions, compared to the motions coming from local features which are not from the object. The descriptor saliency models local features as CLTM and emphasizes features which are likely to be of the object in terms of its feature descriptors. Through these saliencies, the proposed method tries to ``learn and find'' the target object rather than looking for what was given at initialization, becoming robust to initialization problems. Also, our tracking result is obtained by combining the results of each local features of the target and the surroundings, thus being robust against severe occlusions as well. The proposed method is compared against eight other methods, with nine image sequences, and hundred random initializations. The experimental results show that our method outperforms all other compared methods.
Fourth and last, we focus on building robust CLTM with local patches and their neighboring structures. The proposed method is based on sequential Bayesian inference and focuses on solving both the problem of tracking under partial occlusions and the problem of non-rigid object tracking in real-time on desktop personal computers (PC). The proposed scheme is mainly composed of two parts: (1) modeling the target object using elastic structure of local patches for robust performanceand (2) efficient hierarchical diffusion method to perform the tracking process in real-time. The elastic structure of local patches allows the proposed scheme to handle partial occlusions and non-rigid deformations through the relationship among neighboring patches. The proposed hierarchical diffusion generates samples from the region where the posterior is concentrated to reduce computation time. The method is extensively tested on a number of challenging image sequences with occlusion and non-rigid deformation. The experimental results show the real-time capability and the robustness of the proposed scheme under various situations.1 Introduction
1.1 Background and Research Issues
1.1.1 Issues in Motion Detection
1.1.2 Issues in Object Tracking
1.2 The Human Visual Memory
1.2.1 Sensory Memory
1.2.2 Visual Short-Term Memory
1.2.3 Visual Long-Term Memory
1.3 Bio-mimetic Framework for Detection and Tracking
1.4 Contents of the Research
2 Detection by Pixel-wise Dual-Mode Kernel Model
2.1 Proposed Method
2.1.1 Approximated Gaussian Kernel Model
2.1.2 Dual-Mode Kernel Model (DMKM)
2.1.3 Motion Compensation by Mixing Models
2.1.4 Detection of Foreground Pixels
2.2 Experimental Results
2.2.1 Runtime Comparison
2.2.2 Qualitative Comparison
2.2.3 Quantitative Comparison
2.2.4 Effects of Dual-Mode Kernel Model
2.2.5 Effects of Motion Compensation
2.2.6 Mobile Results
2.3 Remarks and Discussion
3 Tracking by Pixel-wise Tri-Model Representation
3.1 Tri-Model Framework
3.1.1 Overall Scheme
3.1.2 Advantages
3.1.3 Practical Approximation
3.2 Tracking with the Tri-Model
3.2.1 Likelihood of the Tri-Model
3.2.2 Likelihood Maximization
3.2.3 Estimating Pixel-Wise Labels
3.3 Learning the Tri-Model
3.3.1 Target Model
3.3.2 Background Model
3.3.3 Foreground Model
3.4 Experimental Results
3.4.1 Experimental Settings
3.4.2 Tracking Accuracy: Bounding Box
3.4.3 Tracking Accuracy: Pixel-Wise
3.5 Remarks and Discussion
4 Tracking by Feature-point-wise Saliency Model
4.1 Proposed Method
4.1.1 Tracking based on GHT
4.1.2 Descriptor Saliency and Feature DB Update
4.1.3 Motion Saliency
4.2 Experimental Results
4.2.1 Tracking with Inaccurate Initializations
4.2.2 Tracking Under Occlusions
4.3 Remarks and Discussion
5 Tracking by Patch-wise Elastic Structure Model
5.1 Tracking with Elastic Structure of Local Patches
5.1.1 Sequential Bayesian Inference Framework
5.1.2 Elastic Structure of Local Patches
5.1.3 Modeling a Single Patch
5.1.4 Modeling the Relationship between Patches
5.1.5 Model Update
5.1.6 Hierarchical Diffusion
5.1.7 Summary of the Proposed Method
5.2 Experiments
5.2.1 Parameter Effects
5.2.2 Performance Evaluation
5.2.3 Discussion on Translation, Rotation, Illumination Changes
5.2.4 Discussion on Partial Occlusions
5.2.5 Discussion on Non-Rigid Deformations
5.2.6 Discussion on Additional Cases
5.2.7 Summary of Tracking Results
5.2.8 Effectiveness of Hierarchical Diffusion
5.2.9 Limitations
5.3 Remarks and Discussion
6 Concluding Remarks and Future Works
Bibliography
Abstract in KoreanDocto
Computer vision models in surveillance robotics
2009/2010In questa Tesi, abbiamo sviluppato algoritmi che usano lāinformazione visiva per eseguire, in tempo reale, individuazione, riconoscimento e classificazione di oggetti in movimento, indipendentemente dalle condizioni ambientali e con lāaccurattezza migliore.
A tal fine, abbiamo sviluppato diversi concetti di visione artificial, cioĆØ l'identificazione degli oggetti di interesse in tutta la scena visiva (monoculare o stereo), e la loro classificazione.
Nel corso della ricerca, sono stati provati diversi approcci, inclusa lāindividuazione di possibili candidati tramite la segmentazione di immagini con classificatori deboli e centroidi, algoritmi per la segmentazione di immagini rafforzate tramite informazioni stereo e riduzione del rumore, combinazione di popolari caratteristiche quali quelle invarianti a fattori di scala (SIFT) combinate con informazioni di distanza.
Abbiamo sviluppato due grandi categorie di soluzioni associate al tipo di sistema usato. Con camera mobile, abbiamo favorito lāindividuazione di oggetti conosciuti tramite scansione dellāimmagine; con camera fissa abbiamo anche utilizzato algoritmi per lāindividuazione degli oggetti in primo piano ed in movimento (foreground detection).
Nel caso di āforeground detectionā, il tasso di individuazione e classificazione aumenta se la qualitaā degli oggetti estratti eā alta. Noi proponiamo metodi per ridurre gli effetti dellāombra, illuminazione e movimenti ripetitivi prodotti dagli oggetti in movimento.
Un aspetto importante studiato eā la possibilitaā di usare algoritmi per lāindividuazione di oggetti in movimento tramite camera mobile.
Soluzioni efficienti stanno diventando sempre piuā complesse, ma anche gli strumenti di calcolo per elaborare gli algoritmi sono piuā potenti e negli anni recenti, le architetture delle schede video (GPU) offrono un grande potenziale. Abbiamo proposto una soluzione per architettura GPU di una gestione delle immagini di sfondo, al fine di aumentare le prestazioni di individuazione.
In questa Tesi abbiamo studiato lāindividuazione ed inseguimento di persone for applicazioni come la prevenzione di situazione di rischio (attraversamento delle strade), e conteggio per lāanalisi del traffico. Noi abbiamo studiato questi problemi ed esplorato vari aspetti dellāindividuazione delle persone, gruppi ed individuazione in scenari
affollati.
Comunque, in un ambiente generico, eā impossibile predire la configurazione di oggetti che saranno catturati dalla telecamera. In questi casi, eā richiesto di āastrarre il concettoā di oggetti. Con questo requisito in mente, abbiamo esplorato le proprietaā dei metodi stocastici e mostrano che buoni tassi di classificazione possono essere ottenuti a condizione che lāinsieme di addestramento sia abbastanza grande.
Una struttura flessibile deve essere in grado di individuare le regioni in movimento e riconoscere gli oggetti di interesse. Abbiamo sviluppato una struttura per la gestione dei problemi di individuazione e classificazione.
Rispetto ad altri metodi, i metodi proposti offrono una struttura flessibile per lāindividuazione e classificazione degli oggetti, e che puoā essere usata in modo efficiente in diversi ambienti interni ed esterni.XXII Cicl