3 research outputs found

    ģ›€ģ§ģ“ėŠ” ė¬¼ģ²“ ź²€ģ¶œ ė° ģ¶”ģ ģ„ ģœ„ķ•œ ģƒģ²“ ėŖØė°© ėŖØėø

    Get PDF
    ķ•™ģœ„ė…¼ė¬ø (ė°•ģ‚¬)-- ģ„œģšøėŒ€ķ•™źµ ėŒ€ķ•™ģ› : ģ „źø°Ā·ģ»“ķ“Øķ„°ź³µķ•™ė¶€, 2014. 2. ģµœģ§„ģ˜.In this thesis, we propose bio-mimetic models for motion detection and visual tracking to overcome the limitations of existing methods in actual environments. The models are inspired from the theory that there are four different forms of visual memory for human visual perception when representing a scenevisible persistence, informational persistence, visual short-term memory (VSTM), and visual long-term memory (VLTM). We view our problem as a problem of modeling and representing an observed scene with temporary short-term models (TSTM) and conservative long-term models (CLTM). We study on building efficient and effective models for TSTM and CLTM, and utilizing them together to obtain robust detection and tracking results under occlusions, clumsy initializations, background clutters, drifting, and non-rigid deformations encountered in actual environments. First, we propose an efficient representation of TSTM to be used for moving object detection on non-stationary cameras, which runs within 5.8 milliseconds (ms) on a PC, and real-time on mobile devices. To achieve real-time capability with robust performance, our method models the background through the proposed dual-mode kernel model (DMKM) and compensates the motion of the camera by mixing neighboring models. Modeling through DMKM prevents the background model from being contaminated by foreground pixels, while still allowing the model to be able to adapt to changes of the background. Mixing neighboring models reduces the errors arising from motion compensation and their influences are further reduced by keeping the age of the model. Also, to decrease computation load, the proposed method applies one DMKM to multiple pixels without performance degradation. Experimental results show the computational lightness and the real-time capability of our method on a smart phone with robust detection performances. Second, by using the concept from both TSTM and CLTM, a new visual tracking method using the novel tri-model is proposed. The proposed method aims to solve the problems of occlusions, background clutters, and drifting simultaneously with the new tri-model. The proposed tri-model is composed of three models, where each model learns the target object, the background, and other non-target moving objects online. The proposed scheme performs tracking by finding the best explanation of the scene with the three learned models. By utilizing the information in the background and the foreground models as well as the target object model, our method obtains robust results under occlusions and background clutters. Also, the target object model is updated in a conservative way to prevent drifting. Furthermore, our method is not restricted to bounding-boxes when representing the target object, and is able to give pixel-wise tracking results. Third, we go beyond pixel-wise modeling and propose a local feature based tracking model using both TSTM and CLTM to track objects in case of uncertain initializations and severe occlusions. To track objects accurately in such situations, the proposed scheme uses ``motion saliency'' and ``descriptor saliency'' of local features and performs tracking based on generalized Hough transform (GHT). The proposed motion saliency of a local feature utilizes instantaneous velocity of features to form TSTM and emphasizes features having distinctive motions, compared to the motions coming from local features which are not from the object. The descriptor saliency models local features as CLTM and emphasizes features which are likely to be of the object in terms of its feature descriptors. Through these saliencies, the proposed method tries to ``learn and find'' the target object rather than looking for what was given at initialization, becoming robust to initialization problems. Also, our tracking result is obtained by combining the results of each local features of the target and the surroundings, thus being robust against severe occlusions as well. The proposed method is compared against eight other methods, with nine image sequences, and hundred random initializations. The experimental results show that our method outperforms all other compared methods. Fourth and last, we focus on building robust CLTM with local patches and their neighboring structures. The proposed method is based on sequential Bayesian inference and focuses on solving both the problem of tracking under partial occlusions and the problem of non-rigid object tracking in real-time on desktop personal computers (PC). The proposed scheme is mainly composed of two parts: (1) modeling the target object using elastic structure of local patches for robust performanceand (2) efficient hierarchical diffusion method to perform the tracking process in real-time. The elastic structure of local patches allows the proposed scheme to handle partial occlusions and non-rigid deformations through the relationship among neighboring patches. The proposed hierarchical diffusion generates samples from the region where the posterior is concentrated to reduce computation time. The method is extensively tested on a number of challenging image sequences with occlusion and non-rigid deformation. The experimental results show the real-time capability and the robustness of the proposed scheme under various situations.1 Introduction 1.1 Background and Research Issues 1.1.1 Issues in Motion Detection 1.1.2 Issues in Object Tracking 1.2 The Human Visual Memory 1.2.1 Sensory Memory 1.2.2 Visual Short-Term Memory 1.2.3 Visual Long-Term Memory 1.3 Bio-mimetic Framework for Detection and Tracking 1.4 Contents of the Research 2 Detection by Pixel-wise Dual-Mode Kernel Model 2.1 Proposed Method 2.1.1 Approximated Gaussian Kernel Model 2.1.2 Dual-Mode Kernel Model (DMKM) 2.1.3 Motion Compensation by Mixing Models 2.1.4 Detection of Foreground Pixels 2.2 Experimental Results 2.2.1 Runtime Comparison 2.2.2 Qualitative Comparison 2.2.3 Quantitative Comparison 2.2.4 Effects of Dual-Mode Kernel Model 2.2.5 Effects of Motion Compensation 2.2.6 Mobile Results 2.3 Remarks and Discussion 3 Tracking by Pixel-wise Tri-Model Representation 3.1 Tri-Model Framework 3.1.1 Overall Scheme 3.1.2 Advantages 3.1.3 Practical Approximation 3.2 Tracking with the Tri-Model 3.2.1 Likelihood of the Tri-Model 3.2.2 Likelihood Maximization 3.2.3 Estimating Pixel-Wise Labels 3.3 Learning the Tri-Model 3.3.1 Target Model 3.3.2 Background Model 3.3.3 Foreground Model 3.4 Experimental Results 3.4.1 Experimental Settings 3.4.2 Tracking Accuracy: Bounding Box 3.4.3 Tracking Accuracy: Pixel-Wise 3.5 Remarks and Discussion 4 Tracking by Feature-point-wise Saliency Model 4.1 Proposed Method 4.1.1 Tracking based on GHT 4.1.2 Descriptor Saliency and Feature DB Update 4.1.3 Motion Saliency 4.2 Experimental Results 4.2.1 Tracking with Inaccurate Initializations 4.2.2 Tracking Under Occlusions 4.3 Remarks and Discussion 5 Tracking by Patch-wise Elastic Structure Model 5.1 Tracking with Elastic Structure of Local Patches 5.1.1 Sequential Bayesian Inference Framework 5.1.2 Elastic Structure of Local Patches 5.1.3 Modeling a Single Patch 5.1.4 Modeling the Relationship between Patches 5.1.5 Model Update 5.1.6 Hierarchical Diffusion 5.1.7 Summary of the Proposed Method 5.2 Experiments 5.2.1 Parameter Effects 5.2.2 Performance Evaluation 5.2.3 Discussion on Translation, Rotation, Illumination Changes 5.2.4 Discussion on Partial Occlusions 5.2.5 Discussion on Non-Rigid Deformations 5.2.6 Discussion on Additional Cases 5.2.7 Summary of Tracking Results 5.2.8 Effectiveness of Hierarchical Diffusion 5.2.9 Limitations 5.3 Remarks and Discussion 6 Concluding Remarks and Future Works Bibliography Abstract in KoreanDocto

    Computer vision models in surveillance robotics

    Get PDF
    2009/2010In questa Tesi, abbiamo sviluppato algoritmi che usano lā€™informazione visiva per eseguire, in tempo reale, individuazione, riconoscimento e classificazione di oggetti in movimento, indipendentemente dalle condizioni ambientali e con lā€™accurattezza migliore. A tal fine, abbiamo sviluppato diversi concetti di visione artificial, cioĆØ l'identificazione degli oggetti di interesse in tutta la scena visiva (monoculare o stereo), e la loro classificazione. Nel corso della ricerca, sono stati provati diversi approcci, inclusa lā€™individuazione di possibili candidati tramite la segmentazione di immagini con classificatori deboli e centroidi, algoritmi per la segmentazione di immagini rafforzate tramite informazioni stereo e riduzione del rumore, combinazione di popolari caratteristiche quali quelle invarianti a fattori di scala (SIFT) combinate con informazioni di distanza. Abbiamo sviluppato due grandi categorie di soluzioni associate al tipo di sistema usato. Con camera mobile, abbiamo favorito lā€™individuazione di oggetti conosciuti tramite scansione dellā€™immagine; con camera fissa abbiamo anche utilizzato algoritmi per lā€™individuazione degli oggetti in primo piano ed in movimento (foreground detection). Nel caso di ā€œforeground detectionā€, il tasso di individuazione e classificazione aumenta se la qualitaā€™ degli oggetti estratti eā€™ alta. Noi proponiamo metodi per ridurre gli effetti dellā€™ombra, illuminazione e movimenti ripetitivi prodotti dagli oggetti in movimento. Un aspetto importante studiato eā€™ la possibilitaā€™ di usare algoritmi per lā€™individuazione di oggetti in movimento tramite camera mobile. Soluzioni efficienti stanno diventando sempre piuā€™ complesse, ma anche gli strumenti di calcolo per elaborare gli algoritmi sono piuā€™ potenti e negli anni recenti, le architetture delle schede video (GPU) offrono un grande potenziale. Abbiamo proposto una soluzione per architettura GPU di una gestione delle immagini di sfondo, al fine di aumentare le prestazioni di individuazione. In questa Tesi abbiamo studiato lā€™individuazione ed inseguimento di persone for applicazioni come la prevenzione di situazione di rischio (attraversamento delle strade), e conteggio per lā€™analisi del traffico. Noi abbiamo studiato questi problemi ed esplorato vari aspetti dellā€™individuazione delle persone, gruppi ed individuazione in scenari affollati. Comunque, in un ambiente generico, eā€™ impossibile predire la configurazione di oggetti che saranno catturati dalla telecamera. In questi casi, eā€™ richiesto di ā€œastrarre il concettoā€ di oggetti. Con questo requisito in mente, abbiamo esplorato le proprietaā€™ dei metodi stocastici e mostrano che buoni tassi di classificazione possono essere ottenuti a condizione che lā€™insieme di addestramento sia abbastanza grande. Una struttura flessibile deve essere in grado di individuare le regioni in movimento e riconoscere gli oggetti di interesse. Abbiamo sviluppato una struttura per la gestione dei problemi di individuazione e classificazione. Rispetto ad altri metodi, i metodi proposti offrono una struttura flessibile per lā€™individuazione e classificazione degli oggetti, e che puoā€™ essere usata in modo efficiente in diversi ambienti interni ed esterni.XXII Cicl
    corecore