2,938 research outputs found

    Motion Detection in Low Resolution Grayscale Videos Using Fast Normalized Cross Correrelation on GP-GPU

    Get PDF
    Motion estimation (ME) has been widely used in many computer vision applications, such as object tracking, object detection, pattern recognition and video compression. The most popular block based similarity measures are the sum of absolute differences (SAD), the sum of squared differences (SSD) and the normalized cross correlation (NCC). Similarity measure obtained using NCC is more robust under varying illumination changes as compared to SAD and SSD. However NCC is computationally expensive and application of NCC using full or exhaustive search method further increases required computational time. Relatively efficient way of calculating the NCC is to pre-compute sum-tables to perform the normalization referred to as fast NCC (FCC). In this paper we propose real time implementation of full search FCC algorithm applied to gray scale videos using NVIDIA’s Compute Unified Device Architecture (CUDA). We present fine-grained optimization techniques for fully exploiting computational capacity of CUDA. Novel parallelization strategies adopted for extracting data parallelism substantially reduce computational time of exhaustive FCC. We show that by efficient utilization of global, shared and texture memories available on CUDA, we can obtain the speedup of the order of 10x as compared to the sequential implementation of FCC

    Selected topics in video coding and computer vision

    Get PDF
    Video applications ranging from multimedia communication to computer vision have been extensively studied in the past decades. However, the emergence of new applications continues to raise questions that are only partially answered by existing techniques. This thesis studies three selected topics related to video: intra prediction in block-based video coding, pedestrian detection and tracking in infrared imagery, and multi-view video alignment.;In the state-of-art video coding standard H.264/AVC, intra prediction is defined on the hierarchical quad-tree based block partitioning structure which fails to exploit the geometric constraint of edges. We propose a geometry-adaptive block partitioning structure and a new intra prediction algorithm named geometry-adaptive intra prediction (GAIP). A new texture prediction algorithm named geometry-adaptive intra displacement prediction (GAIDP) is also developed by extending the original intra displacement prediction (IDP) algorithm with the geometry-adaptive block partitions. Simulations on various test sequences demonstrate that intra coding performance of H.264/AVC can be significantly improved by incorporating the proposed geometry adaptive algorithms.;In recent years, due to the decreasing cost of thermal sensors, pedestrian detection and tracking in infrared imagery has become a topic of interest for night vision and all weather surveillance applications. We propose a novel approach for detecting and tracking pedestrians in infrared imagery based on a layered representation of infrared images. Pedestrians are detected from the foreground layer by a Principle Component Analysis (PCA) based scheme using the appearance cue. To facilitate the task of pedestrian tracking, we formulate the problem of shot segmentation and present a graph matching-based tracking algorithm. Simulations with both OSU Infrared Image Database and WVU Infrared Video Database are reported to demonstrate the accuracy and robustness of our algorithms.;Multi-view video alignment is a process to facilitate the fusion of non-synchronized multi-view video sequences for various applications including automatic video based surveillance and video metrology. In this thesis, we propose an accurate multi-view video alignment algorithm that iteratively aligns two sequences in space and time. To achieve an accurate sub-frame temporal alignment, we generalize the existing phase-correlation algorithm to 3-D case. We also present a novel method to obtain the ground-truth of the temporal alignment by using supplementary audio signals sampled at a much higher rate. The accuracy of our algorithm is verified by simulations using real-world sequences

    Complexity adaptation in video encoders for power limited platforms

    Get PDF
    With the emergence of video services on power limited platforms, it is necessary to consider both performance-centric and constraint-centric signal processing techniques. Traditionally, video applications have a bandwidth or computational resources constraint or both. The recent H.264/AVC video compression standard offers significantly improved efficiency and flexibility compared to previous standards, which leads to less emphasis on bandwidth. However, its high computational complexity is a problem for codecs running on power limited plat- forms. Therefore, a technique that integrates both complexity and bandwidth issues in a single framework should be considered. In this thesis we investigate complexity adaptation of a video coder which focuses on managing computational complexity and provides significant complexity savings when applied to recent standards. It consists of three sub functions specially designed for reducing complexity and a framework for using these sub functions; Variable Block Size (VBS) partitioning, fast motion estimation, skip macroblock detection, and complexity adaptation framework. Firstly, the VBS partitioning algorithm based on the Walsh Hadamard Transform (WHT) is presented. The key idea is to segment regions of an image as edges or flat regions based on the fact that prediction errors are mainly affected by edges. Secondly, a fast motion estimation algorithm called Fast Walsh Boundary Search (FWBS) is presented on the VBS partitioned images. Its results outperform other commonly used fast algorithms. Thirdly, a skip macroblock detection algorithm is proposed for use prior to motion estimation by estimating the Discrete Cosine Transform (DCT) coefficients after quantisation. A new orthogonal transform called the S-transform is presented for predicting Integer DCT coefficients from Walsh Hadamard Transform coefficients. Complexity saving is achieved by deciding which macroblocks need to be processed and which can be skipped without processing. Simulation results show that the proposed algorithm achieves significant complexity savings with a negligible loss in rate-distortion performance. Finally, a complexity adaptation framework which combines all three techniques mentioned above is proposed for maximizing the perceptual quality of coded video on a complexity constrained platform

    Detection and Recognition of Traffic Signs Inside the Attentional Visual Field of Drivers

    Get PDF
    Traffic sign detection and recognition systems are essential components of Advanced Driver Assistance Systems and self-driving vehicles. In this contribution we present a vision-based framework which detects and recognizes traffic signs inside the attentional visual field of drivers. This technique takes advantage of the driver\u27s 3D absolute gaze point obtained through the combined use of a front-view stereo imaging system and a non-contact 3D gaze tracker. We used a linear Support Vector Machine as a classifier and a Histogram of Oriented Gradient as features for detection. Recognition is performed by using Scale Invariant Feature Transforms and color information. Our technique detects and recognizes signs which are in the field of view of the driver and also provides indication when one or more signs have been missed by the driver

    Underwater Localization in Complex Environments

    Get PDF
    A capacidade de um veículo autónomo submarino (AUV) se localizar num ambiente complexo, bem como de extrair características relevantes do mesmo, é de grande importância para o sucesso da navegação. No entanto, esta tarefa é particularmente desafiante em ambientes subaquáticos devido à rápida atenuação sofrida pelos sinais de sistemas de posicionamento global ou outros sinais de radiofrequência, dispersão e reflexão, sendo assim necessário o uso de processos de filtragem. Ambiente complexo é definido aqui como um cenário com objetos destacados das paredes, por exemplo, o objeto pode ter uma certa variabilidade de orientação, portanto a sua posição nem sempre é conhecida. Exemplos de cenários podem ser um porto, um tanque ou mesmo uma barragem, onde existem paredes e dentro dessas paredes um AUV pode ter a necessidade de se localizar de acordo com os outros veículos na área e se posicionar em relação ao mesmo e analisá-lo. Os veículos autónomos empregam muitos tipos diferentes de sensores para localização e percepção dos seus ambientes e dependem dos computadores de bordo para realizar tarefas de direção autónoma. Para esta dissertação há um problema concreto a resolver, localizar um cabo suspenso numa coluna de água em uma região conhecida do mar e navegar de acordo com ela. Embora a posição do cabo no mundo seja bem conhecida, a dinâmica do cabo não permite saber exatamente onde ele está. Assim, para que o veículo se localize de acordo com este para que possa ser inspecionado, a localização deve ser baseada em sensores ópticos e acústicos. Este estudo explora o processamento e a análise de imagens óticas e acústicas, por meio dos dados adquiridos através de uma câmara e por um sonar de varrimento mecânico (MSIS),respetivamente, a fim de extrair características ambientais relevantes que possibilitem a estimação da localização do veículo. Os pontos de interesse extraídos de cada um dos sensores são utilizados para alimentar um estimador de posição, implementando um Filtro de Kalman Extendido (EKF), de modo a estimar a posição do cabo e através do feedback do filtro melhorar os processos de extração de pontos de interesse utilizados.The ability of an autonomous underwater vehicle (AUV) to locate itself in a complex environment as well as to detect relevant environmental features is of crucial importance for successful navigation. However, it's particularly challenging in underwater environments due to the rapid attenuation suffered by signals from global positioning systems or other radio frequency signals, dispersion and reflection thus needing a filtering process. Complex environment is defined here as a scenario with objects detached from the walls, for example the object can have a certain orientation variability therefore its position is not always known. Examples of scenarios can be a harbour, a tank or even a dam reservoir, where there are walls and within those walls an AUV may have the need to localize itself according to the other vehicles in the area and position itself relative to one to observe, analyse or scan it. Autonomous vehicles employ many different types of sensors for localization and perceiving their environments and they depend on the on-board computers to perform autonomous driving tasks. For this dissertation there is a concrete problem to solve, which is to locate a suspended cable in a water column in a known region in the sea and navigate according to it. Although the cable position in the world is well known, the cable dynamics does not allow knowing where it is exactly. So, in order to the vehicle localize itself according to it so it can be inspected, the localization has to be based on optical and acoustic sensors. This study explores the processing and analysis of optical and acoustic images, through the data acquired through a camera and by a mechanical scanning sonar (MSIS), respectively, in order to extract relevant environmental characteristics that allow the estimation of the location of the vehicle. The points of interest extracted from each of the sensors are used to feed a position estimator, by implementing an Extended Kalman Filter (EKF), in order to estimate the position of the cable and through the feedback of the filter improve the extraction processes of points of interest used

    Homogeneous and Heterogeneous Face Recognition: Enhancing, Encoding and Matching for Practical Applications

    Get PDF
    Face Recognition is the automatic processing of face images with the purpose to recognize individuals. Recognition task becomes especially challenging in surveillance applications, where images are acquired from a long range in the presence of difficult environments. Short Wave Infrared (SWIR) is an emerging imaging modality that is able to produce clear long range images in difficult environments or during night time. Despite the benefits of the SWIR technology, matching SWIR images against a gallery of visible images presents a challenge, since the photometric properties of the images in the two spectral bands are highly distinct.;In this dissertation, we describe a cross spectral matching method that encodes magnitude and phase of multi-spectral face images filtered with a bank of Gabor filters. The magnitude of filtered images is encoded with Simplified Weber Local Descriptor (SWLD) and Local Binary Pattern (LBP) operators. The phase is encoded with Generalized Local Binary Pattern (GLBP) operator. Encoded multi-spectral images are mapped into a histogram representation and cross matched by applying symmetric Kullback-Leibler distance. Performance of the developed algorithm is demonstrated on TINDERS database that contains long range SWIR and color images acquired at a distance of 2, 50, and 106 meters.;Apart from long acquisition range, other variations and distortions such as pose variation, motion and out of focus blur, and uneven illumination may be observed in multispectral face images. Recognition performance of the face recognition matcher can be greatly affected by these distortions. It is important, therefore, to ensure that matching is performed on high quality images. Poor quality images have to be either enhanced or discarded. This dissertation addresses the problem of selecting good quality samples.;The last chapters of the dissertation suggest a number of modifications applied to the cross spectral matching algorithm for matching low resolution color images in near-real time. We show that the method that encodes the magnitude of Gabor filtered images with the SWLD operator guarantees high recognition rates. The modified method (Gabor-SWLD) is adopted in a camera network set up where cameras acquire several views of the same individual. The designed algorithm and software are fully automated and optimized to perform recognition in near-real time. We evaluate the recognition performance and the processing time of the method on a small dataset collected at WVU

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF
    corecore