1,615 research outputs found

    Target detection, tracking, and localization using multi-spectral image fusion and RF Doppler differentials

    Get PDF
    It is critical for defense and security applications to have a high probability of detection and low false alarm rate while operating over a wide variety of conditions. Sensor fusion, which is the the process of combining data from two or more sensors, has been utilized to improve the performance of a system by exploiting the strengths of each sensor. This dissertation presents algorithms to fuse multi-sensor data that improves system performance by increasing detection rates, lowering false alarms, and improving track performance. Furthermore, this dissertation presents a framework for comparing algorithm error for image registration which is a critical pre-processing step for multi-spectral image fusion. First, I present an algorithm to improve detection and tracking performance for moving targets in a cluttered urban environment by fusing foreground maps from multi-spectral imagery. Most research in image fusion consider visible and long-wave infrared bands; I examine these bands along with near infrared and mid-wave infrared. To localize and track a particular target of interest, I present an algorithm to fuse output from the multi-spectral image tracker with a constellation of RF sensors measuring a specific cellular emanation. The fusion algorithm matches the Doppler differential from the RF sensors with the theoretical Doppler Differential of the video tracker output by selecting the sensor pair that minimizes the absolute difference or root-mean-square difference. Finally, a framework to quantify shift-estimation error for both area- and feature-based algorithms is presented. By exploiting synthetically generated visible and long-wave infrared imagery, error metrics are computed and compared for a number of area- and feature-based shift estimation algorithms. A number of key results are presented in this dissertation. The multi-spectral image tracker improves the location accuracy of the algorithm while improving the detection rate and lowering false alarms for most spectral bands. All 12 moving targets were tracked through the video sequence with only one lost track that was later recovered. Targets from the multi-spectral tracking algorithm were correctly associated with their corresponding cellular emanation for all targets at lower measurement uncertainty using the root-mean-square difference while also having a high confidence ratio for selecting the true target from background targets. For the area-based algorithms and the synthetic air-field image pair, the DFT and ECC algorithms produces sub-pixel shift-estimation error in regions such as shadows and high contrast painted line regions. The edge orientation feature descriptors increase the number of sub-field estimates while improving the shift-estimation error compared to the Lowe descriptor

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    Foreground Removal in a Multi-Camera System

    Get PDF
    Traditionally, whiteboards have been used to brainstorm, teach, and convey ideas with others. However distributing whiteboard content remotely can be challenging. To solve this problem, A multi-camera system was developed which can be scaled to broadcast an arbitrarily large writing surface while removing objects not related to the whiteboard content. Related research has been performed previously to combine multiple images together, identify and remove unrelated objects, also referred to as foreground, in a single image and correct for warping differences in camera frames. However, this is the first time anyone has attempted to solve this problem using a multi-camera system. The main components of this problem include stitching the input images together, identifying foreground material, and replacing the foreground information with the most recent background (desired) information. This problem can be subdivided into two main components: fusing multiple images into one cohesive frame, and detecting/removing foreground objects. for the first component, homographic transformations are used to create a mathematical mapping from the input image to the desired reference frame. Blending techniques are then applied to remove artifacts that remain after the perspective transform. For the second, statistical tests and modeling in conjunction with additional classification algorithms were used

    Um estudo comparativo das abordagens de detecção e reconhecimento de texto para cenários de computação restrita

    Get PDF
    Orientadores: Ricardo da Silva Torres, Allan da Silva PintoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Textos são elementos fundamentais para uma efetiva comunicação em nosso cotidiano. A mobilidade de pessoas e veículos em ambientes urbanos e a busca por um produto de interesse em uma prateleira de supermercado são exemplos de atividades em que o entendimento dos elementos textuais presentes no ambiente são essenciais para a execução da tarefa. Recentemente, diversos avanços na área de visão computacional têm sido reportados na literatura, com o desenvolvimento de algoritmos e métodos que objetivam reconhecer objetos e textos em cenas. Entretanto, a detecção e reconhecimento de textos são problemas considerados em aberto devido a diversos fatores que atuam como fontes de variabilidades durante a geração e captura de textos em cenas, o que podem impactar as taxas de detecção e reconhecimento de maneira significativa. Exemplo destes fatores incluem diferentes formas dos elementos textuais (e.g., circular ou em linha curva), estilos e tamanhos da fonte, textura, cor, variação de brilho e contraste, entre outros. Além disso, os recentes métodos considerados estado-da-arte, baseados em aprendizagem profunda, demandam altos custos de processamento computacional, o que dificulta a utilização de tais métodos em cenários de computação restritiva. Esta dissertação apresenta um estudo comparativo de técnicas de detecção e reconhecimento de texto, considerando tanto os métodos baseados em aprendizado profundo quanto os métodos que utilizam algoritmos clássicos de aprendizado de máquina. Esta dissertação também apresenta um método de fusão de caixas delimitadoras, baseado em programação genética (GP), desenvolvido para atuar tanto como uma etapa de pós-processamento, posterior a etapa de detecção, quanto para explorar a complementariedade dos algoritmos de detecção de texto investigados nesta dissertação. De acordo com o estudo comparativo apresentado neste trabalho, os métodos baseados em aprendizagem profunda são mais eficazes e menos eficientes, em comparação com os métodos clássicos da literatura e considerando as métricas adotadas. Além disso, o algoritmo de fusão proposto foi capaz de aprender informações complementares entre os métodos investigados nesta dissertação, o que resultou em uma melhora das taxas de precisão e revocação. Os experimentos foram conduzidos considerando os problemas de detecção de textos horizontais, verticais e de orientação arbitráriaAbstract: Texts are fundamental elements for effective communication in our daily lives. The mobility of people and vehicles in urban environments and the search for a product of interest on a supermarket shelf are examples of activities in which the understanding of the textual elements present in the environment is essential to succeed in such tasks. Recently, several advances in computer vision have been reported in the literature, with the development of algorithms and methods that aim to recognize objects and texts in scenes. However, text detection and recognition are still open problems due to several factors that act as sources of variability during scene text generation and capture, which can significantly impact detection and recognition rates of current algorithms. Examples of these factors include different shapes of textual elements (e.g., circular or curved), font styles and sizes, texture, color, brightness and contrast variation, among others. Besides, recent state-of-the-art methods based on deep learning demand high computational processing costs, which difficult their use in restricted computing scenarios. This dissertation presents a comparative study of text detection and recognition techniques, considering methods based on deep learning and methods that use classical machine learning algorithms. This dissertation also presents an algorithm for fusing bounding boxes, based on genetic programming (GP), developed to act as a post-processing step for a single text detector and to explore the complementarity of text detection algorithms investigated in this dissertation. According to the comparative study presented in this work, the methods based on deep learning are more effective and less efficient, in comparison to classic methods for text detection investigated in this work, considering the adopted metrics. Furthermore, the proposed GP-based fusion algorithm was able to learn complementary information from the methods investigated in this dissertation, which resulted in an improvement of precision and recall rates. The experiments were conducted considering text detection problems involving horizontal, vertical and arbitrary orientationsMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    Advances in Object and Activity Detection in Remote Sensing Imagery

    Get PDF
    The recent revolution in deep learning has enabled considerable development in the fields of object and activity detection. Visual object detection tries to find objects of target classes with precise localisation in an image and assign each object instance a corresponding class label. At the same time, activity recognition aims to determine the actions or activities of an agent or group of agents based on sensor or video observation data. It is a very important and challenging problem to detect, identify, track, and understand the behaviour of objects through images and videos taken by various cameras. Together, objects and their activity recognition in imaging data captured by remote sensing platforms is a highly dynamic and challenging research topic. During the last decade, there has been significant growth in the number of publications in the field of object and activity recognition. In particular, many researchers have proposed application domains to identify objects and their specific behaviours from air and spaceborne imagery. This Special Issue includes papers that explore novel and challenging topics for object and activity detection in remote sensing images and videos acquired by diverse platforms

    DOCUMENT TEXT DETECTION IN VIDEO FRAMES ACQUIRED BY A SMARTPHONE BASED ON LINE SEGMENT DETECTOR AND DBSCAN CLUSTERING

    Get PDF
    Automatic document text detection in video is an important task and a prerequisite for video retrieval, annotation, recognition, indexing and content analysis. In this paper, we present an effective and efficient model for detecting the page outlines within frames of video clip acquired by a Smartphone. The model consists of four stages: In the first stage, all line segments of each video frame are detected by LSD method. In the second stage, the line segments are grouped into clusters using the DBSCAN clustering algorithm, and then a prior knowledge is used in order to discover the cluster of page document from the background. In the third and fourth stages, a length and an angle filtering processes are performed respectively on the cluster of line segments. Finally a sorting operation is applied in order to detect the quadrilateral coordinates of the document page in the input video frame. The proposed model is evaluated on the ICDAR 2015 Smartphone Capture OCR dataset. Experimental results and comparative study show that our model can achieve encouraging and useful results and works efficiently even under different classes of documents

    Deep Learning based 3D Segmentation: A Survey

    Full text link
    3D object segmentation is a fundamental and challenging problem in computer vision with applications in autonomous driving, robotics, augmented reality and medical image analysis. It has received significant attention from the computer vision, graphics and machine learning communities. Traditionally, 3D segmentation was performed with hand-crafted features and engineered methods which failed to achieve acceptable accuracy and could not generalize to large-scale data. Driven by their great success in 2D computer vision, deep learning techniques have recently become the tool of choice for 3D segmentation tasks as well. This has led to an influx of a large number of methods in the literature that have been evaluated on different benchmark datasets. This paper provides a comprehensive survey of recent progress in deep learning based 3D segmentation covering over 150 papers. It summarizes the most commonly used pipelines, discusses their highlights and shortcomings, and analyzes the competitive results of these segmentation methods. Based on the analysis, it also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure
    corecore