180 research outputs found

    Vehicle make and model recognition for intelligent transportation monitoring and surveillance.

    Get PDF
    Vehicle Make and Model Recognition (VMMR) has evolved into a significant subject of study due to its importance in numerous Intelligent Transportation Systems (ITS), such as autonomous navigation, traffic analysis, traffic surveillance and security systems. A highly accurate and real-time VMMR system significantly reduces the overhead cost of resources otherwise required. The VMMR problem is a multi-class classification task with a peculiar set of issues and challenges like multiplicity, inter- and intra-make ambiguity among various vehicles makes and models, which need to be solved in an efficient and reliable manner to achieve a highly robust VMMR system. In this dissertation, facing the growing importance of make and model recognition of vehicles, we present a VMMR system that provides very high accuracy rates and is robust to several challenges. We demonstrate that the VMMR problem can be addressed by locating discriminative parts where the most significant appearance variations occur in each category, and learning expressive appearance descriptors. Given these insights, we consider two data driven frameworks: a Multiple-Instance Learning-based (MIL) system using hand-crafted features and an extended application of deep neural networks using MIL. Our approach requires only image level class labels, and the discriminative parts of each target class are selected in a fully unsupervised manner without any use of part annotations or segmentation masks, which may be costly to obtain. This advantage makes our system more intelligent, scalable, and applicable to other fine-grained recognition tasks. We constructed a dataset with 291,752 images representing 9,170 different vehicles to validate and evaluate our approach. Experimental results demonstrate that the localization of parts and distinguishing their discriminative powers for categorization improve the performance of fine-grained categorization. Extensive experiments conducted using our approaches yield superior results for images that were occluded, under low illumination, partial camera views, or even non-frontal views, available in our real-world VMMR dataset. The approaches presented herewith provide a highly accurate VMMR system for rea-ltime applications in realistic environments.\\ We also validate our system with a significant application of VMMR to ITS that involves automated vehicular surveillance. We show that our application can provide law inforcement agencies with efficient tools to search for a specific vehicle type, make, or model, and to track the path of a given vehicle using the position of multiple cameras

    Diseño de herramientas de apoyo para la detección de logotipos en secuencias de video

    Full text link
    Este Trabajo Fin de Grado se ha realizado usando herramientas y conceptos de visión por ordenador para poder desarrollar métodos analíticos que permitan procesar una secuencia de video y obtener distintos tipos de parámetros o datos que de forma independiente o encadenados puedan llevar a realizar detecciones de logos (precargados o no) en los distintos fotogramas de la secuencia a procesar. El trabajo no se realiza sólo sobre un concepto dentro de la visión por ordenador y el procesado de imagen, sino que se intentan abarcar el máximo de herramientas y conceptos que pueden ser utilizados para detectar un logo, ya sean de color o forma. El método comienza definiendo tres pasos de pre-procesado que, motivados por las heurísticas del diseño, determinan las áreas donde un logo es más susceptible de ser localizado. Específicamente, los métodos usados son estrategias basadas en técnicas estructurales, saliencia y color que vayan reduciendo las zonas donde se ejecutarán las tareas de detección. Además, una detección de regiones estáticas en la secuencia evita detecciones en éstas áreas. En este proyecto, la detección de logotipos se logra mediante una serie de pasos, siendo el primero y más innovador el preprocesado, seguido del uso de segmentado de la imagen y matching de puntos de interés para alcanzar el reconocimiento correcto de un logotipo, que luego será revisado por varias técnicas incluyendo un módulo de perspectiva que detecta si el match está en la perspectiva general de la toma. Los logos se detectan midiendo el grado de similitud entre la plantilla transformada y el área candidata. Los resultados experimentales en una serie de secuencias elegidas validan parcialmente el diseño y método para transmisiones futbolísticas. Aunque por otro lado, los resultados muestran las limitaciones y problemas del método al analizar secuencias de otros deportes. Además, también se incluyen experimentos preliminares del uso de éste método en la generación de estadísticas enfocadas al análisis publicitario, dando resultados prometedores. En términos generales, los resultados sugieren que el uso de técnicas de pre-procesado puede ayudar en la labor de detección automática de logotipos.This work describes an automatic method for the detection of brand logos in sport sequences. The work starts by studying the solutions existing in the state-of-the art in the topic. From this study a set of conclusions is derived, and these are used to define the design of the proposed method. The method starts by defining three pre-processing methods which—motivated by design-heuristics—determine the spatial areas on which a logo is prone to be placed. Specifically, the methods use colour, structural and saliency based strategies to constrain the areas on which the logo detection process takes place. On the candidate areas—those prone to contain a logo—, a classical point-of-interest matching strategy is used to relate the candidate instances with a preload logo template. From these matches, an affine correction of the template is derived. Logos are detected by measuring the similarity between the transformed template and the candidate areas. Experimental results on a set of candidate sequences partially validate the design and development of the method for soccer sequences. However, results also illustrate the method’s drawbacks and limitations when analysing sequences of alternative sports. Furthermore, preliminary experiments on the use of the method for the generation of publicity statistics are also included, obtaining promising results. In overall, results suggest that the use of pre-processing techniques may help in the task of automatic logo detection

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Adaptive detection and tracking using multimodal information

    Get PDF
    This thesis describes work on fusing data from multiple sources of information, and focuses on two main areas: adaptive detection and adaptive object tracking in automated vision scenarios. The work on adaptive object detection explores a new paradigm in dynamic parameter selection, by selecting thresholds for object detection to maximise agreement between pairs of sources. Object tracking, a complementary technique to object detection, is also explored in a multi-source context and an efficient framework for robust tracking, termed the Spatiogram Bank tracker, is proposed as a means to overcome the difficulties of traditional histogram tracking. As well as performing theoretical analysis of the proposed methods, specific example applications are given for both the detection and the tracking aspects, using thermal infrared and visible spectrum video data, as well as other multi-modal information sources

    Visual complexity in human-machine interaction = Visuelle Komplexität in der Mensch-Maschine Interaktion

    Get PDF
    Visuelle Komplexität wird oft als der Grad an Detail oder Verworrenheit in einem Bild definiert (Snodgrass & Vanderwart, 1980). Diese hat Einfluss auf viele Bereiche des menschlichen Lebens, darunter auch solche, die die Interaktion mit Technologie invol-vieren. So wurden Effekte visueller Komplexität etwa im Straßenverkehr (Edquist et al., 2012; Mace & Pollack, 1983) oder bei der Interaktion mit Software (Alemerien & Magel, 2014) oder Webseiten (Deng & Poole, 2010; Tuch et al., 2011) nachgewie-sen. Obwohl die Erforschung visueller Komplexität bereits bis auf die Gestaltpsycho-logen zurückgeht, welche etwa mit dem Gestaltprinzip der Prägnanz die Bedeutung von Simplizität und Komplexität im Wahrnehmungsprozess verankerten (Koffka, 1935; Wertheimer, 1923), sind weder die Einflussfaktoren visueller Komplexität, noch die Zusammenhänge mit Blickbewegungen oder mentaler Beanspruchung bisher ab-schließend erforscht. Diese Punkte adressiert die vorliegende Arbeit mithilfe von vier empirischen Forschungsarbeiten. In Studie 1 wird anhand der Komplexität von Videos in Leitwarten sowie der Effekte auf subjektive, physiologische und Leistungsparameter mentaler Beanspruchung die Bedeutung des Konstruktes im Bereich der Mensch-Maschine Interaktion untersucht. Studie 2 betrachtet die dimensionale Struktur und die Bedeutung verschiedener Ein-flussfaktoren visueller Komplexität genauer, wobei unterschiedliches Stimulusmaterial genutzt wird. In Studie 3 werden mithilfe eines experimentellen Ansatzes die Auswir-kungen von Einflussfaktoren visueller Komplexität auf subjektive Bewertungen sowie eine Auswahl okularer Parameter untersucht. Als Stimuli dienen dabei einfache, schwarz-weiße Formenmuster. Zudem werden verschiedene computationale und oku-lare Parameter genutzt, um anhand dieser Komplexitätsbewertungen vorherzusagen. Dieser Ansatz wird in Studie 4 auf Screenshots von Webseiten übertragen, um die Aussagekraft in einem anwendungsnahen Bereich zu untersuchen. Neben vorangegangenen Forschungsarbeiten legen insbesondere die gefundenen Zusammenhänge mit mentaler Beanspruchung nahe, dass visuelle Komplexität ein relevantes Konstrukt im Bereich der Mensch-Maschine Interaktion darstellt. Dabei haben insbesondere quantitative und strukturelle, aber potentiell auch weitere Aspekte Einfluss auf die Bewertung visueller Komplexität sowie auf das Blickverhalten der Be-trachter. Die gewonnenen Ergebnisse erlauben darüber hinaus Rückschlüsse auf die Zusammenhänge mit computationalen Maßen, welche in Kombination mit okularen Parametern gut für die Vorhersage von Komplexitätsbewertungen geeignet sind. Die Erkenntnisse aus den durchgeführten Studien werden im Kontext vorheriger For-schungsarbeiten diskutiert. Daraus wird ein integratives Forschungsmodell visueller Komplexität in der Mensch-Maschine-Interaktion abgeleitet

    Signal processing algorithms for enhanced image fusion performance and assessment

    Get PDF
    The dissertation presents several signal processing algorithms for image fusion in noisy multimodal conditions. It introduces a novel image fusion method which performs well for image sets heavily corrupted by noise. As opposed to current image fusion schemes, the method has no requirements for a priori knowledge of the noise component. The image is decomposed with Chebyshev polynomials (CP) being used as basis functions to perform fusion at feature level. The properties of CP, namely fast convergence and smooth approximation, renders it ideal for heuristic and indiscriminate denoising fusion tasks. Quantitative evaluation using objective fusion assessment methods show favourable performance of the proposed scheme compared to previous efforts on image fusion, notably in heavily corrupted images. The approach is further improved by incorporating the advantages of CP with a state-of-the-art fusion technique named independent component analysis (ICA), for joint-fusion processing based on region saliency. Whilst CP fusion is robust under severe noise conditions, it is prone to eliminating high frequency information of the images involved, thereby limiting image sharpness. Fusion using ICA, on the other hand, performs well in transferring edges and other salient features of the input images into the composite output. The combination of both methods, coupled with several mathematical morphological operations in an algorithm fusion framework, is considered a viable solution. Again, according to the quantitative metrics the results of our proposed approach are very encouraging as far as joint fusion and denoising are concerned. Another focus of this dissertation is on a novel metric for image fusion evaluation that is based on texture. The conservation of background textural details is considered important in many fusion applications as they help define the image depth and structure, which may prove crucial in many surveillance and remote sensing applications. Our work aims to evaluate the performance of image fusion algorithms based on their ability to retain textural details from the fusion process. This is done by utilising the gray-level co-occurrence matrix (GLCM) model to extract second-order statistical features for the derivation of an image textural measure, which is then used to replace the edge-based calculations in an objective-based fusion metric. Performance evaluation on established fusion methods verifies that the proposed metric is viable, especially for multimodal scenarios

    Visual attention and swarm cognition for off-road robots

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2011Esta tese aborda o problema da modelação de atenção visual no contexto de robôs autónomos todo-o-terreno. O objectivo de utilizar mecanismos de atenção visual é o de focar a percepção nos aspectos do ambiente mais relevantes à tarefa do robô. Esta tese mostra que, na detecção de obstáculos e de trilhos, esta capacidade promove robustez e parcimónia computacional. Estas são características chave para a rapidez e eficiência dos robôs todo-o-terreno. Um dos maiores desafios na modelação de atenção visual advém da necessidade de gerir o compromisso velocidade-precisão na presença de variações de contexto ou de tarefa. Esta tese mostra que este compromisso é resolvido se o processo de atenção visual for modelado como um processo auto-organizado, cuja operação é modulada pelo módulo de selecção de acção, responsável pelo controlo do robô. Ao fechar a malha entre o processo de selecção de acção e o de percepção, o último é capaz de operar apenas onde é necessário, antecipando as acções do robô. Para fornecer atenção visual com propriedades auto-organizadas, este trabalho obtém inspiração da Natureza. Concretamente, os mecanismos responsáveis pela capacidade que as formigas guerreiras têm de procurar alimento de forma auto-organizada, são usados como metáfora na resolução da tarefa de procurar, também de forma auto-organizada, obstáculos e trilhos no campo visual do robô. A solução proposta nesta tese é a de colocar vários focos de atenção encoberta a operar como um enxame, através de interacções baseadas em feromona. Este trabalho representa a primeira realização corporizada de cognição de enxame. Este é um novo campo de investigação que procura descobrir os princípios básicos da cognição, inspeccionando as propriedades auto-organizadas da inteligência colectiva exibida pelos insectos sociais. Logo, esta tese contribui para a robótica como disciplina de engenharia e para a robótica como disciplina de modelação, capaz de suportar o estudo do comportamento adaptável.Esta tese aborda o problema da modelação de atenção visual no contexto de robôs autónomos todo-o-terreno. O objectivo de utilizar mecanismos de atenção visual é o de focar a percepção nos aspectos do ambiente mais relevantes à tarefa do robô. Esta tese mostra que, na detecção de obstáculos e de trilhos, esta capacidade promove robustez e parcimónia computacional. Estas são características chave para a rapidez e eficiência dos robôs todo-o-terreno. Um dos maiores desafios na modelação de atenção visual advém da necessidade de gerir o compromisso velocidade-precisão na presença de variações de contexto ou de tarefa. Esta tese mostra que este compromisso é resolvido se o processo de atenção visual for modelado como um processo auto-organizado, cuja operação é modulada pelo módulo de selecção de acção, responsável pelo controlo do robô. Ao fechar a malha entre o processo de selecção de acção e o de percepção, o último é capaz de operar apenas onde é necessário, antecipando as acções do robô. Para fornecer atenção visual com propriedades auto-organizadas, este trabalho obtém inspi- ração da Natureza. Concretamente, os mecanismos responsáveis pela capacidade que as formi- gas guerreiras têm de procurar alimento de forma auto-organizada, são usados como metáfora na resolução da tarefa de procurar, também de forma auto-organizada, obstáculos e trilhos no campo visual do robô. A solução proposta nesta tese é a de colocar vários focos de atenção encoberta a operar como um enxame, através de interacções baseadas em feromona. Este trabalho representa a primeira realização corporizada de cognição de enxame. Este é um novo campo de investigação que procura descobrir os princípios básicos da cognição, ins- peccionando as propriedades auto-organizadas da inteligência colectiva exibida pelos insectos sociais. Logo, esta tese contribui para a robótica como disciplina de engenharia e para a robótica como disciplina de modelação, capaz de suportar o estudo do comportamento adaptável.Fundação para a Ciência e a Tecnologia (FCT,SFRH/BD/27305/2006); Laboratory of Agent Modelling (LabMag