224 research outputs found

    Crowd-sourced data and its applications for new algorithms in photographic imaging

    Get PDF
    This thesis comprises two main themes. The first of these is concerned primarily with the validity and utility of data acquired from web-based psychophysical experiments. In recent years web-based experiments, and the crowd-sourced data they can deliver, have been rising in popularity among the research community for several key reasons – primarily ease of administration and easy access to a large population of diverse participants. However, the level of control with which traditional experiments are performed, and the severe lack of control we have over web-based alternatives may lead us to believe that these benefits come at the cost of reliable data. Indeed, the results reported early in this thesis support this assumption. However, we proceed to show that it is entirely possible to crowd-source data that is comparable with lab-based results. The second theme of the thesis explores the possibilities presented by the use of crowd-sourced data, taking a popular colour naming experiment as an example. After using the crowd-sourced data to construct a model for computational colour naming, we consider the value of colour names as image descriptors, with particular relevance to illuminant estimation and object indexing. We discover that colour names represent a particularly useful quantisation of colour space, allowing us to construct compact image descriptors for object indexing. We show that these descriptors are somewhat tolerant to errors in illuminant estimation and that their perceptual relevance offers even further utility. We go on to develop a novel algorithm which delivers perceptually-relevant, illumination-invariant image descriptors based on colour names

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

    A Fast Alpha-tree Algorithm for Extreme Dynamic Range Pixel Dissimilarities

    Get PDF
    The α-tree algorithm is a useful hierarchical representation technique which facilitates comprehension of imagessuch as remote sensing and medical images. Most α-tree algorithms make use of priority queues to process image edgesin a correct order, but because traditional priority queues areinefficient in α-tree algorithms using extreme-dynamic-rangepixel dissimilarities, they run slower compared with other relatedalgorithms such as component tree. In this paper, we proposea novel hierarchical heap priority queue algorithm that canprocess α-tree edges much more efficiently than other stateof-the-art priority queues. Experimental results using 48-bitSentinel-2A remotely sensed images and randomly generatedimages have shown that the proposed hierarchical heap priorityqueue improved the timings of the flooding α-tree algorithm byreplacing the heap priority queue with the proposed queue: 1.68times in 4-N and 2.41 times in 8-N on Sentinel-2A images, and2.56 times and 4.43 times on randomly generated images

    A Fast Alpha-tree Algorithm for Extreme Dynamic Range Pixel Dissimilarities

    Get PDF
    The α-tree algorithm is a useful hierarchical representation technique which facilitates comprehension of imagessuch as remote sensing and medical images. Most α-tree algorithms make use of priority queues to process image edgesin a correct order, but because traditional priority queues areinefficient in α-tree algorithms using extreme-dynamic-rangepixel dissimilarities, they run slower compared with other relatedalgorithms such as component tree. In this paper, we proposea novel hierarchical heap priority queue algorithm that canprocess α-tree edges much more efficiently than other stateof-the-art priority queues. Experimental results using 48-bitSentinel-2A remotely sensed images and randomly generatedimages have shown that the proposed hierarchical heap priorityqueue improved the timings of the flooding α-tree algorithm byreplacing the heap priority queue with the proposed queue: 1.68times in 4-N and 2.41 times in 8-N on Sentinel-2A images, and2.56 times and 4.43 times on randomly generated images

    BoR: Bag-of-Relations for Symbol Retrieval

    Get PDF
    International audienceIn this paper, we address a new scheme for symbol retrieval based on bag-of-relations (BoRs) which are computed between extracted visual primitives (e.g. circle and corner). Our features consist of pairwise spatial relations from all possible combinations of individual visual primitives. The key characteristic of the overall process is to use topological relation information indexed in bags-of-relations and use this for recognition. As a consequence, directional relation matching takes place only with those candidates having similar topological configurations. A comprehensive study is made by using several different well known datasets such as GREC, FRESH and SESYD, and includes a comparison with state-of-the-art descriptors. Experiments provide interesting results on symbol spotting and other user-friendly symbol retrieval applications

    Object Detection in Video Signal

    Get PDF
    Tato diplomová práce se zabýva detekcí objektu ve videosignálu, implementovaného pro platformu Raspberry-Pi v programovacím jazyce \textit{C}. Detekce objektu funguje na základě algoritmu "histogramu orientovaných gradientů" (HOG) a klasifikátoru "support vector machine" (SVM). V úvodní části obsahuje popis platformy Raspbery-Pi a algoritmů HOG a SVM. V následující časti je zdokumentována implementace a popis dosažení požadované detekce objektu.This diploma thesis deals with object detection in a video signal, implemented for Raspberry-Pi platform using \textit{C} programing language. Object detection is based on "Histogram of Oriented gradients" (HOG) algorithm and "Support Vector Machine" (SVM) classification. The introductory contains description of the Raspbery-Pi platform and describes HOG and SVM algorithms. The following chapter contains code design specification and the information about the implementation to reach the desired detection

    A Framework of Indexation and Document Video Retrieval Based on the Conceptual Graphs

    Get PDF
    Most of the video indexing and retrieval systems suffer from the lack of a comprehensive video model capturing the image semantic richness, the conveyed signal information and the spatial relations between visual entities. To remedy such shortcomings, we present in this paper a video model integrating visual semantics, spatial and signal characterizations. It relies on an expressive representation formalism handling high-level video descriptions and a full-text query framework in an attempt to operate video indexing and retrieval beyond trivial low-level processes, semantic-based keyword annotation and retrieval frameworks

    Scene understanding for interactive applications

    Get PDF
    Para interactuar con el entorno, es necesario entender que está ocurriendo en la escena donde se desarrolla la acción. Décadas de investigación en el campo de la visión por computador han contribuido a conseguir sistemas que permiten interpretar de manera automática el contenido en una escena a partir de información visual. Se podría decir el objetivo principal de estos sistemas es replicar la capacidad humana para extraer toda la información a partir solo de datos visuales. Por ejemplo, uno de sus objetivos es entender como percibimosel mundo en tres dimensiones o como podemos reconocer sitios y objetos a pesar de la gran variación en su apariencia. Una de las tareas básicas para entender una escena es asignar un significado semántico a cada elemento (píxel) de una imagen. Esta tarea se puede formular como un problema de etiquetado denso el cual especifica valores (etiquetas) a cada pixel o región de una imagen. Dependiendo de la aplicación, estas etiquetas puedenrepresentar conceptos muy diferentes, desde magnitudes físicas como la información de profundidad, hasta información semántica, como la categoría de un objeto. El objetivo general en esta tesis es investigar y desarrollar nuevas técnicas para incorporar automáticamente una retroalimentación por parte del usuario, o un conocimiento previo en sistemas inteligente para conseguir analizar automáticamente el contenido de una escena. en particular,esta tesis explora dos fuentes comunes de información previa proporcionado por los usuario: interacción humana y etiquetado manual de datos de ejemplo.La primera parte de esta tesis esta dedicada a aprendizaje de información de una escena a partir de información proporcionada de manera interactiva por un usuario. Las soluciones que involucran a un usuario imponen limitaciones en el rendimiento, ya que la respuesta que se le da al usuario debe obtenerse en un tiempo interactivo. Esta tesis presenta un paradigma eficiente que aproxima cualquier magnitud por píxel a partir de unos pocos trazos del usuario. Este sistema propaga los escasos datos de entrada proporcionados por el usuario a cada píxel de la imagen. El paradigma propuesto se ha validado a través detres aplicaciones interactivas para editar imágenes, las cuales requieren un conocimiento por píxel de una cierta magnitud, con el objetivo de simular distintos efectos.Otra estrategia común para aprender a partir de información de usuarios es diseñar sistemas supervisados de aprendizaje automático. En los últimos años, las redes neuronales convolucionales han superado el estado del arte de gran variedad de problemas de reconocimiento visual. Sin embargo, para nuevas tareas, los datos necesarios de entrenamiento pueden no estar disponibles y recopilar suficientes no es siempre posible. La segunda parte de esta tesis explora como mejorar los sistema que aprenden etiquetado denso semántico a partir de imágenes previamente etiquetadas por los usuarios. En particular, se presenta y validan estrategias, basadas en los dos principales enfoques para transferir modelos basados en deep learning, para segmentación semántica, con el objetivo de poder aprender nuevas clases cuando los datos de entrenamiento no son suficientes en cantidad o precisión.Estas estrategias se han validado en varios entornos realistas muy diferentes, incluyendo entornos urbanos, imágenes aereas y imágenes submarinas.In order to interact with the environment, it is necessary to understand what is happening on it, on the scene where the action is ocurring. Decades of research in the computer vision field have contributed towards automatically achieving this scene understanding from visual information. Scene understanding is a very broad area of research within the computer vision field. We could say that it tries to replicate the human capability of extracting plenty of information from visual data. For example, we would like to understand how the people perceive the world in three dimensions or can quickly recognize places or objects despite substantial appearance variation. One of the basic tasks in scene understanding from visual data is to assign a semantic meaning to every element of the image, i.e., assign a concept or object label to every pixel in the image. This problem can be formulated as a dense image labeling problem which assigns specific values (labels) to each pixel or region in the image. Depending on the application, the labels can represent very different concepts, from a physical magnitude, such as depth information, to high level semantic information, such as an object category. The general goal in this thesis is to investigate and develop new ways to automatically incorporate human feedback or prior knowledge in intelligent systems that require scene understanding capabilities. In particular, this thesis explores two common sources of prior information from users: human interactions and human labeling of sample data. The first part of this thesis is focused on learning complex scene information from interactive human knowledge. Interactive user solutions impose limitations on the performance where the feedback to the user must be at interactive rates. This thesis presents an efficient interaction paradigm that approximates any per-pixel magnitude from a few user strokes. It propagates the sparse user input to each pixel of the image. We demonstrate the suitability of the proposed paradigm through three interactive image editing applications which require per-pixel knowledge of certain magnitude: simulate the effect of depth of field, dehazing and HDR tone mapping. Other common strategy to learn from user prior knowledge is to design supervised machine-learning approaches. In the last years, Convolutional Neural Networks (CNNs) have pushed the state-of-the-art on a broad variety of visual recognition problems. However, for new tasks, enough training data is not always available and therefore, training from scratch is not always feasible. The second part of this thesis investigates how to improve systems that learn dense semantic labeling of images from user labeled examples. In particular, we present and validate strategies, based on common transfer learning approaches, for semantic segmentation. The goal of these strategies is to learn new specific classes when there is not enough labeled data to train from scratch. We evaluate these strategies across different environments, such as autonomous driving scenes, aerial images or underwater ones.<br /
    corecore