48 research outputs found

    Few-parameter learning for a hierarchical perceptual grouping system

    Get PDF
    Perceptual grouping along well-established Gestalt laws provides one set of traditional methods that provide a tiny set of meaningful parameters to be adjusted for each application field. More complex and challenging tasks require a hierarchical setting, where the results aggregated by a first grouping process are later subject to further processing on a larger scale and with more abstract objects. This can be several steps deep. An example from the domain of forestry provides insight into the search for suitable parameter settings providing sufficient performance for the machine-vision module to be of practical use within a larger robotic control setting in this application domain. This sets a stark contrast in comparison to the state-of-the-art deep-learning neural nets, where many millions of obscure parameters must be adjusted properly before the performance suffices. It is the opinion of the author that the huge freedom for possible settings in such a high-dimensional inscrutable parameter space poses an unnecessary risk. Moreover, few-parameter learning is getting along with less training material. Whereas the state-of-the-art networks require millions of images with expert labels, a single image can already provide good insight into the nature of the parameter domain of the Gestalt laws, and a domain expert labeling just a handful of salient contours in said image yields already a proper goal function, so that a well working sweet spot in the parameter domain can be found in a few steps. As compared to the state-of-the-art neural nets, a reduction of six orders of magnitude in the number of parameters results. Almost parameter-free statistical test methods can reduce the number of parameters to be trained further by one order of magnitude, but they are less flexible and currently lack the advantages of hierarchical feature processing

    Development of artificial neural network-based object detection algorithms for low-cost hardware devices

    Get PDF
    Finally, the fourth work was published in the “WCCI” conference in 2020 and consisted of an individuals' position estimation algorithm based on a novel neural network model for environments with forbidden regions, named “Forbidden Regions Growing Neural Gas”.The human brain is the most complex, powerful and versatile learning machine ever known. Consequently, many scientists of various disciplines are fascinated by its structures and information processing methods. Due to the quality and quantity of the information extracted from the sense of sight, image is one of the main information channels used by humans. However, the massive amount of video footage generated nowadays makes it difficult to process those data fast enough manually. Thus, computer vision systems represent a fundamental tool in the extraction of information from digital images, as well as a major challenge for scientists and engineers. This thesis' primary objective is automatic foreground object detection and classification through digital image analysis, using artificial neural network-based techniques, specifically designed and optimised to be deployed in low-cost hardware devices. This objective will be complemented by developing individuals' movement estimation methods by using unsupervised learning and artificial neural network-based models. The cited objectives have been addressed through a research work illustrated in four publications supporting this thesis. The first one was published in the “ICAE” journal in 2018 and consists of a neural network-based movement detection system for Pan-Tilt-Zoom (PTZ) cameras deployed in a Raspberry Pi board. The second one was published in the “WCCI” conference in 2018 and consists of a deep learning-based automatic video surveillance system for PTZ cameras deployed in low-cost hardware. The third one was published in the “ICAE” journal in 2020 and consists of an anomalous foreground object detection and classification system for panoramic cameras, based on deep learning and supported by low-cost hardware

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Computer vision-based structural assessment exploiting large volumes of images

    Get PDF
    Visual assessment is a process to understand the state of a structure based on evaluations originating from visual information. Recent advances in computer vision to explore new sensors, sensing platforms and high-performance computing have shed light on the potential for vision-based visual assessment in civil engineering structures. The use of low-cost, high-resolution visual sensors in conjunction with mobile and aerial platforms can overcome spatial and temporal limitations typically associated with other forms of sensing in civil structures. Also, GPU-accelerated and parallel computing offer unprecedented speed and performance, accelerating processing the collected visual data. However, despite the enormous endeavor in past research to implement such technologies, there are still many practical challenges to overcome to successfully apply these techniques in real world situations. A major challenge lies in dealing with a large volume of unordered and complex visual data, collected under uncontrolled circumstance (e.g. lighting, cluttered region, and variations in environmental conditions), while just a tiny fraction of them are useful for conducting actual assessment. Such difficulty induces an undesirable high rate of false-positive and false-negative errors, reducing the trustworthiness and efficiency of their implementation. To overcome the inherent challenges in using such images for visual assessment, high-level computer vision algorithms must be integrated with relevant prior knowledge and guidance, thus aiming to have similar performance with those of humans conducting visual assessment. Moreover, the techniques must be developed and validated in the realistic context of a large volume of real-world images, which is likely contain numerous practical challenges. In this dissertation, the novel use of computer vision algorithms is explored to address two promising applications of vision-based visual assessment in civil engineering: visual inspection, and visual data analysis for post-disaster evaluation. For both applications, powerful techniques are developed here to enable reliable and efficient visual assessment for civil structures and demonstrate them using a large volume of real-world images collected from actual structures. State-of-art computer vision techniques, such as structure-from-motion and convolutional neural network techniques, facilitate these tasks. The core techniques derived from this study are scalable and expandable to many other applications in vision-based visual assessment, and will serve to close the existing gaps between past research efforts and real-world implementations

    A theory of information processing for machine visual perception: inspiration from psychology, formal analysis and applications

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura : 20-09-201

    Estudio de métodos de construcción de ensembles de clasificadores y aplicaciones

    Get PDF
    La inteligencia artificial se dedica a la creación de sistemas informáticos con un comportamiento inteligente. Dentro de este área el aprendizaje computacional estudia la creación de sistemas que aprenden por sí mismos. Un tipo de aprendizaje computacional es el aprendizaje supervisado, en el cual, se le proporcionan al sistema tanto las entradas como la salida esperada y el sistema aprende a partir de estos datos. Un sistema de este tipo se denomina clasificador. En ocasiones ocurre, que en el conjunto de ejemplos que utiliza el sistema para aprender, el número de ejemplos de un tipo es mucho mayor que el número de ejemplos de otro tipo. Cuando esto ocurre se habla de conjuntos desequilibrados. La combinación de varios clasificadores es lo que se denomina "ensemble", y a menudo ofrece mejores resultados que cualquiera de los miembros que lo forman. Una de las claves para el buen funcionamiento de los ensembles es la diversidad. Esta tesis, se centra en el desarrollo de nuevos algoritmos de construcción de ensembles, centrados en técnicas de incremento de la diversidad y en los problemas desequilibrados. Adicionalmente, se aplican estas técnicas a la solución de varias problemas industriales.Ministerio de Economía y Competitividad, proyecto TIN-2011-2404

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Herramientas para la indexación de vídeo: extracción de imágenes relevantes y análisis de imágenes de agencia

    Get PDF
    English: Firstly, in this report we will introduce a relevant frame extractor system from a video sequence. This allows us to characterize with a small number of images the content of a media sequence. Thus, the task of automatic audiovisual content indexing and retrieval becomes easier. The choice of frames that best represent the sequence occurs from three criteria: the presence of faces, the presence of text and image sharpness. Furthermore, as the main tool, it is developed a scene change detector based on the publication of Swain&Ballard Color Indexing [1]. Secondly we present a news agencies covers recognition system, which often contain information about the content of the video that follows. By extracting the text, we will also make easier the task of content indexing and retrieval from databases.Castellano: En esta memoria se presenta, en primer lugar, un sistema de extracción de frames relevantes en secuencias de vídeo. Esto permite caracterizar con un número reducido de imágenes el contenido de una secuencia procedente de medios audiovisuales. De esta forma, se facilita la indexación automática o semiautomática y la posterior recuperación de contenido audiovisual. La elección de los frames que mejor representan la secuencia se realiza a partir de tres criterios: presencia de caras, presencia de texto y nitidez de la imagen. Además, como herramienta principal, se desarrolla un detector de cambios de escena basado en la publicación de Swain & Ballard Color Indexing [1]. En segundo lugar se presenta un sistema de reconocimiento de carátulas de agencias de noticias, las cuales suelen contener información sobre el contenido del video que les sigue. Mediante la extracción del texto podremos facilitar también las tareas de indexación y recuperación de contenidos en bases de datos.Català: En aquesta memòria es presenta, en primer lloc, un sistema d'extracció de frames rellevants en seqüències de vídeo. Això ens permet caracteritzar amb un número reduït d'imatges el contingut d'una seqüència típica procedent de mitjans audiovisuals. D?aquesta manera es facilita la indexació automàtica o semiautomàtica i la posterior recuperació de contingut audiovisual. L'elecció dels frames que millor representen la seqüència es realitza a partir de tres criteris: presencia de cares, presencia de text i nitidesa de la imatge. A més, com a eina principal, es desenvolupa un detector de canvis d'escena basat en la publicació de Swain & Ballard Color Indexing [1]. En segon lloc, es presenta un sistema de reconeixement de caretes d'agència de notícies, les quals solen aportar informació del contingut del vídeo que precedeixen. Mitjançant l'extracció del text podrem facilitar també les tasques d'indexació i recuperació de continguts en bases de dades
    corecore