94 research outputs found

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    A Robotic System for Learning Visually-Driven Grasp Planning (Dissertation Proposal)

    Get PDF
    We use findings in machine learning, developmental psychology, and neurophysiology to guide a robotic learning system\u27s level of representation both for actions and for percepts. Visually-driven grasping is chosen as the experimental task since it has general applicability and it has been extensively researched from several perspectives. An implementation of a robotic system with a gripper, compliant instrumented wrist, arm and vision is used to test these ideas. Several sensorimotor primitives (vision segmentation and manipulatory reflexes) are implemented in this system and may be thought of as the innate perceptual and motor abilities of the system. Applying empirical learning techniques to real situations brings up such important issues as observation sparsity in high-dimensional spaces, arbitrary underlying functional forms of the reinforcement distribution and robustness to noise in exemplars. The well-established technique of non-parametric projection pursuit regression (PPR) is used to accomplish reinforcement learning by searching for projections of high-dimensional data sets that capture task invariants. We also pursue the following problem: how can we use human expertise and insight into grasping to train a system to select both appropriate hand preshapes and approaches for a wide variety of objects, and then have it verify and refine its skills through trial and error. To accomplish this learning we propose a new class of Density Adaptive reinforcement learning algorithms. These algorithms use statistical tests to identify possibly interesting regions of the attribute space in which the dynamics of the task change. They automatically concentrate the building of high resolution descriptions of the reinforcement in those areas, and build low resolution representations in regions that are either not populated in the given task or are highly uniform in outcome. Additionally, the use of any learning process generally implies failures along the way. Therefore, the mechanics of the untrained robotic system must be able to tolerate mistakes during learning and not damage itself. We address this by the use of an instrumented, compliant robot wrist that controls impact forces

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Vol. 16, No. 1 (Full Issue)

    Get PDF

    Spectral LADAR: Active Range-Resolved Imaging Spectroscopy

    Get PDF
    Imaging spectroscopy using ambient or thermally generated optical sources is a well developed technique for capturing two dimensional images with high per-pixel spectral resolution. The per-pixel spectral data is often a sufficient sampling of a material's backscatter spectrum to infer chemical properties of the constituent material to aid in substance identification. Separately, conventional LADAR sensors use quasi-monochromatic laser radiation to create three dimensional images of objects at high angular resolution, compared to RADAR. Advances in dispersion engineered photonic crystal fibers in recent years have made high spectral radiance optical supercontinuum sources practical, enabling this study of Spectral LADAR, a continuous polychromatic spectrum augmentation of conventional LADAR. This imaging concept, which combines multi-spectral and 3D sensing at a physical level, is demonstrated with 25 independent and parallel LADAR channels and generates point cloud images with three spatial dimensions and one spectral dimension. The independence of spectral bands is a key characteristic of Spectral LADAR. Each spectral band maintains a separate time waveform record, from which target parameters are estimated. Accordingly, the spectrum computed for each backscatter reflection is independently and unambiguously range unmixed from multiple target reflections that may arise from transmission of a single panchromatic pulse. This dissertation presents the theoretical background of Spectral LADAR, a shortwave infrared laboratory demonstrator system constructed as a proof-of-concept prototype, and the experimental results obtained by the prototype when imaging scenes at stand off ranges of 45 meters. The resultant point cloud voxels are spectrally classified into a number of material categories which enhances object and feature recognition. Experimental results demonstrate the physical level combination of active backscatter spectroscopy and range resolved sensing to produce images with a level of complexity, detail, and accuracy that is not obtainable with data-level registration and fusion of conventional imaging spectroscopy and LADAR. The capabilities of Spectral LADAR are expected to be useful in a range of applications, such as biomedical imaging and agriculture, but particularly when applied as a sensor in unmanned ground vehicle navigation. Applications to autonomous mobile robotics are the principal motivators of this study, and are specifically addressed

    Computer vision beyond the visible : image understanding through language

    Get PDF
    In the past decade, deep neural networks have revolutionized computer vision. High performing deep neural architectures trained for visual recognition tasks have pushed the field towards methods relying on learned image representations instead of hand-crafted ones, in the seek of designing end-to-end learning methods to solve challenging tasks, ranging from long-lasting ones such as image classification to newly emerging tasks like image captioning. As this thesis is framed in the context of the rapid evolution of computer vision, we present contributions that are aligned with three major changes in paradigm that the field has recently experienced, namely 1) the power of re-utilizing deep features from pre-trained neural networks for different tasks, 2) the advantage of formulating problems with end-to-end solutions given enough training data, and 3) the growing interest of describing visual data with natural language rather than pre-defined categorical label spaces, which can in turn enable visual understanding beyond scene recognition. The first part of the thesis is dedicated to the problem of visual instance search, where we particularly focus on obtaining meaningful and discriminative image representations which allow efficient and effective retrieval of similar images given a visual query. Contributions in this part of the thesis involve the construction of sparse Bag-of-Words image representations from convolutional features from a pre-trained image classification neural network, and an analysis of the advantages of fine-tuning a pre-trained object detection network using query images as training data. The second part of the thesis presents contributions to the problem of image-to-set prediction, understood as the task of predicting a variable-sized collection of unordered elements for an input image. We conduct a thorough analysis of current methods for multi-label image classification, which are able to solve the task in an end-to-end manner by simultaneously estimating both the label distribution and the set cardinality. Further, we extend the analysis of set prediction methods to semantic instance segmentation, and present an end-to-end recurrent model that is able to predict sets of objects (binary masks and categorical labels) in a sequential manner. Finally, the third part of the dissertation takes insights learned in the previous two parts in order to present deep learning solutions to connect images with natural language in the context of cooking recipes and food images. First, we propose a retrieval-based solution in which the written recipe and the image are encoded into compact representations that allow the retrieval of one given the other. Second, as an alternative to the retrieval approach, we propose a generative model to predict recipes directly from food images, which first predicts ingredients as sets and subsequently generates the rest of the recipe one word at a time by conditioning both on the image and the predicted ingredients.En l'última dècada, les xarxes neuronals profundes han revolucionat el camp de la visió per computador. Els resultats favorables obtinguts amb arquitectures neuronals profundes entrenades per resoldre tasques de reconeixement visual han causat un canvi de paradigma cap al disseny de mètodes basats en representacions d'imatges apreses de manera automàtica, deixant enrere les tècniques tradicionals basades en l'enginyeria de representacions. Aquest canvi ha permès l'aparició de tècniques basades en l'aprenentatge d'extrem a extrem (end-to-end), capaces de resoldre de manera efectiva molts dels problemes tradicionals de la visió per computador (e.g. classificació d'imatges o detecció d'objectes), així com nous problemes emergents com la descripció textual d'imatges (image captioning). Donat el context de la ràpida evolució de la visió per computador en el qual aquesta tesi s'emmarca, presentem contribucions alineades amb tres dels canvis més importants que la visió per computador ha experimentat recentment: 1) la reutilització de representacions extretes de models neuronals pre-entrenades per a tasques auxiliars, 2) els avantatges de formular els problemes amb solucions end-to-end entrenades amb grans bases de dades, i 3) el creixent interès en utilitzar llenguatge natural en lloc de conjunts d'etiquetes categòriques pre-definits per descriure el contingut visual de les imatges, facilitant així l'extracció d'informació visual més enllà del reconeixement de l'escena i els elements que la composen La primera part de la tesi està dedicada al problema de la cerca d'imatges (image retrieval), centrada especialment en l'obtenció de representacions visuals significatives i discriminatòries que permetin la recuperació eficient i efectiva d'imatges donada una consulta formulada amb una imatge d'exemple. Les contribucions en aquesta part de la tesi inclouen la construcció de representacions Bag-of-Words a partir de descriptors locals obtinguts d'una xarxa neuronal entrenada per classificació, així com un estudi dels avantatges d'utilitzar xarxes neuronals per a detecció d'objectes entrenades utilitzant les imatges d'exemple, amb l'objectiu de millorar les capacitats discriminatòries de les representacions obtingudes. La segona part de la tesi presenta contribucions al problema de predicció de conjunts a partir d'imatges (image to set prediction), entès com la tasca de predir una col·lecció no ordenada d'elements de longitud variable donada una imatge d'entrada. En aquest context, presentem una anàlisi exhaustiva dels mètodes actuals per a la classificació multi-etiqueta d'imatges, que són capaços de resoldre la tasca de manera integral calculant simultàniament la distribució probabilística sobre etiquetes i la cardinalitat del conjunt. Seguidament, estenem l'anàlisi dels mètodes de predicció de conjunts a la segmentació d'instàncies semàntiques, presentant un model recurrent capaç de predir conjunts d'objectes (representats per màscares binàries i etiquetes categòriques) de manera seqüencial. Finalment, la tercera part de la tesi estén els coneixements apresos en les dues parts anteriors per presentar solucions d'aprenentatge profund per connectar imatges amb llenguatge natural en el context de receptes de cuina i imatges de plats cuinats. En primer lloc, proposem una solució basada en algoritmes de cerca, on la recepta escrita i la imatge es codifiquen amb representacions compactes que permeten la recuperació d'una donada l'altra. En segon lloc, com a alternativa a la solució basada en algoritmes de cerca, proposem un model generatiu capaç de predir receptes (compostes pels seus ingredients, predits com a conjunts, i instruccions) directament a partir d'imatges de menjar.Postprint (published version

    Texture and Colour in Image Analysis

    Get PDF
    Research in colour and texture has experienced major changes in the last few years. This book presents some recent advances in the field, specifically in the theory and applications of colour texture analysis. This volume also features benchmarks, comparative evaluations and reviews

    Intelligent Circuits and Systems

    Get PDF
    ICICS-2020 is the third conference initiated by the School of Electronics and Electrical Engineering at Lovely Professional University that explored recent innovations of researchers working for the development of smart and green technologies in the fields of Energy, Electronics, Communications, Computers, and Control. ICICS provides innovators to identify new opportunities for the social and economic benefits of society.  This conference bridges the gap between academics and R&D institutions, social visionaries, and experts from all strata of society to present their ongoing research activities and foster research relations between them. It provides opportunities for the exchange of new ideas, applications, and experiences in the field of smart technologies and finding global partners for future collaboration. The ICICS-2020 was conducted in two broad categories, Intelligent Circuits & Intelligent Systems and Emerging Technologies in Electrical Engineering
    • …
    corecore