    Prioritizing Content of Interest in Multimedia Data Compression

    Image and video compression techniques make data transmission and storage in digital multimedia systems more efficient and feasible for the system's limited storage and bandwidth. Many generic image and video compression techniques such as JPEG and H.264/AVC have been standardized and are now widely adopted. Despite their great success, we observe that these standard compression techniques are not the best solution for data compression in special types of multimedia systems such as microscopy videos and low-power wireless broadcast systems. In these application-specific systems where the content of interest in the multimedia data is known and well-defined, we should re-think the design of a data compression pipeline. We hypothesize that by identifying and prioritizing multimedia data's content of interest, new compression methods can be invented that are far more effective than standard techniques. In this dissertation, a set of new data compression methods based on the idea of prioritizing the content of interest has been proposed for three different kinds of multimedia systems. I will show that the key to designing efficient compression techniques in these three cases is to prioritize the content of interest in the data. The definition of the content of interest of multimedia data depends on the application. First, I show that for microscopy videos, the content of interest is defined as the spatial regions in the video frame with pixels that don't only contain noise. Keeping data in those regions with high quality and throwing out other information yields to a novel microscopy video compression technique. Second, I show that for a Bluetooth low energy beacon based system, practical multimedia data storage and transmission is possible by prioritizing content of interest. I designed custom image compression techniques that preserve edges in a binary image, or foreground regions of a color image of indoor or outdoor objects. Last, I present a new indoor Bluetooth low energy beacon based augmented reality system that integrates a 3D moving object compression method that prioritizes the content of interest.Doctor of Philosoph

    Towards Highly-Integrated Stereovideoscopy for \u3ci\u3ein vivo\u3c/i\u3e Surgical Robots

    When compared to traditional surgery, laparoscopic procedures result in better patient outcomes: shorter recovery, reduced post-operative pain, and less trauma to incisioned tissue. Unfortunately, laparoscopic procedures require specialized training for surgeons, as these minimally-invasive procedures provide an operating environment that has limited dexterity and limited vision. Advanced surgical robotics platforms can make minimally-invasive techniques safer and easier for the surgeon to complete successfully. The most common type of surgical robotics platforms -- the laparoscopic robots -- accomplish this with multi-degree-of-freedom manipulators that are capable of a diversified set of movements when compared to traditional laparoscopic instruments. Also, these laparoscopic robots allow for advanced kinematic translation techniques that allow the surgeon to focus on the surgical site, while the robot calculates the best possible joint positions to complete any surgical motion. An important component of these systems is the endoscopic system used to transmit a live view of the surgical environment to the surgeon. Coupled with 3D high-definition endoscopic cameras, the entirety of the platform, in effect, eliminates the peculiarities associated with laparoscopic procedures, which allows less-skilled surgeons to complete minimally-invasive surgical procedures quickly and accurately. A much newer approach to performing minimally-invasive surgery is the idea of using in-vivo surgical robots -- small robots that are inserted directly into the patient through a single, small incision; once inside, an in-vivo robot can perform surgery at arbitrary positions, with a much wider range of motion. While laparoscopic robots can harness traditional endoscopic video solutions, these in-vivo robots require a fundamentally different video solution that is as flexible as possible and free of bulky cables or fiber optics. This requires a miniaturized videoscopy system that incorporates an image sensor with a transceiver; because of severe size constraints, this system should be deeply embedded into the robotics platform. Here, early results are presented from the integration of a miniature stereoscopic camera into an in-vivo surgical robotics platform. A 26mm X 24mm stereo camera was designed and manufactured. The proposed device features USB connectivity and 1280 X 720 resolution at 30 fps. Resolution testing indicates the device performs much better than similarly-priced analog cameras. Suitability of the platform for 3D computer vision tasks -- including stereo reconstruction -- is examined. The platform was also tested in a living porcine model at the University of Nebraska Medical Center. Results from this experiment suggest that while the platform performs well in controlled, static environments, further work is required to obtain usable results in true surgeries. Concluding, several ideas for improvement are presented, along with a discussion of core challenges associated with the platform. Adviser: Lance C. Pérez [Document = 28 Mb

    Realtime Color Stereovision Processing

    Recent developments in aviation have made micro air vehicles (MAVs) a reality. These featherweight palm-sized radio-controlled flying saucers embody the future of air-to-ground combat. No one has ever successfully implemented an autonomous control system for MAVs. Because MAVs are physically small with limited energy supplies, video signals offer superiority over radar for navigational applications. This research takes a step forward in real time machine vision processing. It investigates techniques for implementing a real time stereovision processing system using two miniature color cameras. The effects of poor-quality optics are overcome by a robust algorithm, which operates in real time and achieves frame rates up to 10 fps in ideal conditions. The vision system implements innovative work in the following five areas of vision processing: fast image registration preprocessing, object detection, feature correspondence, distortion-compensated ranging, and multi scale nominal frequency-based object recognition. Results indicate that the system can provide adequate obstacle avoidance feedback for autonomous vehicle control. However, typical relative position errors are about 10%-to high for surveillance applications. The range of operation is also limited to between 6 - 30 m. The root of this limitation is imprecise feature correspondence: with perfect feature correspondence the range would extend to between 0.5 - 30 m. Stereo camera separation limits the near range, while optical resolution limits the far range. Image frame sizes are 160x120 pixels. Increasing this size will improve far range characteristics but will also decrease frame rate. Image preprocessing proved to be less appropriate than precision camera alignment in this application. A proof of concept for object recognition shows promise for applications with more precise object detection. Future recommendations are offered in all five areas of vision processing

    Layered Scene Models from Single Hazy Images

    Méthodes de tatouage robuste pour la protection de l imagerie numerique 3D

    La multiplication des contenus stéréoscopique augmente les risques de piratage numérique. La solution technologique par tatouage relève ce défi. En pratique, le défi d une approche de tatouage est d'atteindre l équilibre fonctionnel entre la transparence, la robustesse, la quantité d information insérée et le coût de calcul. Tandis que la capture et l'affichage du contenu 3D ne sont fondées que sur les deux vues gauche/droite, des représentations alternatives, comme les cartes de disparité devrait également être envisagée lors de la transmission/stockage. Une étude spécifique sur le domaine d insertion optimale devient alors nécessaire. Cette thèse aborde les défis mentionnés ci-dessus. Tout d'abord, une nouvelle carte de disparité (3D video-New Three Step Search- 3DV-SNSL) est développée. Les performances des 3DV-NTSS ont été évaluées en termes de qualité visuelle de l'image reconstruite et coût de calcul. En comparaison avec l'état de l'art (NTSS et FS-MPEG) des gains moyens de 2dB en PSNR et 0,1 en SSIM sont obtenus. Le coût de calcul est réduit par un facteur moyen entre 1,3 et 13. Deuxièmement, une étude comparative sur les principales classes héritées des méthodes de tatouage 2D et de leurs domaines d'insertion optimales connexes est effectuée. Quatre méthodes d'insertion appartenant aux familles SS, SI et hybride (Fast-IProtect) sont considérées. Les expériences ont mis en évidence que Fast-IProtect effectué dans la nouvelle carte de disparité (3DV-NTSS) serait suffisamment générique afin de servir une grande variété d'applications. La pertinence statistique des résultats est donnée par les limites de confiance de 95% et leurs erreurs relatives inférieurs er <0.1The explosion in stereoscopic video distribution increases the concerns over its copyright protection. Watermarking can be considered as the most flexible property right protection technology. The watermarking applicative issue is to reach the trade-off between the properties of transparency, robustness, data payload and computational cost. While the capturing and displaying of the 3D content are solely based on the two left/right views, some alternative representations, like the disparity maps should also be considered during transmission/storage. A specific study on the optimal (with respect to the above-mentioned properties) insertion domain is also required. The present thesis tackles the above-mentioned challenges. First, a new disparity map (3D video-New Three Step Search - 3DV-NTSS) is designed. The performances of the 3DV-NTSS were evaluated in terms of visual quality of the reconstructed image and computational cost. When compared with state of the art methods (NTSS and FS-MPEG) average gains of 2dB in PSNR and 0.1 in SSIM are obtained. The computational cost is reduced by average factors between 1.3 and 13. Second, a comparative study on the main classes of 2D inherited watermarking methods and on their related optimal insertion domains is carried out. Four insertion methods are considered; they belong to the SS, SI and hybrid (Fast-IProtect) families. The experiments brought to light that the Fast-IProtect performed in the new disparity map domain (3DV-NTSS) would be generic enough so as to serve a large variety of applications. The statistical relevance of the results is given by the 95% confidence limits and their underlying relative errors lower than er<0.1EVRY-INT (912282302) / SudocSudocFranceF

    Percepción basada en visión estereoscópica, planificación de trayectorias y estrategias de navegación para exploración robótica autónoma

    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Ingeniería del Software e Inteligencia artificial, leída el 13-05-2015En esta tesis se trata el desarrollo de una estrategia de navegación autónoma basada en visión artificial para exploración robótica autónoma de superficies planetarias. Se han desarrollado una serie de subsistemas, módulos y software específicos para la investigación desarrollada en este trabajo, ya que la mayoría de las herramientas existentes para este dominio son propiedad de agencias espaciales nacionales, no accesibles a la comunidad científica. Se ha diseñado una arquitectura software modular multi-capa con varios niveles jerárquicos para albergar el conjunto de algoritmos que implementan la estrategia de navegación autónoma y garantizar la portabilidad del software, su reutilización e independencia del hardware. Se incluye también el diseño de un entorno de trabajo destinado a dar soporte al desarrollo de las estrategias de navegación. Éste se basa parcialmente en herramientas de código abierto al alcance de cualquier investigador o institución, con las necesarias adaptaciones y extensiones, e incluye capacidades de simulación 3D, modelos de vehículos robóticos, sensores, y entornos operacionales, emulando superficies planetarias como Marte, para el análisis y validación a nivel funcional de las estrategias de navegación desarrolladas. Este entorno también ofrece capacidades de depuración y monitorización.La presente tesis se compone de dos partes principales. En la primera se aborda el diseño y desarrollo de las capacidades de autonomía de alto nivel de un rover, centrándose en la navegación autónoma, con el soporte de las capacidades de simulación y monitorización del entorno de trabajo previo. Se han llevado a cabo un conjunto de experimentos de campo, con un robot y hardware real, detallándose resultados, tiempo de procesamiento de algoritmos, así como el comportamiento y rendimiento del sistema en general. Como resultado, se ha identificado al sistema de percepción como un componente crucial dentro de la estrategia de navegación y, por tanto, el foco principal de potenciales optimizaciones y mejoras del sistema. Como consecuencia, en la segunda parte de este trabajo, se afronta el problema de la correspondencia en imágenes estéreo y reconstrucción 3D de entornos naturales no estructurados. Se han analizado una serie de algoritmos de correspondencia, procesos de imagen y filtros. Generalmente se asume que las intensidades de puntos correspondientes en imágenes del mismo par estéreo es la misma. Sin embargo, se ha comprobado que esta suposición es a menudo falsa, a pesar de que ambas se adquieren con un sistema de visión compuesto de dos cámaras idénticas. En consecuencia, se propone un sistema experto para la corrección automática de intensidades en pares de imágenes estéreo y reconstrucción 3D del entorno basado en procesos de imagen no aplicados hasta ahora en el campo de la visión estéreo. Éstos son el filtrado homomórfico y la correspondencia de histogramas, que han sido diseñados para corregir intensidades coordinadamente, ajustando una imagen en función de la otra. Los resultados se han podido optimizar adicionalmente gracias al diseño de un proceso de agrupación basado en el principio de continuidad espacial para eliminar falsos positivos y correspondencias erróneas. Se han estudiado los efectos de la aplicación de dichos filtros, en etapas previas y posteriores al proceso de correspondencia, con eficiencia verificada favorablemente. Su aplicación ha permitido la obtención de un mayor número de correspondencias válidas en comparación con los resultados obtenidos sin la aplicación de los mismos, consiguiendo mejoras significativas en los mapas de disparidad y, por lo tanto, en los procesos globales de percepción y reconstrucción 3D.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu

    Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

    3D Object Detection Via 2D LiDAR Corrected Pseudo LiDAR Point Clouds

    The age of automation has led to significant research in the field of Machine Learning and Computer Vision. Computer Vision tasks fundamentally rely on information from digital images, videos, texts and sensors to build intelligent systems. In recent times, deep neural networks combined with computer vision algorithms have been successful in developing 2D object detection methods with a potential to be applied in real-time systems. However, performing fast and accurate 3D object detection is still a challenging problem. The automotive industry is shifting gears towards building electric vehicles, connected cars, sustainable vehicles and is expected to have a high growth potential in the coming years. 3D object detection is a critical task for autonomous driving vehicles and robots as it helps moving objects in the scene to effectively plan their motion around other objects. 3D object detection tasks leverage image data from camera and/or 3D point clouds obtained from expensive 3D LiDAR sensors to achieve high detection accuracy. The 3D LiDAR sensor provides accurate depth information that is required to estimate the third dimension of the objects in the scene. Typically, a 64 beam LiDAR sensor mounted on a self-driving car cost around $75000. In this thesis, we propose a cost-effective approach for 3D object detection using a low-cost 2D LiDAR sensor. We collectively use the single beam point cloud data from 2D LiDAR for depth correction in pseudo-LiDAR. The proposed methods are tested on the KITTI 3D object detection dataset
