27 research outputs found

    Smart cmos image sensor for 3d measurement

    Get PDF
    3D measurements are concerned with extracting visual information from the geometry of visible surfaces and interpreting the 3D coordinate data thus obtained, to detect or track the position or reconstruct the profile of an object, often in real time. These systems necessitate image sensors with high accuracy of position estimation and high frame rate of data processing for handling large volumes of data. A standard imager cannot address the requirements of fast image acquisition and processing, which are the two figures of merit for 3D measurements. Hence, dedicated VLSI imager architectures are indispensable for designing these high performance sensors. CMOS imaging technology provides potential to integrate image processing algorithms on the focal plane of the device, resulting in smart image sensors, capable of achieving better processing features in handling massive image data. The objective of this thesis is to present a new architecture of smart CMOS image sensor for real time 3D measurement using the sheet-beam projection methods based on active triangulation. Proposing the vision sensor as an ensemble of linear sensor arrays, all working in parallel and processing the entire image in slices, the complexity of the image-processing task shifts from O (N 2 ) to O (N). Inherent also in the design is the high level of parallelism to achieve massive parallel processing at high frame rate, required in 3D computation problems. This work demonstrates a prototype of the smart linear sensor incorporating full testability features to test and debug both at device and system levels. The salient features of this work are the asynchronous position to pulse stream conversion, multiple images binarization, high parallelism and modular architecture resulting in frame rate and sub-pixel resolution suitable for real time 3D measurements

    Video Stream Adaptation In Computer Vision Systems

    Get PDF
    Computer Vision (CV) has been deployed recently in a wide range of applications, including surveillance and automotive industries. According to a recent report, the market for CV technologies will grow to $33.3 billion by 2019. Surveillance and automotive industries share over 20% of this market. This dissertation considers the design of real-time CV systems with live video streaming, especially those over wireless and mobile networks. Such systems include video cameras/sensors and monitoring stations. The cameras should adapt their captured videos based on the events and/or available resources and time requirement. The monitoring station receives video streams from all cameras and run CV algorithms for decisions, warnings, control, and/or other actions. Real-time CV systems have constraints in power, computational, and communicational resources. Most video adaptation techniques considered the video distortion as the primary metric. In CV systems, however, the main objective is enhancing the event/object detection/recognition/tracking accuracy. The accuracy can essentially be thought of as the quality perceived by machines, as opposed to the human perceptual quality. High-Efficiency Video Coding (HEVC) is a recent encoding standard that seeks to address the limited communication bandwidth problem as a result of the popularity of High Definition (HD) videos. Unfortunately, HEVC adopts algorithms that greatly slow down the encoding process, and thus results in complications in real-time systems. This dissertation presents a method for adapting live video streams to limited and varying network bandwidth and energy resources. It analyzes and compares the rate-accuracy and rate-energy characteristics of various video streams adaptation techniques in CV systems. We model the video capturing, encoding, and transmission aspects and then provide an overall model of the power consumed by the video cameras and/or sensors. In addition to modeling the power consumption, we model the achieved bitrate of video encoding. We validate and analyze the power consumption models of each phase as well as the aggregate power consumption model through extensive experiments. The analysis includes examining individual parameters separately and examining the impacts of changing more than one parameter at a time. For HEVC, we develop an algorithm that predicts the size of the block without iterating through the exhaustive Rate Distortion Optimization (RDO) method. We demonstrate the effectiveness of the proposed algorithm in comparison with existing algorithms. The proposed algorithm achieves approximately 5 times the encoding speed of the RDO algorithm and 1.42 times the encoding speed of the fastest analyzed algorithm

    FPGA-based stereo vision system for autonomous driving

    Get PDF
    The project consists on the design and implementation of a real-time stereo vision image sensor oriented to autonomous driving systems using an FPGA. The function of this sensor is to output a real-time depth image from an input of two grayscale luminance images, which can make further processing much easier and faster. The final objective of the project is to develop a standalone prototype for the implementation of the system on an autonomous vehicle, but it will be developed on an existing FPGA platform to prove its viability. Two low-cost digital cameras will be used as input sensors, and the output image will be transmitted to a PC

    Event-based neuromorphic stereo vision

    Full text link

    Development and implementation of a selective change-driven vision sensor for high speed movement analysis

    Get PDF
    Un sistema de vision artificial esta compuesto, en su forma más basica, por un sensor VLSI, habitualmente fabricado en tecnología CMOS o CCD, y una etapa de procesado. En la gran mayoría de los sistemas de visión artificial implementados hoy en día la etapa sensora del sistema consiste en un sensor de imágenes tradicional. Este tipo de sensores trabajan bajo unos principios muy simples y conocidos: el nivel de iluminación del entorno es muestreado y transmitido a intervalos de tiempo regulares; y todos los píxeles de la matriz, sin excepción, son transmitidos secuencialmente y en orden. Esto es así aunque no se hayan producido cambios en la escena bajo observación. Esto implica que una gran parte de la información que se genera y transmite puede ser considerada como redundante. En muchos casos esta estrategia es la más adecuada. Algunos ejemplos de ello son los escáneres, los sistemas de captura de imágenes para diagnóstico médico o los sistemas de video para entretenimiento. Todas estas aplicaciones necesitan la mayor cantidad de información posible sobre el entorno, aunque este no cambie o muestre variaciones muy pequeñas en intervalos de tiempo largos. Para otro tipo de aplicaciones, como los sistemas de visión artificial o las redes de sensores inalámbricas, la gran cantidad de información redundante que genera y transmite un sensor tradicional de imágenes puede convertirse en una limitación para la implementación de sistemas en muchos entornos reales. Muchos sistemas de visión biológicos trabajan de manera completamente distinta a los sensores de captura de imágenes tradicionales. Una de sus principales características es que las celdas sensibles (el equivalente de los píxeles en tecnología de silicio) reaccionan de manera independiente y asíncrona a los cambios de iluminación. Tomando como punto de partida los trabajos de C.Mead y M.Mahowald realizados a finales de los años 80, las últimas dos décadas han presenciado avances muy significativos en el diseño de sensores de visión, todos estos fundamentalmente orientados a transmitir y procesar solo la información considerada importante o relevante dentro de la escena bajo análisis. La mayor parte de estos diseños han tomado, en mayor o menor medida, el funcionamiento del sistema biológico de visión como base de sus desarrollos. El objetivo de muchos de los trabajos realizados en este área es imitar de la mejor manera posible, y mediante las más avanzadas tecnologías de silicio, el comportamiento de los sistemas biológicos en sus facetas visual, auditiva y cognitiva. Otros trabajos han seguido otra filosofía, tomando la biología como fuente de inspiración, pero no como un objetivo en sí mismo. La estrategia de visión selectiva guiada por cambios (SCD por sus siglas en inglés) pertenece a este último grupo. Orientada a la detección y análisis de objetos moviéndose a alta velocidad, la estrategia SCD asume que solo un parte de la imagen muestra cambios entre dos frames consecutivos, mientras que la mayor parte de los píxeles permanecen igual. Esta hipótesis cobra especial sentido cuando se capturan frames a alta velocidad. Teniendo en cuenta que muchos de los píxeles de una determinada imagen no han cambiado respecto de sus valores en las imágenes anteriores de la secuencia, los algoritmos de procesado pueden utilizar la información ya almacenada para realizar sus cálculos. Es decir, que esta información redundante podría no transmitirse. Se podría incluso considerar que los píxeles de la matriz que muestran cambios pequeños, tendrán poco impacto en los resultados de los algoritmos. En la estrategia SCD estas hipótesis son trabajadas de forma tal que se consigue reducir sustancialmente la cantidad de información transmitida por el sensor, y por lo tanto la cantidad de información procesada fuera del mismo. En la estrategia SCD ya no se trabaja con imágenes de forma estática, sino que la información es transportada y transmitida en la forma de un flujo de píxeles. Estos píxeles son seleccionados de forma tal que contengan solo la información con cambios temporales relevantes dentro de la escena bajo análisis. Bajo estas nuevas condiciones, sería necesario el rediseño de muchos de los algoritmos de visión tradicionales, ya que estos trabajan en base a una secuencia de imágenes estáticas transmitidas a intrevalos de tiempo regulares. El paradigma de procesado por flujo de datos (data-flow processing) parace ajustarse de manera más adecuada a esta nueva forma de trabajo. En esta tesis, se presenta el primer sensor de visión basado en los principios SCD. Dicho sensor consiste en una matriz de 32x32 píxeles fabricada en tecnología CMOS de 350 nm. La mayor dificultad del diseño microelectrónico presentado en esta tesis es el diseño del bloque que selecciona el pixel de mayor cambio entre todos los de la matriz. Este problema ha sido resuelto mediante un circuito winner-takes-all (WTA). La propuesta de un circuito digital para la selección de un unico ganador en una matriz WTA compuesta por una gran cantidad de celdas es uno de los aportes originales de esta tesis. El sensor fue empotrado en un sistema de visión artifical portátil basado en un microcontrolador de 32 bits trabajando a 80 MHz. Este sistema ha sido utilizado para la implementación de un algoritmo de seguimiento de objetos así como para la caracterización misma del sensor. Con la experimentación presentada en esta tesis se demuestra como una sistema SCD simple y portátil, como el desarrollado aquí, se puede hacer el seguimiento de un objeto en movimiento con la resolución temporal de una cámara de alta velocidad trabajando a 2000 frames por segundo, pero utilizando solo el ancho de banda que utilizaría una cámara estándar de baja velocidad trabajando a 25 frames por segundo. Esto demuestra claramente que la utilización de la estrategia SCD implica una reducción substancial en los requisitos de ancho de banda y potencia de cálculo del sistema.An artificial vision system is basically composed of a sensor, usually in VLSI CMOS or CCD technology, and a processing stage. Nowadays, in the vast majority of real-world implementations, the sensing part of the system is a traditional frame-based imager. These types of image sensors work under some very well known principles: the illumination level of the surrounding environment is sampled and transmitted at regular time intervals, even if no new relevant information is produced in the scene under analysis. A traditional frame-based image sensor is usually not able to evaluate if the information coming from a certain pixel is relevant or irrelevant. Since they do not perform any kind of analysis of the information being captured, the illumination level of all the pixels in the sensing matrix must be transmitted to be analyzed and processed at the processing stage. Many times, a huge amount of redundant non-relevant information is transmitted. The consequences of this are that valuable resources such as bandwidth and processing power are wasted. Furthermore, depending on the particular context and hardware configuration, the processing hardware may not even be able to cope with all the generated data. Many of these problems can be overcome with the design of new sensing and readout strategies focused on the selection of relevant changing information. Over the last decade many relevant improvements have been achieved in this direction. Taking the biological vision system as a general guide and inspiration, an increasing number of very-large scale of integration (VLSI) vision sensors have been, and are being designed where the sparcity, asynchrony and event-driven generation of the information coming from the visual field is taken into account. It is within this framework that Selective Change-Driven Vision (SCD) emerges as an innovative and original proposal. SCD Vision relies on the idea that a pixel showing a large change in intensity is an indicator of fast movements, and object edges around it. An SCD sensor is frame-based in the sense that successive frames are captured at a very high rate, but pixel readout is performed in an entirely different manner. The pixels are read out in order of relevance. The larger the change in illumination, the more relevant the pixel is considered to be. Not all the pixels in the sensing matrix need to be transmitted. As the pixels showing relevant changing information are transmitted first, a small subset of pixels might be read out, these being the ones conveying the most important information of the scene under analysis. In this thesis, the first VLSI CMOS vision sensor following SCD principles is presented. A 32x32 pixel matrix was implemented and fabricated in 0.35 μm 4-metal 2-poly silicon technology. The most challenging part of this microelectronic design was the decision block, where the pixels undergoing the largest changes in the sensing matrix are selected. This problem was solved by means of a winner-takes-all (WTA) circuit. A large WTA network together with a proposal for single winner selection was designed, implemented and its behaviour characterized. The designed sensor was embedded into a small, but powerful artificial vision system based on a 32-bit microcontroller. This system was used to implement tracking algorithms as well as to characterize the main basic features of the sensor. The experimentation carried out in this thesis shows how a simple SCD system based on our SCD sensor is able to track fast moving objects with just the bandwidth requirements of a low speed 25 fps standard camera, but with the time resolution and performance of a high-speed camera working at 2000 fps. This clearly demonstrates that bandwidth and processing requirements are substantially reduced when SCD hardware is used

    Development of a Full-Field Time-of-Flight Range Imaging System

    Get PDF
    A full-field, time-of-flight, image ranging system or 3D camera has been developed from a proof-of-principle to a working prototype stage, capable of determining the intensity and range for every pixel in a scene. The system can be adapted to the requirements of various applications, producing high precision range measurements with sub-millimetre resolution, or high speed measurements at video frame rates. Parallel data acquisition at each pixel provides high spatial resolution independent of the operating speed. The range imaging system uses a heterodyne technique to indirectly measure time of flight. Laser diodes with highly diverging beams are intensity modulated at radio frequencies and used to illuminate the scene. Reflected light is focused on to an image intensifier used as a high speed optical shutter, which is modulated at a slightly different frequency to that of the laser source. The output from the shutter is a low frequency beat signal, which is sampled by a digital video camera. Optical propagation delay is encoded into the phase of the beat signal, hence from a captured time variant intensity sequence, the beat signal phase can be measured to determine range for every pixel in the scene. A direct digital synthesiser (DDS) is designed and constructed, capable of generating up to three outputs at frequencies beyond 100 MHz with the relative frequency stability in excess of nine orders of magnitude required to control the laser and shutter modulation. Driver circuits were also designed to modulate the image intensifier photocathode at 50 Vpp, and four laser diodes with a combined power output of 320 mW, both over a frequency range of 10-100 MHz. The DDS, laser, and image intensifier response are characterised. A unique method of measuring the image intensifier optical modulation response is developed, requiring the construction of a pico-second pulsed laser source. This characterisation revealed deficiencies in the measured responses, which were mitigated through hardware modifications where possible. The effects of remaining imperfections, such as modulation waveform harmonics and image intensifier irising, can be calibrated and removed from the range measurements during software processing using the characterisation data. Finally, a digital method of generating the high frequency modulation signals using a FPGA to replace the analogue DDS is developed, providing a highly integrated solution, reducing the complexity, and enhancing flexibility. In addition, a novel modulation coding technique is developed to remove the undesirable influence of waveform harmonics from the range measurement without extending the acquisition time. When combined with a proposed modification to the laser illumination source, the digital system can enhance range measurement precision and linearity. From this work, a flexible full-field image ranging system is successfully realised. The system is demonstrated operating in a high precision mode with sub-millimetre depth resolution, and also in a high speed mode operating at video update rates (25 fps), in both cases providing high (512 512) spatial resolution over distances of several metres

    Information selection and fusion in vision systems

    Get PDF
    Handling the enormous amounts of data produced by data-intensive imaging systems, such as multi-camera surveillance systems and microscopes, is technically challenging. While image and video compression help to manage the data volumes, they do not address the basic problem of information overflow. In this PhD we tackle the problem in a more drastic way. We select information of interest to a specific vision task, and discard the rest. We also combine data from different sources into a single output product, which presents the information of interest to end users in a suitable, summarized format. We treat two types of vision systems. The first type is conventional light microscopes. During this PhD, we have exploited for the first time the potential of the curvelet transform for image fusion for depth-of-field extension, allowing us to combine the advantages of multi-resolution image analysis for image fusion with increased directional sensitivity. As a result, the proposed technique clearly outperforms state-of-the-art methods, both on real microscopy data and on artificially generated images. The second type is camera networks with overlapping fields of view. To enable joint processing in such networks, inter-camera communication is essential. Because of infrastructure costs, power consumption for wireless transmission, etc., transmitting high-bandwidth video streams between cameras should be avoided. Fortunately, recently designed 'smart cameras', which have on-board processing and communication hardware, allow distributing the required image processing over the cameras. This permits compactly representing useful information from each camera. We focus on representing information for people localization and observation, which are important tools for statistical analysis of room usage, quick localization of people in case of building fires, etc. To further save bandwidth, we select which cameras should be involved in a vision task and transmit observations only from the selected cameras. We provide an information-theoretically founded framework for general purpose camera selection based on the Dempster-Shafer theory of evidence. Applied to tracking, it allows tracking people using a dynamic selection of as little as three cameras with the same accuracy as when using up to ten cameras

    Nouvelle génération de systèmes de vision temps réel à grande dynamique

    Get PDF
    Cette thèse s intègre dans le cadre du projet européen EUREKA "High Dynamic Range - Low NoiseCMOS imagers", qui a pour but de développer de nouvelles approches de fabrication de capteursd images CMOS à haute performance. L objectif de la thèse est la conception d un système de visiontemps réel à grande gamme dynamique (HDR). L axe principal sera la reconstruction, en temps réelet à la cadence du capteur (60 images/sec), d une vidéo à grande dynamique sur une architecturede calcul embarquée.La plupart des capteurs actuels produisent une image numérique qui n est pas capable de reproduireles vraies échelles d intensités lumineuses du monde réel. De la même manière, les écrans, impri-mantes et afficheurs courants ne permettent pas la restitution effective d une gamme tonale étendue.L approche envisagée dans cette thèse est la capture multiple d images acquises avec des tempsd exposition différents permettant de palier les limites des dispositifs actuels.Afin de concevoir un système capable de s adapter temporellement aux conditions lumineuses,l étude d algorithmes dédiés à la grande dynamique, tels que les techniques d auto exposition, dereproduction de tons, en passant par la génération de cartes de radiances est réalisée. Le nouveausystème matériel de type "smart caméra" est capable de capturer, générer et restituer du contenu àgrande dynamique dans un contexte de parallélisation et de traitement des flux vidéos en temps réelThis thesis is a part of the EUREKA European project called "High Dynamic Range - Low NoiseCMOS imagers", which developped new approaches to design high performance CMOS sensors.The purpose of this thesis is to design a real-time high dynamic range (HDR) vision system. Themain focus will be the real-time video reconstruction at 60 frames/sec in an embedded architecture.Most of the sensors produce a digital image that is not able to reproduce the real world light inten-sities. Similarly, monitors, printers and current displays do not recover of a wide tonal range. Theapproach proposed in this thesis is multiple acquisitions, taken with different exposure times, to over-come the limitations of the standard devices.To temporally adapt the light conditions, the study of algorithms dedicated to the high dynamic rangetechniques is performed. Our new smart camera system is able to capture, generate and showcontent in a highly parallelizable context for a real time processingDIJON-BU Doc.électronique (212319901) / SudocSudocFranceF

    Robust and real-time hand detection and tracking in monocular video

    Get PDF
    In recent years, personal computing devices such as laptops, tablets and smartphones have become ubiquitous. Moreover, intelligent sensors are being integrated into many consumer devices such as eyeglasses, wristwatches and smart televisions. With the advent of touchscreen technology, a new human-computer interaction (HCI) paradigm arose that allows users to interface with their device in an intuitive manner. Using simple gestures, such as swipe or pinch movements, a touchscreen can be used to directly interact with a virtual environment. Nevertheless, touchscreens still form a physical barrier between the virtual interface and the real world. An increasingly popular field of research that tries to overcome this limitation, is video based gesture recognition, hand detection and hand tracking. Gesture based interaction allows the user to directly interact with the computer in a natural manner by exploring a virtual reality using nothing but his own body language. In this dissertation, we investigate how robust hand detection and tracking can be accomplished under real-time constraints. In the context of human-computer interaction, real-time is defined as both low latency and low complexity, such that a complete video frame can be processed before the next one becomes available. Furthermore, for practical applications, the algorithms should be robust to illumination changes, camera motion, and cluttered backgrounds in the scene. Finally, the system should be able to initialize automatically, and to detect and recover from tracking failure. We study a wide variety of existing algorithms, and propose significant improvements and novel methods to build a complete detection and tracking system that meets these requirements. Hand detection, hand tracking and hand segmentation are related yet technically different challenges. Whereas detection deals with finding an object in a static image, tracking considers temporal information and is used to track the position of an object over time, throughout a video sequence. Hand segmentation is the task of estimating the hand contour, thereby separating the object from its background. Detection of hands in individual video frames allows us to automatically initialize our tracking algorithm, and to detect and recover from tracking failure. Human hands are highly articulated objects, consisting of finger parts that are connected with joints. As a result, the appearance of a hand can vary greatly, depending on the assumed hand pose. Traditional detection algorithms often assume that the appearance of the object of interest can be described using a rigid model and therefore can not be used to robustly detect human hands. Therefore, we developed an algorithm that detects hands by exploiting their articulated nature. Instead of resorting to a template based approach, we probabilistically model the spatial relations between different hand parts, and the centroid of the hand. Detecting hand parts, such as fingertips, is much easier than detecting a complete hand. Based on our model of the spatial configuration of hand parts, the detected parts can be used to obtain an estimate of the complete hand's position. To comply with the real-time constraints, we developed techniques to speed-up the process by efficiently discarding unimportant information in the image. Experimental results show that our method is competitive with the state-of-the-art in object detection while providing a reduction in computational complexity with a factor 1 000. Furthermore, we showed that our algorithm can also be used to detect other articulated objects such as persons or animals and is therefore not restricted to the task of hand detection. Once a hand has been detected, a tracking algorithm can be used to continuously track its position in time. We developed a probabilistic tracking method that can cope with uncertainty caused by image noise, incorrect detections, changing illumination, and camera motion. Furthermore, our tracking system automatically determines the number of hands in the scene, and can cope with hands entering or leaving the video canvas. We introduced several novel techniques that greatly increase tracking robustness, and that can also be applied in other domains than hand tracking. To achieve real-time processing, we investigated several techniques to reduce the search space of the problem, and deliberately employ methods that are easily parallelized on modern hardware. Experimental results indicate that our methods outperform the state-of-the-art in hand tracking, while providing a much lower computational complexity. One of the methods used by our probabilistic tracking algorithm, is optical flow estimation. Optical flow is defined as a 2D vector field describing the apparent velocities of objects in a 3D scene, projected onto the image plane. Optical flow is known to be used by many insects and birds to visually track objects and to estimate their ego-motion. However, most optical flow estimation methods described in literature are either too slow to be used in real-time applications, or are not robust to illumination changes and fast motion. We therefore developed an optical flow algorithm that can cope with large displacements, and that is illumination independent. Furthermore, we introduce a regularization technique that ensures a smooth flow-field. This regularization scheme effectively reduces the number of noisy and incorrect flow-vector estimates, while maintaining the ability to handle motion discontinuities caused by object boundaries in the scene. The above methods are combined into a hand tracking framework which can be used for interactive applications in unconstrained environments. To demonstrate the possibilities of gesture based human-computer interaction, we developed a new type of computer display. This display is completely transparent, allowing multiple users to perform collaborative tasks while maintaining eye contact. Furthermore, our display produces an image that seems to float in thin air, such that users can touch the virtual image with their hands. This floating imaging display has been showcased on several national and international events and tradeshows. The research that is described in this dissertation has been evaluated thoroughly by comparing detection and tracking results with those obtained by state-of-the-art algorithms. These comparisons show that the proposed methods outperform most algorithms in terms of accuracy, while achieving a much lower computational complexity, resulting in a real-time implementation. Results are discussed in depth at the end of each chapter. This research further resulted in an international journal publication; a second journal paper that has been submitted and is under review at the time of writing this dissertation; nine international conference publications; a national conference publication; a commercial license agreement concerning the research results; two hardware prototypes of a new type of computer display; and a software demonstrator
    corecore