98 research outputs found

    Pixels for focal-plane scale space generation and for high dynamic range imaging

    Get PDF
    Focal-plane processing allows for parallel processing throughout the entire pixel matrix, which can help increasing the speed of vision systems. The fabrication of circuits inside the pixel matrix increases the pixel pitch and reduces the fill factor, which leads to reduced image quality. To take advantage of the focal-plane processing capabilities and minimize image quality reduction, we first consider the inclusion of only two extra transistors in the pixel, allowing for scale space generation at the focal plane. We assess the conditions in which the proposed circuitry is advantageous. We perform a time and energy analysis of this approach in comparison to a digital solution. Considering that a SAR ADC per column is used and the clock frequency is equal to 5.6 MHz, the proposed analysis show that the focal-plane approach is 26 times faster if the digital solution uses 10 processing elements, and 49 times more energy-efficient. Another way of taking advantage of the focal-plane signal processing capability is by using focal-plane processing for increasing image quality itself, such as in the case of high dynamic range imaging pixels. This work also presents the design and study of a pixel that captures high dynamic range images by sensing the matrix average luminance, and then adjusting the integration time of each pixel according to the global average and to the local value of the pixel. This pixel was implemented considering small structural variations, such as different photodiode sizes for global average luminance measurement. Schematic and post-layout simulations were performed with the implemented pixel using an input image of 76 dB, presenting results with details in both dark and bright image areas.O processamento no plano focal de imageadores permite que a imagem capturada seja processada em paralelo por toda a matrix de pixels, característica que pode aumentar a velocidade de sistemas de visão. Ao fabricar circuitos dentro da matrix de pixels, o tamanho do pixel aumenta e a razão entre área fotossensível e a área total do pixel diminui, reduzindo a qualidade da imagem. Para utilizar as vantagens do processamento no plano focal e minimizar a redução da qualidade da imagem, a primeira parte da tese propõe a inclusão de dois transistores no pixel, o que permite que o espaço de escalas da imagem capturada seja gerado. Nós então avaliamos em quais condições o circuito proposto é vantajoso. Nós analisamos o tempo de processamento e o consumo de energia dessa proposta em comparação com uma solução digital. Utilizando um conversor de aproximações sucessivas com frequência de 5.6 MHz, a análise proposta mostra que a abordagem no plano focal é 26 vezes mais rápida que o circuito digital com 10 elementos de processamento, e consome 49 vezes menos energia. Outra maneira de utilizar processamento no plano focal consiste em aplicá-lo para melhorar a qualidade da imagem, como na captura de imagens em alta faixa dinâmica. Esta tese também apresenta o estudo e projeto de um pixel que realiza a captura de imagens em alta faixa dinâmica através do ajuste do tempo de integração de cada pixel utilizando a iluminação média e o valor do próprio pixel. Esse pixel foi projetado considerando pequenas variações estruturais, como diferentes tamanhos do fotodiodo que realiza a captura do valor de iluminação médio. Simulações de esquemático e pós-layout foram realizadas com o pixel projetado utilizando uma imagem com faixa dinâmica de 76 dB, apresentando resultados com detalhes tanto na parte clara como na parte escura da imagem

    Use of Coherent Point Drift in computer vision applications

    Get PDF
    This thesis presents the novel use of Coherent Point Drift in improving the robustness of a number of computer vision applications. CPD approach includes two methods for registering two images - rigid and non-rigid point set approaches which are based on the transformation model used. The key characteristic of a rigid transformation is that the distance between points is preserved, which means it can be used in the presence of translation, rotation, and scaling. Non-rigid transformations - or affine transforms - provide the opportunity of registering under non-uniform scaling and skew. The idea is to move one point set coherently to align with the second point set. The CPD method finds both the non-rigid transformation and the correspondence distance between two point sets at the same time without having to use a-priori declaration of the transformation model used. The first part of this thesis is focused on speaker identification in video conferencing. A real-time, audio-coupled video based approach is presented, which focuses more on the video analysis side, rather than the audio analysis that is known to be prone to errors. CPD is effectively utilised for lip movement detection and a temporal face detection approach is used to minimise false positives if face detection algorithm fails to perform. The second part of the thesis is focused on multi-exposure and multi-focus image fusion with compensation for camera shake. Scale Invariant Feature Transforms (SIFT) are first used to detect keypoints in images being fused. Subsequently this point set is reduced to remove outliers, using RANSAC (RANdom Sample Consensus) and finally the point sets are registered using CPD with non-rigid transformations. The registered images are then fused with a Contourlet based image fusion algorithm that makes use of a novel alpha blending and filtering technique to minimise artefacts. The thesis evaluates the performance of the algorithm in comparison to a number of state-of-the-art approaches, including the key commercial products available in the market at present, showing significantly improved subjective quality in the fused images. The final part of the thesis presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR task and may capture vehicles at different approaching angles. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximise the reliability of the final outcome. Experimental results are provided to prove that the proposed system demonstrates an accuracy in excess of 95% when tested on real CCTV footage with no prior camera calibration

    Graphics Insertions into Real Video for Market Research

    Get PDF

    Correspondence problems in computer vision : novel models, numerics, and applications

    Get PDF
    Correspondence problems like optic flow belong to the fundamental problems in computer vision. Here, one aims at finding correspondences between the pixels in two (or more) images. The correspondences are described by a displacement vector field that is often found by minimising an energy (cost) function. In this thesis, we present several contributions to the energy-based solution of correspondence problems: (i) We start by developing a robust data term with a high degree of invariance under illumination changes. Then, we design an anisotropic smoothness term that works complementary to the data term, thereby avoiding undesirable interference. Additionally, we propose a simple method for determining the optimal balance between the two terms. (ii) When discretising image derivatives that occur in our continuous models, we show that adapting one-sided upwind discretisations from the field of hyperbolic differential equations can be beneficial. To ensure a fast solution of the nonlinear system of equations that arises when minimising the energy, we use the recent fast explicit diffusion (FED) solver in an explicit gradient descent scheme. (iii) Finally, we present a novel application of modern optic flow methods where we align exposure series used in high dynamic range (HDR) imaging. Furthermore, we show how the alignment information can be used in a joint super-resolution and HDR method.Korrespondenzprobleme wie der optische Fluß, gehören zu den fundamentalen Problemen im Bereich des maschinellen Sehens (Computer Vision). Hierbei ist das Ziel, Korrespondenzen zwischen den Pixeln in zwei (oder mehreren) Bildern zu finden. Die Korrespondenzen werden durch ein Verschiebungsvektorfeld beschrieben, welches oft durch Minimierung einer Energiefunktion (Kostenfunktion) gefunden wird. In dieser Arbeit stellen wir mehrere Beiträge zur energiebasierten Lösung von Korrespondenzproblemen vor: (i) Wir beginnen mit der Entwicklung eines robusten Datenterms, der ein hohes Maß an Invarianz unter Beleuchtungsänderungen aufweißt. Danach entwickeln wir einen anisotropen Glattheitsterm, der komplementär zu dem Datenterm wirkt und deshalb keine unerwünschten Interferenzen erzeugt. Zusätzlich schlagen wir eine einfache Methode vor, die es erlaubt die optimale Balance zwischen den beiden Termen zu bestimmen. (ii) Im Zuge der Diskretisierung von Bildableitungen, die in unseren kontinuierlichen Modellen auftauchen, zeigen wir dass es hilfreich sein kann, einseitige upwind Diskretisierungen aus dem Bereich hyperbolischer Differentialgleichungen zu übernehmen. Um eine schnelle Lösung des nichtlinearen Gleichungssystems, dass bei der Minimierung der Energie auftaucht, zu gewährleisten, nutzen wir den kürzlich vorgestellten fast explicit diffusion (FED) Löser im Rahmen eines expliziten Gradientenabstiegsschemas. (iii) Schließlich stellen wir eine neue Anwendung von modernen optischen Flußmethoden vor, bei der Belichtungsreihen für high dynamic range (HDR) Bildgebung registriert werden. Außerdem zeigen wir, wie diese Registrierungsinformation in einer kombinierten super-resolution und HDR Methode genutzt werden kann

    Real-Time Algorithms for High Dynamic Range Video

    Full text link
    A recurring problem in capturing video is the scene having a range of brightness values that exceeds the capabilities of the capturing device. An example would be a video camera in a bright outside area, directed at the entrance of a building. Because of the potentially big brightness difference, it may not be possible to capture details of the inside of the building and the outside simultaneously using just one shutter speed setting. This results in under- and overexposed pixels in the video footage. The approach we follow in this thesis to overcome this problem is temporal exposure bracketing, i.e., using a set of images captured in quick sequence at different shutter settings. Each image then captures one facet of the scene's brightness range. When fused together, a high dynamic range (HDR) video frame is created that reveals details in dark and bright regions simultaneously. The process of creating a frame in an HDR video can be thought of as a pipeline where the output of each step is the input to the subsequent one. It begins by capturing a set of regular images using varying shutter speeds. Next, the images are aligned with respect to each other to compensate for camera and scene motion during capture. The aligned images are then merged together to create a single HDR frame containing accurate brightness values of the entire scene. As a last step, the HDR frame is tone mapped in order to be displayable on a regular screen with a lower dynamic range. This thesis covers algorithms for these steps that allow the creation of HDR video in real-time. When creating videos instead of still images, the focus lies on high capturing and processing speed and on assuring temporal consistency between the video frames. In order to achieve this goal, we take advantage of the knowledge gained from the processing of previous frames in the video. This work addresses the following aspects in particular. The image size parameters for the set of base images are chosen such that only as little image data as possible is captured. We make use of the fact that it is not always necessary to capture full size images when only small portions of the scene require HDR. Avoiding redundancy in the image material is an obvious approach to reducing the overall time taken to generate a frame. With the aid of the previous frames, we calculate brightness statistics of the scene. The exposure values are chosen in a way, such that frequently occurring brightness values are well-exposed in at least one of the images in the sequence. The base images from which the HDR frame is created are captured in quick succession. The effects of intermediate camera motion are thus less intense than in the still image case, and a comparably simpler camera motion model can be used. At the same time, however, there is much less time available to estimate motion. For this reason, we use a fast heuristic that makes use of the motion information obtained in previous frames. It is robust to the large brightness difference between the images of an exposure sequence. The range of luminance values of an HDR frame must be tone mapped to the displayable range of the output device. Most available tone mapping operators are designed for still images and scale the dynamic range of each frame independently. In situations where the scene's brightness statistics change quickly, these operators produce visible image flicker. We have developed an algorithm that detects such situations in an HDR video. Based on this detection, a temporal stability criterion for the tone mapping parameters then prevents image flicker. All methods for capture, creation and display of HDR video introduced in this work have been fully implemented, tested and integrated into a running HDR video system. The algorithms were analyzed for parallelizability and, if applicable, adjusted and implemented on a high-performance graphics chip

    Heterogeneous Multi-Sensor Camera

    Get PDF
    Today's security cameras typically capture either color images based on visible light or grayscale images based on both visible and near-infrared (NIR) light. This means that the performance is limited in situations where it is unclear which kind of image to capture. At dusk, for example, the visible light may not be sufficient to capture good-quality color images. On the other hand, capturing grayscale images at dusk means that useful color information is disregarded. Thus, there is a need for a hybrid-mode camera which can utilize the best of both the color and the grayscale images. In order to build a prototype camera which can achieve just this, a dichroic beam-splitter was placed in between the lens and the image sensor of a conventional camera. This way, the incident light is split up into its visible and NIR components which then can be detected separately by two image sensors. Different approaches on how to merge color and grayscale images into one superior image were investigated, where the basic idea was to extract the overall intensity from the color image and partly replace it with the intensity of the grayscale image. The prototype camera developed proved to outperform conventional security cameras in certain situations, such as in low light conditions as well as when the illumination is highly varying. While more development is needed, the technique looks promising overall and should offer new imaging capabilities to the security camera market
    corecore