13 research outputs found

    Face recognition using convolutional macropixel comparison approach

    Get PDF
    Convolutional Neural Network (CNN) is a widely used deep learning framework and is applied in the field of face recognition achieving outstanding results. Macropixel Comparison Approach is a shallow mathematical approach that recognizes face by comparing original pixel blocks of face images. In this thesis, we are inspired by ideas of the currently popular deep neural network framework and introduce two features into the mathematical approach: deep overlap and weighted filter. The aim is exploring if the idea of deep learning could benefit mathematical method which might extends the scope of face recognition research. Results from our experiments show that the new proposed approach achives markedly better recognition rates than the original macropixel method.convolutional neural network (CNN)face recognitionmacropixelmathematical methodmacropixel comparison approac

    High-speed object detection with a single-photon time-of-flight image sensor

    Get PDF
    3D time-of-flight (ToF) imaging is used in a variety of applications such as augmented reality (AR), computer interfaces, robotics and autonomous systems. Single-photon avalanche diodes (SPADs) are one of the enabling technologies providing accurate depth data even over long ranges. By developing SPADs in array format with integrated processing combined with pulsed, flood-type illumination, high-speed 3D capture is possible. However, array sizes tend to be relatively small, limiting the lateral resolution of the resulting depth maps, and, consequently, the information that can be extracted from the image for applications such as object detection. In this paper, we demonstrate that these limitations can be overcome through the use of convolutional neural networks (CNNs) for high-performance object detection. We present outdoor results from a portable SPAD camera system that outputs 16-bin photon timing histograms with 64x32 spatial resolution. The results, obtained with exposure times down to 2 ms (equivalent to 500 FPS) and in signal-to-background (SBR) ratios as low as 0.05, point to the advantages of providing the CNN with full histogram data rather than point clouds alone. Alternatively, a combination of point cloud and active intensity data may be used as input, for a similar level of performance. In either case, the GPU-accelerated processing time is less than 1 ms per frame, leading to an overall latency (image acquisition plus processing) in the millisecond range, making the results relevant for safety-critical computer vision applications which would benefit from faster than human reaction times.Comment: 13 pages, 5 figures, 3 table

    Deep Demosaicing for Polarimetric Filter Array Cameras

    Get PDF
    Polarisation Filter Array (PFA) cameras allow the analysis of light polarisation state in a simple and cost-effective manner. Such filter arrays work as the Bayer pattern for colour cameras, sharing similar advantages and drawbacks. Among the others, the raw image must be demosaiced considering the local variations of the PFA and the characteristics of the imaged scene. Non-linear effects, like the cross-talk among neighbouring pixels, are difficult to explicitly model and suggest the potential advantage of a data-driven learning approach. However, the PFA cannot be removed from the sensor, making it difficult to acquire the ground-truth polarization state for training. In this work we propose a novel CNN-based model which directly demosaics the raw camera image to a per-pixel Stokes vector. Our contribution is twofold. First, we propose a network architecture composed by a sequence of Mosaiced Convolutions operating coherently with the local arrangement of the different filters. Second, we introduce a new method, employing a consumer LCD screen, to effectively acquire real-world data for training. The process is designed to be invariant by monitor gamma and external lighting conditions. We extensively compared our method against algorithmic and learning-based demosaicing techniques, obtaining a consistently lower error especially in terms of polarisation angle

    Guided direct time-of-flight Lidar for self-driving vehicles

    Get PDF
    Self-driving vehicles demand efficient and reliable depth-sensing technologies. Lidar, with its capacity for long-distance, high-precision measurement, is a crucial component in this pursuit. However, conventional mechanical scanning implementations suffer from reliability, cost, and frame rate limitations. Solid-state lidar solutions have emerged as a promising alternative, but the vast amount of photon data processed and stored using conventional direct time-of-flight (dToF) prevents long-distance sensing unless power-intensive partial histogram approaches are used. This research introduces a pioneering ‘guided’ dToF approach, harnessing external guidance from other onboard sensors to narrow down the depth search space for a power and data-efficient solution. This approach centres around a dToF sensor in which the exposed time widow of independent pixels can be dynamically adjusted. A pair of vision cameras are used in this demonstrator to provide the guiding depth estimates. The implemented guided dToF demonstrator successfully captures a dynamic outdoor scene at 3 fps with distances up to 75 m. Compared to a conventional full histogram approach, on-chip data is reduced by over 25 times, while the total laser cycles in each frame are reduced by at least 6 times compared to any partial histogram approach. The capability of guided dToF to mitigate multipath reflections is also demonstrated. For self-driving vehicles where a wealth of sensor data is already available, guided dToF opens new possibilities for efficient solid-state lidar

    NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

    Full text link
    We introduce the Never Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks, sorted chronologically and extracted from papers sampled uniformly from computer vision proceedings spanning the last three decades. The resulting stream reflects what the research community thought was meaningful at any point in time. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, crowd counting, scene recognition, and so forth. The diversity is also reflected in the wide range of dataset sizes, spanning over four orders of magnitude. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks, yet with a low entry barrier as it is limited to a single modality and each task is a classical supervised learning problem. Moreover, we provide a reference implementation including strong baselines and a simple evaluation protocol to compare methods in terms of their trade-off between accuracy and compute. We hope that NEVIS'22 can be useful to researchers working on continual learning, meta-learning, AutoML and more generally sequential learning, and help these communities join forces towards more robust and efficient models that efficiently adapt to a never ending stream of data. Implementations have been made available at https://github.com/deepmind/dm_nevis

    Arrayed LiDAR signal analysis for automotive applications

    Get PDF
    Light detection and ranging (LiDAR) is one of the enabling technologies for advanced driver assistance and autonomy. Advances in solid-state photon detector arrays offer the potential of high-performance LiDAR systems but require novel signal processing approaches to fully exploit the dramatic increase in data volume an arrayed detector can provide. This thesis presents two approaches applicable to arrayed solid-state LiDAR. First, a novel block independent sparse depth reconstruction framework is developed, which utilises a random and very sparse illumination scheme to reduce illumination density while improving sampling times, which further remain constant for any array size. Compressive sensing (CS) principles are used to reconstruct depth information from small measurement subsets. The smaller problem size of blocks reduces the reconstruction complexity, improves compressive depth reconstruction performance and enables fast concurrent processing. A feasibility study of a system proposal for this approach demonstrates that the required logic could be practically implemented within detector size constraints. Second, a novel deep learning architecture called LiDARNet is presented to localise surface returns from LiDAR waveforms with high throughput. This single data driven processing approach can unify a wide range of scenarios, making use of a training-by-simulation methodology. This augments real datasets with challenging simulated conditions such as multiple returns and high noise variance, while enabling rapid prototyping of fast data driven processing approaches for arrayed LiDAR systems. Both approaches are fast and practical processing methodologies for arrayed LiDAR systems. These retrieve depth information with excellent depth resolution for wide operating ranges, and are demonstrated on real and simulated data. LiDARNet is a rapid approach to determine surface locations from LiDAR waveforms for efficient point cloud generation, while block sparse depth reconstruction is an efficient method to facilitate high-resolution depth maps at high frame rates with reduced power and memory requirements.Engineering and Physical Sciences Research Council (EPSRC

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio

    Compression and visual quality assessment for light field contents

    Get PDF
    Since its invention in the 19th century, photography has allowed to create durable images of the world around us by capturing the intensity of light that flows through a scene, first analogically by using light-sensitive material, and then, with the advent of electronic image sensors, digitally. However, one main limitation of both analog and digital photography lays in its inability to capture any information about the direction of light rays. Through traditional photography, each three-dimensional scene is projected onto a 2D plane; consequently, no information about the position of the 3D objects in space is retained. Light field photography aims at overcoming these limitations by recording the direction of light along with its intensity. In the past, several acquisition technologies have been presented to properly capture light field information, and portable devices have been commercialized to the general public. However, a considerably larger volume of data is generated when compared to traditional photography. Thus, new solutions must be designed to face the challenges light field photography poses in terms of storage, representation, and visualization of the acquired data. In particular, new and efficient compression algorithms are needed to sensibly reduce the amount of data that needs to be stored and transmitted, while maintaining an adequate level of perceptual quality. In designing new solutions to address the unique challenges posed by light field photography, one cannot forgo the importance of having reliable, reproducible means of evaluating their performance, especially in relation to the scenario in which they will be consumed. To that end, subjective assessment of visual quality is of paramount importance to evaluate the impact of compression, representation, and rendering models on user experience. Yet, the standardized methodologies that are commonly used to evaluate the visual quality of traditional media content, such as images and videos, are not equipped to tackle the challenges posed by light field photography. New subjective methodologies must be tailored for the new possibilities this new type of imaging offers in terms of rendering and visual experience. In this work, we address the aforementioned problems by both designing new methodologies for visual quality evaluation of light field contents, and outlining a new compression solution to efficiently reduce the amount of data that needs to be transmitted and stored. We first analyse how traditional methodologies for subjective evaluation of multimedia contents can be adapted to suit light field data, and, we propose new methodologies to reliably assess the visual quality while maintaining user engagement. Furthermore, we study how user behavior is affected by the visual quality of the data. We employ subjective quality assessment to compare several state-of-the-art solutions in light field coding, in order to find the most promising approaches to minimize the volume of data without compromising on the perceptual quality. To that means, we define and inspect several coding approaches for light field compression, and we investigate the impact of color subsampling on the final rendered content. Lastly, we propose a new coding approach to perform light field compression, showing significant improvement with respect to the state of the art
    corecore