13 research outputs found
Face recognition using convolutional macropixel comparison approach
Convolutional Neural Network (CNN) is a widely used deep learning framework and is applied in the field of face recognition achieving outstanding results. Macropixel Comparison Approach is a shallow mathematical approach that recognizes face by comparing original pixel blocks of face images. In this thesis, we are inspired by ideas of the currently popular deep neural network framework and introduce two features into the mathematical approach: deep overlap and weighted filter. The aim is exploring if the idea of deep learning could benefit mathematical method which might extends the scope of face recognition research. Results from our experiments show that the new proposed approach achives markedly better recognition rates than the original macropixel method.convolutional neural network (CNN)face recognitionmacropixelmathematical methodmacropixel comparison approac
High-speed object detection with a single-photon time-of-flight image sensor
3D time-of-flight (ToF) imaging is used in a variety of applications such as
augmented reality (AR), computer interfaces, robotics and autonomous systems.
Single-photon avalanche diodes (SPADs) are one of the enabling technologies
providing accurate depth data even over long ranges. By developing SPADs in
array format with integrated processing combined with pulsed, flood-type
illumination, high-speed 3D capture is possible. However, array sizes tend to
be relatively small, limiting the lateral resolution of the resulting depth
maps, and, consequently, the information that can be extracted from the image
for applications such as object detection. In this paper, we demonstrate that
these limitations can be overcome through the use of convolutional neural
networks (CNNs) for high-performance object detection. We present outdoor
results from a portable SPAD camera system that outputs 16-bin photon timing
histograms with 64x32 spatial resolution. The results, obtained with exposure
times down to 2 ms (equivalent to 500 FPS) and in signal-to-background (SBR)
ratios as low as 0.05, point to the advantages of providing the CNN with full
histogram data rather than point clouds alone. Alternatively, a combination of
point cloud and active intensity data may be used as input, for a similar level
of performance. In either case, the GPU-accelerated processing time is less
than 1 ms per frame, leading to an overall latency (image acquisition plus
processing) in the millisecond range, making the results relevant for
safety-critical computer vision applications which would benefit from faster
than human reaction times.Comment: 13 pages, 5 figures, 3 table
Deep Demosaicing for Polarimetric Filter Array Cameras
Polarisation Filter Array (PFA) cameras allow the analysis of light polarisation state in a simple and cost-effective manner. Such filter arrays work as the Bayer pattern for colour cameras, sharing similar advantages and drawbacks. Among the others, the raw image must be demosaiced considering the local variations of the PFA and the characteristics of the imaged scene. Non-linear effects, like the cross-talk among neighbouring pixels, are difficult to explicitly model and suggest the potential advantage of a data-driven learning approach. However, the PFA cannot be removed from the sensor, making it difficult to acquire the ground-truth polarization state for training. In this work we propose a novel CNN-based model which directly demosaics the raw camera image to a per-pixel Stokes vector. Our contribution is twofold. First, we propose a network architecture composed by a sequence of Mosaiced Convolutions operating coherently with the local arrangement of the different filters. Second, we introduce a new method, employing a consumer LCD screen, to effectively acquire real-world data for training. The process is designed to be invariant by monitor gamma and external lighting conditions. We extensively compared our method against algorithmic and learning-based demosaicing techniques, obtaining a consistently lower error especially in terms of polarisation angle
Guided direct time-of-flight Lidar for self-driving vehicles
Self-driving vehicles demand efficient and reliable depth-sensing technologies. Lidar, with its capacity for long-distance, high-precision measurement, is a crucial component in this pursuit. However, conventional mechanical scanning implementations suffer from reliability, cost, and frame rate limitations. Solid-state lidar solutions have emerged as a promising alternative, but the vast amount of photon data processed and stored using conventional direct time-of-flight (dToF) prevents long-distance sensing unless power-intensive partial histogram approaches are used.
This research introduces a pioneering ‘guided’ dToF approach, harnessing external guidance from other onboard sensors to narrow down the depth search space for a power and data-efficient solution. This approach centres around a dToF sensor in which the exposed time widow of independent pixels can be dynamically adjusted. A pair of vision cameras are used in this demonstrator to provide the guiding depth estimates.
The implemented guided dToF demonstrator successfully captures a dynamic outdoor scene at 3 fps with distances up to 75 m. Compared to a conventional full histogram approach, on-chip data is reduced by over 25 times, while the total laser cycles in each frame are reduced by at least 6 times compared to any partial histogram approach. The capability of guided dToF to mitigate multipath reflections is also demonstrated.
For self-driving vehicles where a wealth of sensor data is already available, guided dToF opens new possibilities for efficient solid-state lidar
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
We introduce the Never Ending VIsual-classification Stream (NEVIS'22), a
benchmark consisting of a stream of over 100 visual classification tasks,
sorted chronologically and extracted from papers sampled uniformly from
computer vision proceedings spanning the last three decades. The resulting
stream reflects what the research community thought was meaningful at any point
in time. Despite being limited to classification, the resulting stream has a
rich diversity of tasks from OCR, to texture analysis, crowd counting, scene
recognition, and so forth. The diversity is also reflected in the wide range of
dataset sizes, spanning over four orders of magnitude. Overall, NEVIS'22 poses
an unprecedented challenge for current sequential learning approaches due to
the scale and diversity of tasks, yet with a low entry barrier as it is limited
to a single modality and each task is a classical supervised learning problem.
Moreover, we provide a reference implementation including strong baselines and
a simple evaluation protocol to compare methods in terms of their trade-off
between accuracy and compute. We hope that NEVIS'22 can be useful to
researchers working on continual learning, meta-learning, AutoML and more
generally sequential learning, and help these communities join forces towards
more robust and efficient models that efficiently adapt to a never ending
stream of data. Implementations have been made available at
https://github.com/deepmind/dm_nevis
Arrayed LiDAR signal analysis for automotive applications
Light detection and ranging (LiDAR) is one of the enabling technologies for advanced
driver assistance and autonomy. Advances in solid-state photon detector arrays offer
the potential of high-performance LiDAR systems but require novel signal processing
approaches to fully exploit the dramatic increase in data volume an arrayed detector
can provide.
This thesis presents two approaches applicable to arrayed solid-state LiDAR. First, a
novel block independent sparse depth reconstruction framework is developed, which
utilises a random and very sparse illumination scheme to reduce illumination density while improving sampling times, which further remain constant for any array
size. Compressive sensing (CS) principles are used to reconstruct depth information
from small measurement subsets. The smaller problem size of blocks reduces the
reconstruction complexity, improves compressive depth reconstruction performance
and enables fast concurrent processing. A feasibility study of a system proposal for
this approach demonstrates that the required logic could be practically implemented
within detector size constraints. Second, a novel deep learning architecture called
LiDARNet is presented to localise surface returns from LiDAR waveforms with high
throughput. This single data driven processing approach can unify a wide range
of scenarios, making use of a training-by-simulation methodology. This augments
real datasets with challenging simulated conditions such as multiple returns and
high noise variance, while enabling rapid prototyping of fast data driven processing
approaches for arrayed LiDAR systems.
Both approaches are fast and practical processing methodologies for arrayed LiDAR
systems. These retrieve depth information with excellent depth resolution for wide
operating ranges, and are demonstrated on real and simulated data. LiDARNet is
a rapid approach to determine surface locations from LiDAR waveforms for efficient point cloud generation, while block sparse depth reconstruction is an efficient method to facilitate high-resolution depth maps at high frame rates with reduced power and memory requirements.Engineering and Physical Sciences Research Council (EPSRC
Dense light field coding: a survey
Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems.
Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio
Compression and visual quality assessment for light field contents
Since its invention in the 19th century, photography has allowed to create durable images of the world around us by capturing the intensity of light that flows through a scene, first analogically by using light-sensitive material, and then, with the advent of electronic image sensors, digitally. However, one main limitation of both analog and digital photography lays in its inability to capture any information about the direction of light rays. Through traditional photography, each three-dimensional scene is projected onto a 2D plane; consequently, no information about the position of the 3D objects in space is retained. Light field photography aims at overcoming these limitations by recording the direction of light along with its intensity. In the past, several acquisition technologies have been presented to properly capture light field information, and portable devices have been commercialized to the general public. However, a considerably larger volume of data is generated when compared to traditional photography. Thus, new solutions must be designed to face the challenges light field photography poses in terms of storage, representation, and visualization of the acquired data. In particular, new and efficient compression algorithms are needed to sensibly reduce the amount of data that needs to be stored and transmitted, while maintaining an adequate level of perceptual quality.
In designing new solutions to address the unique challenges posed by light field photography, one cannot forgo the importance of having reliable, reproducible means of evaluating their performance, especially in relation to the scenario in which they will be consumed. To that end, subjective assessment of visual quality is of paramount importance to evaluate the impact of compression, representation, and rendering models on user experience. Yet, the standardized methodologies that are commonly used to evaluate the visual quality of traditional media content, such as images and videos, are not equipped to tackle the challenges posed by light field photography. New subjective methodologies must be tailored for the new possibilities this new type of imaging offers in terms of rendering and visual experience.
In this work, we address the aforementioned problems by both designing new methodologies for visual quality evaluation of light field contents, and outlining a new compression solution to efficiently reduce the amount of data that needs to be transmitted and stored. We first analyse how traditional methodologies for subjective evaluation of multimedia contents can be adapted to suit light field data, and, we propose new methodologies to reliably assess the visual quality while maintaining user engagement. Furthermore, we study how user behavior is affected by the visual quality of the data. We employ subjective quality assessment to compare several state-of-the-art solutions in light field coding, in order to find the most promising approaches to minimize the volume of data without compromising on the perceptual quality. To that means, we define and inspect several coding approaches for light field compression, and we investigate the impact of color subsampling on the final rendered content. Lastly, we propose a new coding approach to perform light field compression, showing significant improvement with respect to the state of the art