67 research outputs found
On-the-Go Reflectance Transformation Imaging with Ordinary Smartphones
Reflectance Transformation Imaging (RTI) is a popular technique that allows the recovery of per-pixel reflectance information by capturing an object under different light conditions. This can be later used to reveal surface details and interactively relight the subject. Such process, however, typically requires dedicated hardware setups to recover the light direction from multiple locations, making the process tedious when performed outside the lab.
We propose a novel RTI method that can be carried out by recording videos with two ordinary smartphones. The flash led-light of one device is used to illuminate the subject while the other captures the reflectance. Since the led is mounted close to the camera lenses, we can infer the light direction for thousands of images by freely moving the illuminating device while observing a fiducial marker surrounding the subject. To deal with such amount of data, we propose a neural relighting model that reconstructs object appearance for arbitrary light directions from extremely compact reflectance distribution data compressed via Principal Components Analysis (PCA). Experiments shows that the proposed technique can be easily performed on the field with a resulting RTI model that can outperform state-of-the-art approaches involving dedicated hardware setups
Calibration of a Telecentric Structured-light Device for Micrometric 3D Reconstruction
Structured-light 3D reconstruction techniques are employed in a wide range of applications for industrial inspection. In particular, some tasks require micrometric precision for the identification of microscopic surface irregularities. We propose a novel calibration technique for structured-light systems adopting telecentric lenses for both camera and projector. The device exploits a fixed light pattern (striped-based) to perform accurate microscopic surface reconstruction and measurements. Our method employs a sphere with a known radius as calibration target and takes advantage of the orthographic projection model of the telecentric lenses to recover the bundle of planes originated by the projector. Once the sheaf of parallel planes is properly described in the camera reference frame, the triangulation of the surface’s object hit by the light stripes is immediate. Moreover, we tested our technique in a real-world scenario for industrial surface inspection by implementing a complete pipeline to recover the intersections between the projected planes and the surface. Experimental analysis shows the robustness of the proposed approach against synthetic and real-world test data
Exploring Audio Compression as Image Completion in Time-Frequency Domain
Audio compression is usually achieved with algorithms that exploit spectral properties of the given signal such as frequency or temporal masking. In this paper we propose to tackle such a problem from a different point of view, considering the time-frequency domain of an audio signal as an intensity map to be reconstructed via a data-driven approach. The compression stage removes some selected input values from the time-frequency representation of the original signal. Then, decompression works by reconstructing the missing samples as an image completion task. Our method is divided into two main parts: first, we analyse the feasibility of a data-driven audio reconstruction with missing samples in its time-frequency representation. To do so, we exploit an existing CNN model designed for depth completion, involving a sequence of sparse convolutions to deal with absent values. Second, we propose a method to select the values to be removed at compression stage, maximizing the perceived audio quality of the decompressed signal. In the experimental section we validate the proposed technique on some standard audio datasets and provide an extensive study on the quality of the reconstructed signal under different conditions
Embedding Shepard’s Interpolation into CNN Models for Unguided Depth Completion
When acquiring sparse data samples, an interpolation method is often needed to fill in the missing information. An example application, known as “depth completion”, consists in estimating dense depth maps from sparse observations (e.g. LiDAR acquisitions). To do this, algorithmic methods fill the depth image by performing a sequence of basic image processing operations, while recent approaches propose data-driven solutions, mostly based on Convolutional Neural Networks (CNNs), to predict the missing information. In this work, we combine learning-based and classical algorithmic approaches to ideally exploit the performance of the former with the ability to generalize of the latter. First, we define a novel architecture block called IDWBlock. This component allows to embed Shepard’s interpolation (or Inverse Distance Weighting, IDW) into a CNN model, with the advantage of requiring a small number of parameters regardless of the kernel size. Second, we propose two network architectures involving a combination of the IDWBlock and learning-based depth completion techniques. In the experimental section, we tested the models’ performances on the KITTI depth completion benchmark and NYU-depth-v2 dataset, showing how they present strong robustness to input sparsity under different densities and patterns
Deep Demosaicing for Polarimetric Filter Array Cameras
Polarisation Filter Array (PFA) cameras allow the analysis of light polarisation state in a simple and cost-effective manner. Such filter arrays work as the Bayer pattern for colour cameras, sharing similar advantages and drawbacks. Among the others, the raw image must be demosaiced considering the local variations of the PFA and the characteristics of the imaged scene. Non-linear effects, like the cross-talk among neighbouring pixels, are difficult to explicitly model and suggest the potential advantage of a data-driven learning approach. However, the PFA cannot be removed from the sensor, making it difficult to acquire the ground-truth polarization state for training. In this work we propose a novel CNN-based model which directly demosaics the raw camera image to a per-pixel Stokes vector. Our contribution is twofold. First, we propose a network architecture composed by a sequence of Mosaiced Convolutions operating coherently with the local arrangement of the different filters. Second, we introduce a new method, employing a consumer LCD screen, to effectively acquire real-world data for training. The process is designed to be invariant by monitor gamma and external lighting conditions. We extensively compared our method against algorithmic and learning-based demosaicing techniques, obtaining a consistently lower error especially in terms of polarisation angle
A Light Source Calibration Technique for Multi-camera Inspection Devices
Industrial manufacturing processes often involve a visual control system to detect possible product defects during production. Such inspection devices usually include one or more cameras and several light sources designed to highlight surface imperfections under different illumination conditions (e.g. bumps, scratches, holes). In such scenarios, a preliminary calibration procedure of each component is a mandatory step to recover the system’s geometrical configuration and thus ensure a good process accuracy. In this paper we propose a procedure to estimate the position of each light source with respect to a camera network using an inexpensive Lambertian spherical target. For each light source, the target is acquired at different positions from different cameras, and an initial guess of the corresponding light vector is recovered from the analysis of the collected intensity isocurves. Then, an energy minimization process based on the Lambertian shading model refines the result for a pr ecise 3D localization. We tested our approach in an industrial setup, performing extensive experiments on synthetic and real-world data to demonstrate the accuracy of the proposed approach
Learning Computer Vision through the development of a Camera-trackable Game Controller
The trade-off between the available classroom time and the complexity of the proposed task is central to the design of any Computer Science laboratory lecture. Special care must be taken to build up an experimental setup that allows the students to get the most significant information from the experience without getting lost in the details. This is especially true when teaching Computer Vision concepts to prospective students that own little or no previous background in programming and a strongly diversified knowledge with respect to mathematics. In this chapter, the authors describe a setup for a laboratory lecture that has been administered through several years to prospective students of the Computer Science course at the University of Venice. The goal is to teach basic concepts such as color spaces or image transforms through a rewarding task, which is the development of a vision-based game controller similar in spirit to the recent human-machine interfaces adopted by the current generation of game consoles
A Practical Setup for Projection-based Augmented Maps
Projected Augmented Reality is a human-computer interaction scenario where synthetic data, rather than being rendered on a display, are directly projected on the real world. Differening from screen-based approaches, which only require the pose of the camera with respect to the world, this setup poses the additional hurdle of knowing the relative pose between capturing and projecting devices. In this chapter, the authors propose a thorough solution that addresses both camera and projector calibration using a simple fiducial marker design. Specifically, they introduce a novel Augmented Maps setup where the user can explore geographically located information by moving a physical inspection tool over a printed map. Since the tool presents both a projection surface and a 3D-localizable marker, it can be used to display suitable information about the area that it covers. The proposed setup has been evaluated in terms of accuracy of the calibration and ease of use declared by the users
Observation of extreme sea waves in a space-time ensemble
In this paper, an observational space-time ensemble of sea surface elevations is investigated in search of the highest waves of the sea state. Wave data were gathered by means of a stereo camera system, which was installed on top of a fixed oceanographic platform located in the Adriatic Sea (Italy). Waves were measured during a mature sea state with an average wind speed of 11 m s-1. By examining the space-time ensemble, the 3D wave groups have been isolated while evolving in the 2D space and grabbed "when and where" they have been close to the apex of their development, thus exhibiting large surface displacements. The authors have selected the groups displaying maximal crest height exceeding the threshold adopted to define rogue waves in a time record, that is, 1.25 times the significant wave height (Hs). The records at the spatial positions where such large crests occurred have been analyzed to derive the empirical distributions of crest and wave heights, which have been compared against standard statistical linear and nonlinear models. Here, the maximal observed wave crests have resulted to be outliers of the standard statistics, behaving as isolated members of the sample, apparently uncorrelated with other waves of the record. However, this study has found that these unexpectedly large wave crests are better approximated by a space-time model for extreme crest heights. The space-time model performance has been improved, deriving a second-order approximation of the linear model, which has provided a fair agreement with the empirical maxima. The present investigation suggests that very large waves may be more numerous than generally expected.In this paper, an observational space-time ensemble of sea surface elevations is investigated in search of the highest waves of the sea state. Wave data were gathered by means of a stereo camera system, which was installed on top of a fixed oceanographic platform located in the Adriatic Sea (Italy). Waves were measured during a mature sea state with an average wind speed of 11 m s(-1). By examining the space-time ensemble, the 3D wave groups have been isolated while evolving in the 2D space and grabbed "when and where" they have been close to the apex of their development, thus exhibiting large surface displacements. The authors have selected the groups displaying maximal crest height exceeding the threshold adopted to define rogue waves in a time record, that is, 1.25 times the significant wave height (H-s). The records at the spatial positions where such large crests occurred have been analyzed to derive the empirical distributions of crest and wave heights, which have been compared against standard statistical linear and nonlinear models. Here, the maximal observed wave crests have resulted to be outliers of the standard statistics, behaving as isolated members of the sample, apparently uncorrelated with other waves of the record. However, this study has found that these unexpectedly large wave crests are better approximated by a space-time model for extreme crest heights. The space-time model performance has been improved, deriving a second-order approximation of the linear model, which has provided a fair agreement with the empirical maxima. The present investigation suggests that very large waves may be more numerous than generally expected
- …