219 research outputs found
Micro Fourier Transform Profilometry (FTP): 3D shape measurement at 10,000 frames per second
Recent advances in imaging sensors and digital light projection technology
have facilitated a rapid progress in 3D optical sensing, enabling 3D surfaces
of complex-shaped objects to be captured with improved resolution and accuracy.
However, due to the large number of projection patterns required for phase
recovery and disambiguation, the maximum fame rates of current 3D shape
measurement techniques are still limited to the range of hundreds of frames per
second (fps). Here, we demonstrate a new 3D dynamic imaging technique, Micro
Fourier Transform Profilometry (FTP), which can capture 3D surfaces of
transient events at up to 10,000 fps based on our newly developed high-speed
fringe projection system. Compared with existing techniques, FTP has the
prominent advantage of recovering an accurate, unambiguous, and dense 3D point
cloud with only two projected patterns. Furthermore, the phase information is
encoded within a single high-frequency fringe image, thereby allowing
motion-artifact-free reconstruction of transient events with temporal
resolution of 50 microseconds. To show FTP's broad utility, we use it to
reconstruct 3D videos of 4 transient scenes: vibrating cantilevers, rotating
fan blades, bullet fired from a toy gun, and balloon's explosion triggered by a
flying dart, which were previously difficult or even unable to be captured with
conventional approaches.Comment: This manuscript was originally submitted on 30th January 1
Temporal phase unwrapping using deep learning
The multi-frequency temporal phase unwrapping (MF-TPU) method, as a classical
phase unwrapping algorithm for fringe projection profilometry (FPP), is capable
of eliminating the phase ambiguities even in the presence of surface
discontinuities or spatially isolated objects. For the simplest and most
efficient case, two sets of 3-step phase-shifting fringe patterns are used: the
high-frequency one is for 3D measurement and the unit-frequency one is for
unwrapping the phase obtained from the high-frequency pattern set. The final
measurement precision or sensitivity is determined by the number of fringes
used within the high-frequency pattern, under the precondition that the phase
can be successfully unwrapped without triggering the fringe order error.
Consequently, in order to guarantee a reasonable unwrapping success rate, the
fringe number (or period number) of the high-frequency fringe patterns is
generally restricted to about 16, resulting in limited measurement accuracy. On
the other hand, using additional intermediate sets of fringe patterns can
unwrap the phase with higher frequency, but at the expense of a prolonged
pattern sequence. Inspired by recent successes of deep learning techniques for
computer vision and computational imaging, in this work, we report that the
deep neural networks can learn to perform TPU after appropriate training, as
called deep-learning based temporal phase unwrapping (DL-TPU), which can
substantially improve the unwrapping reliability compared with MF-TPU even in
the presence of different types of error sources, e.g., intensity noise, low
fringe modulation, and projector nonlinearity. We further experimentally
demonstrate for the first time, to our knowledge, that the high-frequency phase
obtained from 64-period 3-step phase-shifting fringe patterns can be directly
and reliably unwrapped from one unit-frequency phase using DL-TPU
A New Dataset, Poisson GAN and AquaNet for Underwater Object Grabbing
To boost the object grabbing capability of underwater robots for open-sea
farming, we propose a new dataset (UDD) consisting of three categories
(seacucumber, seaurchin, and scallop) with 2,227 images. To the best of our
knowledge, it is the first 4K HD dataset collected in a real open-sea farm. We
also propose a novel Poisson-blending Generative Adversarial Network (Poisson
GAN) and an efficient object detection network (AquaNet) to address two common
issues within related datasets: the class-imbalance problem and the problem of
mass small object, respectively. Specifically, Poisson GAN combines Poisson
blending into its generator and employs a new loss called Dual Restriction loss
(DR loss), which supervises both implicit space features and image-level
features during training to generate more realistic images. By utilizing
Poisson GAN, objects of minority class like seacucumber or scallop could be
added into an image naturally and annotated automatically, which could increase
the loss of minority classes during training detectors to eliminate the
class-imbalance problem; AquaNet is a high-efficiency detector to address the
problem of detecting mass small objects from cloudy underwater pictures. Within
it, we design two efficient components: a depth-wise-convolution-based
Multi-scale Contextual Features Fusion (MFF) block and a Multi-scale
Blursampling (MBP) module to reduce the parameters of the network to 1.3
million. Both two components could provide multi-scale features of small
objects under a short backbone configuration without any loss of accuracy. In
addition, we construct a large-scale augmented dataset (AUDD) and a
pre-training dataset via Poisson GAN from UDD. Extensive experiments show the
effectiveness of the proposed Poisson GAN, AquaNet, UDD, AUDD, and pre-training
dataset.Comment: 14 pages, 10 figure
1DFormer: a Transformer Architecture Learning 1D Landmark Representations for Facial Landmark Tracking
Recently, heatmap regression methods based on 1D landmark representations
have shown prominent performance on locating facial landmarks. However,
previous methods ignored to make deep explorations on the good potentials of 1D
landmark representations for sequential and structural modeling of multiple
landmarks to track facial landmarks. To address this limitation, we propose a
Transformer architecture, namely 1DFormer, which learns informative 1D landmark
representations by capturing the dynamic and the geometric patterns of
landmarks via token communications in both temporal and spatial dimensions for
facial landmark tracking. For temporal modeling, we propose a recurrent token
mixing mechanism, an axis-landmark-positional embedding mechanism, as well as a
confidence-enhanced multi-head attention mechanism to adaptively and robustly
embed long-term landmark dynamics into their 1D representations; for structure
modeling, we design intra-group and inter-group structure modeling mechanisms
to encode the component-level as well as global-level facial structure patterns
as a refinement for the 1D representations of landmarks through token
communications in the spatial dimension via 1D convolutional layers.
Experimental results on the 300VW and the TF databases show that 1DFormer
successfully models the long-range sequential patterns as well as the inherent
facial structures to learn informative 1D representations of landmark
sequences, and achieves state-of-the-art performance on facial landmark
tracking
- …