9 research outputs found
Accurate and Fast Compressed Video Captioning
Existing video captioning approaches typically require to first sample video
frames from a decoded video and then conduct a subsequent process (e.g.,
feature extraction and/or captioning model learning). In this pipeline, manual
frame sampling may ignore key information in videos and thus degrade
performance. Additionally, redundant information in the sampled frames may
result in low efficiency in the inference of video captioning. Addressing this,
we study video captioning from a different perspective in compressed domain,
which brings multi-fold advantages over the existing pipeline: 1) Compared to
raw images from the decoded video, the compressed video, consisting of
I-frames, motion vectors and residuals, is highly distinguishable, which allows
us to leverage the entire video for learning without manual sampling through a
specialized model design; 2) The captioning model is more efficient in
inference as smaller and less redundant information is processed. We propose a
simple yet effective end-to-end transformer in the compressed domain for video
captioning that enables learning from the compressed video for captioning. We
show that even with a simple design, our method can achieve state-of-the-art
performance on different benchmarks while running almost 2x faster than
existing approaches. Code is available at https://github.com/acherstyx/CoCap
Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment
This work studies the generalization issue of face anti-spoofing (FAS) models
on domain gaps, such as image resolution, blurriness and sensor variations.
Most prior works regard domain-specific signals as a negative impact, and apply
metric learning or adversarial losses to remove them from feature
representation. Though learning a domain-invariant feature space is viable for
the training data, we show that the feature shift still exists in an unseen
test domain, which backfires on the generalizability of the classifier. In this
work, instead of constructing a domain-invariant feature space, we encourage
domain separability while aligning the live-to-spoof transition (i.e., the
trajectory from live to spoof) to be the same for all domains. We formulate
this FAS strategy of separability and alignment (SA-FAS) as a problem of
invariant risk minimization (IRM), and learn domain-variant feature
representation but domain-invariant classifier. We demonstrate the
effectiveness of SA-FAS on challenging cross-domain FAS datasets and establish
state-of-the-art performance.Comment: Accepted in CVPR202
Effect of lithium anti-ablation and grain refinement introduced by TiC nanoparticles in LPBF Al–Li alloy
Al–Li alloys are extensively utilized in the field of aerospace due to their low density and high specific strength. However, laser powder bed fusion (LPBF) processed Al–Li alloys still encounter challenges because of hot cracking and Li element ablation. In this study, a TiC nanoparticle modified Al–Mg–Li alloy is developed for LPBF process. Full dense printed TiC modified Al–Mg–Li alloy can be obtained. The presence of TiC nanoparticles in the melt pool effectively increased the viscosity of Al alloy liquid, leading to a reduction in the metal vaporization and liquid spatters, thus preventing the Li ablation during LBPF. The Li content was significantly increased from 0.87 wt.% in the Al–Mg–Li alloy to 1.34 % in the TiC modified Al–Mg–Li alloy. Moreover, the TiC nanoparticles played a key role in transition of columnar to equiaxed grain. The average grain size of TiC modified Al–Mg–Li alloy was refined to about 1.5 μm, two orders of magnitude smaller than that in printed Al–Mg–Li alloy. A gradient transition reaction from TiC to Al3Ti was found on the TiC nanoparticles surface during LPBF. The in-situ formed Al3Ti phase on TiC nanoparticles significantly decreased the lattice mismatch with Al matrix, thereby resulting in an outstanding mechanical property of ultimate tensile strength of 343 MPa and elongation of 9.3 %. The effect of Li element anti-ablation induced by TiC nanoparticles provided a new pathway for additive manufacturing light-weight alloy
A novel experimental method for in situ strain measurement during selective laser melting
Selective laser Melting (SLM), a powder bed-based additive manufacturing technology, has been developed and applied in multiple industrial fields in the last decade. However, the distortion and swelling in the SLM process resulting from thermal stress cannot be predicted subject to measurement. In this work, an in situ distortion measurement system applied to the SLM process is presented. The distortion behaviour of component under laser scanning can be precisely recorded in real-time by this system. The detailed evolution and driving force of specimen distortion in the SLM process are discussed based on the experimental results. The distortion in single laser scanning presents a strong instantaneous upward motion of the central section during laser heating and a relatively slow downward recovery motion of the central section during cooling. The distortion behaviour of the sample with and without a layer of metal powder are compared, and laser scanning on the bare sample surface leads to a significantly higher residual distortion. The influence of SLM parameter variables (such as scanning speed, laser power, scanning width, layer thickness and scanning times) on SLM distortion is also analysed. At last, the stress distribution of laser melting is verified by the high-resolution EBSD analysis
Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing
While recent face anti-spoofing methods perform well under the intra-domain
setups, an effective approach needs to account for much larger appearance
variations of images acquired in complex scenes with different sensors for
robust performance. In this paper, we present adaptive vision transformers
(ViT) for robust cross-domain face anti-spoofing. Specifically, we adopt ViT as
a backbone to exploit its strength to account for long-range dependencies among
pixels. We further introduce the ensemble adapters module and feature-wise
transformation layers in the ViT to adapt to different domains for robust
performance with a few samples. Experiments on several benchmark datasets show
that the proposed models achieve both robust and competitive performance
against the state-of-the-art methods