137 research outputs found
Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images
Recovering the 3D representation of an object from single-view or multi-view
RGB images by deep neural networks has attracted increasing attention in the
past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural
networks (RNNs) to fuse multiple feature maps extracted from input images
sequentially. However, when given the same set of input images with different
orders, RNN-based approaches are unable to produce consistent reconstruction
results. Moreover, due to long-term memory loss, RNNs cannot fully exploit
input images to refine reconstruction results. To solve these problems, we
propose a novel framework for single-view and multi-view 3D reconstruction,
named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse
3D volume from each input image. Then, a context-aware fusion module is
introduced to adaptively select high-quality reconstructions for each part
(e.g., table legs) from different coarse 3D volumes to obtain a fused 3D
volume. Finally, a refiner further refines the fused 3D volume to generate the
final output. Experimental results on the ShapeNet and Pix3D benchmarks
indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large
margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in
terms of backward inference time. The experiments on ShapeNet unseen 3D
categories have shown the superior generalization abilities of our method.Comment: ICCV 201
Towards Optimal Discrete Online Hashing with Balanced Similarity
When facing large-scale image datasets, online hashing serves as a promising
solution for online retrieval and prediction tasks. It encodes the online
streaming data into compact binary codes, and simultaneously updates the hash
functions to renew codes of the existing dataset. To this end, the existing
methods update hash functions solely based on the new data batch, without
investigating the correlation between such new data and the existing dataset.
In addition, existing works update the hash functions using a relaxation
process in its corresponding approximated continuous space. And it remains as
an open problem to directly apply discrete optimizations in online hashing. In
this paper, we propose a novel supervised online hashing method, termed
Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above
problems in a unified framework. BSODH employs a well-designed hashing
algorithm to preserve the similarity between the streaming data and the
existing dataset via an asymmetric graph regularization. We further identify
the "data-imbalance" problem brought by the constructed asymmetric graph, which
restricts the application of discrete optimization in our problem. Therefore, a
novel balanced similarity is further proposed, which uses two equilibrium
factors to balance the similar and dissimilar weights and eventually enables
the usage of discrete optimizations. Extensive experiments conducted on three
widely-used benchmarks demonstrate the advantages of the proposed method over
the state-of-the-art methods.Comment: 8 pages, 11 figures, conferenc
DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis
The rapid progress in deep learning has given rise to hyper-realistic facial
forgery methods, leading to concerns related to misinformation and security
risks. Existing face forgery datasets have limitations in generating
high-quality facial images and addressing the challenges posed by evolving
generative techniques. To combat this, we present DiffusionFace, the first
diffusion-based face forgery dataset, covering various forgery categories,
including unconditional and Text Guide facial image generation, Img2Img,
Inpaint, and Diffusion-based facial exchange algorithms. Our DiffusionFace
dataset stands out with its extensive collection of 11 diffusion models and the
high-quality of the generated images, providing essential metadata and a
real-world internet-sourced forgery facial image dataset for evaluation.
Additionally, we provide an in-depth analysis of the data and introduce
practical evaluation protocols to rigorously assess discriminative models'
effectiveness in detecting counterfeit facial images, aiming to enhance
security in facial image authentication processes. The dataset is available for
download at \url{https://github.com/Rapisurazurite/DiffFace}
Distinctive action sketch for human action recognition
Recent developments in the field of computer vision have led to a renewed interest in sketch correlated research. There have emerged considerable solid evidence which revealed the significance of sketch. However, there have been few profound discussions on sketch based action analysis so far. In this paper, we propose an approach to discover the most distinctive sketches for action recognition. The action sketches should satisfy two characteristics: sketchability and objectiveness. Primitive sketches are prepared according to the structured forests based fast edge detection. Meanwhile, we take advantage of Faster R-CNN to detect the persons in parallel. On completion of the two stages, the process of distinctive action sketch mining is carried out. After that, we present four kinds of sketch pooling methods to get a uniform representation for action videos. The experimental results show that the proposed method achieves impressive performance against several compared methods on two public datasets.The work was supported in part by the National Science Foundation of China (61472103, 61772158, 61702136, and 61701273) and Australian Research Council (ARC) grant (DP150104645)
Towards General Visual-Linguistic Face Forgery Detection
Deepfakes are realistic face manipulations that can pose serious threats to
security, privacy, and trust. Existing methods mostly treat this task as binary
classification, which uses digital labels or mask signals to train the
detection model. We argue that such supervisions lack semantic information and
interpretability. To address this issues, in this paper, we propose a novel
paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses
fine-grained sentence-level prompts as the annotation. Since text annotations
are not available in current deepfakes datasets, VLFFD first generates the
mixed forgery image with corresponding fine-grained prompts via Prompt Forgery
Image Generator (PFIG). Then, the fine-grained mixed data and coarse-grained
original data and is jointly trained with the Coarse-and-Fine Co-training
framework (C2F), enabling the model to gain more generalization and
interpretability. The experiments show the proposed method improves the
existing detection models on several challenging benchmarks. Furthermore, we
have integrated our method with multimodal large models, achieving noteworthy
results that demonstrate the potential of our approach. This integration not
only enhances the performance of our VLFFD paradigm but also underscores the
versatility and adaptability of our method when combined with advanced
multimodal technologies, highlighting its potential in tackling the evolving
challenges of deepfake detection
- …