263 research outputs found
Level-SfM: Structure from Motion on Neural Level Set of Implicit Surfaces
This paper presents a neural incremental Structure-from-Motion (SfM)
approach, Level-SfM. In our formulation, we aim at simultaneously learning
coordinate MLPs for the implicit surfaces and the radiance fields, and
estimating the camera poses and scene geometry, which is mainly sourced from
the established keypoint correspondences by SIFT. Our formulation would face
some new challenges due to inevitable two-view and few-view configurations at
the beginning of incremental SfM pipeline for the optimization of coordinate
MLPs, but we found that the strong inductive biases conveying in the 2D
correspondences are feasible and promising to avoid those challenges by
exploiting the relationship between the ray sampling schemes used in volumetric
rendering and the sphere tracing of finding the zero-level set of implicit
surfaces. Based on this, we revisit the pipeline of incremental SfM and renew
the key components of two-view geometry initialization, the camera pose
registration, and the 3D points triangulation, as well as the Bundle Adjustment
in a novel perspective of neural implicit surfaces. Because the coordinate MLPs
unified the scene geometry in small MLP networks, our Level-SfM treats the
zero-level set of the implicit surface as an informative top-down
regularization to manage the reconstructed 3D points, reject the outlier of
correspondences by querying SDF, adjust the estimated geometries by NBA (Neural
BA), finally yielding promising results of 3D reconstruction. Furthermore, our
Level-SfM alleviated the requirement of camera poses for neural 3D
reconstruction.Comment: under revie
Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation
Contemporary domain adaptation offers a practical solution for achieving
cross-domain transfer of semantic segmentation between labeled source data and
unlabeled target data. These solutions have gained significant popularity;
however, they require the model to be retrained when the test environment
changes. This can result in unbearable costs in certain applications due to the
time-consuming training process and concerns regarding data privacy. One-shot
domain adaptation methods attempt to overcome these challenges by transferring
the pre-trained source model to the target domain using only one target data.
Despite this, the referring style transfer module still faces issues with
computation cost and over-fitting problems. To address this problem, we propose
a novel framework called Informative Data Mining (IDM) that enables efficient
one-shot domain adaptation for semantic segmentation. Specifically, IDM
provides an uncertainty-based selection criterion to identify the most
informative samples, which facilitates quick adaptation and reduces redundant
training. We then perform a model adaptation method using these selected
samples, which includes patch-wise mixing and prototype-based information
maximization to update the model. This approach effectively enhances adaptation
and mitigates the overfitting problem. In general, we provide empirical
evidence of the effectiveness and efficiency of IDM. Our approach outperforms
existing methods and achieves a new state-of-the-art one-shot performance of
56.7\%/55.4\% on the GTA5/SYNTHIA to Cityscapes adaptation tasks, respectively.
The code will be released at \url{https://github.com/yxiwang/IDM}.Comment: Accepted by ICCV 202
A corpusābased discourse analysis of liberal studies textbooks in Hong Kong: legitimatizing populism
Researchers have discussed Hong Kongās localist identities, nativist sentiments, and populism, but have not widely examined the extent to which populism could be perceived in education in Hong Kong. As the chief participants of the Occupying Central and the radical Anti-Extradition Bill movements in Hong Kong were students, this suggests the need to explore the relationship between populism and education, particularly the then-controversial liberal studies textbooks. According to contemporary news reports, liberal studies textbooks contained much content stigmatising the Chinese mainland. Previous studies of liberal studies textbooks applied qualitative discourse analysis methods. In this study, mixed-method analysis was applied to a specialised corpus comprising seven commercial liberal studies textbooks containing 248,339 Chinese characters in total to explore the extent to which liberal studies textbooks contain information concerning the key features of populismāthe heightened division between the inner and outer groups. A division was found between positive images of Hong Kong and negative images of China in the narratives of commercial liberal studies textbooks. Accordingly, the textbooks can be understood to contain populism. The present study advocates that relevant educational watchdogs in Hong Kong provide more guidance on the writing and publishing of liberal studies textbooks in the future, keeping the enquiry-based spirit of the liberal studies course fulfilled, and urges stakeholders of Hong Kong education to consider teaching peace education and developing a more inclusive environment
Volumetric Wireframe Parsing from Neural Attraction Fields
The primal sketch is a fundamental representation in Marr's vision theory,
which allows for parsimonious image-level processing from 2D to 2.5D
perception. This paper takes a further step by computing 3D primal sketch of
wireframes from a set of images with known camera poses, in which we take the
2D wireframes in multi-view images as the basis to compute 3D wireframes in a
volumetric rendering formulation. In our method, we first propose a NEural
Attraction (NEAT) Fields that parameterizes the 3D line segments with
coordinate Multi-Layer Perceptrons (MLPs), enabling us to learn the 3D line
segments from 2D observation without incurring any explicit feature
correspondences across views. We then present a novel Global Junction
Perceiving (GJP) module to perceive meaningful 3D junctions from the NEAT
Fields of 3D line segments by optimizing a randomly initialized
high-dimensional latent array and a lightweight decoding MLP. Benefitting from
our explicit modeling of 3D junctions, we finally compute the primal sketch of
3D wireframes by attracting the queried 3D line segments to the 3D junctions,
significantly simplifying the computation paradigm of 3D wireframe parsing. In
experiments, we evaluate our approach on the DTU and BlendedMVS datasets with
promising performance obtained. As far as we know, our method is the first
approach to achieve high-fidelity 3D wireframe parsing without requiring
explicit matching.Comment: Technical report; Video can be found at https://youtu.be/qtBQYbOpVp
DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation
The Class Incremental Semantic Segmentation (CISS) extends the traditional
segmentation task by incrementally learning newly added classes. Previous work
has introduced generative replay, which involves replaying old class samples
generated from a pre-trained GAN, to address the issues of catastrophic
forgetting and privacy concerns. However, the generated images lack semantic
precision and exhibit out-of-distribution characteristics, resulting in
inaccurate masks that further degrade the segmentation performance. To tackle
these challenges, we propose DiffusePast, a novel framework featuring a
diffusion-based generative replay module that generates semantically accurate
images with more reliable masks guided by different instructions (e.g., text
prompts or edge maps). Specifically, DiffusePast introduces a dual-generator
paradigm, which focuses on generating old class images that align with the
distribution of downstream datasets while preserving the structure and layout
of the original images, enabling more precise masks. To adapt to the novel
visual concepts of newly added classes continuously, we incorporate class-wise
token embedding when updating the dual-generator. Moreover, we assign adequate
pseudo-labels of old classes to the background pixels in the new step images,
further mitigating the forgetting of previously learned knowledge. Through
comprehensive experiments, our method demonstrates competitive performance
across mainstream benchmarks, striking a better balance between the performance
of old and novel classes.Comment: e.g.: 13 pages, 7 figure
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
We present the content deformation field CoDeF as a new type of video
representation, which consists of a canonical content field aggregating the
static contents in the entire video and a temporal deformation field recording
the transformations from the canonical image (i.e., rendered from the canonical
content field) to each individual frame along the time axis.Given a target
video, these two fields are jointly optimized to reconstruct it through a
carefully tailored rendering pipeline.We advisedly introduce some
regularizations into the optimization process, urging the canonical content
field to inherit semantics (e.g., the object shape) from the video.With such a
design, CoDeF naturally supports lifting image algorithms for video processing,
in the sense that one can apply an image algorithm to the canonical image and
effortlessly propagate the outcomes to the entire video with the aid of the
temporal deformation field.We experimentally show that CoDeF is able to lift
image-to-image translation to video-to-video translation and lift keypoint
detection to keypoint tracking without any training.More importantly, thanks to
our lifting strategy that deploys the algorithms on only one image, we achieve
superior cross-frame consistency in processed videos compared to existing
video-to-video translation approaches, and even manage to track non-rigid
objects like water and smog.Project page can be found at
https://qiuyu96.github.io/CoDeF/.Comment: Project Webpage: https://qiuyu96.github.io/CoDeF/, Code:
https://github.com/qiuyu96/CoDe
UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Recent years have witnessed the prevailing progress of Generative Adversarial
Networks (GANs) in image-to-image translation. However, the success of these
GAN models hinges on ponderous computational costs and labor-expensive training
data. Current efficient GAN learning techniques often fall into two orthogonal
aspects: i) model slimming via reduced calculation costs;
ii)data/label-efficient learning with fewer training data/labels. To combine
the best of both worlds, we propose a new learning paradigm, Unified GAN
Compression (UGC), with a unified optimization objective to seamlessly prompt
the synergy of model-efficient and label-efficient learning. UGC sets up
semi-supervised-driven network architecture search and adaptive online
semi-supervised distillation stages sequentially, which formulates a
heterogeneous mutual learning scheme to obtain an architecture-flexible,
label-efficient, and performance-excellent model
- ā¦