98 research outputs found
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
Composed image retrieval aims to find an image that best matches a given
multi-modal user query consisting of a reference image and text pair. Existing
methods commonly pre-compute image embeddings over the entire corpus and
compare these to a reference image embedding modified by the query text at test
time. Such a pipeline is very efficient at test time since fast vector
distances can be used to evaluate candidates, but modifying the reference image
embedding guided only by a short textual description can be difficult,
especially independent of potential candidates. An alternative approach is to
allow interactions between the query and every possible candidate, i.e.,
reference-text-candidate triplets, and pick the best from the entire set.
Though this approach is more discriminative, for large-scale datasets the
computational cost is prohibitive since pre-computation of candidate embeddings
is no longer possible. We propose to combine the merits of both schemes using a
two-stage model. Our first stage adopts the conventional vector distancing
metric and performs a fast pruning among candidates. Meanwhile, our second
stage employs a dual-encoder architecture, which effectively attends to the
input triplet of reference-text-candidate and re-ranks the candidates. Both
stages utilize a vision-and-language pre-trained network, which has proven
beneficial for various downstream tasks. Our method consistently outperforms
state-of-the-art approaches on standard benchmarks for the task.Comment: 14 page
Multi-sourced modelling for strip breakage using knowledge graph embeddings
Strip breakage is an undesired production failure in cold rolling. Typically, conventional studies focused on cause analyses, and existing data-driven approaches only rely on a single data source, resulting in a limited amount of information. Hence, we propose an approach for modelling breakage using multiple data sources. Many breakage-relevant features from multiple sources are identified and used, and these features are integrated using a breakage-centric ontology which is then used to create knowledge graphs. Through ontology construction and knowledge
embedding, a real-world study using data from a cold-rolled strip manufacturer was conducted using the proposed approach
Learning Effective NeRFs and SDFs Representations with 3D Generative Adversarial Networks for 3D Object Generation: Technical Report for ICCV 2023 OmniObject3D Challenge
In this technical report, we present a solution for 3D object generation of
ICCV 2023 OmniObject3D Challenge. In recent years, 3D object generation has
made great process and achieved promising results, but it remains a challenging
task due to the difficulty of generating complex, textured and high-fidelity
results. To resolve this problem, we study learning effective NeRFs and SDFs
representations with 3D Generative Adversarial Networks (GANs) for 3D object
generation. Specifically, inspired by recent works, we use the efficient
geometry-aware 3D GANs as the backbone incorporating with label embedding and
color mapping, which enables to train the model on different taxonomies
simultaneously. Then, through a decoder, we aggregate the resulting features to
generate Neural Radiance Fields (NeRFs) based representations for rendering
high-fidelity synthetic images. Meanwhile, we optimize Signed Distance
Functions (SDFs) to effectively represent objects with 3D meshes. Besides, we
observe that this model can be effectively trained with only a few images of
each object from a variety of classes, instead of using a great number of
images per object or training one model per class. With this pipeline, we can
optimize an effective model for 3D object generation. This solution is one of
the final top-3-place solutions in the ICCV 2023 OmniObject3D Challenge
A multi-source feature-level fusion approach for predicting strip breakage in cold rolling
As an undesired and instantaneous failure in the production of cold-rolled strip products, strip breakage results in yield loss, reduced work speed and further equipment damage. Typically, studies have investigated this failure in a retrospective way focused on root cause analyses, and these causes are proven to be multi-faceted. In order to model the onset of this failure in a predictive manner, an integrated multi-source feature-level approach is proposed in this work. Firstly, by harnessing heterogeneous data across the breakage-relevant processes, blocks of data from different sources are collected to improve the breadth of breakage-centric information and are pre-processed according to its granularity. Afterwards, feature extraction or selection is applied to each block of data separately according to the domain knowledge. Matrices of selected features are concatenated in either flattened or expanded manner for comparison. Finally, fused features are used as inputs for strip breakage prediction using recurrent neural networks (RNNs). An experimental study using real-world data instantaneouseffectiveness of the proposed approach
An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems
The Segment Anything Model (SAM) has demonstrated exceptional performance and
versatility, making it a promising tool for various related tasks. In this
report, we explore the application of SAM in Weakly-Supervised Semantic
Segmentation (WSSS). Particularly, we adapt SAM as the pseudo-label generation
pipeline given only the image-level class labels. While we observed impressive
results in most cases, we also identify certain limitations. Our study includes
performance evaluations on PASCAL VOC and MS-COCO, where we achieved remarkable
improvements over the latest state-of-the-art methods on both datasets. We
anticipate that this report encourages further explorations of adopting SAM in
WSSS, as well as wider real-world applications.Comment: Technique repor
Bi-directional Training for Composed Image Retrieval via Text Prompt Learning
Composed image retrieval searches for a target image based on a multi-modal
user query comprised of a reference image and modification text describing the
desired changes. Existing approaches to solving this challenging task learn a
mapping from the (reference image, modification text)-pair to an image
embedding that is then matched against a large image corpus. One area that has
not yet been explored is the reverse direction, which asks the question, what
reference image when modified as describe by the text would produce the given
target image? In this work we propose a bi-directional training scheme that
leverages such reversed queries and can be applied to existing composed image
retrieval architectures. To encode the bi-directional query we prepend a
learnable token to the modification text that designates the direction of the
query and then finetune the parameters of the text embedding module. We make no
other changes to the network architecture. Experiments on two standard datasets
show that our novel approach achieves improved performance over a baseline
BLIP-based model that itself already achieves state-of-the-art performance.Comment: 12 pages, 5 figure
Spin Excitation in Coupled Honeycomb Lattice NiInSbO
We performed an inelastic neutron scattering experiment on a polycrystalline
sample of a helimagnet NiInSbO to construct the spin Hamiltonian.
Well-defined spin-wave excitation with a band energy of 20 meV was observed
below K. Using the linear spin-wave theory, the spectrum was
reasonably reproduced with honeycomb spin layers coupled along the stacking
axis (the axis). The proposed spin model reproduces the soliton lattice
induced by a magnetic field applied perpendicular to the axis.Comment: 8 pages, 5 figure
Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning
Machine Unlearning (MU) algorithms have become increasingly critical due to
the imperative adherence to data privacy regulations. The primary objective of
MU is to erase the influence of specific data samples on a given model without
the need to retrain it from scratch. Accordingly, existing methods focus on
maximizing user privacy protection. However, there are different degrees of
privacy regulations for each real-world web-based application. Exploring the
full spectrum of trade-offs between privacy, model utility, and runtime
efficiency is critical for practical unlearning scenarios. Furthermore,
designing the MU algorithm with simple control of the aforementioned trade-off
is desirable but challenging due to the inherent complex interaction. To
address the challenges, we present Controllable Machine Unlearning (ConMU), a
novel framework designed to facilitate the calibration of MU. The ConMU
framework contains three integral modules: an important data selection module
that reconciles the runtime efficiency and model generalization, a progressive
Gaussian mechanism module that balances privacy and model generalization, and
an unlearning proxy that controls the trade-offs between privacy and runtime
efficiency. Comprehensive experiments on various benchmark datasets have
demonstrated the robust adaptability of our control mechanism and its
superiority over established unlearning methods. ConMU explores the full
spectrum of the Privacy-Utility-Efficiency trade-off and allows practitioners
to account for different real-world regulations. Source code available at:
https://github.com/guangyaodou/ConMU
- …