120 research outputs found
Efficient Approximation Algorithms for Adaptive Seed Minimization
As a dual problem of influence maximization, the seed minimization problem
asks for the minimum number of seed nodes to influence a required number
of users in a given social network . Existing algorithms for seed
minimization mostly consider the non-adaptive setting, where all seed nodes are
selected in one batch without observing how they may influence other users. In
this paper, we study seed minimization in the adaptive setting, where the seed
nodes are selected in several batches, such that the choice of a batch may
exploit information about the actual influence of the previous batches. We
propose a novel algorithm, ASTI, which addresses the adaptive seed minimization
problem in expected
time and offers an approximation guarantee of in expectation, where is the
targeted number of influenced nodes, is size of each seed node batch, and
is a user-specified parameter. To the best of our
knowledge, ASTI is the first algorithm that provides such an approximation
guarantee without incurring prohibitive computation overhead. With extensive
experiments on a variety of datasets, we demonstrate the effectiveness and
efficiency of ASTI over competing methods.Comment: A short version of the paper appeared in 2019 International
Conference on Management of Data (SIGMOD '19), June 30--July 5, 2019,
Amsterdam, Netherlands. ACM, New York, NY, USA, 18 page
Attending Category Disentangled Global Context for Image Classification
In this paper, we propose a general framework for image classification using
the attention mechanism and global context, which could incorporate with
various network architectures to improve their performance. To investigate the
capability of the global context, we compare four mathematical models and
observe the global context encoded in the category disentangled conditional
generative model could give more guidance as "know what is task irrelevant will
also know what is relevant". Based on this observation, we define a novel
Category Disentangled Global Context (CDGC) and devise a deep network to obtain
it. By attending CDGC, the baseline networks could identify the objects of
interest more accurately, thus improving the performance. We apply the
framework to many different network architectures and compare with the
state-of-the-art on four publicly available datasets. Extensive results
validate the effectiveness and superiority of our approach. Code will be made
public upon paper acceptance.Comment: Under revie
Computational design of steady 3D dissection puzzles
Dissection puzzles require assembling a common set of pieces into multiple distinct forms. Existing works focus on creating 2D dissection puzzles that form primitive or naturalistic shapes. Unlike 2D dissection puzzles that could be supported on a tabletop surface, 3D dissection puzzles are preferable to be steady by themselves for each assembly form. In this work, we aim at computationally designing steady 3D dissection puzzles. We address this challenging problem with three key contributions. First, we take two voxelized shapes as inputs and dissect them into a common set of puzzle pieces, during which we allow slightly modifying the input shapes, preferably on their internal volume, to preserve the external appearance. Second, we formulate a formal model of generalized interlocking for connecting pieces into a steady assembly using both their geometric arrangements and friction. Third, we modify the geometry of each dissected puzzle piece based on the formal model such that each assembly form is steady accordingly. We demonstrate the effectiveness of our approach on a wide variety of shapes, compare it with the state-of-the-art on 2D and 3D examples, and fabricate some of our designed puzzles to validate their steadiness
Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search
Cross-Modal sponsored search displays multi-modal advertisements (ads) when
consumers look for desired products by natural language queries in search
engines. Since multi-modal ads bring complementary details for query-ads
matching, the ability to align ads-specific information in both images and
texts is crucial for accurate and flexible sponsored search. Conventional
research mainly studies from the view of modeling the implicit correlations
between images and texts for query-ads matching, ignoring the alignment of
detailed product information and resulting in suboptimal search performance.In
this work, we propose a simple alignment network for explicitly mapping
fine-grained visual parts in ads images to the corresponding text, which
leverages the co-occurrence structure consistency between vision and language
spaces without requiring expensive labeled training data. Moreover, we propose
a novel model for cross-modal sponsored search that effectively conducts the
cross-modal alignment and query-ads matching in two separate processes. In this
way, the model matches the multi-modal input in the same language space,
resulting in a superior performance with merely half of the training data. Our
model outperforms the state-of-the-art models by 2.57% on a large commercial
dataset. Besides sponsored search, our alignment method is applicable for
general cross-modal search. We study a typical cross-modal retrieval task on
the MSCOCO dataset, which achieves consistent performance improvement and
proves the generalization ability of our method. Our code is available at
https://github.com/Pter61/AlignCMSS
Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval
Different from Composed Image Retrieval task that requires expensive labels
for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR)
involves diverse tasks with a broad range of visual content manipulation intent
that could be related to domain, scene, object, and attribute. The key
challenge for ZS-CIR tasks is to learn a more accurate image representation
that has adaptive attention to the reference image for various manipulation
descriptions. In this paper, we propose a novel context-dependent mapping
network, named Context-I2W, for adaptively converting description-relevant
Image information into a pseudo-word token composed of the description for
accurate ZS-CIR. Specifically, an Intent View Selector first dynamically learns
a rotation rule to map the identical image to a task-specific manipulation
view. Then a Visual Target Extractor further captures local information
covering the main targets in ZS-CIR tasks under the guidance of multiple
learnable queries. The two complementary modules work together to map an image
to a context-dependent pseudo-word token without extra supervision. Our model
shows strong generalization ability on four ZS-CIR tasks, including domain
conversion, object composition, object manipulation, and attribute
manipulation. It obtains consistent and significant performance boosts ranging
from 1.88% to 3.60% over the best methods and achieves new state-of-the-art
results on ZS-CIR. Our code is available at
https://github.com/Pter61/context_i2w
Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service
Recent advances in vision-language pre-trained models (VLPs) have
significantly increased visual understanding and cross-modal analysis
capabilities. Companies have emerged to provide multi-modal Embedding as a
Service (EaaS) based on VLPs (e.g., CLIP-based VLPs), which cost a large amount
of training data and resources for high-performance service. However, existing
studies indicate that EaaS is vulnerable to model extraction attacks that
induce great loss for the owners of VLPs. Protecting the intellectual property
and commercial ownership of VLPs is increasingly crucial yet challenging. A
major solution of watermarking model for EaaS implants a backdoor in the model
by inserting verifiable trigger embeddings into texts, but it is only
applicable for large language models and is unrealistic due to data and model
privacy. In this paper, we propose a safe and robust backdoor-based embedding
watermarking method for VLPs called VLPMarker. VLPMarker utilizes embedding
orthogonal transformation to effectively inject triggers into the VLPs without
interfering with the model parameters, which achieves high-quality copyright
verification and minimal impact on model performance. To enhance the watermark
robustness, we further propose a collaborative copyright verification strategy
based on both backdoor trigger and embedding distribution, enhancing resilience
against various attacks. We increase the watermark practicality via an
out-of-distribution trigger selection approach, removing access to the model
training data and thus making it possible for many real-world scenarios. Our
extensive experiments on various datasets indicate that the proposed
watermarking approach is effective and safe for verifying the copyright of VLPs
for multi-modal EaaS and robust against model extraction attacks. Our code is
available at https://github.com/Pter61/vlpmarker
- …