120 research outputs found

    Efficient Approximation Algorithms for Adaptive Seed Minimization

    Full text link
    As a dual problem of influence maximization, the seed minimization problem asks for the minimum number of seed nodes to influence a required number η\eta of users in a given social network GG. Existing algorithms for seed minimization mostly consider the non-adaptive setting, where all seed nodes are selected in one batch without observing how they may influence other users. In this paper, we study seed minimization in the adaptive setting, where the seed nodes are selected in several batches, such that the choice of a batch may exploit information about the actual influence of the previous batches. We propose a novel algorithm, ASTI, which addresses the adaptive seed minimization problem in O(η(m+n)ε2lnn)O\Big(\frac{\eta \cdot (m+n)}{\varepsilon^2}\ln n \Big) expected time and offers an approximation guarantee of (lnη+1)2(1(11/b)b)(11/e)(1ε)\frac{(\ln \eta+1)^2}{(1 - (1-1/b)^b) (1-1/e)(1-\varepsilon)} in expectation, where η\eta is the targeted number of influenced nodes, bb is size of each seed node batch, and ε(0,1)\varepsilon \in (0, 1) is a user-specified parameter. To the best of our knowledge, ASTI is the first algorithm that provides such an approximation guarantee without incurring prohibitive computation overhead. With extensive experiments on a variety of datasets, we demonstrate the effectiveness and efficiency of ASTI over competing methods.Comment: A short version of the paper appeared in 2019 International Conference on Management of Data (SIGMOD '19), June 30--July 5, 2019, Amsterdam, Netherlands. ACM, New York, NY, USA, 18 page

    Attending Category Disentangled Global Context for Image Classification

    Full text link
    In this paper, we propose a general framework for image classification using the attention mechanism and global context, which could incorporate with various network architectures to improve their performance. To investigate the capability of the global context, we compare four mathematical models and observe the global context encoded in the category disentangled conditional generative model could give more guidance as "know what is task irrelevant will also know what is relevant". Based on this observation, we define a novel Category Disentangled Global Context (CDGC) and devise a deep network to obtain it. By attending CDGC, the baseline networks could identify the objects of interest more accurately, thus improving the performance. We apply the framework to many different network architectures and compare with the state-of-the-art on four publicly available datasets. Extensive results validate the effectiveness and superiority of our approach. Code will be made public upon paper acceptance.Comment: Under revie

    Computational design of steady 3D dissection puzzles

    Get PDF
    Dissection puzzles require assembling a common set of pieces into multiple distinct forms. Existing works focus on creating 2D dissection puzzles that form primitive or naturalistic shapes. Unlike 2D dissection puzzles that could be supported on a tabletop surface, 3D dissection puzzles are preferable to be steady by themselves for each assembly form. In this work, we aim at computationally designing steady 3D dissection puzzles. We address this challenging problem with three key contributions. First, we take two voxelized shapes as inputs and dissect them into a common set of puzzle pieces, during which we allow slightly modifying the input shapes, preferably on their internal volume, to preserve the external appearance. Second, we formulate a formal model of generalized interlocking for connecting pieces into a steady assembly using both their geometric arrangements and friction. Third, we modify the geometry of each dissected puzzle piece based on the formal model such that each assembly form is steady accordingly. We demonstrate the effectiveness of our approach on a wide variety of shapes, compare it with the state-of-the-art on 2D and 3D examples, and fabricate some of our designed puzzles to validate their steadiness

    Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search

    Full text link
    Cross-Modal sponsored search displays multi-modal advertisements (ads) when consumers look for desired products by natural language queries in search engines. Since multi-modal ads bring complementary details for query-ads matching, the ability to align ads-specific information in both images and texts is crucial for accurate and flexible sponsored search. Conventional research mainly studies from the view of modeling the implicit correlations between images and texts for query-ads matching, ignoring the alignment of detailed product information and resulting in suboptimal search performance.In this work, we propose a simple alignment network for explicitly mapping fine-grained visual parts in ads images to the corresponding text, which leverages the co-occurrence structure consistency between vision and language spaces without requiring expensive labeled training data. Moreover, we propose a novel model for cross-modal sponsored search that effectively conducts the cross-modal alignment and query-ads matching in two separate processes. In this way, the model matches the multi-modal input in the same language space, resulting in a superior performance with merely half of the training data. Our model outperforms the state-of-the-art models by 2.57% on a large commercial dataset. Besides sponsored search, our alignment method is applicable for general cross-modal search. We study a typical cross-modal retrieval task on the MSCOCO dataset, which achieves consistent performance improvement and proves the generalization ability of our method. Our code is available at https://github.com/Pter61/AlignCMSS

    Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

    Full text link
    Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute. The key challenge for ZS-CIR tasks is to learn a more accurate image representation that has adaptive attention to the reference image for various manipulation descriptions. In this paper, we propose a novel context-dependent mapping network, named Context-I2W, for adaptively converting description-relevant Image information into a pseudo-word token composed of the description for accurate ZS-CIR. Specifically, an Intent View Selector first dynamically learns a rotation rule to map the identical image to a task-specific manipulation view. Then a Visual Target Extractor further captures local information covering the main targets in ZS-CIR tasks under the guidance of multiple learnable queries. The two complementary modules work together to map an image to a context-dependent pseudo-word token without extra supervision. Our model shows strong generalization ability on four ZS-CIR tasks, including domain conversion, object composition, object manipulation, and attribute manipulation. It obtains consistent and significant performance boosts ranging from 1.88% to 3.60% over the best methods and achieves new state-of-the-art results on ZS-CIR. Our code is available at https://github.com/Pter61/context_i2w

    Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service

    Full text link
    Recent advances in vision-language pre-trained models (VLPs) have significantly increased visual understanding and cross-modal analysis capabilities. Companies have emerged to provide multi-modal Embedding as a Service (EaaS) based on VLPs (e.g., CLIP-based VLPs), which cost a large amount of training data and resources for high-performance service. However, existing studies indicate that EaaS is vulnerable to model extraction attacks that induce great loss for the owners of VLPs. Protecting the intellectual property and commercial ownership of VLPs is increasingly crucial yet challenging. A major solution of watermarking model for EaaS implants a backdoor in the model by inserting verifiable trigger embeddings into texts, but it is only applicable for large language models and is unrealistic due to data and model privacy. In this paper, we propose a safe and robust backdoor-based embedding watermarking method for VLPs called VLPMarker. VLPMarker utilizes embedding orthogonal transformation to effectively inject triggers into the VLPs without interfering with the model parameters, which achieves high-quality copyright verification and minimal impact on model performance. To enhance the watermark robustness, we further propose a collaborative copyright verification strategy based on both backdoor trigger and embedding distribution, enhancing resilience against various attacks. We increase the watermark practicality via an out-of-distribution trigger selection approach, removing access to the model training data and thus making it possible for many real-world scenarios. Our extensive experiments on various datasets indicate that the proposed watermarking approach is effective and safe for verifying the copyright of VLPs for multi-modal EaaS and robust against model extraction attacks. Our code is available at https://github.com/Pter61/vlpmarker
    corecore