22 research outputs found

    Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search

    Full text link
    Cross-Modal sponsored search displays multi-modal advertisements (ads) when consumers look for desired products by natural language queries in search engines. Since multi-modal ads bring complementary details for query-ads matching, the ability to align ads-specific information in both images and texts is crucial for accurate and flexible sponsored search. Conventional research mainly studies from the view of modeling the implicit correlations between images and texts for query-ads matching, ignoring the alignment of detailed product information and resulting in suboptimal search performance.In this work, we propose a simple alignment network for explicitly mapping fine-grained visual parts in ads images to the corresponding text, which leverages the co-occurrence structure consistency between vision and language spaces without requiring expensive labeled training data. Moreover, we propose a novel model for cross-modal sponsored search that effectively conducts the cross-modal alignment and query-ads matching in two separate processes. In this way, the model matches the multi-modal input in the same language space, resulting in a superior performance with merely half of the training data. Our model outperforms the state-of-the-art models by 2.57% on a large commercial dataset. Besides sponsored search, our alignment method is applicable for general cross-modal search. We study a typical cross-modal retrieval task on the MSCOCO dataset, which achieves consistent performance improvement and proves the generalization ability of our method. Our code is available at https://github.com/Pter61/AlignCMSS

    Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

    Full text link
    Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute. The key challenge for ZS-CIR tasks is to learn a more accurate image representation that has adaptive attention to the reference image for various manipulation descriptions. In this paper, we propose a novel context-dependent mapping network, named Context-I2W, for adaptively converting description-relevant Image information into a pseudo-word token composed of the description for accurate ZS-CIR. Specifically, an Intent View Selector first dynamically learns a rotation rule to map the identical image to a task-specific manipulation view. Then a Visual Target Extractor further captures local information covering the main targets in ZS-CIR tasks under the guidance of multiple learnable queries. The two complementary modules work together to map an image to a context-dependent pseudo-word token without extra supervision. Our model shows strong generalization ability on four ZS-CIR tasks, including domain conversion, object composition, object manipulation, and attribute manipulation. It obtains consistent and significant performance boosts ranging from 1.88% to 3.60% over the best methods and achieves new state-of-the-art results on ZS-CIR. Our code is available at https://github.com/Pter61/context_i2w

    Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service

    Full text link
    Recent advances in vision-language pre-trained models (VLPs) have significantly increased visual understanding and cross-modal analysis capabilities. Companies have emerged to provide multi-modal Embedding as a Service (EaaS) based on VLPs (e.g., CLIP-based VLPs), which cost a large amount of training data and resources for high-performance service. However, existing studies indicate that EaaS is vulnerable to model extraction attacks that induce great loss for the owners of VLPs. Protecting the intellectual property and commercial ownership of VLPs is increasingly crucial yet challenging. A major solution of watermarking model for EaaS implants a backdoor in the model by inserting verifiable trigger embeddings into texts, but it is only applicable for large language models and is unrealistic due to data and model privacy. In this paper, we propose a safe and robust backdoor-based embedding watermarking method for VLPs called VLPMarker. VLPMarker utilizes embedding orthogonal transformation to effectively inject triggers into the VLPs without interfering with the model parameters, which achieves high-quality copyright verification and minimal impact on model performance. To enhance the watermark robustness, we further propose a collaborative copyright verification strategy based on both backdoor trigger and embedding distribution, enhancing resilience against various attacks. We increase the watermark practicality via an out-of-distribution trigger selection approach, removing access to the model training data and thus making it possible for many real-world scenarios. Our extensive experiments on various datasets indicate that the proposed watermarking approach is effective and safe for verifying the copyright of VLPs for multi-modal EaaS and robust against model extraction attacks. Our code is available at https://github.com/Pter61/vlpmarker

    Enterprise Architecture Specification Case Study

    Get PDF
    A graduate course in enterprise architecture had a team project component in which a real-world business case, provided by an industry sponsor, formed the basis of the project charter and the architecture statement of work. The paper aims to share the team project experience on developing the architecture specifications based on the business case of an accountable health care organization. Students collaborated as a team in various roles to develop the architecture specifications for a new business initiative of the sponsoring organization, XYZ ACO. The teaching case describes the case study approach and the architecture approach adopted for the architecture process, and is accompanied by Teaching Case Notes which provide a selection of the models developed by members of the project team towards the architecture specifications. The course started with coverage of enterprise architecture theory, best practices and standards, and the team project gave students the opportunity to apply their theoretical knowledge and “learn by doing”. Students were challenged to interpret the business case, the project charter and project requirements, and each team member was allocated an architecture viewpoint and a role to play. The Teaching Case presents a summary of the team project and the lessons learned in performing the project

    Global variation in anastomosis and end colostomy formation following left-sided colorectal resection

    Get PDF
    Background End colostomy rates following colorectal resection vary across institutions in high-income settings, being influenced by patient, disease, surgeon and system factors. This study aimed to assess global variation in end colostomy rates after left-sided colorectal resection. Methods This study comprised an analysis of GlobalSurg-1 and -2 international, prospective, observational cohort studies (2014, 2016), including consecutive adult patients undergoing elective or emergency left-sided colorectal resection within discrete 2-week windows. Countries were grouped into high-, middle- and low-income tertiles according to the United Nations Human Development Index (HDI). Factors associated with colostomy formation versus primary anastomosis were explored using a multilevel, multivariable logistic regression model. Results In total, 1635 patients from 242 hospitals in 57 countries undergoing left-sided colorectal resection were included: 113 (6·9 per cent) from low-HDI, 254 (15·5 per cent) from middle-HDI and 1268 (77·6 per cent) from high-HDI countries. There was a higher proportion of patients with perforated disease (57·5, 40·9 and 35·4 per cent; P < 0·001) and subsequent use of end colostomy (52·2, 24·8 and 18·9 per cent; P < 0·001) in low- compared with middle- and high-HDI settings. The association with colostomy use in low-HDI settings persisted (odds ratio (OR) 3·20, 95 per cent c.i. 1·35 to 7·57; P = 0·008) after risk adjustment for malignant disease (OR 2·34, 1·65 to 3·32; P < 0·001), emergency surgery (OR 4·08, 2·73 to 6·10; P < 0·001), time to operation at least 48 h (OR 1·99, 1·28 to 3·09; P = 0·002) and disease perforation (OR 4·00, 2·81 to 5·69; P < 0·001). Conclusion Global differences existed in the proportion of patients receiving end stomas after left-sided colorectal resection based on income, which went beyond case mix alone

    An Approach for Designing Secure and High Performance Cloud Systems

    No full text
    Recent expansions of cloud computing have been growing at a phenomenal rate. Security and privacy issues have become a considerable issue while the applications of big data are growing dramatically fast in cloud computing. However, there exists a contradiction between ensuring a high performance and achieving a high-level security and privacy protection due to the restrictions of the computing resources, based on the findings of the literature review. This study focuses on this contradiction issue and intend to develop an approach of effectuating the cloud system design for a high-level security and privacy protection while acquiring a high performance. The work consists of four research tasks that support the solution to the proposed problem. They are (i) designing a Optimal Fully Homomorphic Encryption (O-FHE) mechanism that can both avoid noise and execute efficiently; (ii) designing a privacy-preserving data encryption strategy while considering efficiency; (iii) developing an approach of the data analytics manager system for in-memory big data analytics; (iv) designing an adaptive energy-aware data allocation approach for heterogeneous memory and creating an efficient data allocation approach for cloud-based heterogeneous memory. The research implements experimental evaluations to examine the performance of the proposed approaches. The main contributions of this study address three aspects. First, this study has proposed an O-FHE method that is different from all approaches proposed by the prior researches. Second, this study addresses the contradiction between the data security and system performance and presents a privacy-preserving strategy for secure data transmissions in cloud systems. Finally, this study attempts to increase the computation efficiency by enhancing the functioning of hardware, more specifically, using heterogeneous memory and in-memory data analytics

    Mobile cloud computing: models, implementation, and security

    No full text

    Blend Arithmetic Operations on Tensor-Based Fully Homomorphic Encryption Over Real Numbers

    No full text
    corecore