761 research outputs found

    Simple to Complex Cross-modal Learning to Rank

    Get PDF
    The heterogeneity-gap between different modalities brings a significant challenge to multimedia information retrieval. Some studies formalize the cross-modal retrieval tasks as a ranking problem and learn a shared multi-modal embedding space to measure the cross-modality similarity. However, previous methods often establish the shared embedding space based on linear mapping functions which might not be sophisticated enough to reveal more complicated inter-modal correspondences. Additionally, current studies assume that the rankings are of equal importance, and thus all rankings are used simultaneously, or a small number of rankings are selected randomly to train the embedding space at each iteration. Such strategies, however, always suffer from outliers as well as reduced generalization capability due to their lack of insightful understanding of procedure of human cognition. In this paper, we involve the self-paced learning theory with diversity into the cross-modal learning to rank and learn an optimal multi-modal embedding space based on non-linear mapping functions. This strategy enhances the model's robustness to outliers and achieves better generalization via training the model gradually from easy rankings by diverse queries to more complex ones. An efficient alternative algorithm is exploited to solve the proposed challenging problem with fast convergence in practice. Extensive experimental results on several benchmark datasets indicate that the proposed method achieves significant improvements over the state-of-the-arts in this literature.Comment: 14 pages; Accepted by Computer Vision and Image Understandin

    Commission pricing strategy on online retail platforms:power and dependence in triad

    Get PDF

    Living Between Dialectics: A Bakhtinian and Lacanian Reading of Jade Snow Wong\u27s Fifth Chinese Daughter and Maxine Hong Kingston\u27s The Woman Warrior

    Get PDF
    This dissertation creates a dialogic web encompassing the sociocultural and psychological aspects of Jade Snow Wong\u27s and Maxine Hong Kingston\u27s autobiographies Fifth Chinese Daughter and The Woman Warrior. The American mainstream society and Chinese patriarchal community have conceived insurmountable ethnic and gender differences that are inherent in Wong\u27s and Kingston\u27s growing-up environment. The dissertation argues that how the two authors perceive the way of how these differences have been conceived is central to our understanding of their representations of ethnic female consciousness when they are writing as both subjects and writers. The dissertation notes, to be more specific, Wong\u27s and Kingston\u27s texts are like a mirror reflecting how they are informed by the principal social ideologies such as Orientalism, the American dream, and the patriarchal definition of womanhood. The dissertation goes on to argue that the literary strategies Wong and Kingston have adopted in their texts function to represent the difference of the protagonist\u27s world longtime suppressed by dominant social values, and the mixture of different cultural and literary elements in the text metaphorically manifest the dialogic nature of American ethnic literature and culture. The dissertation contends that all these strategies have political implications because they are used to challenge the conventional unitary literary language

    Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

    Full text link
    As a fundamental aspect of human life, two-person interactions contain meaningful information about people's activities, relationships, and social settings. Human action recognition serves as the foundation for many smart applications, with a strong focus on personal privacy. However, recognizing two-person interactions poses more challenges due to increased body occlusion and overlap compared to single-person actions. In this paper, we propose a point cloud-based network named Two-stream Multi-level Dynamic Point Transformer for two-person interaction recognition. Our model addresses the challenge of recognizing two-person interactions by incorporating local-region spatial information, appearance information, and motion information. To achieve this, we introduce a designed frame selection method named Interval Frame Sampling (IFS), which efficiently samples frames from videos, capturing more discriminative information in a relatively short processing time. Subsequently, a frame features learning module and a two-stream multi-level feature aggregation module extract global and partial features from the sampled frames, effectively representing the local-region spatial information, appearance information, and motion information related to the interactions. Finally, we apply a transformer to perform self-attention on the learned features for the final classification. Extensive experiments are conducted on two large-scale datasets, the interaction subsets of NTU RGB+D 60 and NTU RGB+D 120. The results show that our network outperforms state-of-the-art approaches across all standard evaluation settings

    Unconditionally secure quantum bit commitment: Revised

    Get PDF
    Bit commitment is a primitive task of many cryptographic tasks. It has been proved that the unconditionally secure quantum bit commitment is impossible from Mayers-Lo-Chau No-go theorem. A variant of quantum bit commitment requires cheat sensible for both parties. Another results shows that these no-go theorem can be evaded using the non-relativistic transmission or Minkowski causality. Our goal in this paper is to revise unconditionally secure quantum bit commitment. We firstly propose new quantum bit commitments using distributed settings and quantum entanglement which is used to overcome Mayers-Lo-Chau No-go Theorems. Both protocols are perfectly concealing, perfectly binding, and cheating sensible in asymptotic model against entanglement-based attack and splitting attack from quantum networks. These schemes are then extended to commit secret bits against eavesdroppers. We further propose two new applications. One is to commit qubit states. The other is to commit unitary circuits. These new schemes are useful for committing several primitives including sampling model, randomness, and Boolean functions in cryptographic protocols

    Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method

    Full text link
    Feature alignment is the primary means of fusing multimodal data. We propose a feature alignment method that fully fuses multimodal information, which alternately shifts and expands feature information from different modalities to have a consistent representation in a feature space. The proposed method can robustly capture high-level interactions between features of different modalities, thus significantly improving the performance of multimodal learning. We also show that the proposed method outperforms other popular multimodal schemes on multiple tasks. Experimental evaluation of ETT and MIT-BIH-Arrhythmia, datasets shows that the proposed method achieves state of the art performance.Comment: 8 pages,7 figure
    corecore