5 research outputs found

    CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

    Full text link
    This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a zero-shot way, similar to ``Contrastive Language-Image Pre-training (CLIP)'' and ``Locked-image Tuning (LiT)'' that have recently gained considerable attention. Most existing works for cross-modal representation alignment (including CLIP and LiT) use the standard contrastive training objective, which employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more `non-binary' treatment. To address this, we propose a novel loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to align the embedding space of one modality with another. Owing to the continuous nature of similarity in the proposed loss function, these models outperform existing methods for 0-shot transfer across multiple models, datasets and modalities. Particularly, we consider the modality pairs of image-text and speech-text and our models achieve 5-8% (absolute) improvement over previous state-of-the-art methods in 0-shot image classification and 20-30% (absolute) improvement in 0-shot speech-to-intent classification and keyword classification.Comment: Accepted to Neural Information Processing Systems (NeurIPS) 2023 conferenc

    Subspace learning by randomized sketching

    Get PDF
    High dimensional data is often accompanied by inherent low dimensionality that can be leveraged to design scalable machine learning and signal processing algorithms. Developing efficient computational frameworks that take advantage of the underlying structure in the data is crucial. In this thesis, we consider a particular form of inherent low dimensionality in data: subspace models. In many applications, data is known to lie close to a low dimensional subspace. The underlying subspace itself may or may not be known a priori. Incorporating this structure into data acquisition systems and algorithms can aid in scalability. We first consider two specific applications in the field of array signal processing where subspace priors on the data are commonly used. For both these applications, we develop algorithms that require a number of measurements that scale with only the dimension of the underlying subspace. In doing so, we show that arrays demand dimensionality reduction maps that can operate on individual subsets or blocks of data at a time, without having access to other blocks. Inspired by such block constraints, we consider more general problems in numerical linear algebra where data has a natural partition into blocks. This is common in applications with distributed or decentralized data. We study the problems of sketched ridge regression and sketched matrix multiplication under this constraint and give sample optimal theoretical guarantees on block diagonal sketching matrices. Extending the block model to low-rank matrix sensing, we then study the problem of recovering a low-rank matrix from compressed observations of each column. While each column itself is compressed to a point that is beyond recovery, we leverage their joint structure to recover the matrix as a whole. To do so, we establish a new framework to design estimators of low-rank matrices that obey the constraints imposed by different observation models. Finally, we extend our framework of designing low rank matrix estimators to the application of blind deconvolution. We provide a novel estimator that enjoys uniform recovery guarantees over the entire signal class while being sample optimal.Ph.D
    corecore