Search CORE

10 research outputs found

CognitiveCNN: Mimicking Human Cognitive Models to resolve Texture-Shape Bias

Author: Banerjee Biplab
Chaudhari Subhasis
Mohla Satyam
Nasery Anshul
Publication venue
Publication date: 25/06/2020
Field of study

Recent works demonstrate the texture bias in Convolutional Neural Networks (CNNs), conflicting with early works claiming that networks identify objects using shape. It is commonly believed that the cost function forces the network to take a greedy route to increase accuracy using texture, failing to explore any global statistics. We propose a novel intuitive architecture, namely CognitiveCNN, inspired from feature integration theory in psychology to utilise human-interpretable feature like shape, texture, edges etc. to reconstruct, and classify the image. We define two metrics, namely TIC and RIC to quantify the importance of each stream using attention maps. We introduce a regulariser which ensures that the contribution of each feature is same for any task, as it is for reconstruction; and perform experiments to show the resulting boost in accuracy and robustness besides imparting explainability. Lastly, we adapt these ideas to conventional CNNs and propose Augmented Cognitive CNN to achieve superior performance in object recognition.Comment: 5 Pages; LaTeX; Published at ICLR 2020 Workshop on Bridging AI and Cognitive Scienc

arXiv.org e-Print Archive

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Author: Behl Harkirat
Jain Yash
Nasery Anshul
Vineet Vibhav
Publication venue
Publication date: 19/04/2024
Field of study

Modern video generation models like Sora have achieved remarkable success in producing high-quality videos. However, a significant limitation is their inability to offer interactive control to users, a feature that promises to open up unprecedented applications and creativity. In this work, we introduce the first solution to equip diffusion-based video generation models with spatio-temporal control. We present Peekaboo, a novel masked attention module, which seamlessly integrates with current video generation models offering control without the need for additional training or inference overhead. To facilitate future research, we also introduce a comprehensive benchmark for interactive video generation. This benchmark offers a standardized framework for the community to assess the efficacy of emerging interactive video generation models. Our extensive qualitative and quantitative assessments reveal that Peekaboo achieves up to a 3.8x improvement in mIoU over baseline models, all while maintaining the same latency. Code and benchmark are available on the webpage.Comment: Project webpage - https://jinga-lala.github.io/projects/Peekaboo

arXiv.org e-Print Archive

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates

Author: Jain Prateek
Nasery Anshul
Shah Hardik
Suggala Arun Sai
Publication venue
Publication date: 09/06/2023
Field of study

Neural network (NN) compression via techniques such as pruning, quantization requires setting compression hyperparameters (e.g., number of channels to be pruned, bitwidths for quantization) for each layer either manually or via neural architecture search (NAS) which can be computationally expensive. We address this problem by providing an end-to-end technique that optimizes for model's Floating Point Operations (FLOPs) or for on-device latency via a novel

\frac{\ell_1}{\ell_2}

latency surrogate. Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization. Crucially, it is fast and runs in almost the same amount of time as single model training; which is a significant training speed-up over standard NAS methods. For BERT compression on GLUE fine-tuning tasks, we achieve

50\%

reduction in FLOPs with only

1\%

drop in performance. For compressing MobileNetV3 on ImageNet-1K, we achieve

15\%

reduction in FLOPs, and

11\%

reduction in on-device latency without drop in accuracy, while still requiring

3\times

less training compute than SOTA compression techniques. Finally, for transfer learning on smaller datasets, our technique identifies

1.2\times

1.4\times

cheaper architectures than standard MobileNetV3, EfficientNet suite of architectures at almost the same training cost and accuracy

arXiv.org e-Print Archive

PLeaS -- Merging Models with Permutations and Least Squares

Author: Hayase Jonathan
Koh Pang Wei
Nasery Anshul
Oh Sewoong
Publication venue
Publication date: 02/07/2024
Field of study

The democratization of machine learning systems has made the process of fine-tuning accessible to a large number of practitioners, leading to a wide range of open-source models fine-tuned on specialized tasks and datasets. Recent work has proposed to merge such models to combine their functionalities. However, prior approaches are restricted to models that are fine-tuned from the same base model. Furthermore, the final merged model is typically restricted to be of the same size as the original models. In this work, we propose a new two-step algorithm to merge models-termed PLeaS-which relaxes these constraints. First, leveraging the Permutation symmetries inherent in the two models, PLeaS partially matches nodes in each layer by maximizing alignment. Next, PLeaS computes the weights of the merged model as a layer-wise Least Squares solution to minimize the approximation error between the features of the merged model and the permuted features of the original models. into a single model of a desired size, even when the two original models are fine-tuned from different base models. We also present a variant of our method which can merge models without using data from the fine-tuning domains. We demonstrate our method to merge ResNet models trained with shared and different label spaces, and show that we can perform better than the state-of-the-art merging methods by 8 to 15 percentage points for the same target compute while merging models trained on DomainNet and on fine-grained classification tasks

arXiv.org e-Print Archive

Label Differential Privacy via Aggregation

Author: Brahmbhatt Anand
Havaldar Shreyas
Nasery Anshul
Raghuveer Aravindan
Saket Rishi
Publication venue
Publication date: 27/11/2023
Field of study

In many real-world applications, due to recent developments in the privacy landscape, training data may be aggregated to preserve the privacy of sensitive training labels. In the learning from label proportions (LLP) framework, the dataset is partitioned into bags of feature-vectors which are available only with the sum of the labels per bag. A further restriction, which we call learning from bag aggregates (LBA) is where instead of individual feature-vectors, only the (possibly weighted) sum of the feature-vectors per bag is available. We study whether such aggregation techniques can provide privacy guarantees under the notion of label differential privacy (label-DP) previously studied in for e.g. [Chaudhuri-Hsu'11, Ghazi et al.'21, Esfandiari et al.'22]. It is easily seen that naive LBA and LLP do not provide label-DP. Our main result however, shows that weighted LBA using iid Gaussian weights with

m

randomly sampled disjoint

k

-sized bags is in fact

(\varepsilon, \delta)

-label-DP for any

\varepsilon > 0

with

\delta \approx \exp(-\Omega(\sqrt{k}))

assuming a lower bound on the linear-mse regression loss. Further, the

\ell_2^2

-regressor which minimizes the loss on the aggregated dataset has a loss within

\left(1 + o(1)\right)

-factor of the optimum on the original dataset w.p.

\approx 1 - exp(-\Omega(m))

. We emphasize that no additive label noise is required. The analogous weighted-LLP does not however admit label-DP. Nevertheless, we show that if additive

N(0, 1)

noise can be added to any constant fraction of the instance labels, then the noisy weighted-LLP admits similar label-DP guarantees without assumptions on the dataset, while preserving the utility of Lipschitz-bounded neural mse-regression tasks. Our work is the first to demonstrate that label-DP can be achieved by randomly weighted aggregation for regression tasks, using no or little additive noise

arXiv.org e-Print Archive

Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks

Author: Addepalli Sravanti
Babu R. Venkatesh
Jain Prateek
Nasery Anshul
Netrapalli Praneeth
Publication venue
Publication date: 04/10/2022
Field of study

Deep Neural Networks are known to be brittle to even minor distribution shifts compared to the training distribution. While one line of work has demonstrated that Simplicity Bias (SB) of DNNs - bias towards learning only the simplest features - is a key reason for this brittleness, another recent line of work has surprisingly found that diverse/ complex features are indeed learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features. To bridge the gap between these two lines of work, we first hypothesize and verify that while SB may not altogether preclude learning complex features, it amplifies simpler features over complex ones. Namely, simple features are replicated several times in the learned representations while complex features might not be replicated. This phenomenon, we term Feature Replication Hypothesis, coupled with the Implicit Bias of SGD to converge to maximum margin solutions in the feature space, leads the models to rely mostly on the simple features for classification. To mitigate this bias, we propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits. The use of {\em FRR} in linear layer training (FRR-L) encourages the use of more diverse features for classification. We further propose to finetune the full network by freezing the weights of the linear layer trained using FRR-L, to refine the learned features, making them more suitable for classification. Using this simple solution, we demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts. Moreover, we demonstrate noteworthy gains over existing SOTA methods on the standard OOD benchmark DomainBed as well

arXiv.org e-Print Archive

OML: Open, Monetizable, and Loyal AI

Author: Andrew Miller
Anshul Nasery
Ben Finch
Edoardo Contente
Himanshu Tyagi
Jonathan Hayase
Niusha Moshrefi
Oleg Golev
Pramod Viswanath
Sandeep Nailwal
Sewoong Oh
Zerui Cheng
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 05/10/2024
Field of study

Artificial Intelligence (AI) has steadily improved across a wide range of tasks, and a significant breakthrough towards general intelligence was achieved with the rise of generative deep models, which have garnered worldwide attention. However, the development and deployment of AI are almost entirely controlled by a few powerful organizations and individuals who are racing to create Artificial General Intelligence (AGI). These centralized entities make decisions with little public oversight, shaping the future of humanity, often with unforeseen consequences. In this paper, we propose OML, which stands for Open, Monetizable, and Loyal AI, an approach designed to democratize AI development and shift control away from these monopolistic actors. OML is realized through an interdisciplinary framework spanning AI, blockchain, and cryptography. We present several ideas for constructing OML systems using technologies such as Trusted Execution Environments (TEE), traditional cryptographic primitives like fully homomorphic encryption and functional encryption, obfuscation, and AI-native solutions rooted in the sample complexity and intrinsic hardness of AI tasks. A key innovation of our work is the introduction of a new scientific field: AI-native cryptography, which leverages cryptographic primitives tailored to AI applications. Unlike conventional cryptography, which focuses on discrete data and binary security guarantees, AI-native cryptography exploits the continuous nature of AI data representations and their low-dimensional manifolds, focusing on improving approximate performance. One core idea is to transform AI attack methods, such as data poisoning, into security tools. This novel approach serves as a foundation for OML 1.0, an implemented system that demonstrates the practical viability of AI-native cryptographic techniques. At the heart of OML 1.0 is the concept of model fingerprinting, a novel AI-native cryptographic primitive that helps protect the integrity and ownership of AI models. The spirit of OML is to establish a decentralized, open, and transparent platform for AI development, enabling the community to contribute, monetize, and take ownership of AI models. By decentralizing control and ensuring transparency through blockchain technology, OML prevents the concentration of power and provides accountability in AI development that has not been possible before. To the best of our knowledge, this paper is the first to: • Identify the monopolization and lack of transparency challenges in AI deployment today and formulate the challenge as OML (Open, Monetizable, Loyal). • Provide an interdisciplinary approach to solving the OML challenge, incorporating ideas from AI, blockchain, and cryptography. • Introduce and formally define the new scientific field of AI-native cryptography. • Develop novel AI-native cryptographic primitives and implement them in OML 1.0, analyzing their security and effectiveness. • Leverage blockchain technology to host OML solutions, ensuring transparency, decentralization, and alignment with the goals of democratized AI development. Through OML, we aim to provide a decentralized framework for AI development that prioritizes open collaboration, ownership rights, and transparency, ultimately fostering a more inclusive AI ecosystem

Cryptology ePrint Archive

Teaching CNNs to Mimic Human Visual Cognitive Process & Regularise Texture-Shape Bias

Author: Anshul Nasery
Biplab Banerjee
Satyam Mohla
Publication venue: IEEE
Publication date: 23/05/2022
Field of study

Crossref

MIMOQA: Multimodal Input Multimodal Output Question Answering

Author: Aishwarya Agarwal
Anshul Nasery
Balaji Vasan Srinivasan
Denil Mehta
Hrituraj Singh
Jatin Lamba
Publication venue: Association for Computational Linguistics (ACL)
Publication date
Field of study

Crossref