Search CORE

9 research outputs found

Locally Differentially Private Document Generation Using Zero Shot Prompting

Author: Chen Pin Yu
Hooker Sara
Utpala Saiteja
Publication venue
Publication date: 30/11/2023
Field of study

Numerous studies have highlighted the privacy risks associated with pretrained large language models. In contrast, our research offers a unique perspective by demonstrating that pretrained large language models can effectively contribute to privacy preservation. We propose a locally differentially private mechanism called DP-Prompt, which leverages the power of pretrained large language models and zero-shot prompting to counter author de-anonymization attacks while minimizing the impact on downstream utility. When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks, showing that it surpasses existing approaches by a considerable margin despite its simpler design. For instance, in the case of the IMDB dataset, DP-Prompt (with ChatGPT) perfectly recovers the clean sentiment F1 score while achieving a 46\% reduction in author identification F1 score against static attackers and a 26\% reduction against adaptive attackers. We conduct extensive experiments across six open-source large language models, ranging up to 7 billion parameters, to analyze various effects of the privacy-utility tradeoff.Comment: Accepted at EMNLP 2023 (Findings

arXiv.org e-Print Archive

Language Agnostic Code Embeddings

Author: Chen Pin Yu
Gu Alex
Utpala Saiteja
Publication venue
Publication date: 25/10/2023
Field of study

Recently, code language models have achieved notable advancements in addressing a diverse array of essential code comprehension and generation tasks. Yet, the field lacks a comprehensive deep dive and understanding of the code embeddings of multilingual code models. In this paper, we present a comprehensive study on multilingual code embeddings, focusing on the cross-lingual capabilities of these embeddings across different programming languages. Through probing experiments, we demonstrate that code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details, primarily focusing on semantics. Further, we show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks, leading to an absolute increase of up to +17 in the Mean Reciprocal Rank (MRR)

arXiv.org e-Print Archive

Rieoptax: Riemannian Optimization in JAX

Author: Han Andi
Jawanpuria Pratik
Mishra Bamdev
Utpala Saiteja
Publication venue
Publication date: 10/10/2022
Field of study

We present Rieoptax, an open source Python library for Riemannian optimization in JAX. We show that many differential geometric primitives, such as Riemannian exponential and logarithm maps, are usually faster in Rieoptax than existing frameworks in Python, both on CPU and GPU. We support various range of basic and advanced stochastic optimization solvers like Riemannian stochastic gradient, stochastic variance reduction, and adaptive gradient methods. A distinguishing feature of the proposed toolbox is that we also support differentially private optimization on Riemannian manifolds

arXiv.org e-Print Archive

geomstats/challenge-iclr-2021: Published algorithms (final version)

Author: António Leitão
Bilal AbdulRahman
emaignant
Federico Iuricich
Florent-Michel
Gabriele Corso
Guoxi
Marek Cerny
Martin Bauw
Max(im) Beket(ov)
mihaelanistor
Nina Miolane
S. Shailja
Saiteja Utpala
Somesh Mohapatra
Tom Davies
Zhiyuan Liu
Publication venue: Clemson University Libraries
Publication date: 16/05/2022
Field of study

GitHub repository for the ICLR Computational Geometry & Topology Challenge 202

Clemson University: TigerPrints

ICLR 2022 Challenge for Computational Geometry & Topology: Design and Results

Author: Aharony Noga
Ambellan Felix
Bergsson Andri
Cui Xinyue
Donnat Claire
Dunn Benjamin
Hanik Martin
Hauberg Soren
Hermansen Erik
Klindt David
Lupo Umberto
Mathe Johan
Miolane Nina
Myers Adele
Nava-Yazdani Esfandiar
Nielsen Dmitriy
Pe'Er Itsik
Pignet Arthur
Sanborn Sophia
Shewmake Christian
Sommer Stefan
Sonthalia Rishi
Szwagier Tom
Talbar Shubham
Utpala Saiteja
Vaupel Melvin
von Tycowicz Christoph
Xiong Jeffrey
Publication venue: PMLR
Publication date: 09/11/2022
Field of study

International audienceThis paper presents the computational challenge on differential geometry and topology that was hosted within the ICLR 2022 workshop “Geometric and Topo- logical Representation Learning”. The competition asked participants to provide implementations of machine learning algorithms on manifolds that would respect the API of the open-source software Geomstats (manifold part) and Scikit-Learn (machine learning part) or PyTorch. The challenge attracted seven teams in its two month duration. This paper describes the design of the challenge and summarizes its main findings

INRIA a CCSD electronic archive server

ICLR 2021 Challenge for Computational Geometry & Topology: Design and Results

This paper presents the computational challenge on differential geometry and topology that happened within the ICLR 2021 workshop "Geometric and Topological Representation Learning". The competition asked participants to provide creative contributions to the fields of computational geometry and topology through the open-source repositories Geomstats and Giotto-TDA. The challenge attracted 16 teams in its two month duration. This paper describes the design of the challenge and summarizes its main findings

INRIA a CCSD electronic archive server

ICLR 2021 Challenge for Computational Geometry & Topology: Design and Results

INRIA a CCSD electronic archive server

HAL Descartes

HAL-MINES ParisTech

ICLR 2021 Challenge for Computational Geometry & Topology: Design and Results

HAL Descartes