Search CORE

3 research outputs found

Unary Words Have the Smallest Levenshtein k-Neighbourhoods

Author: Charalampopoulos Panagiotis
Pissis Solon P.
Radoszewski Jakub
Walen Tomasz
Zuba Wiktor
Publication venue: Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Publication date: 01/01/2020
Field of study

The edit distance (a.k.a. the Levenshtein distance) between two words is defined as the minimum number of insertions, deletions or substitutions of letters needed to transform one word into another. The Levenshtein k-neighbourhood of a word w is the set of words that are at edit distance at most k from w. This is perhaps the most important concept underlying BLAST, a widely-used tool for comparing biological sequences. A natural combinatorial question is to ask for upper and lower bounds on the size of this set. The answer to this question has important algorithmic implications as well. Myers notes that "such bounds would give a tighter characterisation of the running time of the algorithm" behind BLAST. We show that the size of the Levenshtein k-neighbourhood of any word of length n over an arbitrary alphabet is not smaller than the size of the Levenshtein k-neighbourhood of a unary word of length n, thus providing a tight lower bound on the size of the Levenshtein k-neighbourhood. We remark that this result was posed as a conjecture by Dufresne at WCTA 2019

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

Unary Words Have the Smallest Levenshtein k-Neighbourhoods

Author: Charalampopoulos Panagiotis
Gortz Inge Li
Pissis Solon P.
Radoszewski Jakub
Walen Tomasz
Weimann Oren
Zuba Wiktor
Publication venue: Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Publication date: 09/06/2020
Field of study

RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion

Author: Bauer Lujo
Huang Zhuoqun
Lucas Keane
Marchant Neil G.
Ohrimenko Olga
Rubinstein Benjamin I. P.
Publication venue
Publication date: 26/10/2023
Field of study

Randomized smoothing is a leading approach for constructing classifiers that are certifiably robust against adversarial examples. Existing work on randomized smoothing has focused on classifiers with continuous inputs, such as images, where

\ell_p

-norm bounded adversaries are commonly studied. However, there has been limited work for classifiers with discrete or variable-size inputs, such as for source code, which require different threat models and smoothing mechanisms. In this work, we adapt randomized smoothing for discrete sequence classifiers to provide certified robustness against edit distance-bounded adversaries. Our proposed smoothing mechanism randomized deletion (RS-Del) applies random deletion edits, which are (perhaps surprisingly) sufficient to confer robustness against adversarial deletion, insertion and substitution edits. Our proof of certification deviates from the established Neyman-Pearson approach, which is intractable in our setting, and is instead organized around longest common subsequences. We present a case study on malware detection--a binary classification problem on byte sequences where classifier evasion is a well-established threat model. When applied to the popular MalConv malware detection model, our smoothing mechanism RS-Del achieves a certified accuracy of 91% at an edit distance radius of 128 bytes.Comment: To be published in NeurIPS 2023. 36 pages, 7 figures, 12 tables. Includes 20 pages of appendice

arXiv.org e-Print Archive