4 research outputs found
Toward Stronger Textual Attack Detectors
The landscape of available textual adversarial attacks keeps growing, posing
severe threats and raising concerns regarding the deep NLP system's integrity.
However, the crucial problem of defending against malicious attacks has only
drawn the attention of the NLP community. The latter is nonetheless
instrumental in developing robust and trustworthy systems. This paper makes two
important contributions in this line of search: (i) we introduce LAROUSSE, a
new framework to detect textual adversarial attacks and (ii) we introduce
STAKEOUT, a new benchmark composed of nine popular attack methods, three
datasets, and two pre-trained models. LAROUSSE is ready-to-use in production as
it is unsupervised, hyperparameter-free, and non-differentiable, protecting it
against gradient-based methods. Our new benchmark STAKEOUT allows for a robust
evaluation framework: we conduct extensive numerical experiments which
demonstrate that LAROUSSE outperforms previous methods, and which allows to
identify interesting factors of detection rate variations.Comment: Findings EMNLP 202
Depth and Depth-Based Classification with R Package ddalpha
Following the seminal idea of Tukey (1975), data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification. ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the DDα-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition
Choosing among notions of multivariate depth statistics
Classical multivariate statistics measures the outlyingness of a point by its
Mahalanobis distance from the mean, which is based on the mean and the
covariance matrix of the data. A multivariate depth function is a function
which, given a point and a distribution in d-space, measures centrality by a
number between 0 and 1, while satisfying certain postulates regarding
invariance, monotonicity, convexity and continuity. Accordingly, numerous
notions of multivariate depth have been proposed in the literature, some of
which are also robust against extremely outlying data. The departure from
classical Mahalanobis distance does not come without cost. There is a trade-off
between invariance, robustness and computational feasibility. In the last few
years, efficient exact algorithms as well as approximate ones have been
constructed and made available in R-packages. Consequently, in practical
applications the choice of a depth statistic is no more restricted to one or
two notions due to computational limits; rather often more notions are
feasible, among which the researcher has to decide. The article debates
theoretical and practical aspects of this choice, including invariance and
uniqueness, robustness and computational feasibility. Complexity and speed of
exact algorithms are compared. The accuracy of approximate approaches like the
random Tukey depth is discussed as well as the application to large and
high-dimensional data. Extensions to local and functional depths and
connections to regression depth are shortly addressed