Search CORE

19 research outputs found

Outlier Dimensions that Disrupt Transformers are Driven by Frequency

Author: Dell'Orletta Felice
Drozd Aleksandr
Puccetti Giovanni
Rogers Anna
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

While Transformer-based language models are generally very robust to pruning, there is the recently discovered outlier phenomenon: disabling only 48 out of 110M parameters in BERT-base drops its performance by nearly 30% on MNLI. We replicate the original evidence for the outlier phenomenon and we link it to the geometry of the embedding space. We find that in both BERT and RoBERTa the magnitude of hidden state coefficients corresponding to outlier dimensions correlate with the frequencies of encoded tokens in pre-training data, and they also contribute to the “vertical” self-attention pattern enabling the model to focus on the special tokens. This explains the drop in performance from disabling the outliers, and it suggests that to decrease anisotopicity in future models we need pre-training schemas that would better take into account the skewed token distributions

Archivio istituzionale della Ricerca - Scuola Normale Superiore

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache

Author: Chen Peng
Domke Jens
Drozd Aleksandr
Gerofi Balazs
Kodama Yuetsu
Matsuoka Satoshi
Mittal Sparsh
Pericàs Miquel
Podobas Artur
Vatai Emil
Wahib Mohamed
Zhang Lingqi
Publication venue
Publication date: 05/04/2022
Field of study

Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of LARC, a processor fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a board set of proxy-applications and benchmarks, we aim to reveal where HPC CPU performance could be circa 2028, and conclude an average boost of 9.77x for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design

arXiv.org e-Print Archive

The First International Workshop on COmputing using EmeRging EXotic AI-Inspired Systems (CORtEX'22)

Author: Devereux Barry
Drozd Aleksandr
Drozd Aleksandr
Podobas Artur
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 01/08/2022
Field of study

Queen's University Belfast Research Portal

Fast GPU Read Alignmenntwith Burrows Wheeler Transform Based Index

Author: DROZD Aleksandr
Drozd Aleksandr
Maruyama Naoya
MATSUOKA SATOSHI
丸山直也
松岡聡
Publication venue
Publication date: 18/01/2012
Field of study

Institutional Repositories DataBase (IRDB)

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Author: Bhargava Prajjwal
Drozd Aleksandr
Rogers Anna
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/11/2021
Field of study

Copenhagen University Research Information System

The IT University of Copenhagen's Repository