9,765 research outputs found
Code Prediction by Feeding Trees to Transformers
We advance the state-of-the-art in the accuracy of code prediction (next
token prediction) used in autocomplete systems. First, we report that using the
recently proposed Transformer architecture even out-of-the-box outperforms
previous neural and non-neural systems for code prediction. We then show that
by making the Transformer architecture aware of the syntactic structure of
code, we further increase the margin by which a Transformer-based system
outperforms previous systems. With this, it outperforms the accuracy of an
RNN-based system (similar to Hellendoorn et al. 2018) by 18.3\%, the Deep3
system (Raychev et al 2016) by 14.1\%, and an adaptation of Code2Seq (Alon et
al., 2018) for code prediction by 14.4\%.
We present in the paper several ways of communicating the code structure to
the Transformer, which is fundamentally built for processing sequence data. We
provide a comprehensive experimental evaluation of our proposal, along with
alternative design choices, on a standard Python dataset, as well as on a
Facebook internal Python corpus. Our code and data preparation pipeline will be
available in open source
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation
Large Language Models (LLMs) have been demonstrated effective for code
generation. Due to the complexity and opacity of LLMs, little is known about
how these models generate code. To deepen our understanding, we investigate
whether LLMs attend to the same parts of a natural language description as
human programmers during code generation. An analysis of five LLMs on a popular
benchmark, HumanEval, revealed a consistent misalignment between LLMs' and
programmers' attention. Furthermore, we found that there is no correlation
between the code generation accuracy of LLMs and their alignment with human
programmers. Through a quantitative experiment and a user study, we confirmed
that, among twelve different attention computation methods, attention computed
by the perturbation-based method is most aligned with human attention and is
constantly favored by human programmers. Our findings highlight the need for
human-aligned LLMs for better interpretability and programmer trust.Comment: 13 pages, 8 figures, 7 table
Masked Language Model Scoring
Pretrained masked language models (MLMs) require finetuning for most NLP
tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood
scores (PLLs), which are computed by masking tokens one by one. We show that
PLLs outperform scores from autoregressive language models like GPT-2 in a
variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an
end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on
state-of-the-art baselines for low-resource translation pairs, with further
gains from domain adaptation. We attribute this success to PLL's unsupervised
expression of linguistic acceptability without a left-to-right bias, greatly
improving on scores from GPT-2 (+10 points on island effects, NPI licensing in
BLiMP). One can finetune MLMs to give scores without masking, enabling
computation in a single inference pass. In all, PLLs and their associated
pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of
pretrained MLMs; e.g., we use a single cross-lingual model to rescore
translations in multiple languages. We release our library for language model
scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020
Silent Vulnerable Dependency Alert Prediction with Vulnerability Key Aspect Explanation
Due to convenience, open-source software is widely used. For beneficial
reasons, open-source maintainers often fix the vulnerabilities silently,
exposing their users unaware of the updates to threats. Previous works all
focus on black-box binary detection of the silent dependency alerts that suffer
from high false-positive rates. Open-source software users need to analyze and
explain AI prediction themselves. Explainable AI becomes remarkable as a
complementary of black-box AI models, providing details in various forms to
explain AI decisions. Noticing there is still no technique that can discover
silent dependency alert on time, in this work, we propose a framework using an
encoder-decoder model with a binary detector to provide explainable silent
dependency alert prediction. Our model generates 4 types of vulnerability key
aspects including vulnerability type, root cause, attack vector, and impact to
enhance the trustworthiness and users' acceptance to alert prediction. By
experiments with several models and inputs, we confirm CodeBERT with both
commit messages and code changes achieves the best results. Our user study
shows that explainable alert predictions can help users find silent dependency
alert more easily than black-box predictions. To the best of our knowledge,
this is the first research work on the application of Explainable AI in silent
dependency alert prediction, which opens the door of the related domains
Representation Learning based and Interpretable Reactor System Diagnosis Using Denoising Padded Autoencoder
With the mass construction of Gen III nuclear reactors, it is a popular trend
to use deep learning (DL) techniques for fast and effective diagnosis of
possible accidents. To overcome the common problems of previous work in
diagnosing reactor accidents using deep learning theory, this paper proposes a
diagnostic process that ensures robustness to noisy and crippled data and is
interpretable. First, a novel Denoising Padded Autoencoder (DPAE) is proposed
for representation extraction of monitoring data, with representation extractor
still effective on disturbed data with signal-to-noise ratios up to 25.0 and
monitoring data missing up to 40.0%. Secondly, a diagnostic framework using
DPAE encoder for extraction of representations followed by shallow statistical
learning algorithms is proposed, and such stepwise diagnostic approach is
tested on disturbed datasets with 41.8% and 80.8% higher classification and
regression task evaluation metrics, in comparison with the end-to-end
diagnostic approaches. Finally, a hierarchical interpretation algorithm using
SHAP and feature ablation is presented to analyze the importance of the input
monitoring parameters and validate the effectiveness of the high importance
parameters. The outcomes of this study provide a referential method for
building robust and interpretable intelligent reactor anomaly diagnosis systems
in scenarios with high safety requirements
- …