561 research outputs found
Language Modeling with Deep Transformers
We explore deep autoregressive Transformer models in language modeling for
speech recognition. We focus on two aspects. First, we revisit Transformer
model configurations specifically for language modeling. We show that well
configured Transformer models outperform our baseline models based on the
shallow stack of LSTM recurrent neural network layers. We carry out experiments
on the open-source LibriSpeech 960hr task, for both 200K vocabulary word-level
and 10K byte-pair encoding subword-level language modeling. We apply our
word-level models to conventional hybrid speech recognition by lattice
rescoring, and the subword-level models to attention based encoder-decoder
models by shallow fusion. Second, we show that deep Transformer language models
do not require positional encoding. The positional encoding is an essential
augmentation for the self-attention mechanism which is invariant to sequence
ordering. However, in autoregressive setup, as is the case for language
modeling, the amount of information increases along the position dimension,
which is a positional signal by its own. The analysis of attention weights
shows that deep autoregressive self-attention models can automatically make use
of such positional information. We find that removing the positional encoding
even slightly improves the performance of these models.Comment: To appear in the proceedings of INTERSPEECH 201
An Experiment in Ping-Pong Protocol Verification by Nondeterministic Pushdown Automata
An experiment is described that confirms the security of a well-studied class
of cryptographic protocols (Dolev-Yao intruder model) can be verified by
two-way nondeterministic pushdown automata (2NPDA). A nondeterministic pushdown
program checks whether the intersection of a regular language (the protocol to
verify) and a given Dyck language containing all canceling words is empty. If
it is not, an intruder can reveal secret messages sent between trusted users.
The verification is guaranteed to terminate in cubic time at most on a
2NPDA-simulator. The interpretive approach used in this experiment simplifies
the verification, by separating the nondeterministic pushdown logic and program
control, and makes it more predictable. We describe the interpretive approach
and the known transformational solutions, and show they share interesting
features. Also noteworthy is how abstract results from automata theory can
solve practical problems by programming language means.Comment: In Proceedings MARS/VPT 2018, arXiv:1803.0866
A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation
Due to the development of pre-trained language models, automated code
generation techniques have shown great promise in recent years. However, the
generated code is difficult to meet the syntactic constraints of the target
language, especially in the case of Turducken-style code, where declarative
code snippets are embedded within imperative programs. In this study, we
summarize the lack of syntactic constraints into three significant challenges:
(1) the efficient representation of syntactic constraints, (2) the effective
integration of syntactic information, and (3) the scalable syntax-first
decoding algorithm. To address these challenges, we propose a syntax-guided
multi-task learning approach TurduckenGen. Specifically, we first explicitly
append the type information to the code tokens to capture the representation of
syntactic constraints. Then we formalize code generation with syntactic
constraint representation as an auxiliary task to enable the model to learn the
syntactic constraints of the code. Finally, the syntactically correct code is
selected accurately from the multiple candidates with the help of the compiler
feedback. Extensive experiments and comprehensive analysis demonstrate the
effectiveness and general applicability of our approach after being compared
with six state-of-the-art baselines on two Turducken-style code datasets.
Finally, we conducted a human study and found the code quality generated by our
approach is better than baselines in terms of code readability and semantic
similarity.Comment: Accepted in Empirical Software Engineerin
- …