561 research outputs found

    Language Modeling with Deep Transformers

    Full text link
    We explore deep autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured Transformer models outperform our baseline models based on the shallow stack of LSTM recurrent neural network layers. We carry out experiments on the open-source LibriSpeech 960hr task, for both 200K vocabulary word-level and 10K byte-pair encoding subword-level language modeling. We apply our word-level models to conventional hybrid speech recognition by lattice rescoring, and the subword-level models to attention based encoder-decoder models by shallow fusion. Second, we show that deep Transformer language models do not require positional encoding. The positional encoding is an essential augmentation for the self-attention mechanism which is invariant to sequence ordering. However, in autoregressive setup, as is the case for language modeling, the amount of information increases along the position dimension, which is a positional signal by its own. The analysis of attention weights shows that deep autoregressive self-attention models can automatically make use of such positional information. We find that removing the positional encoding even slightly improves the performance of these models.Comment: To appear in the proceedings of INTERSPEECH 201

    An Experiment in Ping-Pong Protocol Verification by Nondeterministic Pushdown Automata

    Get PDF
    An experiment is described that confirms the security of a well-studied class of cryptographic protocols (Dolev-Yao intruder model) can be verified by two-way nondeterministic pushdown automata (2NPDA). A nondeterministic pushdown program checks whether the intersection of a regular language (the protocol to verify) and a given Dyck language containing all canceling words is empty. If it is not, an intruder can reveal secret messages sent between trusted users. The verification is guaranteed to terminate in cubic time at most on a 2NPDA-simulator. The interpretive approach used in this experiment simplifies the verification, by separating the nondeterministic pushdown logic and program control, and makes it more predictable. We describe the interpretive approach and the known transformational solutions, and show they share interesting features. Also noteworthy is how abstract results from automata theory can solve practical problems by programming language means.Comment: In Proceedings MARS/VPT 2018, arXiv:1803.0866

    A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation

    Full text link
    Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code is difficult to meet the syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize the lack of syntactic constraints into three significant challenges: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.Comment: Accepted in Empirical Software Engineerin
    • …
    corecore