2 research outputs found
Augmenting transformers with recursively composed multi-grained representations
We present ReCAT, a recursive composition augmented Transformer that is able
to explicitly model hierarchical syntactic structures of raw texts without
relying on gold trees during both learning and inference. Existing research
along this line restricts data to follow a hierarchical tree structure and thus
lacks inter-span communications. To overcome the problem, we propose a novel
contextual inside-outside (CIO) layer that learns contextualized
representations of spans through bottom-up and top-down passes, where a
bottom-up pass forms representations of high-level spans by composing low-level
spans, while a top-down pass combines information inside and outside a span. By
stacking several CIO layers between the embedding layer and the attention
layers in Transformer, the ReCAT model can perform both deep intra-span and
deep inter-span interactions, and thus generate multi-grained representations
fully contextualized with other spans. Moreover, the CIO layers can be jointly
pre-trained with Transformers, making ReCAT enjoy scaling ability, strong
performance, and interpretability at the same time. We conduct experiments on
various sentence-level and span-level tasks. Evaluation results indicate that
ReCAT can significantly outperform vanilla Transformer models on all span-level
tasks and baselines that combine recursive networks with Transformers on
natural language inference tasks. More interestingly, the hierarchical
structures induced by ReCAT exhibit strong consistency with human-annotated
syntactic trees, indicating good interpretability brought by the CIO layers.Comment: preprin
Fast incremental learning of stochastic context-free grammars in radar electronic support
Radar Electronic Support (ES) involves the passive search for, interception, location, analysis and identification of radiated electromagnetic energy for military purposes. Although Stochastic Context-Free Grammars (SCFGs) appear promising for recognition of radar emitters, and for estimation of their respective level of threat in radar ES systems, the computations associated with well-known techniques for learning their production rule probabilities are very computationally demanding. The most popular methods for this task are the Inside-Outside (IO) algorithm, which maximizes the likelihood of a data set,and the Viterbi Score (VS) algorithm, which maximizes the likelihood of its best parse trees. For each iteration, their time complexity is cubic with the length of sequences in the training set and with the number of non-terminal symbols in the grammar. Since applications of radar ES require timely protection against threats, fast techniques for learning SCFGs probabilities are needed. Moreover, in radar ES applications, new information from a battlefield or other sources often becomes available at different points in time. In order to rapidly refiect changes in operational environments, fast incremental learning of SCFG probabilities is therefore an undisputed asset.
Several techniques have been developed to accelerate the computation of production rules probabilities of SCFGs. In the first part of this thesis, three fast alternatives, called graphical EM (gEM), Tree Scanning (TS) and HOLA, are compared from several perspectives - perplexity, state estimation, ability to detect MFRs, time and memory complexity, and convergence time. Estimation of the average-case and worst-case execution time and storage requirements allows for the assessment of complexity, while computer simulations, performed using radar pulse data, facilitates the assessment of the other performance measures. An experimental protocol has been defined such that the impact on performance of factors like training set size and level of ambiguity of grammars may be observed. In addition, since VS is known to have a lower overall computational cost in practice, VS versions of the original 10-based gEM and TS have also been proposed and compared. Results indicate that both gEM(IO) and TS(IO) provide the same level of accuracy, yet the resource requirements mostly vary as a function of the ambiguity of the grammars. Furthermore, for a similar quality in results, the gEM(VS) and TS(VS) techniques provide significantly lower convergence times and time complexities per iteration in practice than do gEM(IO) and TS(IO). All of these algorithms may provide a greater level of accuracy than HOLA, yet their computational complexity may be orders of magnitude higher. Finally, HOLA is an on-line technique that naturally allows for incremental learning of production rule probabilities.
In the second part of this thesis, two new incremental versions of gEM, called Incremental gEM (igEM) and On-Line Incremental gEM (oigEM), are proposed and compared to HOLA. They allow for a SCFG to efficiently learn new training sequences incrementally, without retraining from the start an all training data. An experimental protocol has been defined such that the impact on performance of factors like the size of new data blocks for incremental learning, and the level of ambiguity of MFR grammars, may be observed. Results indicate that, unlike HOLA, incremental learning of training data blocks with igEM
and oigEM provides the same level of accuracy as learning from all cumulative data from scratch, even for relatively small data blocks. As expected, incremental leaming significantly reduces the overall time and memory complexities associated with updating SCFG probabilities. Finally, it appears that while the computational complexity and memory requirements of igEM and oigEM may be greater than that of HOLA, they both provide the higher level of accuracy