284 research outputs found
Scaling Transformer to 1M tokens and beyond with RMT
This technical report presents the application of a recurrent memory to
extend the context length of BERT, one of the most effective Transformer-based
models in natural language processing. By leveraging the Recurrent Memory
Transformer architecture, we have successfully increased the model's effective
context length to an unprecedented two million tokens, while maintaining high
memory retrieval accuracy. Our method allows for the storage and processing of
both local and global information and enables information flow between segments
of the input sequence through the use of recurrence. Our experiments
demonstrate the effectiveness of our approach, which holds significant
potential to enhance long-term dependency handling in natural language
understanding and generation tasks as well as enable large-scale context
processing for memory-intensive applications
Recurrent Memory Transformer
Transformer-based models show their effectiveness across multiple domains and
tasks. The self-attention allows to combine information from all sequence
elements into context-aware representations. However, global and local
information has to be stored mostly in the same element-wise representations.
Moreover, the length of an input sequence is limited by quadratic computational
complexity of self-attention.
In this work, we propose and study a memory-augmented segment-level recurrent
Transformer (Recurrent Memory Transformer). Memory allows to store and process
local and global information as well as to pass information between segments of
the long sequence with the help of recurrence. We implement a memory mechanism
with no changes to Transformer model by adding special memory tokens to the
input or output sequence. Then Transformer is trained to control both memory
operations and sequence representations processing.
Results of experiments show that our model performs on par with the
Transformer-XL on language modeling for smaller memory sizes and outperforms it
for tasks that require longer sequence processing. We show that adding memory
tokens to Tr-XL is able to improve it performance. This makes Recurrent Memory
Transformer a promising architecture for applications that require learning of
long-term dependencies and general purpose in memory processing, such as
algorithmic tasks and reasoning
Limited Sanity in the Legislation of Russia and Europe
This article presents the author’s analysis of the problem of limited sanity in the criminal law theory and practice of Russia and Europe. The author established that the problem of limited sanity, despite its long history, has not yet been developed in many countries, and that the boundaries of the concept of limited sanity are extremely vague and indefinite. However, the experience of some foreign countries in terms of ensuring security measures can be used in the Russian Federation
International Standards of Penitentiary Activity: Impact on the Development of the Penitentiary Activity of the CIS Countries
Overcrowding in prisons is a common problem that affects many countries. It is difficult to define this term because there is no single internationally accepted standard. This article presents a comparative study of international standards for the activities of doctors in penitentiary institutions as an integral part of international standards of penitentiary activity. The authors investigated the methods and the degree of their impact on the penitentiary legislation of the Russian Federation and other CIS countries. The conclusion is drawn about the positive role of such standards in improving the national penitentiary legislation in order to increase the level of medical care for prisoners
- …