284 research outputs found

    Scaling Transformer to 1M tokens and beyond with RMT

    Full text link
    This technical report presents the application of a recurrent memory to extend the context length of BERT, one of the most effective Transformer-based models in natural language processing. By leveraging the Recurrent Memory Transformer architecture, we have successfully increased the model's effective context length to an unprecedented two million tokens, while maintaining high memory retrieval accuracy. Our method allows for the storage and processing of both local and global information and enables information flow between segments of the input sequence through the use of recurrence. Our experiments demonstrate the effectiveness of our approach, which holds significant potential to enhance long-term dependency handling in natural language understanding and generation tasks as well as enable large-scale context processing for memory-intensive applications

    Recurrent Memory Transformer

    Full text link
    Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (Recurrent Memory Transformer). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then Transformer is trained to control both memory operations and sequence representations processing. Results of experiments show that our model performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve it performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning

    Limited Sanity in the Legislation of Russia and Europe

    Get PDF
    This article presents the author’s analysis of the problem of limited sanity in the criminal law theory and practice of Russia and Europe. The author established that the problem of limited sanity, despite its long history, has not yet been developed in many countries, and that the boundaries of the concept of limited sanity are extremely vague and indefinite. However, the experience of some foreign countries in terms of ensuring security measures can be used in the Russian Federation

    International Standards of Penitentiary Activity: Impact on the Development of the Penitentiary Activity of the CIS Countries

    Get PDF
    Overcrowding in prisons is a common problem that affects many countries. It is difficult to define this term because there is no single internationally accepted standard. This article presents a comparative study of international standards for the activities of doctors in penitentiary institutions as an integral part of international standards of penitentiary activity. The authors investigated the methods and the degree of their impact on the penitentiary legislation of the Russian Federation and other CIS countries. The conclusion is drawn about the positive role of such standards in improving the national penitentiary legislation in order to increase the level of medical care for prisoners
    corecore