16 research outputs found

    Hierarchical Attention Encoder Decoder

    Full text link
    Recent advances in large language models have shown that autoregressive modeling can generate complex and novel sequences that have many real-world applications. However, these models must generate outputs autoregressively, which becomes time-consuming when dealing with long sequences. Hierarchical autoregressive approaches that compress data have been proposed as a solution, but these methods still generate outputs at the original data frequency, resulting in slow and memory-intensive models. In this paper, we propose a model based on the Hierarchical Recurrent Encoder Decoder (HRED) architecture. This model independently encodes input sub-sequences without global context, processes these sequences using a lower-frequency model, and decodes outputs at the original data frequency. By interpreting the encoder as an implicitly defined embedding matrix and using sampled softmax estimation, we develop a training algorithm that can train the entire model without a high-frequency decoder, which is the most memory and compute-intensive part of hierarchical approaches. In a final, brief phase, we train the decoder to generate data at the original granularity. Our algorithm significantly reduces memory requirements for training autoregressive models and it also improves the total training wall-clock time

    About tree-depth

    Get PDF
    In this work I present recent scientific papers related to the concept of tree-depth: different characterizations, a game theoretic approach to it and recently discovered applications. The focus in this work is presenting all the ideas in a self-contained way, such that they can be easily understood with little previous knowledge. Apart from that all the ideas are presented in a homogeneous way with clear examples and all the lemmas, some of which didn’t have proofs in the papers, are presented with rigorous proofs

    Assessment of faculty training program for the use of e-learning platforms: analysis of current use

    Get PDF
    5th International Conference on Education and New Learning Technologies (Barcelona, Spain. 1-3 July, 2013)This paper shows the results of a survey carried out among faculty member that use Moodle to support their classroom teaching, in the University of the Basque Country UPV/EHU. The aim of this work is to determine: a) How Moodle is being used b) Whether the training program helped them improve their teaching by the use of Moodle. c) What are the further training needs. The results showed that the use of the learning platform was mainly oriented to present materials or resources, followed by the attempt to improve the communication with the students and to monitor and grade assignments. We also found that training faculty members only on courses for a didactic use of Moodle, was most effective in the use of collaborative learning strategies. The more trained teachers demand further training in the didactic use of Moodle

    The linear hidden subset problem for the (1+1) EA with scheduled and adaptive mutation rates

    Full text link
    We study unbiased (1+1)(1+1) evolutionary algorithms on linear functions with an unknown number nn of bits with non-zero weight. Static algorithms achieve an optimal runtime of O(n(lnn)2+ϵ)O(n (\ln n)^{2+\epsilon}), however, it remained unclear whether more dynamic parameter policies could yield better runtime guarantees. We consider two setups: one where the mutation rate follows a fixed schedule, and one where it may be adapted depending on the history of the run. For the first setup, we give a schedule that achieves a runtime of (1±o(1))βnlnn(1\pm o(1))\beta n \ln n, where β3.552\beta \approx 3.552, which is an asymptotic improvement over the runtime of the static setup. Moreover, we show that no schedule admits a better runtime guarantee and that the optimal schedule is essentially unique. For the second setup, we show that the runtime can be further improved to (1±o(1))enlnn(1\pm o(1)) e n \ln n, which matches the performance of algorithms that know nn in advance. Finally, we study the related model of initial segment uncertainty with static position-dependent mutation rates, and derive asymptotically optimal lower bounds. This answers a question by Doerr, Doerr, and K\"otzing
    corecore