Search CORE

16 research outputs found

Hierarchical Attention Encoder Decoder

Author: Mujika Asier
Publication venue
Publication date: 01/06/2023
Field of study

Recent advances in large language models have shown that autoregressive modeling can generate complex and novel sequences that have many real-world applications. However, these models must generate outputs autoregressively, which becomes time-consuming when dealing with long sequences. Hierarchical autoregressive approaches that compress data have been proposed as a solution, but these methods still generate outputs at the original data frequency, resulting in slow and memory-intensive models. In this paper, we propose a model based on the Hierarchical Recurrent Encoder Decoder (HRED) architecture. This model independently encodes input sub-sequences without global context, processes these sequences using a lower-frequency model, and decodes outputs at the original data frequency. By interpreting the encoder as an implicitly defined embedding matrix and using sampled softmax estimation, we develop a training algorithm that can train the entire model without a high-frequency decoder, which is the most memory and compute-intensive part of hierarchical approaches. In a final, brief phase, we train the decoder to generate data at the original granularity. Our algorithm significantly reduces memory requirements for training autoregressive models and it also improves the total training wall-clock time

arXiv.org e-Print Archive

About tree-depth

Author: Mujika Aramendia Asier
Publication venue
Publication date: 17/07/2015
Field of study

In this work I present recent scientific papers related to the concept of tree-depth: different characterizations, a game theoretic approach to it and recently discovered applications. The focus in this work is presenting all the ideas in a self-contained way, such that they can be easily understood with little previous knowledge. Apart from that all the ideas are presented in a homogeneous way with clear examples and all the lemmas, some of which didn’t have proofs in the papers, are presented with rigorous proofs

Archivo Digital para la Docencia y la Investigación

Assessment of faculty training program for the use of e-learning platforms: analysis of current use

Author: Aranzabal Maiztegi Asier
Ezeiza Ramos Ainhoa
Garmendia Mujika Mikel
Ros Iker
Publication venue: 'Associated Management Consultants, PVT., Ltd.'
Publication date: 01/01/2013
Field of study

5th International Conference on Education and New Learning Technologies (Barcelona, Spain. 1-3 July, 2013)This paper shows the results of a survey carried out among faculty member that use Moodle to support their classroom teaching, in the University of the Basque Country UPV/EHU. The aim of this work is to determine: a) How Moodle is being used b) Whether the training program helped them improve their teaching by the use of Moodle. c) What are the further training needs. The results showed that the use of the learning platform was mainly oriented to present materials or resources, followed by the attempt to improve the communication with the students and to monitor and grade assignments. We also found that training faculty members only on courses for a didactic use of Moodle, was most effective in the use of collaborative learning strategies. The more trained teachers demand further training in the didactic use of Moodle

Archivo Digital para la Docencia y la Investigación

The linear hidden subset problem for the (1+1) EA with scheduled and adaptive mutation rates

Author: Einarsson Hafsteinn
Gauy Marcelo Matheus
Lengler Johannes
Meier Florian
Mujika Asier
Steger Angelika
Weissenberger Felix
Publication venue
Publication date: 16/08/2018
Field of study

We study unbiased

(1+1)

evolutionary algorithms on linear functions with an unknown number

n

of bits with non-zero weight. Static algorithms achieve an optimal runtime of

O(n (\ln n)^{2+\epsilon})

, however, it remained unclear whether more dynamic parameter policies could yield better runtime guarantees. We consider two setups: one where the mutation rate follows a fixed schedule, and one where it may be adapted depending on the history of the run. For the first setup, we give a schedule that achieves a runtime of

(1\pm o(1))\beta n \ln n

, where

\beta \approx 3.552

, which is an asymptotic improvement over the runtime of the static setup. Moreover, we show that no schedule admits a better runtime guarantee and that the optimal schedule is essentially unique. For the second setup, we show that the runtime can be further improved to

(1\pm o(1)) e n \ln n

, which matches the performance of algorithms that know

n

in advance. Finally, we study the related model of initial segment uncertainty with static position-dependent mutation rates, and derive asymptotically optimal lower bounds. This answers a question by Doerr, Doerr, and K\"otzing

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Harnessing Temporal Information: Methods for Long-term Dependency Learning in Autoregressive Models

Author: Mujika Aramendia Asier
Publication venue: ETH Zurich
Publication date: 01/01/2023
Field of study

Repository for Publications and Research Data