132 research outputs found

    Meta-learning algorithms and applications

    Get PDF
    Meta-learning in the broader context concerns how an agent learns about their own learning, allowing them to improve their learning process. Learning how to learn is not only beneficial for humans, but it has also shown vast benefits for improving how machines learn. In the context of machine learning, meta-learning enables models to improve their learning process by selecting suitable meta-parameters that influence the learning. For deep learning specifically, the meta-parameters typically describe details of the training of the model but can also include description of the model itself - the architecture. Meta-learning is usually done with specific goals in mind, for example trying to improve ability to generalize or learn new concepts from only a few examples. Meta-learning can be powerful, but it comes with a key downside: it is often computationally costly. If the costs would be alleviated, meta-learning could be more accessible to developers of new artificial intelligence models, allowing them to achieve greater goals or save resources. As a result, one key focus of our research is on significantly improving the efficiency of meta-learning. We develop two approaches: EvoGrad and PASHA, both of which significantly improve meta-learning efficiency in two common scenarios. EvoGrad allows us to efficiently optimize the value of a large number of differentiable meta-parameters, while PASHA enables us to efficiently optimize any type of meta-parameters but fewer in number. Meta-learning is a tool that can be applied to solve various problems. Most commonly it is applied for learning new concepts from only a small number of examples (few-shot learning), but other applications exist too. To showcase the practical impact that meta-learning can make in the context of neural networks, we use meta-learning as a novel solution for two selected problems: more accurate uncertainty quantification (calibration) and general-purpose few-shot learning. Both are practically important problems and using meta-learning approaches we can obtain better solutions than the ones obtained using existing approaches. Calibration is important for safety-critical applications of neural networks, while general-purpose few-shot learning tests model's ability to generalize few-shot learning abilities across diverse tasks such as recognition, segmentation and keypoint estimation. More efficient algorithms as well as novel applications enable the field of meta-learning to make more significant impact on the broader area of deep learning and potentially solve problems that were too challenging before. Ultimately both of them allow us to better utilize the opportunities that artificial intelligence presents

    A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks

    Full text link
    Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. As a result, transformer-based models have attracted substantial interest among researchers in the field of artificial intelligence. This can be attributed to their immense potential and remarkable achievements, not only in Natural Language Processing (NLP) tasks but also in a wide range of domains, including computer vision, audio and speech processing, healthcare, and the Internet of Things (IoT). Although several survey papers have been published highlighting the transformer's contributions in specific fields, architectural differences, or performance evaluations, there is still a significant absence of a comprehensive survey paper encompassing its major applications across various domains. Therefore, we undertook the task of filling this gap by conducting an extensive survey of proposed transformer models from 2017 to 2022. Our survey encompasses the identification of the top five application domains for transformer-based models, namely: NLP, Computer Vision, Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze the impact of highly influential transformer-based models in these domains and subsequently classify them based on their respective tasks using a proposed taxonomy. Our aim is to shed light on the existing potential and future possibilities of transformers for enthusiastic researchers, thus contributing to the broader understanding of this groundbreaking technology

    Reward Gaming in Conditional Text Generation

    Full text link
    To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations. Under this framework, we identify three common cases where high rewards are incorrectly assigned to undesirable patterns: noise-induced spurious correlation, naturally occurring spurious correlation, and covariate shift. We show that even though learned metrics achieve high performance on the distribution of the data used to train the reward function, the undesirable patterns may be amplified during RL training of the text generation model. While there has been discussion about reward gaming in the RL or safety community, in this discussion piece, we would like to highlight reward gaming in the natural language generation (NLG) community using concrete conditional text generation examples and discuss potential fixes and areas for future work

    PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning

    Full text link
    Hyperparameters of Deep Learning (DL) pipelines are crucial for their downstream performance. While a large number of methods for Hyperparameter Optimization (HPO) have been developed, their incurred costs are often untenable for modern DL. Consequently, manual experimentation is still the most prevalent approach to optimize hyperparameters, relying on the researcher's intuition, domain knowledge, and cheap preliminary explorations. To resolve this misalignment between HPO algorithms and DL researchers, we propose PriorBand, an HPO algorithm tailored to DL, able to utilize both expert beliefs and cheap proxy tasks. Empirically, we demonstrate PriorBand's efficiency across a range of DL benchmarks and show its gains under informative expert input and robustness against poor expert belief

    Simultaneous Machine Translation with Tailored Reference

    Full text link
    Simultaneous machine translation (SiMT) generates translation while reading the whole source sentence. However, existing SiMT models are typically trained using the same reference disregarding the varying amounts of available source information at different latency. Training the model with ground-truth at low latency may introduce forced anticipations, whereas utilizing reference consistent with the source word order at high latency results in performance degradation. Consequently, it is crucial to train the SiMT model with appropriate reference that avoids forced anticipations during training while maintaining high quality. In this paper, we propose a novel method that provides tailored reference for the SiMT models trained at different latency by rephrasing the ground-truth. Specifically, we introduce the tailor, induced by reinforcement learning, to modify ground-truth to the tailored reference. The SiMT model is trained with the tailored reference and jointly optimized with the tailor to enhance performance. Importantly, our method is applicable to a wide range of current SiMT approaches. Experiments on three translation tasks demonstrate that our method achieves state-of-the-art performance in both fixed and adaptive policies.Comment: Accepted to EMNLP 2023; 15 pages, 8 figure

    Extrinsic Evaluation of Machine Translation Metrics

    Full text link
    Automatic machine translation (MT) metrics are widely used to distinguish the translation qualities of machine translation systems across relatively large test sets (system-level evaluation). However, it is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level (segment-level evaluation). In this paper, we investigate how useful MT metrics are at detecting the success of a machine translation component when placed in a larger platform with a downstream task. We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks (dialogue state tracking, question answering, and semantic parsing). For each task, we only have access to a monolingual task-specific model. We calculate the correlation between the metric's ability to predict a good/bad translation with the success/failure on the final task for the Translate-Test setup. Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes. We also find that the scores provided by neural metrics are not interpretable mostly because of undefined ranges. We synthesise our analysis into recommendations for future MT metrics to produce labels rather than scores for more informative interaction between machine translation and multilingual language understanding.Comment: ACL 2023 Camera Read

    Transformer Models for Machine Translation and Streaming Automatic Speech Recognition

    Full text link
    [ES] El procesamiento del lenguaje natural (NLP) es un conjunto de problemas computacionales con aplicaciones de máxima relevancia, que junto con otras tecnologías informáticas se ha beneficiado de la revolución que ha significado el aprendizaje profundo. Esta tesis se centra en dos problemas fundamentales para el NLP: la traducción automática (MT) y el reconocimiento automático del habla o transcripción automática (ASR); así como en una arquitectura neuronal profunda, el Transformer, que pondremos en práctica para mejorar las soluciones de MT y ASR en algunas de sus aplicaciones. El ASR y MT pueden servir para obtener textos multilingües de alta calidad a un coste razonable para una diversidad de contenidos audiovisuales. Concre- tamente, esta tesis aborda problemas como el de traducción de noticias o el de subtitulación automática de televisión. El ASR y MT también se pueden com- binar entre sí, generando automáticamente subtítulos traducidos, o con otras soluciones de NLP: resumen de textos para producir resúmenes de discursos, o síntesis del habla para crear doblajes automáticos. Estas aplicaciones quedan fuera del alcance de esta tesis pero pueden aprovechar las contribuciones que contiene, en la meduda que ayudan a mejorar el rendimiento de los sistemas automáticos de los que dependen. Esta tesis contiene una aplicación de la arquitectura Transformer al MT tal y como fue concebida, mediante la que obtenemos resultados de primer nivel en traducción de lenguas semejantes. En capítulos subsecuentes, esta tesis aborda la adaptación del Transformer como modelo de lenguaje para sistemas híbri- dos de ASR en vivo. Posteriormente, describe la aplicación de este tipus de sistemas al caso de uso de subtitulación de televisión, participando en una com- petición pública de RTVE donde obtenemos la primera posición con un marge importante. También demostramos que la mejora se debe principalmenta a la tecnología desarrollada y no tanto a la parte de los datos.[CA] El processament del llenguage natural (NLP) és un conjunt de problemes com- putacionals amb aplicacions de màxima rellevància, que juntament amb al- tres tecnologies informàtiques s'ha beneficiat de la revolució que ha significat l'impacte de l'aprenentatge profund. Aquesta tesi se centra en dos problemes fonamentals per al NLP: la traducció automàtica (MT) i el reconeixement automàtic de la parla o transcripció automàtica (ASR); així com en una ar- quitectura neuronal profunda, el Transformer, que posarem en pràctica per a millorar les solucions de MT i ASR en algunes de les seues aplicacions. l'ASR i MT poden servir per obtindre textos multilingües d'alta qualitat a un cost raonable per a un gran ventall de continguts audiovisuals. Concretament, aquesta tesi aborda problemes com el de traducció de notícies o el de subtitu- lació automàtica de televisió. l'ASR i MT també es poden combinar entre ells, generant automàticament subtítols traduïts, o amb altres solucions de NLP: amb resum de textos per produir resums de discursos, o amb síntesi de la parla per crear doblatges automàtics. Aquestes altres aplicacions es troben fora de l'abast d'aquesta tesi però poden aprofitar les contribucions que conté, en la mesura que ajuden a millorar els resultats dels sistemes automàtics dels quals depenen. Aquesta tesi conté una aplicació de l'arquitectura Transformer al MT tal com va ser concebuda, mitjançant la qual obtenim resultats de primer nivell en traducció de llengües semblants. En capítols subseqüents, aquesta tesi aborda l'adaptació del Transformer com a model de llenguatge per a sistemes híbrids d'ASR en viu. Posteriorment, descriu l'aplicació d'aquest tipus de sistemes al cas d'ús de subtitulació de continguts televisius, participant en una competició pública de RTVE on obtenim la primera posició amb un marge significant. També demostrem que la millora es deu principalment a la tecnologia desen- volupada i no tant a la part de les dades[EN] Natural language processing (NLP) is a set of fundamental computing prob- lems with immense applicability, as language is the natural communication vehicle for people. NLP, along with many other computer technologies, has been revolutionized in recent years by the impact of deep learning. This thesis is centered around two keystone problems for NLP: machine translation (MT) and automatic speech recognition (ASR); and a common deep neural architec- ture, the Transformer, that is leveraged to improve the technical solutions for some MT and ASR applications. ASR and MT can be utilized to produce cost-effective, high-quality multilin- gual texts for a wide array of media. Particular applications pursued in this thesis are that of news translation or that of automatic live captioning of tele- vision broadcasts. ASR and MT can also be combined with each other, for instance generating automatic translated subtitles from audio, or augmented with other NLP solutions: text summarization to produce a summary of a speech, or speech synthesis to create an automatic translated dubbing, for in- stance. These other applications fall out of the scope of this thesis, but can profit from the contributions that it contains, as they help to improve the performance of the automatic systems on which they depend. This thesis contains an application of the Transformer architecture to MT as it was originally conceived, achieving state-of-the-art results in similar language translation. In successive chapters, this thesis covers the adaptation of the Transformer as a language model for streaming hybrid ASR systems. After- wards, it describes how we applied the developed technology for a specific use case in television captioning by participating in a competitive challenge and achieving the first position by a large margin. We also show that the gains came mostly from the improvement in technology capabilities over two years including that of the Transformer language model adapted for streaming, and the data component was minor.Baquero Arnal, P. (2023). Transformer Models for Machine Translation and Streaming Automatic Speech Recognition [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/19368

    MENLI: Robust Evaluation Metrics from Natural Language Inference

    Full text link
    Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%-30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).Comment: TACL 2023 Camera-ready; github link fixed+Fig.3 legend fixe

    Context Consistency between Training and Testing in Simultaneous Machine Translation

    Full text link
    Simultaneous Machine Translation (SiMT) aims to yield a real-time partial translation with a monotonically growing the source-side context. However, there is a counterintuitive phenomenon about the context usage between training and testing: e.g., the wait-k testing model consistently trained with wait-k is much worse than that model inconsistently trained with wait-k' (k' is not equal to k) in terms of translation quality. To this end, we first investigate the underlying reasons behind this phenomenon and uncover the following two factors: 1) the limited correlation between translation quality and training (cross-entropy) loss; 2) exposure bias between training and testing. Based on both reasons, we then propose an effective training approach called context consistency training accordingly, which makes consistent the context usage between training and testing by optimizing translation quality and latency as bi-objectives and exposing the predictions to the model during the training. The experiments on three language pairs demonstrate our intuition: our system encouraging context consistency outperforms that existing systems with context inconsistency for the first time, with the help of our context consistency training approach

    End-to-End Simultaneous Speech Translation

    Get PDF
    Speech translation is the task of translating speech in one language to text or speech in another language, while simultaneous translation aims at lower translation latency by starting the translation before the speaker finishes a sentence. The combination of the two, simultaneous speech translation, can be applied in low latency scenarios such as live video caption translation and real-time interpretation. This thesis will focus on an end-to-end or direct approach for simultaneous speech translation. We first define the task of simultaneous speech translation, including the challenges of the task and its evaluation metrics. We then progressly introduce our contributions to tackle the challenges. First, we proposed a novel simultaneous translation policy, mono- tonic multihead attention, for transformer models on text-to-text translation. Second, we investigate the issues and potential solutions when adapting text-to-text simultaneous policies to end-to-end speech-to-text translation models. Third, we introduced the augmented memory transformer encoder for simultaneous speech-to-text translation models for better computation efficiency. Fourth, we explore a direct simultaneous speech translation with variational monotonic multihead attention policy, based on recent speech-to-unit models. At the end, we provide some directions for potential future research
    corecore