50 research outputs found

    Strong and Efficient Baselines for Open Domain Conversational Question Answering

    Full text link
    Unlike the Open Domain Question Answering (ODQA) setting, the conversational (ODConvQA) domain has received limited attention when it comes to reevaluating baselines for both efficiency and effectiveness. In this paper, we study the State-of-the-Art (SotA) Dense Passage Retrieval (DPR) retriever and Fusion-in-Decoder (FiD) reader pipeline, and show that it significantly underperforms when applied to ODConvQA tasks due to various limitations. We then propose and evaluate strong yet simple and efficient baselines, by introducing a fast reranking component between the retriever and the reader, and by performing targeted finetuning steps. Experiments on two ODConvQA tasks, namely TopiOCQA and OR-QuAC, show that our method improves the SotA results, while reducing reader's latency by 60%. Finally, we provide new and valuable insights into the development of challenging baselines that serve as a reference for future, more intricate approaches, including those that leverage Large Language Models (LLMs).Comment: Accepted to EMNLP 2023 Finding

    Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT

    Get PDF
    We explore strategies for incorporating target syntax into Neural Machine Translation. We specifically focus on syntax in ensembles containing multiple sentence representations. We formulate beam search over such ensembles using WFSTs, and describe a delayed SGD update training procedure that is especially effective for long representations like linearized syntax. Our approach gives state-of-the-art performance on a difficult Japanese-English task.This work was supported by EPSRC grant EP/L027623/1

    Modelo estocástico de traducción basado en N-gramas de tuplas bilingües y combinación log-lineal de características

    Get PDF
    En esta comunicación se presenta un sistema de traducción estocástica basado en el modelado mediante N-gramas de la probabilidad conjunta de textos bilingües. La unidad básica del modelo es la tupla, par de cadenas de palabras del lenguaje fuente (a traducir) y el lenguaje destino (traducción). La traducción se lleva a cabo mediante la maximización de una combinación lineal de los logaritmos de la probabilidad asignada a la traducción por el modelo de traducción y otras características, siguiendo la aproximación de entropía máxima. Las prestaciones del sistema de traducción son evaluadas con una tarea de traducción del habla: la traducción entre inglés y español (y viceversa) de transcripciones de intervenciones de los miembros del Parlamento Europeo. Los resultados alcanzados se encuentran al nivel del estado del arte.This communication introduces a stochastic machine translation system based on Ngram modelling of the joint probability of bilingual texts. The basic unit of this model is called a tuple and consists of a pair of both source (to be translated) language and target language (translation) word-strings. Translation is driven by a log-linear combination of the N-gram model probability and other features, according to the maximum entropy language modelling approach. The translation performance is evaluated by means of a speech-to-speech translation tasks: translation from Spanish to English (and viceversa) of European Parliament speeches. The system reaches a state-of-art performance.Este trabajo ha sido financiado parcialmente por la CICYT a través del proyecto TIC2002-04447-C02 (ALIADO) y la Unión Europea mediante el proyecto FP6-506738 (TC-STAR)

    Pushdown automata in statistical machine translation

    Get PDF
    This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p

    Famílies botàniques de plantes medicinals

    Get PDF
    Facultat de Farmàcia, Universitat de Barcelona. Ensenyament: Grau de Farmàcia, Assignatura: Botànica Farmacèutica, Curs: 2013-2014, Coordinadors: Joan Simon, Cèsar Blanché i Maria Bosch.Els materials que aquí es presenten són els recull de 175 treballs d’una família botànica d’interès medicinal realitzats de manera individual. Els treballs han estat realitzat per la totalitat dels estudiants dels grups M-2 i M-3 de l’assignatura Botànica Farmacèutica durant els mesos d’abril i maig del curs 2013-14. Tots els treballs s’han dut a terme a través de la plataforma de GoogleDocs i han estat tutoritzats pel professor de l’assignatura i revisats i finalment co-avaluats entre els propis estudiants. L’objectiu principal de l’activitat ha estat fomentar l’aprenentatge autònom i col·laboratiu en Botànica farmacèutica
    corecore