7 research outputs found

    Memory in humans and deep language models: Linking hypotheses for model augmentation

    Full text link
    The computational complexity of the self-attention mechanism in Transformer models significantly limits their ability to generalize over long temporal durations. Memory-augmentation, or the explicit storing of past information in external memory for subsequent predictions, has become a constructive avenue for mitigating this limitation. We argue that memory-augmented Transformers can benefit substantially from considering insights from the memory literature in humans. We detail an approach to integrating evidence from the human memory system through the specification of cross-domain linking hypotheses. We then provide an empirical demonstration to evaluate the use of surprisal as a linking hypothesis, and further identify the limitations of this approach to inform future research.Comment: 5 figure

    Seeing the advantage: Visually grounding word embeddings to better capture human semantic knowledge

    Get PDF
    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings.Importantly, in both experiments we show that he grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information

    Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge

    Get PDF
    Distributional semantic models capture word-level meaning that is useful in many natural language processing tasks and have even been shown to capture cognitive aspects of word meaning. The majority of these models are purely text based, even though the human sensory experience is much richer. In this paper we create visually grounded word embeddings by combining English text and images and compare them to popular text-based methods, to see if visual information allows our model to better capture cognitive aspects of word meaning. Our analysis shows that visually grounded embedding similarities are more predictive of the human reaction times in a large priming experiment than the purely text-based embeddings. The visually grounded embeddings also correlate well with human word similarity ratings. Importantly, in both experiments we show that the grounded embeddings account for a unique portion of explained variance, even when we include text-based embeddings trained on huge corpora. This shows that visual grounding allows our model to capture information that cannot be extracted using text as the only source of information

    Word Importance Modeling to Enhance Captions Generated by Automatic Speech Recognition for Deaf and Hard of Hearing Users

    Get PDF
    People who are deaf or hard-of-hearing (DHH) benefit from sign-language interpreting or live-captioning (with a human transcriptionist), to access spoken information. However, such services are not legally required, affordable, nor available in many settings, e.g., impromptu small-group meetings in the workplace or online video content that has not been professionally captioned. As Automatic Speech Recognition (ASR) systems improve in accuracy and speed, it is natural to investigate the use of these systems to assist DHH users in a variety of tasks. But, ASR systems are still not perfect, especially in realistic conversational settings, leading to the issue of trust and acceptance of these systems from the DHH community. To overcome these challenges, our work focuses on: (1) building metrics for accurately evaluating the quality of automatic captioning systems, and (2) designing interventions for improving the usability of captions for DHH users. The first part of this dissertation describes our research on methods for identifying words that are important for understanding the meaning of a conversational turn within transcripts of spoken dialogue. Such knowledge about the relative importance of words in spoken messages can be used in evaluating ASR systems (in part 2 of this dissertation) or creating new applications for DHH users of captioned video (in part 3 of this dissertation). We found that models which consider both the acoustic properties of spoken words as well as text-based features (e.g., pre-trained word embeddings) are more effective at predicting the semantic importance of a word than models that utilize only one of these types of features. The second part of this dissertation describes studies to understand DHH users\u27 perception of the quality of ASR-generated captions; the goal of this work was to validate the design of automatic metrics for evaluating captions in real-time applications for these users. Such a metric could facilitate comparison of various ASR systems, for determining the suitability of specific ASR systems for supporting communication for DHH users. We designed experimental studies to elicit feedback on the quality of captions from DHH users, and we developed and evaluated automatic metrics for predicting the usability of automatically generated captions for these users. We found that metrics that consider the importance of each word in a text are more effective at predicting the usability of imperfect text captions than the traditional Word Error Rate (WER) metric. The final part of this dissertation describes research on importance-based highlighting of words in captions, as a way to enhance the usability of captions for DHH users. Similar to highlighting in static texts (e.g., textbooks or electronic documents), highlighting in captions involves changing the appearance of some texts in caption to enable readers to attend to the most important bits of information quickly. Despite the known benefits of highlighting in static texts, research on the usefulness of highlighting in captions for DHH users is largely unexplored. For this reason, we conducted experimental studies with DHH participants to understand the benefits of importance-based highlighting in captions, and their preference on different design configurations for highlighting in captions. We found that DHH users subjectively preferred highlighting in captions, and they reported higher readability and understandability scores and lower task-load scores when viewing videos with captions containing highlighting compared to the videos without highlighting. Further, in partial contrast to recommendations in prior research on highlighting in static texts (which had not been based on experimental studies with DHH users), we found that DHH participants preferred boldface, word-level, non-repeating highlighting in captions

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    El error de concordancia plural en espa帽ol L2 desde una perspectiva emergente

    Get PDF
    Doctorado en Ciencias del Lenguaje. Menci贸n en Ling眉铆stica Aplicada.En esta tesis se analizan, desde una perspectiva emergente, los errores de concordancia plural en cuatro aprendientes italianos de espa帽ol L2: SONIA (nivel A), NATI (nivel B1), JAKO (nivel B2), MIRKA (nivel C1). Los objetivos principales son: (i) examinar los factores relacionados con la chance / riesgo de error; (ii) analizar la din谩mica del error a nivel microsc贸pico y macrosc贸pico desde el enfoque de los sistemas complejos; (iii) predecir el error local de las 煤ltimas sesiones de los aprendientes. La orientaci贸n de esta tesis es fuertemente cuantitativa. Se han utilizado t茅cnicas del 谩mbito de la estad铆stica, la miner铆a de da- tos y la f铆sica de los sistemas complejos. En cuanto al primer objetivo, se crearon las siguientes variables predictoras: (i) tipo de modificador (art铆culo definido, art铆culo indefinido, determinantes, adjetivos atributi- vos); (ii) si es a larga distancia; (iii) de m谩s de dos t茅rminos; (iv) presencia de -e- epent茅tica (controladores terminados en consonante); (v) propiedades del controlador (animicidad, concretud, familiaridad, imagi- nabilidad, frecuencia), (vi) similitud entre las terminaciones del espa帽ol y el italiano; (vii) similitud entre las ra铆ces l茅xicas de ambas lenguas; (viii) errores acumulados hasta la instancia en cuesti贸n; (ix) posibles estrategias de aprendizaje (de 1 a 7), (x) frecuencia de TYPES de instancias de concordancia en corpus EsTenTen y en el propio. Las variables respuesta fueron: (i) binaria [error / no error]; (ii) categ贸rica [error de: g茅nero, -e- epent茅tica, plural, mixto]; (iii) tiempo hasta que se produce un error; (iv) serie simb贸lica. En general se hallaron efectos de: (a) tipo de modificador: los errores suben con determinantes / adjetivos res- pecto del art铆culo definido; (b) g茅nero: plurales masculinos m谩s f谩ciles que los femeninos; (c) familiaridad / frecuencia del controlador [errores bajan]; (d) animicidad [errores suben para animados]; (e) frecuencia de TYPES [error baja a m谩s frecuencia]. El efecto de -e- epent茅tica fue facilitador, contra las expectativas. Se interpret贸 que la estrategia 5, que hac铆a bajar el error, tuvo efecto ben茅fico para los plurales en -es. La distancia de terminaciones evidenci贸 el efecto inverso al esperado, se lo explic贸 analizando las instancias del nivel de referencia. El efecto de errores acumulados result贸 d茅bil. Respecto del segundo objetivo, la hip贸tesis principal fue tratar al error como atractor. A nivel microsc贸pico, se utilizaron los resultados de los an谩lisis estad铆sticos para sesgar el flujo a atractores en tres simulaciones basadas en sistemas din谩micos. En general, se logr贸 emular el patr贸n global de error pero se aproxim贸 menos el patr贸n de error por sesiones. En cuanto al nivel macrosc贸pico, se usaron medidas de detecci贸n de cambio de r茅gimen y redes comple- jas. Se logr贸 identificar grupos de sesiones con din谩mica similar y regiones aproximadas de transiciones, usando la variable respuesta de serie simb贸lica. Las redes complejas arrojaron efectos a nivel de las pala- bras respecto de: modificador, familiaridad / frecuencia y de imaginabilidad / concretud del controlador, -e- epent茅tica y similitud entre ra铆z y desinencia. Tambi茅n hubo efectos para las palabras terminadas en -e. Para el tercer objetivo se emplearon, entre otras predictoras, informaci贸n proveniente de las redes complejas. En general no se logr贸 superar el 80% de precisi贸n. Los atributos derivados del grafo fueron seleccionados como influyentes para todos los alumnos.Fil: Marafioti, Pablo Ezequiel. Universidad Nacional de C贸rdoba. Facultad de Lenguas; Argentina
    corecore