169 research outputs found
Verb similarity: comparing corpus and psycholinguistic data
Similarity, which plays a key role in fields like cognitive science, psycholinguistics and natural language processing, is a broad and multifaceted concept. In this work we analyse how two approaches that belong to different perspectives, the corpus view and the psycholinguistic view, articulate similarity between verb senses in Spanish. Specifically, we compare the similarity between verb senses based on their argument structure, which is captured through semantic roles, with their similarity defined by word associations. We address the question of whether verb argument structure, which reflects the expression of the events, and word associations, which are related to the speakers' organization of the mental lexicon, shape similarity between verbs in a congruent manner, a topic which has not been explored previously. While we find significant correlations between verb sense similarities obtained from these two approaches, our findings also highlight some discrepancies between them and the importance of the degree of abstraction of the corpus annotation and psycholinguistic representations.La similitud, que desempeña un papel clave en campos como la ciencia cognitiva, la psicolingüística y el procesamiento del lenguaje natural, es un concepto amplio y multifacético. En este trabajo analizamos cómo dos enfoques que pertenecen a diferentes perspectivas, la visión del corpus y la visión psicolingüística, articulan la semejanza entre los sentidos verbales en español. Específicamente, comparamos la similitud entre los sentidos verbales basados en su estructura argumental, que se capta a través de roles semánticos, con su similitud definida por las asociaciones de palabras. Abordamos la cuestión de si la estructura del argumento verbal, que refleja la expresión de los acontecimientos, y las asociaciones de palabras, que están relacionadas con la organización de los hablantes del léxico mental, forman similitud entre los verbos de una manera congruente, un tema que no ha sido explorado previamente. Mientras que encontramos correlaciones significativas entre las similitudes de los sentidos verbales obtenidas de estos dos enfoques, nuestros hallazgos también resaltan algunas discrepancias entre ellos y la importancia del grado de abstracción de la anotación del corpus y las representaciones psicolingüísticas.La similitud, que exerceix un paper clau en camps com la ciència cognitiva, la psicolingüística i el processament del llenguatge natural, és un concepte ampli i multifacètic. En aquest treball analitzem com dos enfocaments que pertanyen a diferents perspectives, la visió del corpus i la visió psicolingüística, articulen la semblança entre els sentits verbals en espanyol. Específicament, comparem la similitud entre els sentits verbals basats en la seva estructura argumental, que es capta a través de rols semàntics, amb la seva similitud definida per les associacions de paraules. Abordem la qüestió de si l'estructura de l'argument verbal, que reflecteix l'expressió dels esdeveniments, i les associacions de paraules, que estan relacionades amb l'organització dels parlants del lèxic mental, formen similitud entre els verbs d'una manera congruent, un tema que no ha estat explorat prèviament. Mentre que trobem correlacions significatives entre les similituds dels sentits verbals obtingudes d'aquests dos enfocaments, les nostres troballes també ressalten algunes discrepàncies entre ells i la importància del grau d'abstracció de l'anotació del corpus i les representacions psicolingüístiques
Understanding and Supporting Vocabulary Learners via Machine Learning on Behavioral and Linguistic Data
This dissertation presents various machine learning applications for predicting different cognitive states of students while they are using a vocabulary tutoring system, DSCoVAR. We conduct four studies, each of which includes a comprehensive analysis of behavioral and linguistic data and provides data-driven evidence for designing personalized features for the system.
The first study presents how behavioral and linguistic interactions from the vocabulary tutoring system can be used to predict students' off-task states. The study identifies which predictive features from interaction signals are more important and examines different types of off-task behaviors. The second study investigates how to automatically evaluate students' partial word knowledge from open-ended responses to definition questions. We present a technique that augments modern word-embedding techniques with a classic semantic differential scaling method from cognitive psychology. We then use this interpretable semantic scale method for predicting students' short- and long-term learning.
The third and fourth studies show how to develop a model that can generate more efficient training curricula for both human and machine vocabulary learners. The third study illustrates a deep-learning model to score sentences for a contextual vocabulary learning curriculum. We use pre-trained language models, such as ELMo or BERT, and an additional attention layer to capture how the context words are less or more important with respect to the meaning of the target word. The fourth study examines how the contextual informativeness model, originally designed to develop curricula for human vocabulary learning, can also be used for developing curricula for various word embedding models. We identify sentences predicted as low informative for human learners are also less helpful for machine learning algorithms.
Having a rich understanding of user behaviors, responses, and learning stimuli is imperative to develop an intelligent online system. Our studies demonstrate interpretable methods with cross-disciplinary approaches to understand various cognitive states of students during learning. The analysis results provide data-driven evidence for designing personalized features that can maximize learning outcomes. Datasets we collected from the studies will be shared publicly to promote future studies related to online tutoring systems. And these findings can also be applied to represent different user states observed in other online systems. In the future, we believe our findings can help to implement a more personalized vocabulary learning system, to develop a system that uses non-English texts or different types of inputs, and to investigate how the machine learning outputs interact with students.PHDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162999/1/sjnam_1.pd
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Recommended from our members
COMPUTATIONAL COMMUNICATION INTELLIGENCE: EXPLORING LINGUISTIC MANIFESTATION AND SOCIAL DYNAMICS IN ONLINE COMMUNICATION
We now live in an age of online communication. As social media becomes an integral part of our life, online communication becomes an essential life skill. In this dissertation, we aim to understand how people effectively communicate online. We research components of success in online communication and present scientific methods to study the skill of effective communication. This research advances the state of art in machine learning and communication studies.
For communication studies, we pioneer the study of a communication phenomenon we call Communication Intelligence in online interactions. We create a theory about communication intelligence that measures participants’ ten high-order communication skills, including restraint, self-reflection, perspective taking, and balance. We present a multi-perspective analysis for understanding communication intelligence, including its diverse language, shared linguistic characteristics across people, social dynamics, and the effects of communication modality on communication intelligence.
For machine learning, we contribute new computational models and formulations for addressing multi-label and multi-task machine learning problems. We develop a new hierarchical probabilistic model for simultaneously identifying multiple intelligence-embodied communication skills from natural language. The model learns the topic assignment for each sentence and provides a practical and simple way to determine document labels without relying on a threshold function. The model performance increases as the number of labels grows, which makes it a promising approach for large-scale data analysis. We also develop a new multi-task formulation for simultaneously identifying multiple intelligence-embodied communication skills from lexical, discourse, and interaction features. The key merit of this model is that it is a general multi-task formulation that unifies many widely used regularization techniques, including Lasso, group Lasso, sparse-group Lasso, and the Dirty model. This model expands the applicability of multi-task learning by allowing analyzing real-world problems where the degree of task relatedness is uncertain and the true structure of the groups in data is not clear ahead of time. Moreover, it can be applied to streaming data to perform large-scale analysis in real time. Beyond the application of studying communication intelligence, the developed models and formulations can also benefit research in other areas where the problems of simultaneously predicting multiple categories are abundant
The analysis of breathing and rhythm in speech
Speech rhythm can be described as the temporal patterning by which speech events, such as vocalic onsets, occur. Despite efforts to quantify and model speech rhythm across languages, it remains a scientifically enigmatic aspect of prosody. For instance, one challenge lies in determining how to best quantify and analyse speech rhythm. Techniques range from manual phonetic annotation to the automatic extraction of acoustic features. It is currently unclear how closely these differing approaches correspond to one another. Moreover, the primary means of speech rhythm research has been the analysis of the acoustic signal only. Investigations of speech rhythm may instead benefit from a range of complementary measures, including physiological recordings, such as of respiratory effort. This thesis therefore combines acoustic recording with inductive plethysmography (breath belts) to capture temporal characteristics of speech and speech breathing rhythms. The first part examines the performance of existing phonetic and algorithmic techniques for acoustic prosodic analysis in a new corpus of rhythmically diverse English and Mandarin speech. The second part addresses the need for an automatic speech breathing annotation technique by developing a novel function that is robust to the noisy plethysmography typical of spontaneous, naturalistic speech production. These methods are then applied in the following section to the analysis of English speech and speech breathing in a second, larger corpus. Finally, behavioural experiments were conducted to investigate listeners' perception of speech breathing using a novel gap detection task. The thesis establishes the feasibility, as well as limits, of automatic methods in comparison to manual annotation. In the speech breathing corpus analysis, they help show that speakers maintain a normative, yet contextually adaptive breathing style during speech. The perception experiments in turn demonstrate that listeners are sensitive to the violation of these speech breathing norms, even if unconsciously so. The thesis concludes by underscoring breathing as a necessary, yet often overlooked, component in speech rhythm planning and production
- …