228 research outputs found
Lower Perplexity is Not Always Human-Like
In computational psycholinguistics, various language models have been
evaluated against human reading behavior (e.g., eye movement) to build
human-like computational models. However, most previous efforts have focused
almost exclusively on English, despite the recent trend towards linguistic
universal within the general community. In order to fill the gap, this paper
investigates whether the established results in computational psycholinguistics
can be generalized across languages. Specifically, we re-examine an established
generalization -- the lower perplexity a language model has, the more
human-like the language model is -- in Japanese with typologically different
structures from English. Our experiments demonstrate that this established
generalization exhibits a surprising lack of universality; namely, lower
perplexity is not always human-like. Moreover, this discrepancy between English
and Japanese is further explored from the perspective of (non-)uniform
information density. Overall, our results suggest that a cross-lingual
evaluation will be necessary to construct human-like computational models.Comment: Accepted by ACL 202
Natural Language Understanding: Methodological Conceptualization
This article contains the results of a theoretical analysis of the phenomenon of natural language understanding (NLU), as a methodological problem. The combination of structural-ontological and informational-psychological approaches provided an opportunity to describe the subject matter field of NLU, as a composite function of the mind, which systemically combines the verbal and discursive structural layers. In particular, the idea of NLU is presented, on the one hand, as the relation between the discourse of a specific speech message and the meta-discourse of a language, in turn, activated by the need-motivational factors. On the other hand, it is conceptualized as a process with a specific structure of information metabolism, the study of which implies the necessity to differentiate the affective (emotional) and need-motivational influences on the NLU, as well as to take into account their interaction. At the same time, the hypothesis about the influence of needs on NLU under the scenario similar to the pattern of Yerkes-Dodson is argued. And the theoretical conclusion that emotions fulfill the function of the operator of the structural features of the information metabolism of NLU is substantiated. Thus, depending on the modality of emotions in the process of NLU, it was proposed to distinguish two scenarios for the implementation of information metabolism - reduction and synthetic. The argument in favor of the conclusion about the productive and constitutive role of emotions in the process of NLU is also given
Voices:a clinical computational psycholinguistic approach to language and hallucinations in schizophrenia spectrum disorders
Spontaneous speech contains a wealth of information that reflects personal characteristics of the speaker, such as mood, motivation, intelligence, arousal, and variability in word use. Recent advances in Natural Language Processing (NLP) have paved the way for systematic recording and near real-time analysis of quantifiable properties of spoken language. NLP can reliably provide variables relevant to various aspects of brain functioning within seconds, while the cost and effort of speech recording is negligible. In this thesis, we investigated the use of state-of-the-art NLP models to support the diagnosis of psychotic disorders (e.g., schizophrenia). Psychiatric diagnoses are currently not reliable as no objective quantitative biomarkers are available. This is a serious social problem, because incorrect diagnoses lead to over- and under-treatment. NLP analyzes of spontaneous speech provide reproducible quantitative assessment.In this thesis, we have shown that acoustic, semantic and grammatical aspects of language can be quantified and used as a marker for psychotic disorders. Based on these analyses, we can say with ~85% certainty whether someone has a psychosis or not.In addition, we have shown that computational language analyzes provide clinically relevant insights in the study of auditory verbal hallucinations. In the future, these analyzes may be used to detect a relapse in psychosis earlier, so that you can see a psychosis coming before people become seriously ill
Psychometric Predictive Power of Large Language Models
Next-word probabilities from language models have been shown to successfully
simulate human reading behavior. Building on this, we show that, interestingly,
instruction-tuned large language models (LLMs) yield worse psychometric
predictive power (PPP) for human reading behavior than base LLMs with
equivalent perplexities. In other words, instruction tuning, which helps LLMs
provide human-preferred responses, does not always make them human-like from
the computational psycholinguistics perspective. In addition, we explore
prompting methodologies in simulating human reading behavior with LLMs, showing
that prompts reflecting a particular linguistic hypothesis lead LLMs to exhibit
better PPP but are still worse than base LLMs. These highlight that recent
instruction tuning and prompting do not offer better estimates than direct
probability measurements from base LLMs in cognitive modeling.Comment: 8 page
Language Teaching in India: Issues and Innovations
This collected volume on English language teaching (ELT) in India contains 22 articles written by Indian teachers and researchers. The book has been divided into six sections. The first section—“Problematizing ELT in India”—offers a critical, historical perspective along with innovative ideas for making English language learning and teaching meaningful and purposive in modern India. The second section—“Nature of ELT Materials”—demonstrates how the ELT materials used in Indian classrooms are not embedded in local needs and indigenous contexts. The section emphasizes the importance of developing instructional materials that not only make use of the rich linguistic and cultural resources available in India but also promote effective communication skills among the learners. The third section—“Learner Profiles”—provides interesting insights into the needs, wants, and lacks of Indian learners of English. This section shows how the instruments of needs analysis developed in monocultural and monolingual settings are inadequate for assessing the needs and wants of learners in multilingual and multicultural India. The fourth section—“Classroom Issues”—focuses on certain central issues affecting teaching and learning in the classroom context, particularly the role of native language knowledge and skills that Indian learners bring with them. The fifth section—“Course Evaluation and Teacher Development”—suggests ideas for making teacher education responsive to the changing roles and responsibilities of language teachers. The sixth and final section—“Curriculum Change”—deals with the principles and procedures for curricular changes that are in tune with the evolving knowledge about learning and teaching and the increasing desire for learner control of the process of materials development and evaluation
Availability-Based Production Predicts Speakers' Real-time Choices of Mandarin Classifiers
Speakers often face choices as to how to structure their intended message
into an utterance. Here we investigate the influence of contextual
predictability on the encoding of linguistic content manifested by speaker
choice in a classifier language. In English, a numeral modifies a noun directly
(e.g., three computers). In classifier languages such as Mandarin Chinese, it
is obligatory to use a classifier (CL) with the numeral and the noun (e.g.,
three CL.machinery computer, three CL.general computer). While different nouns
are compatible with different specific classifiers, there is a general
classifier "ge" (CL.general) that can be used with most nouns. When the
upcoming noun is less predictable, the use of a more specific classifier would
reduce surprisal at the noun thus potentially facilitate comprehension
(predicted by Uniform Information Density, Levy & Jaeger, 2007), but the use of
that more specific classifier may be dispreferred from a production standpoint
if accessing the general classifier is always available (predicted by
Availability-Based Production; Bock, 1987; Ferreira & Dell, 2000). Here we use
a picture-naming experiment showing that Availability-Based Production predicts
speakers' real-time choices of Mandarin classifiers.Comment: To appear in proceedings of CogSci 201
Linear Logic for Meaning Assembly
Semantic theories of natural language associate meanings with utterances by
providing meanings for lexical items and rules for determining the meaning of
larger units given the meanings of their parts. Meanings are often assumed to
combine via function application, which works well when constituent structure
trees are used to guide semantic composition. However, we believe that the
functional structure of Lexical-Functional Grammar is best used to provide the
syntactic information necessary for constraining derivations of meaning in a
cross-linguistically uniform format. It has been difficult, however, to
reconcile this approach with the combination of meanings by function
application. In contrast to compositional approaches, we present a deductive
approach to assembling meanings, based on reasoning with constraints, which
meshes well with the unordered nature of information in the functional
structure. Our use of linear logic as a `glue' for assembling meanings allows
for a coherent treatment of the LFG requirements of completeness and coherence
as well as of modification and quantification.Comment: 19 pages, uses lingmacros.sty, fullname.sty, tree-dvips.sty,
latexsym.sty, requires the new version of Late
- …