31 research outputs found

    Psychometric Predictive Power of Large Language Models

    Full text link
    Next-word probabilities from language models have been shown to successfully simulate human reading behavior. Building on this, we show that, interestingly, instruction-tuned large language models (LLMs) yield worse psychometric predictive power (PPP) for human reading behavior than base LLMs with equivalent perplexities. In other words, instruction tuning, which helps LLMs provide human-preferred responses, does not always make them human-like from the computational psycholinguistics perspective. In addition, we explore prompting methodologies in simulating human reading behavior with LLMs, showing that prompts reflecting a particular linguistic hypothesis lead LLMs to exhibit better PPP but are still worse than base LLMs. These highlight that recent instruction tuning and prompting do not offer better estimates than direct probability measurements from base LLMs in cognitive modeling.Comment: 8 page

    Informationist Science Fiction Theory and Informationist Science Fiction

    Get PDF
    Informationist Science Fiction theory provides a way of analysing science fiction texts and narratives in order to demonstrate on an informational basis the uniqueness of science fiction proper as a mode of fiction writing. The theoretical framework presented can be applied to all types of written texts, including non-fictional texts. In "Informationist Science Fiction Theory and Informationist Science Fiction" the author applies the theoretical framework and its specific methods and principles to various contemporary science fiction works, including works by William Gibson, Neal Stephenson and Vernor Vinge. The theoretical framework introduces a new informational theoretic re-framing of existing science fiction literary theoretic posits such as Darko Suvin's novum, the mega-text as conceived of by Damien Broderick, and the work of Samuel R Delany in investigating the subjunctive mood in SF. An informational aesthetics of SF proper is established, and the influence of analytic philosophy - especially modal logic - is investigated. The materialist foundations of the metaphysical outlook of SF proper is investigated with a view to elucidating the importance of the relationship between scientific materialism and SF. SF is presented as The Fiction of Veridical, Counterfactual and Heterogeneous Information

    Informationist Science Fiction Theory and Informationist Science Fiction

    Get PDF
    Informationist Science Fiction theory provides a way of analysing science fiction texts and narratives in order to demonstrate on an informational basis the uniqueness of science fiction proper as a mode of fiction writing. The theoretical framework presented can be applied to all types of written texts, including non-fictional texts. In "Informationist Science Fiction Theory and Informationist Science Fiction" the author applies the theoretical framework and its specific methods and principles to various contemporary science fiction works, including works by William Gibson, Neal Stephenson and Vernor Vinge. The theoretical framework introduces a new informational theoretic re-framing of existing science fiction literary theoretic posits such as Darko Suvin's novum, the mega-text as conceived of by Damien Broderick, and the work of Samuel R Delany in investigating the subjunctive mood in SF. An informational aesthetics of SF proper is established, and the influence of analytic philosophy - especially modal logic - is investigated. The materialist foundations of the metaphysical outlook of SF proper is investigated with a view to elucidating the importance of the relationship between scientific materialism and SF. SF is presented as The Fiction of Veridical, Counterfactual and Heterogeneous Information

    Context, content, and the occasional costs of implicature computation

    Get PDF
    The computation of scalar implicatures is sometimes costly relative to basic meanings. Among the costly computations are those that involve strengthening “some” to “not all” and strengthening inclusive disjunction to exclusive disjunction. The opposite is true for some other cases of strengthening, where the strengthened meaning is less costly than its corresponding basic meaning. These include conjunctive strengthenings of disjunctive sentences (e.g., free-choice inferences) and exactly-readings of numerals. Assuming that these are indeed all instances of strengthening via implicature/exhaustification, the puzzle is to explain why strengthening sometimes increases costs while at other times it decreases costs. I develop a theory of processing costs that makes no reference to the strengthening mechanism or to other aspects of the derivation of the sentence’s form/meaning. Instead, costs are determined by domain-general considerations of the grammar’s output, and in particular by aspects of the meanings of ambiguous sentences and particular ways they update the context. Specifically, I propose that when the hearer has to disambiguate between a sentence’s basic and strengthened meaning, the processing cost of any particular choice is a function of (i) a measure of the semantic complexity of the chosen meaning and (ii) a measure of how much relevant uncertainty it leaves behind in the context. I measure semantic complexity with Boolean Complexity in the propositional case and with semantic automata in the quantificational case, both of which give a domain-general measure of the minimal representational complexity needed to express the given meaning. I measure relevant uncertainty with the information-theoretic notion of entropy; this domain-general measure formalizes how ‘far’ the meaning is from giving a complete answer to the question under discussion, and hence gives an indication of how much representational complexity is yet to come. Processing costs thus follow from domain-general considerations of current and anticipated representational complexity. The results might also speak to functional motivations for having strengthening mechanisms in the first place. Specifically, exhaustification allows language users to use simpler forms than would be available without it to bot

    The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study

    Get PDF
    Carminati MN, Knoeferle P. The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study. Presented at the Architectures and Mechanisms of Language and Processing (AMLaP), Riva del Garda, Italy

    Anticipatory prediction during online language processing

    Get PDF
    Most investigations of linguistic prediction focus on evidence of predictability benefits when comprehenders encounter expected input during reading. However, there remain several unresolved empirical issues that are important for the broader question of whether prediction plays a fundamental role during real-time language comprehension. These include whether there are processing costs for misprediction, what the contents of predictions are, and whether readers differ in the extent to which they engage in prediction. In six experiments, these issues were systematically investigated by presenting different groups of readers with predictable words and unpredictable alternatives that were either semantically related or unrelated in constraining or non-constraining context conditions. The primary methodology was the recording of eye movements during natural reading for comprehension. Self-paced reading was also used to assess the contribution of stimuli presentation format on predictive processing. Across most experiments, there was evidence of early and late processing benefits for predictable completions in constraining contexts, which also extended to unpredictable completions that were semantically related. However, evidence of immediate processing costs for unexpected input that replaced readers’ predictions was more mixed and appeared to depend on a variety of linguistic and non-linguistic factors. Overall, these results provide some support for the idea that the language processor is a “prediction machine” in line with general predictive accounts of cognitive functioning. The results also provide insight into the mechanisms underpinning prediction and provide opportunities for future research to refine theories of prediction

    An Information theoretic approach to production and comprehension of discourse markers

    Get PDF
    Discourse relations are the building blocks of a coherent text. The most important linguistic elements for constructing these relations are discourse markers. The presence of a discourse marker between two discourse segments provides information on the inferences that need to be made for interpretation of the two segments as a whole (e.g., because marks a reason). This thesis presents a new framework for studying human communication at the level of discourse by adapting ideas from information theory. A discourse marker is viewed as a symbol with a measurable amount of relational information. This information is communicated by the writer of a text to guide the reader towards the right semantic decoding. To examine the information theoretic account of discourse markers, we conduct empirical corpus-based investigations, offline crowd-sourced studies and online laboratory experiments. The thesis contributes to computational linguistics by proposing a quantitative meaning representation for discourse markers and showing its advantages over the classic descriptive approaches. For the first time, we show that readers are very sensitive to the fine-grained information encoded in a discourse marker obtained from its natural usage and that writers use explicit marking for less expected relations in terms of linguistic and cognitive predictability. These findings open new directions for implementation of advanced natural language processing systems.Diskursrelationen sind die Bausteine eines kohärenten Texts. Die wichtigsten sprachlichen Elemente für die Konstruktion dieser Relationen sind Diskursmarker. Das Vorhandensein eines Diskursmarkers zwischen zwei Diskurssegmenten liefert Informationen über die Inferenzen, die für die Interpretation der beiden Segmente als Ganzes getroffen werden müssen (zB. weil markiert einen Grund). Diese Dissertation bietet ein neues Framework für die Untersuchung menschlicher Kommunikation auf der Ebene von Diskursrelationen durch Anpassung von denen aus der Informationstheorie. Ein Diskursmarker wird als ein Symbol mit einer messbaren Menge relationaler Information betrachtet. Diese Information wird vom Autoren eines Texts kommuniziert, um den Leser zur richtigen semantischen Decodierung zu führen. Um die informationstheoretische Beschreibung von Diskursmarkern zu untersuchen, führen wir empirische korpusbasierte Untersuchungen durch: offline Crowdsourcing-Studien und online Labor-Experimente. Die Dissertation trägt zur Computerlinguistik bei, indem sie eine quantitative Bedeutungs-Repräsentation zu Diskursmarkern vorschlägt und ihre Vorteile gegenüber den klassischen deskriptiven Ansätzen aufzeigt. Wir zeigen zum ersten Mal, dass Leser sensitiv für feinkörnige Informationen sind, die durch Diskursmarker kodiert werden, und dass Textproduzenten Relationen, die sowohl auf linguistischer Ebene als auch kognitiv weniger vorhersagbar sind, häufiger explizit markieren. Diese Erkenntnisse eröffnen neue Richtungen für die Implementierung fortschrittlicher Systeme der Verarbeitung natürlicher Sprache

    Universal Prediction

    Get PDF
    In this dissertation I investigate the theoretical possibility of a universal method of prediction. A prediction method is universal if it is always able to learn what there is to learn from data: if it is always able to extrapolate given data about past observations to maximally successful predictions about future observations. The context of this investigation is the broader philosophical question into the possibility of a formal specification of inductive or scientific reasoning, a question that also touches on modern-day speculation about a fully automatized data-driven science. I investigate, in particular, a specific mathematical definition of a universal prediction method, that goes back to the early days of artificial intelligence and that has a direct line to modern developments in machine learning. This definition essentially aims to combine all possible prediction algorithms. An alternative interpretation is that this definition formalizes the idea that learning from data is equivalent to compressing data. In this guise, the definition is often presented as an implementation and even as a justification of Occam's razor, the principle that we should look for simple explanations. The conclusions of my investigation are negative. I show that the proposed definition cannot be interpreted as a universal prediction method, as turns out to be exposed by a mathematical argument that it was actually intended to overcome. Moreover, I show that the suggested justification of Occam's razor does not work, and I argue that the relevant notion of simplicity as compressibility is problematic itself

    Universal Prediction

    Get PDF
    In this thesis I investigate the theoretical possibility of a universal method of prediction. A prediction method is universal if it is always able to learn from data: if it is always able to extrapolate given data about past observations to maximally successful predictions about future observations. The context of this investigation is the broader philosophical question into the possibility of a formal specification of inductive or scientific reasoning, a question that also relates to modern-day speculation about a fully automatized data-driven science. I investigate, in particular, a proposed definition of a universal prediction method that goes back to Solomonoff (1964) and Levin (1970). This definition marks the birth of the theory of Kolmogorov complexity, and has a direct line to the information-theoretic approach in modern machine learning. Solomonoff's work was inspired by Carnap's program of inductive logic, and the more precise definition due to Levin can be seen as an explicit attempt to escape the diagonal argument that Putnam (1963) famously launched against the feasibility of Carnap's program. The Solomonoff-Levin definition essentially aims at a mixture of all possible prediction algorithms. An alternative interpretation is that the definition formalizes the idea that learning from data is equivalent to compressing data. In this guise, the definition is often presented as an implementation and even as a justification of Occam's razor, the principle that we should look for simple explanations. The conclusions of my investigation are negative. I show that the Solomonoff-Levin definition fails to unite two necessary conditions to count as a universal prediction method, as turns out be entailed by Putnam's original argument after all; and I argue that this indeed shows that no definition can. Moreover, I show that the suggested justification of Occam's razor does not work, and I argue that the relevant notion of simplicity as compressibility is already problematic itself

    Universal Prediction:A Philosophical Investigation

    Get PDF
    corecore