2,866 research outputs found
Bootstrapping Lexical Choice via Multiple-Sequence Alignment
An important component of any generation system is the mapping dictionary, a
lexicon of elementary semantic expressions and corresponding natural language
realizations. Typically, labor-intensive knowledge-based methods are used to
construct the dictionary. We instead propose to acquire it automatically via a
novel multiple-pass algorithm employing multiple-sequence alignment, a
technique commonly used in bioinformatics. Crucially, our method leverages
latent information contained in multi-parallel corpora -- datasets that supply
several verbalizations of the corresponding semantics rather than just one.
We used our techniques to generate natural language versions of
computer-generated mathematical proofs, with good results on both a
per-component and overall-output basis. For example, in evaluations involving a
dozen human judges, our system produced output whose readability and
faithfulness to the semantic input rivaled that of a traditional generation
system.Comment: 8 pages; to appear in the proceedings of EMNLP-200
Attention, effort, and fatigue: Neuropsychological perspectives
Models of attention, effort, and fatigue are reviewed. Methods are discussed for measuring these phenomena from a neuropsychological and psychophysiological perspective. The following methodologies are included: (1) the autonomic measurement of cognitive effort and quality of encoding; (2) serial assessment approaches to neurophysiological assessment; and (3) the assessment of subjective reports of fatigue using multidimensional ratings and their relationship to neurobehavioral measures
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior
Shannon, in his seminal paper introducing information theory, divided the
communication into three levels: technical, semantic, and effectivenss. While
the technical level is concerned with accurate reconstruction of transmitted
symbols, the semantic and effectiveness levels deal with the inferred meaning
and its effect on the receiver. Thanks to telecommunications, the first level
problem has produced great advances like the internet. Large Language Models
(LLMs) make some progress towards the second goal, but the third level still
remains largely untouched. The third problem deals with predicting and
optimizing communication for desired receiver behavior. LLMs, while showing
wide generalization capabilities across a wide range of tasks, are unable to
solve for this. One reason for the underperformance could be a lack of
"behavior tokens" in LLMs' training corpora. Behavior tokens define receiver
behavior over a communication, such as shares, likes, clicks, purchases,
retweets, etc. While preprocessing data for LLM training, behavior tokens are
often removed from the corpora as noise. Therefore, in this paper, we make some
initial progress towards reintroducing behavior tokens in LLM training. The
trained models, other than showing similar performance to LLMs on content
understanding tasks, show generalization capabilities on behavior simulation,
content simulation, behavior understanding, and behavior domain adaptation.
Using a wide range of tasks on two corpora, we show results on all these
capabilities. We call these models Large Content and Behavior Models (LCBMs).
Further, to spur more research on LCBMs, we release our new Content Behavior
Corpus (CBC), a repository containing communicator, message, and corresponding
receiver behavior
Sentence-Level Content Planning and Style Specification for Neural Text Generation
Building effective text generation systems requires three critical
components: content selection, text planning, and surface realization, and
traditionally they are tackled as separate problems. Recent all-in-one style
neural generation models have made impressive progress, yet they often produce
outputs that are incoherent and unfaithful to the input. To address these
issues, we present an end-to-end trained two-step generation model, where a
sentence-level content planner first decides on the keyphrases to cover as well
as a desired language style, followed by a surface realization decoder that
generates relevant and coherent text. For experiments, we consider three tasks
from domains with diverse topics and varying language styles: persuasive
argument construction from Reddit, paragraph generation for normal and simple
versions of Wikipedia, and abstract generation for scientific articles.
Automatic evaluation shows that our system can significantly outperform
competitive comparisons. Human judges further rate our system generated text as
more fluent and correct, compared to the generations by its variants that do
not consider language style.Comment: Accepted as a long paper to EMNLP 201
ReaderBench, an Environment for Analyzing Text Complexity and Reading Strategies
Session: Educational Data MiningInternational audienceReaderBench is a multi-purpose, multi-lingual and flexible environment that enables the assessment of a wide range of learners' productions and their manipulation by the teacher. ReaderBench allows the assessment of three main textual features: cohesion-based assessment, reading strategies identification and textual complexity evaluation, which have been subject to empirical validations. ReaderBench covers a complete cycle, from the initial complexity assessment of reading materials, the assignment of texts to learners, the capture of metacognitions reflected in one's textual verbalizations and comprehension evaluation, therefore fostering learner's self-regulation process
- …