13 research outputs found
Measuring Semantic Similarity: Representations and Methods
This dissertation investigates and proposes ways to quantify and measure semantic similarity between texts. The general approach is to rely on linguistic information at various levels, including lexical, lexico-semantic, and syntactic. The approach starts by mapping texts onto structured representations that include lexical, lexico-semantic, and syntactic information. The representation is then used as input to methods designed to measure the semantic similarity between texts based on the available linguistic information.While world knowledge is needed to properly assess semantic similarity of texts, in our approach world knowledge is not used, which is a weakness of it.We limit ourselves to answering the question of how successfully one can measure the semantic similarity of texts using just linguistic information.The lexical information in the original texts is retained by using the words in the corresponding representations of the texts. Syntactic information is encoded using dependency relations trees, which represent explicitly the syntactic relations between words. Word-level semantic information is relatively encoded through the use of semantic similarity measures like WordNet Similarity or explicitly encoded using vectorial representations such as Latent Semantic Analysis (LSA). Several methods are being studied to compare the representations, ranging from simple lexical overlap, to more complex methods such as comparing semantic representations in vector spaces as well as syntactic structures. Furthermore, a few powerful kernel models are proposed to use in combination with Support Vector Machine (SVM) classifiers for the case in which the semantic similarity problem is modeled as a classification task
Paraphrase concept and typology. A linguistically based and computationally oriented approach
In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present a general, inclusive and computationally oriented typology of the linguistic mechanisms that give rise to form variations between paraphrase pairs
To Tell The Truth: Language of Deception and Language Models
Text-based misinformation permeates online discourses, yet evidence of
people's ability to discern truth from such deceptive textual content is
scarce. We analyze a novel TV game show data where conversations in a
high-stake environment between individuals with conflicting objectives result
in lies. We investigate the manifestation of potentially verifiable language
cues of deception in the presence of objective truth, a distinguishing feature
absent in previous text-based deception datasets. We show that there exists a
class of detectors (algorithms) that have similar truth detection performance
compared to human subjects, even when the former accesses only the language
cues while the latter engages in conversations with complete access to all
potential sources of cues (language and audio-visual). Our model, built on a
large language model, employs a bottleneck framework to learn discernible cues
to determine truth, an act of reasoning in which human subjects often perform
poorly, even with incentives. Our model detects novel but accurate language
cues in many cases where humans failed to detect deception, opening up the
possibility of humans collaborating with algorithms and ameliorating their
ability to detect the truth
Supporting students in the analysis of case studies for professional ethics education
Intelligent tutoring systems and computer-supported collaborative environments have been designed to enhance human learning in various domains. While a number of solid techniques have been developed in the Artificial Intelligence in Education (AIED) field to foster human learning in fundamental science domains, there is still a lack of evidence about how to support learning in so-called ill-defined domains that are characterized by the absence of formal domain theories, uncertainty about best solution strategies and teaching practices, and learners' answers represented through text and argumentation.
This dissertation investigates how to support students' learning in the ill-defined domain of professional ethics through a computer-based learning system. More specifically, it examines how to support students in the analysis of case studies, which is a common pedagogical practice in the ethics domain.
This dissertation describes our design considerations and a resulting system called Umka. In Umka learners analyze case studies individually and collaboratively that pose some ethical or professional dilemmas. Umka provides various types of support to learners in the analysis task. In the individual analysis it provides various kinds of feedback to arguments of learners based on predefined system knowledge. In the collaborative analysis Umka fosters learners' interactions and self-reflection through system suggestions and a specifically designed visualization. The system suggestions offer learners the chance to consider certain helpful arguments of their peers, or to interact with certain helpful peers. The visualization highlights similarities and differences between the learners' positions, and illustrates the learners' level of acceptance of each other's positions.
This dissertation reports on a series of experiments in which we evaluated the effectiveness of Umka's support features, and suggests several research contributions.
Through this work, it is shown that despite the ill-definedness of the ethics domain, and the consequent complications of text processing and domain modelling, it is possible to build effective tutoring systems for supporting students' learning in this domain. Moreover, the techniques developed through this research for the ethics domain can be readily expanded to other ill-defined domains, where argument, qualitative analysis, metacognition and interaction over case studies are key pedagogical practices
Analysis, optimization and development of an answer scoring system
The main contribution of this work is to analyze and describe the state of the art performance as regards answer scoring systems from the SemEval-
2013 task, as well as to continue with the development of an answer scoring system (EHU-ALM) developed in the University of the Basque Country. On the overall this master thesis focuses on finding any possible configuration that lets improve the results in the SemEval dataset by using attribute engineering techniques in order to find optimal feature subsets, along with trying different hierarchical configurations in order to analyze its performance against the traditional one versus all approach. Altogether, throughout the work we propose two alternative strategies: on the one hand, to improve the EHU-ALM system without changing the architecture, and, on the other hand, to improve the system adapting it to an hierarchical con- figuration. To build such new models we describe and use distinct attribute engineering, data preprocessing, and machine learning techniques
From Discourse Structure To Text Specificity: Studies Of Coherence Preferences
To successfully communicate through text, a writer needs to organize information into an understandable and well-structured discourse for the targeted audience. This involves deciding when to convey general statements, when to elaborate on details, and gauging how much details to convey, i.e., the level of specificity. This thesis explores the automatic prediction of text specificity, and whether the perception of specificity varies across different audiences.
We characterize text specificity from two aspects: the instantiation discourse relation, and the specificity of sentences and words. We identify characteristics of instantiation that signify a change of specificity between sentences. Features derived from these characteristics substantially improve the detection of the relation. Using instantiation sentences as the basis for training, we propose a semi-supervised system to predict sentence specificity with speed and accuracy. Furthermore, we present insights into the effect of underspecified words and phrases on the comprehension of text, and the prediction of such words.
We show distinct preferences in specificity and discourse structure among different audiences. We investigate these distinctions in both cross-lingual and monolingual context. Cross-lingually, we identify discourse factors that significantly impact the quality of text translated from Chinese to English. Notably, a large portion of Chinese sentences are significantly more specific and need to be translated into multiple English sentences. We introduce a system using rich syntactic features to accurately detect such sentences. We also show that simplified text is more general, and that specific sentences are more likely to need simplification. Finally, we present evidence that the perception of sentence specificity differs among male and female readers
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Large language models (LLMs) have demonstrated remarkable performance across
a wide array of NLP tasks. However, their efficacy is undermined by undesired
and inconsistent behaviors, including hallucination, unfaithful reasoning, and
toxic content. A promising approach to rectify these flaws is self-correction,
where the LLM itself is prompted or guided to fix problems in its own output.
Techniques leveraging automated feedback -- either produced by the LLM itself
or some external system -- are of particular interest as they are a promising
way to make LLM-based solutions more practical and deployable with minimal
human feedback. This paper presents a comprehensive review of this emerging
class of techniques. We analyze and taxonomize a wide array of recent work
utilizing these strategies, including training-time, generation-time, and
post-hoc correction. We also summarize the major applications of this strategy
and conclude by discussing future directions and challenges.Comment: Work in Progress. Version
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers
Logical reasoning, i.e., deductively inferring the truth value of a
conclusion from a set of premises, is an important task for artificial
intelligence with wide potential impacts on science, mathematics, and society.
While many prompting-based strategies have been proposed to enable Large
Language Models (LLMs) to do such reasoning more effectively, they still appear
unsatisfactory, often failing in subtle and unpredictable ways. In this work,
we investigate the validity of instead reformulating such tasks as modular
neurosymbolic programming, which we call LINC: Logical Inference via
Neurosymbolic Computation. In LINC, the LLM acts as a semantic parser,
translating premises and conclusions from natural language to expressions in
first-order logic. These expressions are then offloaded to an external theorem
prover, which symbolically performs deductive inference. Leveraging this
approach, we observe significant performance gains on FOLIO and a balanced
subset of ProofWriter for three different models in nearly all experimental
conditions we evaluate. On ProofWriter, augmenting the comparatively small
open-source StarCoder+ (15.5B parameters) with LINC even outperforms GPT-3.5
and GPT-4 with Chain-of-Thought (CoT) prompting by an absolute 38% and 10%,
respectively. When used with GPT-4, LINC scores 26% higher than CoT on
ProofWriter while performing comparatively on FOLIO. Further analysis reveals
that although both methods on average succeed roughly equally often on this
dataset, they exhibit distinct and complementary failure modes. We thus provide
promising evidence for how logical reasoning over natural language can be
tackled through jointly leveraging LLMs alongside symbolic provers. All
corresponding code is publicly available at https://github.com/benlipkin/lin
Large Language Models as Subpopulation Representative Models: A Review
Of the many commercial and scientific opportunities provided by large
language models (LLMs; including Open AI's ChatGPT, Meta's LLaMA, and
Anthropic's Claude), one of the more intriguing applications has been the
simulation of human behavior and opinion. LLMs have been used to generate human
simulcra to serve as experimental participants, survey respondents, or other
independent agents, with outcomes that often closely parallel the observed
behavior of their genuine human counterparts. Here, we specifically consider
the feasibility of using LLMs to estimate subpopulation representative models
(SRMs). SRMs could provide an alternate or complementary way to measure public
opinion among demographic, geographic, or political segments of the population.
However, the introduction of new technology to the socio-technical
infrastructure does not come without risk. We provide an overview of behavior
elicitation techniques for LLMs, and a survey of existing SRM implementations.
We offer frameworks for the analysis, development, and practical implementation
of LLMs as SRMs, consider potential risks, and suggest directions for future
work