37 research outputs found

    From Texts to Prerequisites. Identifying and Annotating Propaedeutic Relations in Educational Textual Resources

    Get PDF
    openPrerequisite Relations (PRs) are dependency relations established between two distinct concepts expressing which piece(s) of information a student has to learn first in order to understand a certain target concept. Such relations are one of the most fundamental in Education, playing a crucial role not only for what concerns new knowledge acquisition, but also in the novel applications of Artificial Intelligence to distant and e-learning. Indeed, resources annotated with such information could be used to develop automatic systems able to acquire and organize the knowledge embodied in educational resources, possibly fostering educational applications personalized, e.g., on students' needs and prior knowledge. The present thesis discusses the issues and challenges of identifying PRs in educational textual materials with the purpose of building a shared understanding of the relation among the research community. To this aim, we present a methodology for dealing with prerequisite relations as established in educational textual resources which aims at providing a systematic approach for uncovering PRs in textual materials, both when manually annotating and automatically extracting the PRs. The fundamental principles of our methodology guided the development of a novel framework for PR identification which comprises three components, each tackling a different task: (i) an annotation protocol (PREAP), reporting the set of guidelines and recommendations for building PR-annotated resources; (ii) an annotation tool (PRET), supporting the creation of manually annotated datasets reflecting the principles of PREAP; (iii) an automatic PR learning method based on machine learning (PREL). The main novelty of our methodology and framework lies in the fact that we propose to uncover PRs from textual resources relying solely on the content of the instructional material: differently from other works, rather than creating de-contextualised PRs, we acknowledge the presence of a PR between two concepts only if emerging from the way they are presented in the text. By doing so, we anchor relations to the text while modelling the knowledge structure entailed in the resource. As an original contribution of this work, we explore whether linguistic complexity of the text influences the task of manual identification of PRs. To this aim, we investigate the interplay between text and content in educational texts through a crowd-sourcing experiment on concept sequencing. Our methodology values the content of educational materials as it incorporates the evidence acquired from such investigation which suggests that PR recognition is highly influenced by the way in which concepts are introduced in the resource and by the complexity of the texts. The thesis reports a case study dealing with every component of the PR framework which produced a novel manually-labelled PR-annotated dataset.openXXXIII CICLO - DIGITAL HUMANITIES. TECNOLOGIE DIGITALI, ARTI, LINGUE, CULTURE E COMUNICAZIONE - Lingue, culture e tecnologie digitaliAlzetta, Chiar

    On Generative Models and Joint Architectures for Document-level Relation Extraction

    Get PDF
    Biomedical text is being generated at a high rate in scientific literature publications and electronic health records. Within these documents lies a wealth of potentially useful information in biomedicine. Relation extraction (RE), the process of automating the identification of structured relationships between entities within text, represents a highly sought-after goal in biomedical informatics, offering the potential to unlock deeper insights and connections from this vast corpus of data. In this dissertation, we tackle this problem with a variety of approaches. We review the recent history of the field of document-level RE. Several themes emerge. First, graph neural networks dominate the methods for constructing entity and relation representations. Second, clever uses of attention allow for the these constructions to focus on particularly relevant tokens and object (such as mentions and entities) representations. Third, aggregation of signal across mentions in entity-level RE is a key focus of research. Fourth, the injection of additional signal by adding tokens to the text prior to encoding via language model (LM) or through additional learning tasks boosts performance. Last, we explore an assortment of strategies for the challenging task of end-to-end entity-level RE. Of particular note are sequence-to-sequence (seq2seq) methods that have become particularly popular in the past few years. With the success of general-domain generative LMs, biomedical NLP researchers have trained a variety of these models on biomedical text under the assumption that they would be superior for biomedical tasks. As training such models is computationally expensive, we investigate whether they outperform generic models. We test this assumption rigorously by comparing performance of all major biomedical generative language models to the performances of their generic counterparts across multiple biomedical RE datasets, in the traditional finetuning setting as well as in the few-shot setting. Surprisingly, we found that biomedical models tended to underperform compared to their generic counterparts. However, we found that small-scale biomedical instruction finetuning improved performance to a similar degree as larger-scale generic instruction finetuning. Zero-shot natural language processing (NLP) offers savings on the expenses associated with annotating datasets and the specialized knowledge required for applying NLP methods. Large, generative LMs trained to align with human objectives have demonstrated impressive zero-shot capabilities over a broad range of tasks. However, the effectiveness of these models in biomedical RE remains uncertain. To bridge this gap in understanding, we investigate how GPT-4 performs across several RE datasets. We experiment with the recent JSON generation features to generate structured output, which we use alternately by defining an explicit schema describing the relation structure, and inferring the structure from the prompt itself. Our work is the first to study zero-shot biomedical RE across a variety of datasets. Overall, performance was lower than that of fully-finetuned methods. Recall suffered in examples with more than a few relations. Entity mention boundaries were a major source of error, which future work could fruitfully address. In our previous work with generative LMs, we noted that RE performance decreased with the number of gold relations in an example. This observation aligns with the general pattern that recurrent neural network and transformer-based model performance tends to decrease with sequence length. Generative LMs also do not identify textual mentions or group them into entities, which are valuable information extraction tasks unto themselves. Therefore, in this age of generative methods, we revisit non-seq2seq methodology for biomedical RE. We adopt a sequential framework of named entity recognition (NER), clustering mentions into entities, followed by relation classification (RC). As errors early in the pipeline necessarily cause downstream errors, and NER performance is near its ceiling, we focus on improving clustering. We match state-of-the-art (SOTA) performance in NER, and substantially improve mention clustering performance by incorporating dependency parsing and gating string dissimilarity embeddings. Overall, we advance the field of biomedical RE in a few ways. In our experiments of finetuned LMs, we show that biomedicine-specific models are unnecessary, freeing researchers to make use of SOTA generic LMs. The relatively high few-shot performance in these experiments also suggests that biomedical RE can be reasonably accessible, as it is not so difficult to construct small datasets. Our investigation into zero-shot RE shows that SOTA LMs can compete with fully finetuned smaller LMs. Together these studies also demonstrate weaknesses of generative RE. Last, we show that non-generative RE methods still outperform generative methods in the fully-finetuned setting

    The Prosody of Sluicing: Production Studies on Prosodic Disambiguation

    Get PDF
    With this thesis, I investigate the prosodic realizations of different sluicing structures, as produced by either trained or untrained native speakers of English. Sluicing is a subtype of ellipsis where the major part of a wh-question has been elided, leaving only a wh-remnant behind (Ross, 1969). From this follows that sluicing can be ambiguous if the wh-remnant has more than one possible antecedent in the preceding un-elided clause. If one of these possible antecedents is located within an island to extraction, the respective sluicing structure is called complex sluicing (Konietzko, Radó, & Winkler, submitted; Ross, 1969; Merchant, 2001). The perception, especially of simple sluicing, has been examined to some extent (Frazier & Clifton, 1998; Carlson, Dickey, Frazier, & Clifton, 2009), finding that listeners prefer a prosodically or syntactically focused NP to be the antecedent of an ambiguous wh-remnant. However, the prosodic production side has not been empirically investigated to date. With this thesis, I thus explore the relationship between prosody and the disambiguation of different sluicing structures in spoken language. With three production studies, I investigate how various sluicing structures with different antecedent types are produced by speakers who are either trained or untrained with respect to the ambiguity of the target items and prosody as a disambiguation technique. I present the results of a pilot production study that examined globally ambiguous simple sluicing structures with contextual disambiguation and two production studies that examined temporarily ambiguous simple and complex sluicing structures with morphological disambiguation. Four preceding acceptability judgment studies made sure that there were no additional factors interfering with the prosodic realizations of the different sluicing structures. The three production studies found that both trained as well as untrained speakers use prosodic prominence as a disambiguating factor to emphasize which NP serves as the antecedent of a contextually or morphologically disambiguated simple or complex sluicing structure. However, an early, sentence-initial NP is more frequently disambiguated than a late, sentence-final NP, both by trained and untrained speakers. In complex sluicing, only a sentence-initial NP is prosodically disambiguated, only by trained speakers. Moreover, trained speakers generally make more frequent use of prosody as a disambiguation technique and they produce stronger prosodic cues than untrained speakers. With this thesis, I thus show that prosody, in the form of prosodic prominence, is used by native speakers of English to indicate the meaning of an information-structurally triggered ambiguity. With this finding, I add further support to Romero (1998), Frazier and Clifton (1998) and Carlson et al. (2009), who argue that a constituent with a prosodic focus is preferably taken as the antecedent of the wh-remnant. Moreover, I add support to Remmele, Schopper, Winkler, and Hörnig (forthcoming 2019), who found that even untrained speakers use prosodic phrasing to resolve a structurally ambiguous word sequence. Furthermore, I contradict Wasow (2015) and Piantadosi, Tily, and Gibson (2012), who argue that one form of disambiguation suffices, thus rendering additional prosodic cues redundant. The results of this thesis thus contribute significantly to the research about the prosody of sluicing and the research about prosodic disambiguation in general.Gefördert durch die Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 19864742

    Interpreting Time in Text Summarizing Text with Time

    Get PDF

    The Circle of Meaning: From Translation to Paraphrasing and Back

    Get PDF
    The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this goal of more obvious importance than for the tasks of machine translation and paraphrase generation. Preserving meaning between the input and the output is paramount for both, the monolingual vs bilingual distinction notwithstanding. In this thesis, I present a novel, symbiotic relationship between these two tasks that I term the "circle of meaning''. Today's statistical machine translation (SMT) systems require high quality human translations for parameter tuning, in addition to large bi-texts for learning the translation units. This parameter tuning usually involves generating translations at different points in the parameter space and obtaining feedback against human-authored reference translations as to how good the translations. This feedback then dictates what point in the parameter space should be explored next. To measure this feedback, it is generally considered wise to have multiple (usually 4) reference translations to avoid unfair penalization of translation hypotheses which could easily happen given the large number of ways in which a sentence can be translated from one language to another. However, this reliance on multiple reference translations creates a problem since they are labor intensive and expensive to obtain. Therefore, most current MT datasets only contain a single reference. This leads to the problem of reference sparsity---the primary open problem that I address in this dissertation---one that has a serious effect on the SMT parameter tuning process. Bannard and Callison-Burch (2005) were the first to provide a practical connection between phrase-based statistical machine translation and paraphrase generation. However, their technique is restricted to generating phrasal paraphrases. I build upon their approach and augment a phrasal paraphrase extractor into a sentential paraphraser with extremely broad coverage. The novelty in this augmentation lies in the further strengthening of the connection between statistical machine translation and paraphrase generation; whereas Bannard and Callison-Burch only relied on SMT machinery to extract phrasal paraphrase rules and stopped there, I take it a few steps further and build a full English-to-English SMT system. This system can, as expected, ``translate'' any English input sentence into a new English sentence with the same degree of meaning preservation that exists in a bilingual SMT system. In fact, being a state-of-the-art SMT system, it is able to generate n-best "translations" for any given input sentence. This sentential paraphraser, built almost entirely from existing SMT machinery, represents the first 180 degrees of the circle of meaning. To complete the circle, I describe a novel connection in the other direction. I claim that the sentential paraphraser, once built in this fashion, can provide a solution to the reference sparsity problem and, hence, be used to improve the performance a bilingual SMT system. I discuss two different instantiations of the sentential paraphraser and show several results that provide empirical validation for this connection

    The Role of Linguistics in Probing Task Design

    Get PDF
    Over the past decades natural language processing has evolved from a niche research area into a fast-paced and multi-faceted discipline that attracts thousands of contributions from academia and industry and feeds into real-world applications. Despite the recent successes, natural language processing models still struggle to generalize across domains, suffer from biases and lack transparency. Aiming to get a better understanding of how and why modern NLP systems make their predictions for complex end tasks, a line of research in probing attempts to interpret the behavior of NLP models using basic probing tasks. Linguistic corpora are a natural source of such tasks, and linguistic phenomena like part of speech, syntax and role semantics are often used in probing studies. The goal of probing is to find out what information can be easily extracted from a pre-trained NLP model or representation. To ensure that the information is extracted from the NLP model and not learned during the probing study itself, probing models are kept as simple and transparent as possible, exposing and augmenting conceptual inconsistencies between NLP models and linguistic resources. In this thesis we investigate how linguistic conceptualization can affect probing models, setups and results. In Chapter 2 we investigate the gap between the targets of classical type-level word embedding models like word2vec, and the items of lexical resources and similarity benchmarks. We show that the lack of conceptual alignment between word embedding vocabularies and lexical resources penalizes the word embedding models in both benchmark-based and our novel resource-based evaluation scenario. We demonstrate that simple preprocessing techniques like lemmatization and POS tagging can partially mitigate the issue, leading to a better match between word embeddings and lexicons. Linguistics often has more than one way of describing a certain phenomenon. In Chapter 3 we conduct an extensive study of the effects of lingustic formalism on probing modern pre-trained contextualized encoders like BERT. We use role semantics as an excellent example of a data-rich multi-framework phenomenon. We show that the choice of linguistic formalism can affect the results of probing studies, and deliver additional insights on the impact of dataset size, domain, and task architecture on probing. Apart from mere labeling choices, linguistic theories might differ in the very way of conceptualizing the task. Whereas mainstream NLP has treated semantic roles as a categorical phenomenon, an alternative, prominence-based view opens new opportunities for probing. In Chapter 4 we investigate prominence-based probing models for role semantics, incl. semantic proto-roles and our novel regression-based role probe. Our results indicate that pre-trained language models like BERT might encode argument prominence. Finally, we propose an operationalization of thematic role hierarchy - a widely used linguistic tool to describe syntactic behavior of verbs, and show that thematic role hierarchies can be extracted from text corpora and transfer cross-lingually. The results of our work demonstrate the importance of linguistic conceptualization for probing studies, and highlight the dangers and the opportunities associated with using linguistics as a meta-langauge for NLP model interpretation

    Short Answer Assessment in Context: The Role of Information Structure

    Get PDF
    Short Answer Assessment (SAA), the computational task of judging the appro- priateness of an answer to a question, has received much attention in recent years (cf., e.g., Dzikovska et al. 2013; Burrows et al. 2015). Most researchers have approached the problem as one similar to paraphrase recognition (cf., e.g., Brockett & Dolan 2005) or textual entailment (Dagan et al., 2006), where the answer to be evaluated is aligned to another available utterance, such as a target answer, in a sufficiently abstract way to capture form variation. While this is a reasonable strategy, it fails to take the explicit context of an answer into account: the question. In this thesis, we present an attempt to change this situation by investigating the role of Information Structure (IS, cf., e.g., Krifka 2007) in SAA. The basic assumption adapted from IS here will be that the content of a linguistic ex- pression is structured in a non-arbitrary way depending on its context (here: the question), and thus it is possible to predetermine to some extent which part of the expression’s content is relevant. In particular, we will adopt the Question Under Discussion (QUD) approach advanced by Roberts (2012) where the information structure of an answer is determined by an explicit or implicit question in the discourse. We proceed by first introducing the reader to the necessary prerequisites in chapters 2 and 3. Since this is a computational linguistics thesis which is inspired by theoretical linguistic research, we will provide an overview of relevant work in both areas, discussing SAA and Information Structure (IS) in sufficient detail, as well as existing attempts at annotating Information Structure in corpora. After providing the reader with enough background to understand the remainder of the thesis, we launch into a discussion of which IS notions and dimensions are most relevant to our goal. We compare the given/new distinction (information status) to the focus/background distinction and conclude that the latter is better suited to our needs, as it captures requested information, which can be either given or new in the context. In chapter 4, we introduce the empirical basis of this work, the Corpus of Reading Comprehension Exercises in German (CREG, Ott, Ziai & Meurers 2012). We outline how as a task-based corpus, CREG is particularly suited to the analysis of language in context, and how it thus forms the basis of our efforts in SAA and focus detection. Complementing this empirical basis, we present the SAA system CoMiC in chapter 5, which is used to integrate focus into SAA in chapter 8. Chapter 6 then delves into the creation of a gold standard for automatic focus detection. We describe what the desiderata for such a gold standard are and how a subset of the CREG corpus is chosen for manual focus annotation. Having determined these prerequisites, we proceed in detail to our novel annotation scheme for focus, and its intrinsic evaluation in terms of inter- annotator agreement. We also discuss explorations of using crowd-sourcing for focus annotation. After establishing the data basis, we turn to the task of automatic focus detection in short answers in chapter 7. We first define the computational task as classifying whether a given word of an answer is focused or not. We experiment with several groups of features and explain in detail the motivation for each: syntax and lexis of the question and the the answer, positional features and givenness features, taking into account both question and answer properties. Using the adjudicated gold standard we established in chapter 6, we show that focus can be detected robustly using these features in a word-based classifier in comparison to several baselines. In chapter 8, we describe the integration of focus information into SAA, which is both an extrinsic testbed for focus annotation and detection per se and the computational task we originally set out to advance. We show that there are several possible ways of integrating focus information into an alignment- based SAA system, and discuss each one’s advantages and disadvantages. We also experiment with using focus vs. using givenness in alignment before concluding that a combination of both yields superior overall performance. Finally, chapter 9 presents a summary of our main research findings along with the contributions of this thesis. We conclude that analyzing focus in authentic data is not only possible but necessary for a) developing context- aware SAA approaches and b) grounding and testing linguistic theory. We give an outlook on where future research needs to go and what particular avenues could be explored.Short Answer Assessment (SAA), die computerlinguistische Aufgabe mit dem Ziel, die Angemessenheit einer Antwort auf eine Frage zu bewerten, ist in den letzten Jahren viel untersucht worden (siehe z.B. Dzikovska et al. 2013; Burrows et al. 2015). Meist wird das Problem analog zur Paraphrase Recognition (siehe z.B. Brockett & Dolan 2005) oder zum Textual Entailment (Dagan et al., 2006) behandelt, indem die zu bewertende Antwort mit einer Referenzantwort verglichen wird. Dies ist prinzipiell ein sinnvoller Ansatz, der jedoch den expliziten Kontext einer Antwort außer Acht lässt: die Frage. In der vorliegenden Arbeit wird ein Ansatz dargestellt, diesen Stand der Forschung zu ändern, indem die Rolle der Informationsstruktur (IS, siehe z.B. Krifka 2007) im SAA untersucht wird. Der Ansatz basiert auf der grundlegen- den Annahme der IS, dass der Inhalt eines sprachlichen Ausdrucks auf einer bestimmte Art und Weise durch seinen Kontext (hier: die Frage) strukturiert wird, und dass man daher bis zu einem gewissen Grad vorhersagen kann, welcher inhaltliche Teil des Ausdrucks relevant ist. Insbesondere wird der Question Under Discussion (QUD) Ansatz (Roberts, 2012) übernommen, bei dem die Informationsstruktur einer Antwort durch eine explizite oder implizite Frage im Diskurs bestimmt wird. In Kapitel 2 und 3 wird der Leser zunächst in die relevanten wissenschaft- lichen Bereiche dieser Dissertation eingeführt. Da es sich um eine compu- terlinguistische Arbeit handelt, die von theoretisch-linguistischer Forschung inspiriert ist, werden sowohl SAA als auch IS in für die Arbeit ausreichender Tiefe diskutiert, sowie ein Überblick über aktuelle Ansätze zur Annotation von IS-Kategorien gegeben. Anschließend wird erörtert, welche Begriffe und Unterscheidungen der IS für die Ziele dieser Arbeit zentral sind: Ein Vergleich der given/new-Unterscheidung und der focus/background-Unterscheidung ergibt, dass letztere das relevantere Kriterium darstellt, da sie erfragte Information erfasst, welche im Kontext sowohl gegeben als auch neu sein kann. Kapitel 4 stellt die empirische Basis dieser Arbeit vor, den Corpus of Reading Comprehension Exercises in German (CREG, Ott, Ziai & Meurers 2012). Es wird herausgearbeitet, warum ein task-basiertes Korpus wie CREG besonders geeignet für die linguistische Analyse von Sprache im Kontext ist, und dass es daher die Basis für die in dieser Arbeit dargestellten Untersuchungen zu SAA und zur Fokusanalyse darstellt. Kapitel 5 präsentiert das SAA-System CoMiC (Meurers, Ziai, Ott & Kopp, 2011b), welches für die Integration von Fokus in SAA in Kapitel 8 verwendet wird. Kapitel 6 befasst sich mit der Annotation eines Korpus mit dem Ziel der manuellen und automatischen Fokusanalyse. Es wird diskutiert, auf welchen Kriterien ein Ansatz zur Annotation von Fokus sinnvoll aufbauen kann, bevor ein neues Annotationsschema präsentiert und auf einen Teil von CREG ange- wendet wird. Der Annotationsansatz wird erfolgreich intrinsisch validiert, und neben Expertenannotation wird außerdem ein Crowdsourcing-Experiment zur Fokusannotation beschrieben. Nachdem die Datengrundlage etabliert wurde, wendet sich Kapitel 7 der automatischen Fokuserkennung in Antworten zu. Nach einem Überblick über bisherige Arbeiten wird zunächst diskutiert, welche relevanten Eigenschaften von Fragen und Antworten in einem automatischen Ansatz verwendet werden können. Darauf folgt die Beschreibung eines wortbasierten Modells zur Foku- serkennung, welches Merkmale der Syntax und Lexis von Frage und Antwort einbezieht und mehrere Baselines in der Genauigkeit der Klassifikation klar übertrifft. In Kapitel 8 wird die Integration von Fokusinformation in SAA anhand des CoMiC-Systems dargestellt, welche sowohl als extrinsische Validierung von manueller und automatischer Fokusanalyse dient, als auch die computerlin- guistische Aufgabe darstellt, zu der diese Arbeit einen Beitrag leistet. Fokus wird als Filter für die Zuordnung von Lerner- und Musterantworten in CoMiC integriert und diese Konfiguration wird benutzt, um den Einfluss von manu- eller und automatischer Fokusannotation zu untersuchen, was zu positiven Ergebnissen führt. Es wird außerdem gezeigt, dass eine Kombination von Fokus und Givenness bei verlässlicher Fokusinformation für bessere Ergebnisse sorgt als jede Kategorie in Isolation erreichen kann. Schließlich gibt Kapitel 9 nochmals einen Überblick über den Inhalt der Arbeit und stellt die Hauptbeiträge heraus. Die Schlussfolgerung ist, dass Fokusanalyse in authentischen Daten sowohl möglich als auch notwendig ist, um a) den Kontext in SAA einzubeziehen und b) linguistische Theorien zu IS zu validieren und zu testen. Basierend auf den Ergebnissen werden mehrere mögliche Richtungen für zukünftige Forschung aufgezeigt