37 research outputs found
From Texts to Prerequisites. Identifying and Annotating Propaedeutic Relations in Educational Textual Resources
openPrerequisite Relations (PRs) are dependency relations established between two distinct concepts expressing which piece(s) of information a student has to learn first in order to understand a certain target concept. Such relations are one of the most fundamental in Education, playing a crucial role not only for what concerns new knowledge acquisition, but also in the novel applications of Artificial Intelligence to distant and e-learning. Indeed, resources annotated with such information could be used to develop automatic systems able to acquire and organize the knowledge embodied in educational resources, possibly fostering educational applications personalized, e.g., on students' needs and prior knowledge.
The present thesis discusses the issues and challenges of identifying PRs in educational textual materials with the purpose of building a shared understanding of the relation among the research community. To this aim, we present a methodology for dealing with prerequisite relations as established in educational textual resources which aims at providing a systematic approach for uncovering PRs in textual materials, both when manually annotating and automatically extracting the PRs. The fundamental principles of our methodology guided the development of a novel framework for PR identification which comprises three components, each tackling a different task: (i) an annotation protocol (PREAP), reporting the set of guidelines and recommendations for building PR-annotated resources; (ii) an annotation tool (PRET), supporting the creation of manually annotated datasets reflecting the principles of PREAP; (iii) an automatic PR learning method based on machine learning (PREL). The main novelty of our methodology and framework lies in the fact that we propose to uncover PRs from textual resources relying solely on the content of the instructional material: differently from other works, rather than creating de-contextualised PRs, we acknowledge the presence of a PR between two concepts only if emerging from the way they are presented in the text. By doing so, we anchor relations to the text while modelling the knowledge structure entailed in the resource.
As an original contribution of this work, we explore whether linguistic complexity of the text influences the task of manual identification of PRs. To this aim, we investigate the interplay between text and content in educational texts through a crowd-sourcing experiment on concept sequencing. Our methodology values the content of educational materials as it incorporates the evidence acquired from such investigation which suggests that PR recognition is highly influenced by the way in which concepts are introduced in the resource and by the complexity of the texts.
The thesis reports a case study dealing with every component of the PR framework which produced a novel manually-labelled PR-annotated dataset.openXXXIII CICLO - DIGITAL HUMANITIES. TECNOLOGIE DIGITALI, ARTI, LINGUE, CULTURE E COMUNICAZIONE - Lingue, culture e tecnologie digitaliAlzetta, Chiar
On Generative Models and Joint Architectures for Document-level Relation Extraction
Biomedical text is being generated at a high rate in scientific literature publications and electronic health records. Within these documents lies a wealth of potentially useful information in biomedicine. Relation extraction (RE), the process of automating the identification of structured relationships between entities within text, represents a highly sought-after goal in biomedical informatics, offering the potential to unlock deeper insights and connections from this vast corpus of data. In this dissertation, we tackle this problem with a variety of approaches.
We review the recent history of the field of document-level RE. Several themes emerge. First, graph neural networks dominate the methods for constructing entity and relation representations. Second, clever uses of attention allow for the these constructions to focus on particularly relevant tokens and object (such as mentions and entities) representations. Third, aggregation of signal across mentions in entity-level RE is a key focus of research. Fourth, the injection of additional signal by adding tokens to the text prior to encoding via language model (LM) or through additional learning tasks boosts performance. Last, we explore an assortment of strategies for the challenging task of end-to-end entity-level RE.
Of particular note are sequence-to-sequence (seq2seq) methods that have become particularly popular in the past few years. With the success of general-domain generative LMs, biomedical NLP researchers have trained a variety of these models on biomedical text under the assumption that they would be superior for biomedical tasks. As training such models is computationally expensive, we investigate whether they outperform generic models. We test this assumption rigorously by comparing performance of all major biomedical generative language models to the performances of their generic counterparts across multiple biomedical RE datasets, in the traditional finetuning setting as well as in the few-shot setting. Surprisingly, we found that biomedical models tended to underperform compared to their generic counterparts. However, we found that small-scale biomedical instruction finetuning improved performance to a similar degree as larger-scale generic instruction finetuning.
Zero-shot natural language processing (NLP) offers savings on the expenses associated with annotating datasets and the specialized knowledge required for applying NLP methods. Large, generative LMs trained to align with human objectives have demonstrated impressive zero-shot capabilities over a broad range of tasks. However, the effectiveness of these models in biomedical RE remains uncertain. To bridge this gap in understanding, we investigate how GPT-4 performs across several RE datasets. We experiment with the recent JSON generation features to generate structured output, which we use alternately by defining an explicit schema describing the relation structure, and inferring the structure from the prompt itself. Our work is the first to study zero-shot biomedical RE across a variety of datasets. Overall, performance was lower than that of fully-finetuned methods. Recall suffered in examples with more than a few relations. Entity mention boundaries were a major source of error, which future work could fruitfully address.
In our previous work with generative LMs, we noted that RE performance decreased with the number of gold relations in an example. This observation aligns with the general pattern that recurrent neural network and transformer-based model performance tends to decrease with sequence length. Generative LMs also do not identify textual mentions or group them into entities, which are valuable information extraction tasks unto themselves. Therefore, in this age of generative methods, we revisit non-seq2seq methodology for biomedical RE. We adopt a sequential framework of named entity recognition (NER), clustering mentions into entities, followed by relation classification (RC). As errors early in the pipeline necessarily cause downstream errors, and NER performance is near its ceiling, we focus on improving clustering. We match state-of-the-art (SOTA) performance in NER, and substantially improve mention clustering performance by incorporating dependency parsing and gating string dissimilarity embeddings.
Overall, we advance the field of biomedical RE in a few ways. In our experiments of finetuned LMs, we show that biomedicine-specific models are unnecessary, freeing researchers to make use of SOTA generic LMs. The relatively high few-shot performance in these experiments also suggests that biomedical RE can be reasonably accessible, as it is not so difficult to construct small datasets. Our investigation into zero-shot RE shows that SOTA LMs can compete with fully finetuned smaller LMs. Together these studies also demonstrate weaknesses of generative RE. Last, we show that non-generative RE methods still outperform generative methods in the fully-finetuned setting
Recommended from our members
Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing
Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs.
To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909
The Prosody of Sluicing: Production Studies on Prosodic Disambiguation
With this thesis, I investigate the prosodic realizations of different sluicing structures, as produced by either trained or untrained native speakers of English. Sluicing is a subtype of ellipsis where the major part of a wh-question has been elided, leaving only a wh-remnant behind (Ross, 1969). From this follows that sluicing can be ambiguous if the wh-remnant has more than one possible antecedent in the preceding un-elided clause. If one of these possible antecedents is located within an island to extraction, the respective sluicing structure is called complex sluicing (Konietzko, Radó, & Winkler, submitted; Ross, 1969; Merchant, 2001). The perception, especially of simple sluicing, has been examined to some extent (Frazier & Clifton, 1998; Carlson, Dickey, Frazier, & Clifton, 2009), finding that listeners prefer a prosodically or syntactically focused NP to be the antecedent of an ambiguous wh-remnant. However, the prosodic production side has not been empirically investigated to date. With this thesis, I thus explore the relationship between prosody and the disambiguation of different sluicing structures in spoken language.
With three production studies, I investigate how various sluicing structures with different antecedent types are produced by speakers who are either trained or untrained with respect to the ambiguity of the target items and prosody as a disambiguation technique. I present the results of a pilot production study that examined globally ambiguous simple sluicing structures with contextual disambiguation and two production studies that examined temporarily ambiguous simple and complex sluicing structures with morphological disambiguation. Four preceding acceptability judgment studies made sure that there were no additional factors interfering with the prosodic realizations of the different sluicing structures. The three production studies found that both trained as well as untrained speakers use prosodic prominence as a disambiguating factor to emphasize which NP serves as the antecedent of a contextually or morphologically disambiguated simple or complex sluicing structure. However, an early, sentence-initial NP is more frequently disambiguated than a late, sentence-final NP, both by trained and untrained speakers. In complex sluicing, only a sentence-initial NP is prosodically disambiguated, only by trained speakers. Moreover, trained speakers generally make more frequent use of prosody as a disambiguation technique and they produce stronger prosodic cues than untrained speakers.
With this thesis, I thus show that prosody, in the form of prosodic prominence, is used by native speakers of English to indicate the meaning of an information-structurally triggered ambiguity. With this finding, I add further support to Romero (1998), Frazier and Clifton (1998) and Carlson et al. (2009), who argue that a constituent with a prosodic focus is preferably taken as the antecedent of the wh-remnant. Moreover, I add support to Remmele, Schopper, Winkler, and Hörnig (forthcoming 2019), who found that even untrained speakers use prosodic phrasing to resolve a structurally ambiguous word sequence. Furthermore, I contradict Wasow (2015) and Piantadosi, Tily, and Gibson (2012), who argue that one form of disambiguation suffices, thus rendering additional prosodic cues redundant. The results of this thesis thus contribute significantly to the research about the prosody of sluicing and the research about prosodic disambiguation in general.Gefördert durch die Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 19864742
The Circle of Meaning: From Translation to Paraphrasing and Back
The preservation of meaning between inputs and outputs is perhaps
the most ambitious and, often, the most elusive goal of systems
that attempt to process natural language. Nowhere is this goal of
more obvious importance than for the tasks of machine translation
and paraphrase generation. Preserving meaning between the input and
the output is paramount for both, the monolingual vs bilingual distinction
notwithstanding. In this thesis, I present a novel, symbiotic relationship
between these two tasks that I term the "circle of meaning''.
Today's statistical machine translation (SMT) systems require high
quality human translations for parameter tuning, in addition to
large bi-texts for learning the translation units. This parameter
tuning usually involves generating translations at different points
in the parameter space and obtaining feedback against human-authored
reference translations as to how good the translations. This feedback
then dictates what point in the parameter space should be explored
next. To measure this feedback, it is generally considered wise to have
multiple (usually 4) reference translations to avoid unfair penalization of translation
hypotheses which could easily happen given the large number of ways in which
a sentence can be translated from one language to another. However, this reliance on multiple reference translations
creates a problem since they are labor intensive and expensive to obtain.
Therefore, most current MT datasets only contain a single reference.
This leads to the problem of reference sparsity---the primary open problem
that I address in this dissertation---one that has a serious effect on the
SMT parameter tuning process.
Bannard and Callison-Burch (2005) were the first to provide a practical
connection between phrase-based statistical machine translation and paraphrase
generation. However, their technique is restricted to generating phrasal
paraphrases. I build upon their approach and augment a phrasal paraphrase
extractor into a sentential paraphraser with extremely broad coverage.
The novelty in this augmentation lies in the further strengthening of
the connection between statistical machine translation and paraphrase
generation; whereas Bannard and Callison-Burch only relied on SMT machinery
to extract phrasal paraphrase rules and stopped there, I take it a few
steps further and build a full English-to-English SMT system. This system
can, as expected, ``translate'' any English input sentence into a new English
sentence with the same degree of meaning preservation that exists in a bilingual
SMT system. In fact, being a state-of-the-art SMT system, it is able to generate
n-best "translations" for any given input sentence. This sentential
paraphraser, built almost entirely from existing SMT machinery, represents
the first 180 degrees of the circle of meaning.
To complete the circle, I describe a novel connection in the other direction.
I claim that the sentential paraphraser, once built in this fashion, can
provide a solution to the reference sparsity problem and, hence, be used
to improve the performance a bilingual SMT system. I discuss two different
instantiations of the sentential paraphraser and show several results that
provide empirical validation for this connection
The Role of Linguistics in Probing Task Design
Over the past decades natural language processing has evolved from a niche research area into a fast-paced and multi-faceted discipline that attracts thousands of contributions from academia and industry and feeds into real-world applications. Despite the recent successes, natural language processing models still struggle to generalize across domains, suffer from biases and lack transparency. Aiming to get a better understanding of how and why modern NLP systems make their predictions for complex end tasks, a line of research in probing attempts to interpret the behavior of NLP models using basic probing tasks. Linguistic corpora are a natural source of such tasks, and linguistic phenomena like part of speech, syntax and role semantics are often used in probing studies.
The goal of probing is to find out what information can be easily extracted from a pre-trained NLP model or representation. To ensure that the information is extracted from the NLP model and not learned during the probing study itself, probing models are kept as simple and transparent as possible, exposing and augmenting conceptual inconsistencies between NLP models and linguistic resources. In this thesis we investigate how linguistic conceptualization can affect probing models, setups and results.
In Chapter 2 we investigate the gap between the targets of classical type-level word embedding models like word2vec, and the items of lexical resources and similarity benchmarks. We show that the lack of conceptual alignment between word embedding vocabularies and lexical resources penalizes the word embedding models in both benchmark-based and our novel resource-based evaluation scenario. We demonstrate that simple preprocessing techniques like lemmatization and POS tagging can partially mitigate the issue, leading to a better match between word embeddings and lexicons.
Linguistics often has more than one way of describing a certain phenomenon. In Chapter 3 we conduct an extensive study of the effects of lingustic formalism on probing modern pre-trained contextualized encoders like BERT. We use role semantics as an excellent example of a data-rich multi-framework phenomenon. We show that the choice of linguistic formalism can affect the results of probing studies, and deliver additional insights on the impact of dataset size, domain, and task architecture on probing.
Apart from mere labeling choices, linguistic theories might differ in the very way of conceptualizing the task. Whereas mainstream NLP has treated semantic roles as a categorical phenomenon, an alternative, prominence-based view opens new opportunities for probing. In Chapter 4 we investigate prominence-based probing models for role semantics, incl. semantic proto-roles and our novel regression-based role probe. Our results indicate that pre-trained language models like BERT might encode argument prominence. Finally, we propose an operationalization of thematic role hierarchy - a widely used linguistic tool to describe syntactic behavior of verbs, and show that thematic role hierarchies can be extracted from text corpora and transfer cross-lingually.
The results of our work demonstrate the importance of linguistic conceptualization for probing studies, and highlight the dangers and the opportunities associated with using linguistics as a meta-langauge for NLP model interpretation
Short Answer Assessment in Context: The Role of Information Structure
Short Answer Assessment (SAA), the computational task of judging the appro-
priateness of an answer to a question, has received much attention in recent
years (cf., e.g., Dzikovska et al. 2013; Burrows et al. 2015). Most researchers
have approached the problem as one similar to paraphrase recognition (cf.,
e.g., Brockett & Dolan 2005) or textual entailment (Dagan et al., 2006), where
the answer to be evaluated is aligned to another available utterance, such as a
target answer, in a sufficiently abstract way to capture form variation. While
this is a reasonable strategy, it fails to take the explicit context of an answer
into account: the question.
In this thesis, we present an attempt to change this situation by investigating
the role of Information Structure (IS, cf., e.g., Krifka 2007) in SAA. The basic
assumption adapted from IS here will be that the content of a linguistic ex-
pression is structured in a non-arbitrary way depending on its context (here:
the question), and thus it is possible to predetermine to some extent which
part of the expression’s content is relevant. In particular, we will adopt the
Question Under Discussion (QUD) approach advanced by Roberts (2012) where
the information structure of an answer is determined by an explicit or implicit
question in the discourse.
We proceed by first introducing the reader to the necessary prerequisites
in chapters 2 and 3. Since this is a computational linguistics thesis which
is inspired by theoretical linguistic research, we will provide an overview of
relevant work in both areas, discussing SAA and Information Structure (IS) in
sufficient detail, as well as existing attempts at annotating Information Structure
in corpora. After providing the reader with enough background to understand
the remainder of the thesis, we launch into a discussion of which IS notions and
dimensions are most relevant to our goal. We compare the given/new distinction
(information status) to the focus/background distinction and conclude that the
latter is better suited to our needs, as it captures requested information, which
can be either given or new in the context.
In chapter 4, we introduce the empirical basis of this work, the Corpus of
Reading Comprehension Exercises in German (CREG, Ott, Ziai & Meurers
2012). We outline how as a task-based corpus, CREG is particularly suited to
the analysis of language in context, and how it thus forms the basis of our
efforts in SAA and focus detection. Complementing this empirical basis, we
present the SAA system CoMiC in chapter 5, which is used to integrate focus
into SAA in chapter 8.
Chapter 6 then delves into the creation of a gold standard for automatic
focus detection. We describe what the desiderata for such a gold standard are
and how a subset of the CREG corpus is chosen for manual focus annotation.
Having determined these prerequisites, we proceed in detail to our novel
annotation scheme for focus, and its intrinsic evaluation in terms of inter-
annotator agreement. We also discuss explorations of using crowd-sourcing for
focus annotation.
After establishing the data basis, we turn to the task of automatic focus
detection in short answers in chapter 7. We first define the computational
task as classifying whether a given word of an answer is focused or not. We
experiment with several groups of features and explain in detail the motivation
for each: syntax and lexis of the question and the the answer, positional
features and givenness features, taking into account both question and answer
properties. Using the adjudicated gold standard we established in chapter 6, we
show that focus can be detected robustly using these features in a word-based
classifier in comparison to several baselines.
In chapter 8, we describe the integration of focus information into SAA,
which is both an extrinsic testbed for focus annotation and detection per se and
the computational task we originally set out to advance. We show that there
are several possible ways of integrating focus information into an alignment-
based SAA system, and discuss each one’s advantages and disadvantages.
We also experiment with using focus vs. using givenness in alignment before
concluding that a combination of both yields superior overall performance.
Finally, chapter 9 presents a summary of our main research findings along
with the contributions of this thesis. We conclude that analyzing focus in
authentic data is not only possible but necessary for a) developing context-
aware SAA approaches and b) grounding and testing linguistic theory. We give
an outlook on where future research needs to go and what particular avenues
could be explored.Short Answer Assessment (SAA), die computerlinguistische Aufgabe mit dem
Ziel, die Angemessenheit einer Antwort auf eine Frage zu bewerten, ist in
den letzten Jahren viel untersucht worden (siehe z.B. Dzikovska et al. 2013;
Burrows et al. 2015). Meist wird das Problem analog zur Paraphrase Recognition
(siehe z.B. Brockett & Dolan 2005) oder zum Textual Entailment (Dagan et al.,
2006) behandelt, indem die zu bewertende Antwort mit einer Referenzantwort
verglichen wird. Dies ist prinzipiell ein sinnvoller Ansatz, der jedoch den
expliziten Kontext einer Antwort außer Acht lässt: die Frage.
In der vorliegenden Arbeit wird ein Ansatz dargestellt, diesen Stand der
Forschung zu ändern, indem die Rolle der Informationsstruktur (IS, siehe z.B.
Krifka 2007) im SAA untersucht wird. Der Ansatz basiert auf der grundlegen-
den Annahme der IS, dass der Inhalt eines sprachlichen Ausdrucks auf einer
bestimmte Art und Weise durch seinen Kontext (hier: die Frage) strukturiert
wird, und dass man daher bis zu einem gewissen Grad vorhersagen kann,
welcher inhaltliche Teil des Ausdrucks relevant ist. Insbesondere wird der
Question Under Discussion (QUD) Ansatz (Roberts, 2012) übernommen, bei
dem die Informationsstruktur einer Antwort durch eine explizite oder implizite
Frage im Diskurs bestimmt wird.
In Kapitel 2 und 3 wird der Leser zunächst in die relevanten wissenschaft-
lichen Bereiche dieser Dissertation eingeführt. Da es sich um eine compu-
terlinguistische Arbeit handelt, die von theoretisch-linguistischer Forschung
inspiriert ist, werden sowohl SAA als auch IS in für die Arbeit ausreichender
Tiefe diskutiert, sowie ein Überblick über aktuelle Ansätze zur Annotation
von IS-Kategorien gegeben. Anschließend wird erörtert, welche Begriffe und
Unterscheidungen der IS für die Ziele dieser Arbeit zentral sind: Ein Vergleich
der given/new-Unterscheidung und der focus/background-Unterscheidung ergibt,
dass letztere das relevantere Kriterium darstellt, da sie erfragte Information
erfasst, welche im Kontext sowohl gegeben als auch neu sein kann.
Kapitel 4 stellt die empirische Basis dieser Arbeit vor, den Corpus of Reading
Comprehension Exercises in German (CREG, Ott, Ziai & Meurers 2012). Es
wird herausgearbeitet, warum ein task-basiertes Korpus wie CREG besonders
geeignet für die linguistische Analyse von Sprache im Kontext ist, und dass es
daher die Basis für die in dieser Arbeit dargestellten Untersuchungen zu SAA
und zur Fokusanalyse darstellt. Kapitel 5 präsentiert das SAA-System CoMiC
(Meurers, Ziai, Ott & Kopp, 2011b), welches für die Integration von Fokus in
SAA in Kapitel 8 verwendet wird.
Kapitel 6 befasst sich mit der Annotation eines Korpus mit dem Ziel der
manuellen und automatischen Fokusanalyse. Es wird diskutiert, auf welchen
Kriterien ein Ansatz zur Annotation von Fokus sinnvoll aufbauen kann, bevor
ein neues Annotationsschema präsentiert und auf einen Teil von CREG ange-
wendet wird. Der Annotationsansatz wird erfolgreich intrinsisch validiert, und
neben Expertenannotation wird außerdem ein Crowdsourcing-Experiment zur
Fokusannotation beschrieben.
Nachdem die Datengrundlage etabliert wurde, wendet sich Kapitel 7 der
automatischen Fokuserkennung in Antworten zu. Nach einem Überblick über
bisherige Arbeiten wird zunächst diskutiert, welche relevanten Eigenschaften
von Fragen und Antworten in einem automatischen Ansatz verwendet werden
können. Darauf folgt die Beschreibung eines wortbasierten Modells zur Foku-
serkennung, welches Merkmale der Syntax und Lexis von Frage und Antwort
einbezieht und mehrere Baselines in der Genauigkeit der Klassifikation klar
übertrifft.
In Kapitel 8 wird die Integration von Fokusinformation in SAA anhand des
CoMiC-Systems dargestellt, welche sowohl als extrinsische Validierung von
manueller und automatischer Fokusanalyse dient, als auch die computerlin-
guistische Aufgabe darstellt, zu der diese Arbeit einen Beitrag leistet. Fokus
wird als Filter für die Zuordnung von Lerner- und Musterantworten in CoMiC
integriert und diese Konfiguration wird benutzt, um den Einfluss von manu-
eller und automatischer Fokusannotation zu untersuchen, was zu positiven
Ergebnissen führt. Es wird außerdem gezeigt, dass eine Kombination von Fokus
und Givenness bei verlässlicher Fokusinformation für bessere Ergebnisse sorgt
als jede Kategorie in Isolation erreichen kann.
Schließlich gibt Kapitel 9 nochmals einen Überblick über den Inhalt der
Arbeit und stellt die Hauptbeiträge heraus. Die Schlussfolgerung ist, dass
Fokusanalyse in authentischen Daten sowohl möglich als auch notwendig ist,
um a) den Kontext in SAA einzubeziehen und b) linguistische Theorien zu IS
zu validieren und zu testen. Basierend auf den Ergebnissen werden mehrere
mögliche Richtungen für zukünftige Forschung aufgezeigt