"Who does what to whom?" The goal of a graph-based meaning representation (in short: MR) is to represent the meaning of a text in a structured format. With an MR, we can explicate the meaning of a text, describe occurring events and entities, and their semantic relations. Thus, a metric of MRs would measure a distance (or similarity) between MRs. We believe that such a meaning-focused similarity measurement can be useful for several important AI tasks, for instance, testing the capability of systems to produce meaningful output (system evaluation), or when searching for similar texts (information retrieval). Moreover, due to the natural explicitness of MRs, we hypothesize that MR metrics could provide us with valuable explainability of their similarity measurement. Indeed, if texts reside in a space where their meaning has been isolated and structured, we might directly see in which aspects two texts are actually similar (or dissimilar).
However, we find that there is not much previous work on MR metrics, and thus we lack fundamental knowledge about them and their potential applications. Therefore, we make first steps to explore MR metrics and MR spaces, focusing on two key goals: 1. Develop novel and generally applicable methods for conducting similarity measurements in the space of MRs; 2. Explore potential applications that can profit from similarity assessments in MR spaces, including, but (by far) not limited to, their "classic" purpose of evaluating the quality of a text-to-MR system against a reference (aka parsing evaluation).
We start by analyzing contributions from previous works that have proposed MR metrics for parsing evaluation. Then, we move beyond this restricted setup and start to develop novel and more general MR metrics based on i) insights from our analysis of the previous parsing evaluation metrics and ii) our motivation to extend MR metrics to similarity assessment of natural language texts. To empirically evaluate and assess our generalized MR metrics, and to open the door for future improvements, we propose the first benchmark of MR metrics. With our benchmark, we can study MR metrics through the lens of multiple metric-objectives such as sentence similarity and robustness.
Then, we investigate novel applications of MR metrics. First, we explore new ways of applying MR metrics to evaluate systems that produce i) text from MRs (MR-to-text evaluation) and ii) MRs from text (MR parsing). We call our new setting MR projection-based, since we presume that one MR (at least) is unobserved and needs to be approximated. An advantage of such projection-based MR metric methods is that we can ablate a costly human reference. Notably, when visiting the MR-to-text scenario, we touch on a much broader application scenario for MR metrics: explainable MR-grounded evaluation of text generation systems.
Moving steadily towards the application of MR metrics to general text similarity, we study MR metrics for measuring the meaning similarity of natural language arguments, which is an important task in argument mining, a new and surging area of natural language processing (NLP). In particular, we show that MRs and MR metrics can support an explainable and unsupervised argument similarity analysis and inform us about the quality of argumentative conclusions.
Ultimately, we seek even more generality and are also interested in practical aspects such as efficiency. To this aim, we distill our insights from our hitherto explorations into MR metric spaces into an explainable state-of-the-art machine learning model for semantic search, a task for which we would like to achieve high accuracy and great efficiency. To this aim, we develop a controllable metric distillation approach that can explain how the similarity decisions in the neural text embedding space are modulated through interpretable features, while maintaining all efficiency and accuracy (sometimes improving it) of a high-performance neural semantic search method. This is an important contribution, since it shows i) that we can alleviate the efficiency bottleneck of computationally costly MR graph metrics and, vice versa, ii) that MR metrics can help mitigate a crucial limitation of large "black box" neural methods by eliciting explanations for decisions