4,811 research outputs found
Unsupervised induction of semantic roles
In recent years, a considerable amount of work has been devoted to the task of automatic
frame-semantic analysis. Given the relative maturity of syntactic parsing technology,
which is an important prerequisite, frame-semantic analysis represents a realistic
next step towards broad-coverage natural language understanding and has been
shown to benefit a range of natural language processing applications such as information
extraction and question answering.
Due to the complexity which arises from variations in syntactic realization, data-driven
models based on supervised learning have become the method of choice for this task.
However, the reliance on large amounts of semantically labeled data which is costly
to produce for every language, genre and domain, presents a major barrier to the
widespread application of the supervised approach.
This thesis therefore develops unsupervised machine learning methods, which automatically
induce frame-semantic representations without making use of semantically
labeled data. If successful, unsupervised methods would render manual data annotation
unnecessary and therefore greatly benefit the applicability of automatic framesemantic
analysis.
We focus on the problem of semantic role induction, in which all the argument instances
occurring together with a specific predicate in a corpus are grouped into clusters
according to their semantic role. Our hypothesis is that semantic roles can be induced
without human supervision from a corpus of syntactically parsed sentences, by
leveraging the syntactic relations conveyed through parse trees with lexical-semantic
information.
We argue that semantic role induction can be guided by three linguistic principles. The
first is the well-known constraint that semantic roles are unique within a particular
frame. The second is that the arguments occurring in a specific syntactic position
within a specific linking all bear the same semantic role. The third principle is that
the (asymptotic) distribution over argument heads is the same for two clusters which
represent the same semantic role. We consider two approaches to semantic role induction based on two fundamentally
different perspectives on the problem. Firstly, we develop feature-based probabilistic
latent structure models which capture the statistical relationships that hold between the
semantic role and other features of an argument instance. Secondly, we conceptualize
role induction as the problem of partitioning a graph whose vertices represent argument
instances and whose edges express similarities between these instances. The graph
thus represents all the argument instances for a particular predicate occurring in the
corpus. The similarities with respect to different features are represented on different
edge layers and accordingly we develop algorithms for partitioning such multi-layer
graphs.
We empirically validate our models and the principles they are based on and show that
our graph partitioning models have several advantages over the feature-based models.
In a series of experiments on both English and German the graph partitioning models
outperform the feature-based models and yield significantly better scores over a strong
baseline which directly identifies semantic roles with syntactic positions.
In sum, we demonstrate that relatively high-quality shallow semantic representations
can be induced without human supervision and foreground a promising direction of
future research aimed at overcoming the problem of acquiring large amounts of lexicalsemantic
knowledge
Unsupervised Induction of Semantic Roles
Datasets annotated with semantic roles are an important prerequisite to developing highperformance role labeling systems. Unfortunately, the reliance on manual annotations, which are both difficult and highly expensive to produce, presents a major obstacle to the widespread application of these systems across different languages and text genres. In this paper we describe a method for inducing the semantic roles of verbal arguments directly from unannotated text. We formulate the role induction problem as one of detecting alternations and finding a canonical syntactic form for them. Both steps are implemented in a novel probabilistic model, a latent-variable variant of the logistic classifier. Our method increases the purity of the induced role clusters by a wide margin over a strong baseline.
Unsupervised frame based Semantic Role Induction: application to French and English
International audienceThis paper introduces a novel unsupervised approach to semantic role induction that uses a generative Bayesian model. To the best of our knowledge, it is the first model that jointly clusters syntactic verbs arguments into semantic roles, and also creates verbs classes according to the syntactic frames accepted by the verbs. The model is evaluated on French and English, outperforming, in both cases, a strong baseline. On English, it achieves results comparable to state-of-the-art unsupervised approaches to semantic role induction
Unsupervised Semantic Role Induction via Split-Merge Clustering
In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.
HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings
We present our system for semantic frame induction that showed the best
performance in Subtask B.1 and finished as the runner-up in Subtask A of the
SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et
al., 2019). Our approach separates this task into two independent steps: verb
clustering using word and their context embeddings and role labeling by
combining these embeddings with syntactical features. A simple combination of
these steps shows very competitive results and can be extended to process other
datasets and languages.Comment: 5 pages, 3 tables, accepted at SemEval 201
Natural language understanding: instructions for (Present and Future) use
In this paper I look at Natural Language Understanding, an area of Natural Language Processing aimed at making sense of text, through the lens of a visionary future: what do we expect a machine should be able to understand? and what are the key dimensions that require the attention of researchers to make this dream come true
- …