6 research outputs found
VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling
We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments.
We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org
A Matter of Framing: The Impact of Linguistic Formalism on Probing Results
Deep pre-trained contextualized encoders like BERT (Delvin et al., 2019)
demonstrate remarkable performance on a range of downstream tasks. A recent
line of research in probing investigates the linguistic knowledge implicitly
learned by these models during pre-training. While most work in probing
operates on the task level, linguistic tasks are rarely uniform and can be
represented in a variety of formalisms. Any linguistics-based probing study
thereby inevitably commits to the formalism used to annotate the underlying
data. Can the choice of formalism affect probing results? To investigate, we
conduct an in-depth cross-formalism layer probing study in role semantics. We
find linguistically meaningful differences in the encoding of semantic role-
and proto-role information by BERT depending on the formalism and demonstrate
that layer probing can detect subtle differences between the implementations of
the same linguistic formalism. Our results suggest that linguistic formalism is
an important dimension in probing studies, along with the commonly used
cross-task and cross-lingual experimental settings
Mapping WordNet Instances to Wikipedia
Lexical resource differ from encyclopaedic resources and represent two distinct types of resource covering general language and named entities respectively. However, many lexical resources, including Princeton WordNet, contain many proper nouns, referring to named entities in the world yet it is not possible or desirable for a lexical resource to cover all named entities that may reasonably occur in a text. In this paper, we propose that instead of including synsets for instance concepts PWN should instead provide links to Wikipedia articles describing the concept. In order to enable this we have created a gold-quality mapping between all of the 7,742 instances in PWN and Wikipedia (where such a mapping is possible). As such, this resource aims to provide a gold standard for link discovery, while also allowing PWN to distinguish itself from other resources such as DBpedia or BabelNet. Moreover, this linking connects PWN to the Linguistic Linked Open Data cloud, thus creating a richer, more usable resource for natural language processing
Recommended from our members
Probabilistic Modeling of Verbnet Clusters
The objective of this research is to build automated models that emulate VerbNet, a semantic resource for English verbs. VerbNet has been built and expanded by linguists, forming a hierarchical clustering of verbs with common semantic and syntactic expressions, and is useful in semantic tasks. A major drawback is the difficulty of extending a manually-curated resource, which leads to gaps in coverage. After over a decade of development, VerbNet has missing verbs, missing senses of common verbs, and is missing appropriate classes to contain at least some of them. Although there have been efforts to build VerbNet resources in other languages, none have received as much attention, so these coverage issues are often more glaring in resource-poor languages. Probabilistic models can emulate VerbNet by learning distributions from large corpora, addressing coverage by providing both a complete clustering of the observed data, and a model to assign unseen sentences to clusters. The output of these models can aid the creation and expansion of VerbNet in English and other languages, especially if they align strongly with known VerbNet classes.This work develops several improvements to the state-of-the-art system for verb sense induction and VerbNet-like clustering. The baseline is two-step process for automatically inducing verb senses and producing a polysemy-aware clustering, that matched VerbNet more closely than any previous methods. First, we will see that a single-step process can produce better automatic senses and clusters. Second, we explore an alternative probabilistic model, which is successful on the verb clustering task. This model does not perform well on sense induction, so we analyze the limitations on its applicability. Third, we explore methods of supervising these probabilistic models with limited labeled data, which dramatically improves the recovery of correct clusters. Together these improvements suggest a line of research for practitioners to take advantage of probabilistic models in VerbNet annotation efforts
Graded Decompositional Semantic Prediction
Compared to traditional approaches, decompositional semantic labeling (DSL) is compelling but introduces complexities for data collection, quality assessment, and modeling. To shed light on these issues and lower barriers to the adoption of DSL or related approaches I bring existing models and novel variations into a shared, familiar framework, facilitating empirical investigation