1,880 research outputs found

    Unsupervised induction of semantic roles

    Get PDF
    In recent years, a considerable amount of work has been devoted to the task of automatic frame-semantic analysis. Given the relative maturity of syntactic parsing technology, which is an important prerequisite, frame-semantic analysis represents a realistic next step towards broad-coverage natural language understanding and has been shown to benefit a range of natural language processing applications such as information extraction and question answering. Due to the complexity which arises from variations in syntactic realization, data-driven models based on supervised learning have become the method of choice for this task. However, the reliance on large amounts of semantically labeled data which is costly to produce for every language, genre and domain, presents a major barrier to the widespread application of the supervised approach. This thesis therefore develops unsupervised machine learning methods, which automatically induce frame-semantic representations without making use of semantically labeled data. If successful, unsupervised methods would render manual data annotation unnecessary and therefore greatly benefit the applicability of automatic framesemantic analysis. We focus on the problem of semantic role induction, in which all the argument instances occurring together with a specific predicate in a corpus are grouped into clusters according to their semantic role. Our hypothesis is that semantic roles can be induced without human supervision from a corpus of syntactically parsed sentences, by leveraging the syntactic relations conveyed through parse trees with lexical-semantic information. We argue that semantic role induction can be guided by three linguistic principles. The first is the well-known constraint that semantic roles are unique within a particular frame. The second is that the arguments occurring in a specific syntactic position within a specific linking all bear the same semantic role. The third principle is that the (asymptotic) distribution over argument heads is the same for two clusters which represent the same semantic role. We consider two approaches to semantic role induction based on two fundamentally different perspectives on the problem. Firstly, we develop feature-based probabilistic latent structure models which capture the statistical relationships that hold between the semantic role and other features of an argument instance. Secondly, we conceptualize role induction as the problem of partitioning a graph whose vertices represent argument instances and whose edges express similarities between these instances. The graph thus represents all the argument instances for a particular predicate occurring in the corpus. The similarities with respect to different features are represented on different edge layers and accordingly we develop algorithms for partitioning such multi-layer graphs. We empirically validate our models and the principles they are based on and show that our graph partitioning models have several advantages over the feature-based models. In a series of experiments on both English and German the graph partitioning models outperform the feature-based models and yield significantly better scores over a strong baseline which directly identifies semantic roles with syntactic positions. In sum, we demonstrate that relatively high-quality shallow semantic representations can be induced without human supervision and foreground a promising direction of future research aimed at overcoming the problem of acquiring large amounts of lexicalsemantic knowledge

    Learning Language from a Large (Unannotated) Corpus

    Full text link
    A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa

    From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

    Get PDF
    Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc

    Semantic frame induction through the detection of communities of verbs and their arguments

    Get PDF
    Resources such as FrameNet, which provide sets of semantic frame definitions and annotated textual data that maps into the evoked frames, are important for several NLP tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspective as a community detection problem that targets the identification of groups of verb instances that evoke the same semantic frame and verb arguments that play the same semantic role. To do so, we apply a graph-clustering algorithm to a graph with contextualized representations of verb instances or arguments as nodes connected by edges if the distance between them is below a threshold that defines the granularity of the induced frames. By applying this approach to the benchmark dataset defined in the context of SemEval 2019, we outperformed all of the previous approaches to the task, achieving the current state-of-the-art performance.info:eu-repo/semantics/publishedVersio

    L2F/INESC-ID at SemEval-2019 Task 2: unsupervised lexical semantic frame induction using contextualized word representations

    Get PDF
    Building large datasets annotated with semantic information, such as FrameNet, is an expensive process. Consequently, such resources are unavailable for many languages and specific domains. This problem can be alleviated by using unsupervised approaches to induce the frames evoked by a collection of documents. That is the objective of the second task of SemEval 2019, which comprises three subtasks: clustering of verbs that evoke the same frame and clustering of arguments into both frame-specific slots and semantic roles. We approach all the subtasks by applying a graph clustering algorithm on contextualized embedding representations of the verbs and arguments. Using such representations is appropriate in the context of this task, since they provide cues for word-sense disambiguation. Thus, they can be used to identify different frames evoked by the same words. Using this approach we were able to outperform all of the baselines reported for the task on the test set in terms of Purity F1, as well as in terms of BCubed F1 in most cases.info:eu-repo/semantics/publishedVersio
    • …
    corecore