15 research outputs found
Unsupervised semantic frame induction using triclustering
We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction. We cast the frame induction problem as a triclustering problem that is a generalization of clustering for triadic data. Our replicable benchmarks demonstrate that the proposed graph-based approach, Triframes, shows state-of-the art results on this task on a FrameNet-derived dataset and performing on par with competitive methods on a verb class clustering task
Unsupervised Semantic Frame Induction using Triclustering
We use dependency triples automatically extracted from a Web-scale corpus to
perform unsupervised semantic frame induction. We cast the frame induction
problem as a triclustering problem that is a generalization of clustering for
triadic data. Our replicable benchmarks demonstrate that the proposed
graph-based approach, Triframes, shows state-of-the art results on this task on
a FrameNet-derived dataset and performing on par with competitive methods on a
verb class clustering task.Comment: 8 pages, 1 figure, 4 tables, accepted at ACL 201
L2F/INESC-ID at SemEval-2019 Task 2: unsupervised lexical semantic frame induction using contextualized word representations
Building large datasets annotated with semantic information, such as FrameNet, is an expensive process. Consequently, such resources are unavailable for many languages and specific domains. This problem can be alleviated by using unsupervised approaches to induce the frames evoked by a collection of documents. That is the objective of the second task of SemEval 2019, which comprises three subtasks: clustering of verbs that evoke the same frame and clustering of arguments into both frame-specific slots and semantic roles. We approach all the subtasks by applying a graph clustering algorithm on contextualized embedding representations of the verbs and arguments. Using such representations is appropriate in the context of this task, since they provide cues for word-sense disambiguation. Thus, they can be used to identify different frames evoked by the same words. Using this approach we were able to outperform all of the baselines reported for the task on the test set in terms of Purity F1, as well as in terms of BCubed F1 in most cases.info:eu-repo/semantics/publishedVersio
HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings
We present our system for semantic frame induction that showed the best
performance in Subtask B.1 and finished as the runner-up in Subtask A of the
SemEval 2019 Task 2 on unsupervised semantic frame induction (QasemiZadeh et
al., 2019). Our approach separates this task into two independent steps: verb
clustering using word and their context embeddings and role labeling by
combining these embeddings with syntactical features. A simple combination of
these steps shows very competitive results and can be extended to process other
datasets and languages.Comment: 5 pages, 3 tables, accepted at SemEval 201
Semantic frame induction as a community detection problem
Resources such as FrameNet provide semantic information that is important for multiple tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspective as a community detection problem that targets the identification of groups of verb instances that evoke the same semantic frame. To do so, we apply a graph-clustering algorithm to a graph with contextualized representations of verb instances as nodes connected by an edge if the distance between them is below a threshold that defines the granularity of the induced frames. By applying this approach to the benchmark dataset defined in the context of the SemEval shared task we outperformed all the previous approaches to the task.info:eu-repo/semantics/acceptedVersio
Semantic frame induction through the detection of communities of verbs and their arguments
Resources such as FrameNet, which provide sets of semantic frame definitions and annotated textual data that maps into the evoked frames, are important for several NLP tasks. However, they are expensive to build and, consequently, are unavailable for many languages and domains. Thus, approaches able to induce semantic frames in an unsupervised manner are highly valuable. In this paper we approach that task from a network perspective as a community detection problem that targets the identification of groups of verb instances that evoke the same semantic frame and verb arguments that play the same semantic role. To do so, we apply a graph-clustering algorithm to a graph with contextualized representations of verb instances or arguments as nodes connected by edges if the distance between them is below a threshold that defines the granularity of the induced frames. By applying this approach to the benchmark dataset defined in the context of SemEval 2019, we outperformed all of the previous approaches to the task, achieving the current state-of-the-art performance.info:eu-repo/semantics/publishedVersio
Automatic induction of framenet lexical units in Italian
In this paper we investigate the applicability of automatic methods for frame induction to improve the coverage of IFrameNet, a novel lexical resource based on Frame Semantics in Italian. The experimental evaluations show that the adopted methods based on neural word embeddings pave the way for the assisted development of a large scale lexical resource for our language
Graph clustering for natural language processing
Graph-based representations are proven to be an effective approach for a variety of Natural Language Processing (NLP) tasks. Graph clustering makes it possible to extract useful knowledge by exploiting the implicit structure of the data. In this tutorial, we will present several efficient graph clustering algorithms, show their strengths and weaknesses as well as their implementations and applications. Then, the evaluation methodology in unsupervised NLP tasks will be discussed
LearnFCA: A Fuzzy FCA and Probability Based Approach for Learning and Classification
Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering.
This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide a literature review of it’s applications and various approaches adopted by researchers in the areas of dataanalysis, knowledge management with emphasis to data-learning and classification problems.
We propose LearnFCA, a novel approach based on FuzzyFCA and probability theory for learning and classification problems. LearnFCA uses an enhanced version of FuzzyLattice which has been developed to store class labels and probability vectors and has the capability to be used for classifying instances with encoded and unlabelled features. We evaluate LearnFCA on encodings from three datasets - mnist, omniglot and cancer images with interesting results and varying degrees of success.
Adviser: Dr Jitender Deogu