505 research outputs found
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
Learning Semantic Text Similarity to rank Hypernyms of Financial Terms
Over the years, there has been a paradigm shift in how users access financial
services. With the advancement of digitalization more users have been
preferring the online mode of performing financial activities. This has led to
the generation of a huge volume of financial content. Most investors prefer to
go through these contents before making decisions. Every industry has terms
that are specific to the domain it operates in. Banking and Financial Services
are not an exception to this. In order to fully comprehend these contents, one
needs to have a thorough understanding of the financial terms. Getting a basic
idea about a term becomes easy when it is explained with the help of the broad
category to which it belongs. This broad category is referred to as hypernym.
For example, "bond" is a hypernym of the financial term "alternative
debenture". In this paper, we propose a system capable of extracting and
ranking hypernyms for a given financial term. The system has been trained with
financial text corpora obtained from various sources like DBpedia [4],
Investopedia, Financial Industry Business Ontology (FIBO), prospectus and so
on. Embeddings of these terms have been extracted using FinBERT [3], FinISH [1]
and fine-tuned using SentenceBERT [54]. A novel approach has been used to
augment the training set with negative samples. It uses the hierarchy present
in FIBO. Finally, we benchmark the system performance with that of the existing
ones. We establish that it performs better than the existing ones and is also
scalable.Comment: Our code base:
https://github.com/sohomghosh/FinSim_Financial_Hypernym_detectio
A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection
Compositionality in language refers to how much the meaning of some phrase
can be decomposed into the meaning of its constituents and the way these
constituents are combined. Based on the premise that substitution by synonyms
is meaning-preserving, compositionality can be approximated as the semantic
similarity between a phrase and a version of that phrase where words have been
replaced by their synonyms. Different ways of representing such phrases exist
(e.g., vectors [1] or language models [2]), and the choice of representation
affects the measurement of semantic similarity.
We propose a new compositionality detection method that represents phrases as
ranked lists of term weights. Our method approximates the semantic similarity
between two ranked list representations using a range of well-known distance
and correlation metrics. In contrast to most state-of-the-art approaches in
compositionality detection, our method is completely unsupervised. Experiments
with a publicly available dataset of 1048 human-annotated phrases shows that,
compared to strong supervised baselines, our approach provides superior
measurement of compositionality using any of the distance and correlation
metrics considered
ExTaSem! Extending, Taxonomizing and Semantifying Domain Terminologies
We introduce EXTASEM!, a novel approach for the automatic learning of lexical taxonomies from domain terminologies. First, we exploit a very large semantic network to collect thousands of in-domain textual definitions. Second, we extract (hyponym, hypernym) pairs from each definition with a CRF-based algorithm trained on manuallyvalidated data. Finally, we introduce a graph induction procedure which constructs a full-fledged taxonomy where each edge is weighted according to its domain pertinence. EXTASEM! achieves state-of-the-art results in the following taxonomy evaluation experiments: (1) Hypernym discovery, (2) Reconstructing gold standard taxonomies, and (3) Taxonomy quality according to structural measures. We release weighted taxonomies for six domains for the use and scrutiny of the communit
A Dependency-Based Neural Network for Relation Classification
Previous research on relation classification has verified the effectiveness
of using dependency shortest paths or subtrees. In this paper, we further
explore how to make full use of the combination of these dependency
information. We first propose a new structure, termed augmented dependency path
(ADP), which is composed of the shortest dependency path between two entities
and the subtrees attached to the shortest path. To exploit the semantic
representation behind the ADP structure, we develop dependency-based neural
networks (DepNN): a recursive neural network designed to model the subtrees,
and a convolutional neural network to capture the most important features on
the shortest path. Experiments on the SemEval-2010 dataset show that our
proposed method achieves state-of-art results.Comment: This preprint is the full version of a short paper accepted in the
annual meeting of the Association for Computational Linguistics (ACL) 2015
(Beijing, China
- …