191 research outputs found
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora
Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even more difficult by the lack of a clear distinction between terms and general language, which results in low inter-annotator agreement. There is a large need for well-documented, manually validated datasets, especially in the rising field of multilingual term extraction from comparable corpora, which presents a unique new set of challenges. In this paper, a new approach is presented for both monolingual and multilingual term annotation in comparable corpora. The detailed guidelines with different term labels, the domain- and language-independent methodology and the large volumes annotated in three different languages and four different domains make this a rich resource. The resulting datasets are not just suited for evaluation purposes but can also serve as a general source of information about terms and even as training data for supervised methods. Moreover, the gold standard for multilingual term extraction from comparable corpora contains information about term variants and translation equivalents, which allows an in-depth, nuanced evaluation
LT3: a multi-modular approach to automatic taxonomy construction
This paper describes our contribution to the SemEval-2015 task 17 on “Taxonomy Extrac-tion Evaluation”. We propose a hypernym de-tection system combining three modules: a lexico-syntactic pattern matcher, a morpho-syntactic analyzer and a module retrieving hy-pernym relations from structured lexical re-sources. Our system ranked first in the compe-tition when considering the gold standard and manual evaluation, and second in the overall ranking. In addition, the experimental results show that all modules contribute to finding hy-pernym relations between terms.
Learning Semantic Text Similarity to rank Hypernyms of Financial Terms
Over the years, there has been a paradigm shift in how users access financial
services. With the advancement of digitalization more users have been
preferring the online mode of performing financial activities. This has led to
the generation of a huge volume of financial content. Most investors prefer to
go through these contents before making decisions. Every industry has terms
that are specific to the domain it operates in. Banking and Financial Services
are not an exception to this. In order to fully comprehend these contents, one
needs to have a thorough understanding of the financial terms. Getting a basic
idea about a term becomes easy when it is explained with the help of the broad
category to which it belongs. This broad category is referred to as hypernym.
For example, "bond" is a hypernym of the financial term "alternative
debenture". In this paper, we propose a system capable of extracting and
ranking hypernyms for a given financial term. The system has been trained with
financial text corpora obtained from various sources like DBpedia [4],
Investopedia, Financial Industry Business Ontology (FIBO), prospectus and so
on. Embeddings of these terms have been extracted using FinBERT [3], FinISH [1]
and fine-tuned using SentenceBERT [54]. A novel approach has been used to
augment the training set with negative samples. It uses the hierarchy present
in FIBO. Finally, we benchmark the system performance with that of the existing
ones. We establish that it performs better than the existing ones and is also
scalable.Comment: Our code base:
https://github.com/sohomghosh/FinSim_Financial_Hypernym_detectio
- …