Search CORE

191 research outputs found

Taxonomy Induction using Hypernym Subsequences

Author: Biemann Chris
Cram Damien
Grefenstette Gregory
Gupta Amit
Kozareva Zornitsa
Nastase Vivi
Oakes Michael P
Ponzetto S.
Ponzetto Simone Paolo
Snow Rion
Publication venue
Publication date: 05/05/2017
Field of study

We propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic framework to extract hypernym subsequences. Taxonomy induction from extracted subsequences is cast as an instance of the minimumcost flow problem on a carefully designed directed graph. Through experiments, we demonstrate that our approach outperforms stateof- the-art taxonomy induction approaches across four languages. Importantly, we also show that our approach is robust to the presence of noise in the input vocabulary. To the best of our knowledge, no previous approaches have been empirically proven to manifest noise-robustness in the input vocabulary

arXiv.org e-Print Archive

Lexical patterns or dependency patterns: which is better for hypernym extraction?

Author: Hofmann K.
Tjong Kim Sang E.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora

Author: Hoste Veronique
Lefever Els
Rigouts Terryn Ayla
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even more difficult by the lack of a clear distinction between terms and general language, which results in low inter-annotator agreement. There is a large need for well-documented, manually validated datasets, especially in the rising field of multilingual term extraction from comparable corpora, which presents a unique new set of challenges. In this paper, a new approach is presented for both monolingual and multilingual term annotation in comparable corpora. The detailed guidelines with different term labels, the domain- and language-independent methodology and the large volumes annotated in three different languages and four different domains make this a rich resource. The resulting datasets are not just suited for evaluation purposes but can also serve as a general source of information about terms and even as training data for supervised methods. Moreover, the gold standard for multilingual term extraction from comparable corpora contains information about term variants and translation equivalents, which allows an in-depth, nuanced evaluation

LT3: a multi-modular approach to automatic taxonomy construction

Author: Lefever Els
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

This paper describes our contribution to the SemEval-2015 task 17 on “Taxonomy Extrac-tion Evaluation”. We propose a hypernym de-tection system combining three modules: a lexico-syntactic pattern matcher, a morpho-syntactic analyzer and a module retrieving hy-pernym relations from structured lexical re-sources. Our system ranked first in the compe-tition when considering the gold standard and manual evaluation, and second in the overall ranking. In addition, the experimental results show that all modules contribute to finding hy-pernym relations between terms.

CiteSeerX

A gold standard for multilingual automatic term extraction from comparable corpora : term structure and translation equivalents

Author: Hoste Veronique
Lefever Els
Rigouts Terryn Ayla
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Learning Semantic Text Similarity to rank Hypernyms of Financial Terms

Author: Chopra Ankush
Ghosh Sohom
Naskar Sudip Kumar
Publication venue
Publication date: 12/08/2023
Field of study

Over the years, there has been a paradigm shift in how users access financial services. With the advancement of digitalization more users have been preferring the online mode of performing financial activities. This has led to the generation of a huge volume of financial content. Most investors prefer to go through these contents before making decisions. Every industry has terms that are specific to the domain it operates in. Banking and Financial Services are not an exception to this. In order to fully comprehend these contents, one needs to have a thorough understanding of the financial terms. Getting a basic idea about a term becomes easy when it is explained with the help of the broad category to which it belongs. This broad category is referred to as hypernym. For example, "bond" is a hypernym of the financial term "alternative debenture". In this paper, we propose a system capable of extracting and ranking hypernyms for a given financial term. The system has been trained with financial text corpora obtained from various sources like DBpedia [4], Investopedia, Financial Industry Business Ontology (FIBO), prospectus and so on. Embeddings of these terms have been extracted using FinBERT [3], FinISH [1] and fine-tuned using SentenceBERT [54]. A novel approach has been used to augment the training set with negative samples. It uses the hierarchy present in FIBO. Finally, we benchmark the system performance with that of the existing ones. We establish that it performs better than the existing ones and is also scalable.Comment: Our code base: https://github.com/sohomghosh/FinSim_Financial_Hypernym_detectio

arXiv.org e-Print Archive