Search CORE

682 research outputs found

MIsA : multilingual 'IsA' extraction from Corpora

Author: Faralli Stefano
Lefever Els
Paolo Ponzetto Simone
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

The horse before the cart: improving the accuracy of taxonomic directions when building tag hierarchies

Author: Almoqhim Fahad
Millard David E.
Shadbolt Nigel
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 15/12/2015
Field of study

Content on the Web is huge and constantly growing, and building taxonomies for such content can help with navigation and organisation, but building taxonomies manually is costly and time-consuming. An alternative is to allow users to construct folksonomies: collective social classifications. Yet, folksonomies are inconsistent and their use for searching and browsing is limited. Approaches have been suggested for acquiring implicit hierarchical structures from folksonomies, however, but these approaches suffer from the ‘popularity-generality’ problem, in that popularity is assumed to be a proxy for generality, i.e. high-level taxonomic terms will occur more often than low-level ones. To tackle this problem, we propose in this paper an improved approach. It is based on the Heymann–Benz algorithm, and works by checking the taxonomic directions against a corpus of text. Our results show that popularity works as a proxy for generality in at most 90.91% of cases, but this can be improved to 95.45% using our approach, which should translate to higher-quality tag hierarchy structure

Southampton (e-Prints Soton)

Crossref

Taxonomy Induction using Hypernym Subsequences

Author: Biemann Chris
Cram Damien
Grefenstette Gregory
Gupta Amit
Kozareva Zornitsa
Nastase Vivi
Oakes Michael P
Ponzetto S.
Ponzetto Simone Paolo
Snow Rion
Publication venue
Publication date: 05/05/2017
Field of study

We propose a novel, semi-supervised approach towards domain taxonomy induction from an input vocabulary of seed terms. Unlike all previous approaches, which typically extract direct hypernym edges for terms, our approach utilizes a novel probabilistic framework to extract hypernym subsequences. Taxonomy induction from extracted subsequences is cast as an instance of the minimumcost flow problem on a carefully designed directed graph. Through experiments, we demonstrate that our approach outperforms stateof- the-art taxonomy induction approaches across four languages. Importantly, we also show that our approach is robust to the presence of noise in the input vocabulary. To the best of our knowledge, no previous approaches have been empirically proven to manifest noise-robustness in the input vocabulary

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

A concept–relationship acquisition and inference approach for hierarchical taxonomy construction from tags

Author: Cheung CFB
Lau SMA
Tsui E
Wang WM
Publication venue: 'Elsevier BV'
Publication date: 11/12/2014
Field of study

Author name used in this publication: W. M. WangAuthor name used in this publication: C. F. CheungAuthor name used in this publication: Adela S. M. Lau2009-2010 > Academic research: refereed > Publication in refereed journalAccepted ManuscriptPublishe

PolyU Institutional Repository

Recommended from our members

Toward the automation of business process ontology generation

Author: De Cesare S
Juric D
Lycett M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2014
Field of study

Semantic Business Process Management (SBPM) utilises semantic technologies (e.g., ontology) to model and query process representations. There are times in which such models must be reconstructed from existing textual documentation. In this scenario the automated generation of ontological models would be preferable, however current methods and technology are still not capable of automatically generating accurate semantic process models from textual descriptions. This research attempts to automate the process as much as possible by proposing a method that drives the transformation through the joint use of a foundational ontology and lexico-semantic analysis. The method is presented, demonstrated and evaluated. The original dataset represents 150 business activities related to the procurement processes of a case study company. As the evaluation shows, the proposed method can accurately map the linguistic patterns of the process descriptions to semantic patterns of the foundational ontology to a high level of accuracy, however further research is required in order to reduce the level of human intervention, expand the method so as to recognise further patterns of the foundational ontology and develop a tool to assist the business process modeller in the semi-automated generation of process models

Brunel University Research Archive

Building WordNet for Afaan Oromoo

Author: Bacha Biru Abera
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 29/05/2020
Field of study

WordNet is a lexical database which has many relations to disambiguate the sense of words for natural languages. From the WordNet relations synonyms and hyponym has major role for natural language processing and artificial intelligence applications. In this paper, word embedding (Word2Vec) and lexico-syntactic pattern (LSP) are developed to extract automatically synonyms and hyponyms respectively. For this study, the word embedding is evaluated on two specialized domain algorithms such as a continuous bag of words and Skip Gram algorithms and show superior results. Applying word embedding (Word2Vec) algorithms for Afaan Oromo texts has been registered 80.09% and 85.04% for the continuous bag of words and Skip Gram respectively. According to the result achieved in this study, the skip-gram algorithm does a better job for frequent pairs of words than a continuous bag of words. But, a continuous bag of words algorithm is faster while skip-gram is slower. A lexical syntactic pattern with the combination of Word2Vec and without Word2Vec is also evaluated using information retrieval evaluation metrics such as precision, recall and F-measure to extract hyponym relation from Afaan Oromoo texts. The precision, recall and F-measure have been registered by lexical syntactic patterns without the combination of Word2Vec is 66.73%, 72%, and 69.26% respectively and with the combination of Word2Vec 81.14%, 80.8%, and 81.1% have been registered for precision, recall and F-measure respectively. There are factors that could affect the accuracy of results: 1) the style of writer of Afaan Oromoo i.e. they write a noun phrase with many adjective to express the noun for the reader; and, 2) it is possible that some instances of the LSP are missed due to misspellings and other typographical errors. Keywords: Afaan Oromoo WordNet, Word embedding, Lexico syntactic patterns, Extraction of WordNet relations. DOI: 10.7176/CEIS/11-3-01 Publication date:May 31st 202

International Institute for Science, Technology and Education (IISTE): E-Journals

Learning Taxonomic Relations from Heterogeneous Evidence

Author: Cimiano Philipp
Pivk Aleksander
Schmidt-Thieme Lars
Staab Steffen
Publication venue
Publication date: 01/01/2004
Field of study

Cimiano P, Schmidt-Thieme L, Pivk A, Staab S. Learning Taxonomic Relations from Heterogeneous Evidence. In: Proceedings of the ECAI 2004 Ontology Learning and Population Workshop. 2004

Publications at Bielefeld University