Search CORE

40,080 research outputs found

Automatic Taxonomy Generation - A Use-Case in the Legal Domain

Author: Buitelaar Paul
O'Neill James
Robin Cécile
Publication venue
Publication date: 04/10/2017
Field of study

A key challenge in the legal domain is the adaptation and representation of the legal knowledge expressed through texts, in order for legal practitioners and researchers to access this information easier and faster to help with compliance related issues. One way to approach this goal is in the form of a taxonomy of legal concepts. While this task usually requires a manual construction of terms and their relations by domain experts, this paper describes a methodology to automatically generate a taxonomy of legal noun concepts. We apply and compare two approaches on a corpus consisting of statutory instruments for UK, Wales, Scotland and Northern Ireland laws.Comment: 9 page

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

Concept Extraction and Clustering for Topic Digital Library Construction

Author: Chengzhi Zhang
Dan Wu
Publication venue
Publication date: 01/12/2008
Field of study

This paper is to introduce a new approach to build topic digital library using concept extraction and document clustering. Firstly, documents in a special domain are automatically produced by document classification approach. Then, the keywords of each document are extracted using the machine learning approach. The keywords are used to cluster the documents subset. The clustered result is the taxonomy of the subset. Lastly, the taxonomy is modified to the hierarchical structure for user navigation by manual adjustments. The topic digital library is constructed after combining the full-text retrieval and hierarchical navigation function

E-LIS

Dutch hypernym detection : does decompounding help?

Author: Lefever Els
Macken Lieve
Rigouts Terryn Ayla
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2016
Field of study

This research presents experiments carried out to improve the precision and recall of Dutch hypernym detection. To do so, we applied a data-driven semantic relation finder that starts from a list of automatically extracted domain-specific terms from technical corpora, and generates a list of hypernym relations between these terms. As Dutch technical terms often consist of compounds written in one orthographic unit, we investigated the impact of a decompounding module on the performance of the hypernym detection system. In addition, we also improved the precision of the system by designing filters taking into account statistical and linguistic information. The experimental results show that both the precision and recall of the hypernym detection system improved, and that the decompounding module is especially effective for hypernym detection in Dutch

Ghent University Academic Bibliography

Context and Keyword Extraction in Plain Text Using a Graph Representation

Author: Chahine Carlo Abi
Chaignaud Nathalie
Kotowicz Jean-Philippe
Pécuchet Jean-Pierre
Publication venue
Publication date: 30/11/2008
Field of study

Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

Evaluation of automatic hypernym extraction from technical corpora in English and Dutch

Author: Hoste Veronique
Lefever Els
Van de Kauter Marjan
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2014
Field of study

In this research, we evaluate different approaches for the automatic extraction of hypernym relations from English and Dutch technical text. The detected hypernym relations should enable us to semantically structure automatically obtained term lists from domain- and user-specific data. We investigated three different hypernymy extraction approaches for Dutch and English: a lexico-syntactic pattern-based approach, a distributional model and a morpho-syntactic method. To test the performance of the different approaches on domain-specific data, we collected and manually annotated English and Dutch data from two technical domains, viz. the dredging and financial domain. The experimental results show that especially the morpho-syntactic approach obtains good results for automatic hypernym extraction from technical and domain-specific texts

Ghent University Academic Bibliography