40,080 research outputs found
Automatic Taxonomy Generation - A Use-Case in the Legal Domain
A key challenge in the legal domain is the adaptation and representation of
the legal knowledge expressed through texts, in order for legal practitioners
and researchers to access this information easier and faster to help with
compliance related issues. One way to approach this goal is in the form of a
taxonomy of legal concepts. While this task usually requires a manual
construction of terms and their relations by domain experts, this paper
describes a methodology to automatically generate a taxonomy of legal noun
concepts. We apply and compare two approaches on a corpus consisting of
statutory instruments for UK, Wales, Scotland and Northern Ireland laws.Comment: 9 page
Concept Extraction and Clustering for Topic Digital Library Construction
This paper is to introduce a new approach to build
topic digital library using concept extraction and
document clustering. Firstly, documents in a special
domain are automatically produced by document
classification approach. Then, the keywords of each
document are extracted using the machine learning
approach. The keywords are used to cluster the
documents subset. The clustered result is the taxonomy
of the subset. Lastly, the taxonomy is modified to the
hierarchical structure for user navigation by manual
adjustments. The topic digital library is constructed
after combining the full-text retrieval and hierarchical
navigation function
Dutch hypernym detection : does decompounding help?
This research presents experiments carried out to improve the precision and recall of Dutch hypernym detection. To do so, we applied a data-driven semantic relation finder that starts from a list of automatically extracted domain-specific terms from technical corpora, and generates a list of hypernym relations between these terms. As Dutch technical terms often consist of compounds written in one orthographic unit, we investigated the impact of a decompounding module on the performance of the hypernym detection system.
In addition, we also improved the precision of the system by designing filters taking into account statistical and linguistic information.
The experimental results show that both the precision and recall of the hypernym detection system improved, and that the decompounding module is especially effective for hypernym detection in Dutch
Context and Keyword Extraction in Plain Text Using a Graph Representation
Document indexation is an essential task achieved by archivists or automatic
indexing tools. To retrieve relevant documents to a query, keywords describing
this document have to be carefully chosen. Archivists have to find out the
right topic of a document before starting to extract the keywords. For an
archivist indexing specialized documents, experience plays an important role.
But indexing documents on different topics is much harder. This article
proposes an innovative method for an indexing support system. This system takes
as input an ontology and a plain text document and provides as output
contextualized keywords of the document. The method has been evaluated by
exploiting Wikipedia's category links as a termino-ontological resources
Evaluation of automatic hypernym extraction from technical corpora in English and Dutch
In this research, we evaluate different approaches for the automatic extraction of hypernym relations from English and Dutch technical text. The detected hypernym relations should enable us to semantically structure automatically obtained term lists from domain- and user-specific data. We investigated three different hypernymy extraction approaches for Dutch and English: a lexico-syntactic pattern-based approach, a distributional model and a morpho-syntactic method. To test the performance of the different approaches on domain-specific data, we collected and manually annotated English and Dutch data from two technical domains, viz. the dredging and financial domain. The experimental results show that especially the morpho-syntactic approach obtains good results for automatic hypernym extraction from technical and domain-specific texts
- …