113 research outputs found
Hierarchical structuring of Cultural Heritage objects within large aggregations
Huge amounts of cultural content have been digitised and are available
through digital libraries and aggregators like Europeana.eu. However, it is not
easy for a user to have an overall picture of what is available nor to find
related objects. We propose a method for hier- archically structuring cultural
objects at different similarity levels. We describe a fast, scalable clustering
algorithm with an automated field selection method for finding semantic
clusters. We report a qualitative evaluation on the cluster categories based on
records from the UK and a quantitative one on the results from the complete
Europeana dataset.Comment: The paper has been published in the proceedings of the TPDL
conference, see http://tpdl2013.info. For the final version see
http://link.springer.com/chapter/10.1007%2F978-3-642-40501-3_2
One Homonym per Translation
The study of homonymy is vital to resolving fundamental problems in lexical
semantics. In this paper, we propose four hypotheses that characterize the
unique behavior of homonyms in the context of translations, discourses,
collocations, and sense clusters. We present a new annotated homonym resource
that allows us to test our hypotheses on existing WSD resources. The results of
the experiments provide strong empirical evidence for the hypotheses. This
study represents a step towards a computational method for distinguishing
between homonymy and polysemy, and constructing a definitive inventory of
coarse-grained senses.Comment: 8 pages, including reference
S2vNTM: Semi-supervised vMF Neural Topic Modeling
Language model based methods are powerful techniques for text classification.
However, the models have several shortcomings. (1) It is difficult to integrate
human knowledge such as keywords. (2) It needs a lot of resources to train the
models. (3) It relied on large text data to pretrain. In this paper, we propose
Semi-Supervised vMF Neural Topic Modeling (S2vNTM) to overcome these
difficulties. S2vNTM takes a few seed keywords as input for topics. S2vNTM
leverages the pattern of keywords to identify potential topics, as well as
optimize the quality of topics' keywords sets. Across a variety of datasets,
S2vNTM outperforms existing semi-supervised topic modeling methods in
classification accuracy with limited keywords provided. S2vNTM is at least
twice as fast as baselines.Comment: 17 pages, 9 figures, ICLR Workshop 2023. arXiv admin note: text
overlap with arXiv:2307.0122
- …