182 research outputs found
One Homonym per Translation
The study of homonymy is vital to resolving fundamental problems in lexical
semantics. In this paper, we propose four hypotheses that characterize the
unique behavior of homonyms in the context of translations, discourses,
collocations, and sense clusters. We present a new annotated homonym resource
that allows us to test our hypotheses on existing WSD resources. The results of
the experiments provide strong empirical evidence for the hypotheses. This
study represents a step towards a computational method for distinguishing
between homonymy and polysemy, and constructing a definitive inventory of
coarse-grained senses.Comment: 8 pages, including reference
A Fast Method for Parallel Document Identification
We present a fast method to identify
homogeneous parallel documents. The
method is based on collecting counts of
identical low-frequency words between
possibly parallel documents. The candidate with the most shared low-frequency
words is selected as the parallel document.
The method achieved 99.96% accuracy
when tested on the EUROPARL corpus
of parliamentary proceedings, failing only
in anomalous cases of truncated or otherwise distorted documents. While other
work has shown similar performance on
this type of dataset, our approach presented here is faster and does not require
training. Apart from proposing an efficient method for parallel document identification in a restricted domain, this paper furnishes evidence that parliamentary
proceedings may be inappropriate for testing parallel document identification systems in general
One Sense Per Translation
The idea of using lexical translations to define sense inventories has a long
history in lexical semantics. We propose a theoretical framework which allows
us to answer the question of why this apparently reasonable idea failed to
produce useful results. We formally prove several propositions on how the
translations of a word relate to its senses, as well as on the relationship
between synonymy and polysemy. We empirically validate our theoretical findings
on BabelNet, and demonstrate how they could be used to perform unsupervised
word sense disambiguation of a substantial fraction of the lexicon
Testy z matematyki a sprawność językowa i wiedza merytoryczna studentów Studium Języka Polskiego dla Cudzoziemców
Zadanie pt. „Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Łódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Łódzki” nr 885/P-DUN/2014 zostało dofinansowane ze środków MNiSW w ramach działalności upowszechniającej naukę
A Fast Method for Parallel Document Identification
We present a fast method to identify
homogeneous parallel documents. The
method is based on collecting counts of
identical low-frequency words between
possibly parallel documents. The candidate with the most shared low-frequency
words is selected as the parallel document.
The method achieved 99.96% accuracy
when tested on the EUROPARL corpus
of parliamentary proceedings, failing only
in anomalous cases of truncated or otherwise distorted documents. While other
work has shown similar performance on
this type of dataset, our approach presented here is faster and does not require
training. Apart from proposing an efficient method for parallel document identification in a restricted domain, this paper furnishes evidence that parliamentary
proceedings may be inappropriate for testing parallel document identification systems in general
Test wstępny jako jeden z czynników intensyfikacji nauczania matematyki w Studium Języka Polskiego dla Cudzoziemców
Celem niniejszego opracowania jest omówienie wybranych zagadnień
związanych z testami wstępnymi z matematyki przeprowadzanymi w Studium
Języka Polskiego dla Cudzoziemców w grupach politechnicznych
i ekonomicznych. Testy te pełnią inną rolę niż testy wstępne na wyższych
uczelniach. Ich wynik nie decyduje o przyjęciu lub nie do Studium. Mają
one wyłącznie dostarczyć informacji o wiadomościach matematycznych
słuchaczy.Zadanie pt. Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Łódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Łódzki nr 885/P-DUN/2014 zostało dofinansowane ze środków MNiSW w ramach działalności upowszechniającej naukę
The Application of Chordal Graphs to Inferring Phylogenetic Trees of Languages
Phylogenetic methods are used to build
evolutionary trees of languages given
character data that may include lexical,
phonological, and morphological information. Such data rarely admits a perfect
phylogeny. We explore the use of the
more permissive conservative Dollo phylogeny as an alternative or complementary
approach. We propose a heuristic search
algorithm based on the notion of chordal
graphs. We test this approach by generating phylogenetic trees from three datasets,
and comparing them to those produced by
other researchers
- …