310 research outputs found
VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling
We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments.
We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org
Terms interrelationship query expansion to improve accuracy of Quran search
Quran retrieval system is becoming an instrument for users to search for needed
information. The search engine is one of the most popular search engines that
successfully implemented for searching relevant verses queries. However, a major
challenge to the Quran search engine is word ambiguities, specifically lexical
ambiguities. With the advent of query expansion techniques for Quran retrieval
systems, the performance of the Quran retrieval system has problem and issue in
terms of retrieving users needed information. The results of the current semantic
techniques still lack precision values without considering several semantic
dictionaries. Therefore, this study proposes a stemmed terms interrelationship query
expansion approach to improve Quran search results. More specifically, related terms
were collected from different semantic dictionaries and then utilize to get roots of
words using a stemming algorithm. To assess the performance of the stemmed terms
interrelationship query expansion, experiments were conducted using eight Quran
datasets from the Tanzil website. Overall, the results indicate that the stemmed terms
interrelationship query expansion is superior to unstemmed terms interrelationship
query expansion in Mean Average Precision with Yusuf Ali 68%, Sarawar 67%,
Arberry 72%, Malay 65%, Hausa 62%, Urdu 62%, Modern Arabic 60% and
Classical Arabic 59%
Extracting Synonyms from Bilingual Dictionaries
We present our progress in developing a novel algorithm to extract synonyms
from bilingual dictionaries. Identification and usage of synonyms play a
significant role in improving the performance of information access
applications. The idea is to construct a translation graph from translation
pairs, then to extract and consolidate cyclic paths to form bilingual sets of
synonyms. The initial evaluation of this algorithm illustrates promising
results in extracting Arabic-English bilingual synonyms. In the evaluation, we
first converted the synsets in the Arabic WordNet into translation pairs (i.e.,
losing word-sense memberships). Next, we applied our algorithm to rebuild these
synsets. We compared the original and extracted synsets obtaining an F-Measure
of 82.3% and 82.1% for Arabic and English synsets extraction, respectively.Comment: In Proceedings - 11th International Global Wordnet Conference
(GWC2021). Global Wordnet Association (2021
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
Speakers Fill Lexical Semantic Gaps with Context
Lexical ambiguity is widespread in language, allowing for the reuse of
economical word forms and therefore making language more efficient. If
ambiguous words cannot be disambiguated from context, however, this gain in
efficiency might make language less clear -- resulting in frequent
miscommunication. For a language to be clear and efficiently encoded, we posit
that the lexical ambiguity of a word type should correlate with how much
information context provides about it, on average. To investigate whether this
is the case, we operationalise the lexical ambiguity of a word as the entropy
of meanings it can take, and provide two ways to estimate this -- one which
requires human annotation (using WordNet), and one which does not (using BERT),
making it readily applicable to a large number of languages. We validate these
measures by showing that, on six high-resource languages, there are significant
Pearson correlations between our BERT-based estimate of ambiguity and the
number of synonyms a word has in WordNet (e.g. in English). We
then test our main hypothesis -- that a word's lexical ambiguity should
negatively correlate with its contextual uncertainty -- and find significant
correlations on all 18 typologically diverse languages we analyse. This
suggests that, in the presence of ambiguity, speakers compensate by making
contexts more informative.Comment: Camera ready version of EMNLP 2020 publication. Code is available in
https://github.com/tpimentelms/lexical-ambiguity-in-contex
Contribution à l’amélioration de la recherche d’information par utilisation des méthodes sémantiques: application à la langue arabe
Un système de recherche d’information est un ensemble de programmes et de modules qui sert à interfacer avec l’utilisateur, pour prendre et interpréter une requête, faire la recherche dans l’index et retourner un classement des documents sélectionnés à cet utilisateur. Cependant le plus grand challenge de ce système est qu’il doit faire face au grand volume d’informations multi modales
et multilingues disponibles via les bases documentaires ou le web pour trouver celles qui correspondent au mieux aux besoins des utilisateurs. A travers ce travail, nous avons présenté deux contributions. Dans la première nous avons
proposé une nouvelle approche pour la reformulation des requêtes dans le contexte de la recherche d’information en arabe. Le principe est donc de représenter la requête par un arbre sémantique pondéré pour mieux identifier le besoin d'information de l'utilisateur, dont les nœuds représentent les concepts (synsets) reliés par des relations sémantiques. La construction de cet arbre est réalisée
par la méthode de la Pseudo-Réinjection de la Pertinence combinée à la ressource sémantique du
WordNet Arabe. Les résultats expérimentaux montrent une bonne amélioration dans les
performances du système de recherche d’information. Dans la deuxième contribution, nous avons aussi proposé une nouvelle approche pour la construction d’une collection de test de recherche d’information arabe. L'approche repose sur la combinaison de la méthode de la stratégie de Pooling utilisant les moteurs de recherches et l’algorithme Naïve-Bayes de classification par l’apprentissage automatique. Pour l’expérimentation nous avons créé une nouvelle collection de test composée d’une base documentaire de 632
documents et de 165 requêtes avec leurs jugements de pertinence sous plusieurs topics. L’expérimentation a également montré l’efficacité du classificateur Bayésien pour la récupération de pertinences des documents, encore plus, il a réalisé des bonnes performances
après l’enrichissement sémantique de la base documentaire par le modèle word2vec
- …