    Terms interrelationship query expansion to improve accuracy of Quran search

    Quran retrieval system is becoming an instrument for users to search for needed information. The search engine is one of the most popular search engines that successfully implemented for searching relevant verses queries. However, a major challenge to the Quran search engine is word ambiguities, specifically lexical ambiguities. With the advent of query expansion techniques for Quran retrieval systems, the performance of the Quran retrieval system has problem and issue in terms of retrieving users needed information. The results of the current semantic techniques still lack precision values without considering several semantic dictionaries. Therefore, this study proposes a stemmed terms interrelationship query expansion approach to improve Quran search results. More specifically, related terms were collected from different semantic dictionaries and then utilize to get roots of words using a stemming algorithm. To assess the performance of the stemmed terms interrelationship query expansion, experiments were conducted using eight Quran datasets from the Tanzil website. Overall, the results indicate that the stemmed terms interrelationship query expansion is superior to unstemmed terms interrelationship query expansion in Mean Average Precision with Yusuf Ali 68%, Sarawar 67%, Arberry 72%, Malay 65%, Hausa 62%, Urdu 62%, Modern Arabic 60% and Classical Arabic 59%

    From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

    Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc

    Natural Language Processing and Language Technologies for the Basque Language

    The presence of a language in the digital domain is crucial for its survival, as online communication and digital language resources have become the standard in the last decades and will gain more importance in the coming years. In order to develop advanced systems that are considered the basics for an efficient digital communication (e.g. machine translation systems, text-to-speech and speech-to-text converters and digital assistants), it is necessary to digitalise linguistic resources and create tools. In the case of Basque, scholars have studied the creation of digital linguistic resources and the tools that allow the development of those systems for the last forty years. In this paper, we present an overview of the natural language processing and language technology resources developed for Basque, their impact in the process of making Basque a “digital language” and the applications and challenges in multilingual communication. More precisely, we present the well-known products for Basque, the basic tools and the resources that are behind the products we use every day. Likewise, we would like that this survey serves as a guide for other minority languages that are making their way to digitalisation. Received: 05 April 2022 Accepted: 20 May 202

    Hitzen arteko antzekotasuna:ezagutza-baseetan oinarritutako tekniken ekarpenak

    146 p.Eredu konputazionalekin sortutako hitzen errepresentazio semantikoak gakoa dira hizkuntzarenprozesamenduko hainbat atazatan, eta errepresentazio horien kalitatea ebaluatzeko hitzen artekoantzekotasuna erabiltzen da. Antzekotasun-ataza hizkuntzaren prozesamenduaren alorrean kokatzen da,lexiko-semantikan, eta, hurrengo urratsak ditu: lehenik, hitzen arteko antzekotasuna hitzenerrepresentazioen bidez kalkulatzen da; ondoren, antzekotasun hori gizakien antzekotasun-irizpideekinkonparatzen da. Eredu konputazionalaren emaitzak zenbat eta gizakion irizpideetatik hurbilago egon, orduaneta kalitate hobea izango dute hitzen errepresentazioek. Lan honetan antzekotasunaren kasuorokorragoarekin ere lan egin dugu, ahaidetasunarekin.Hitzen errepresentazioan testu-corpusetan oinarritutako metodoak eta ezagutza-baseetan oinarritutakoakdaude. Aurreneko familian hainbat eredu daude, baina, lan honetan neurona-sareetan oinarritutakoak erabiliditugu. Metodo horiek hitzen esanahiak testuetako hitz-testuinguru agerkidetzen bidez inferitzen dituzte etabektore-espazio trinko batean kodetzen. Bigarren familiakoen artean, ezagutza-baseak grafoak balira bezalatratatzen dituztenez baliatu gara, azken horien informazio estrukturala bere osotasuenan ustiatuz. Aldebatetik, testu corpusetatik erauzitako errepresentazio trinkoek arrakasta handia izan dute hainbat atazatan,baina, antzekotasun- eta ahaidetasun-erlazioak nahastuta daude hitzen errepresentazioetan. Bestetik,ezagutza-baseetako errepresentazioak kalkulatzea konputazionalki garestia da, baina, ezagutza-baseetanantzekotasun- eta ahaidetasun-erlazioak esplizituak dira.Tesi-lan honen xedea antzekotasun-atazako emaitzak hobetzea da, eta, azken hori hitzen errepresentaziosemantiko hobeak erdiesteko teknikez burutuko dugu. Gure hipotesi nagusia testu-corpusetako etaezagutza-baseetako informazioa desberdina eta osagarria dela da. Gure aburuz, bi iturri horiek konbinatuzgero hitzen errepresentazioen arteko antzekotasun-emaitzak hobetuko dira, eta, ondorioz, errepresentaziohobeak izango ditugu. Hipotesi hori, gainera, elearteko erlazioetara hedatu dugu. elearteko antzekotasunaeta ahaidetasuna ere esploratuz. Izan ere, bi baliabide horiek antzekotasunaren edota ahaidetasunarennabardura desberdinak jasotzen dituzte, eta, konbinatuz gero, antzekotasuna eta ahaidetasuna hobetomodelatuko dute.Tesi-lan honen bitartez aurreko paragrafoko hipotesiak frogatu ditugu, eta egindako ekarpenak hurrengohirurak dira: (1) ausazko ibilbideen metodo batekin ezagutza-baseetako informazio estrukturala corpusbatean kodetzea, eta azken horren hitzen errepresentazio semantikoak kalkulatzea; (2) testuko etaezagutza-baseetako informazio semantikoa konbinatzeko hainbat metodo eta errepresentazio hibridoproposatzea; (3) aurretik proposatutako guztiak elearteko erlazioetan aplikatzea.Aipatuako metodo eta konbinaketa oro antzekotasun-atazan ebaluatu ditugu, beren emaitzak artearenegoerako metodo baliokideekin konparatuz. Gure proposamenek antzekotasun-atazako artearen egoeraberdindu edo gainditu dute, eta gure hipotesiak betetzen direla ondorioztatu dugu