108 research outputs found
Word Sense Disambiguation on English Translation of Holy Quran
This article proposes a system based on the interpretation on the Quranic text that has been translated into English language using word sense disambiguation. This system is based on a combination of three traditional semantic similarity measurements, which are Wu-Palmer (WUP), Lin (LIN), and Jiang-Conrath (JCN) for word sense disambiguation on the English Al-Quran. The experiment was performed to obtain the best overall similarity score. The empirical results demonstrate that the combination of the three mentioned semantic similarity techniques obtained competitive results when compared with using individual similarity measurements
Evaluation on knowledge extraction and machine learning in resolving Malay word ambiguity
The involvement of linguistic professionals in resolving the ambiguity of a word within a particular context will produce a concise meaning of the words that are found in the lexical knowledge based collection. Motivated from that issue, we employed lexical knowledge and machine learning approach which includes the integration of data or/and information from the lexical knowledge based, that is Malay collections which linked to the ambiguous words. We used the most open class word and removed the stop words from the targeted sentences. Experiments have been conducted with and without lexical knowledge on 50 ambiguous words. The Word Sense Disambiguation (WSD) method is determined by machine learning, corpus based approaches namely Malay-Malay corpus and English-Malay corpus. The results show that the proposed method has improved the precision in resolving ambiguity.Keywords: ambiguity; lexical knowledge; machine learning; Malay wor
A Unified multilingual semantic representation of concepts
Semantic representation lies at the core of several applications in Natural Language Processing. However, most existing semantic representation techniques cannot be used effectively for the representation of individual word senses. We put forward a novel multilingual concept representation, called MUFFIN , which not only enables accurate representation of word senses in different languages, but also provides multiple advantages over existing approaches. MUFFIN represents a given concept in a unified semantic space irrespective of the language of interest, enabling cross-lingual comparison of different concepts. We evaluate our approach in two different evaluation benchmarks, semantic similarity and Word Sense Disambiguation, reporting state-of-the-art performance on several standard datasets
Meaning refinement to improve cross-lingual information retrieval
Magdeburg, Univ., Fak. für Informatik, Diss., 2012von Farag Ahme
The Effectiveness of Concept Based Search for Video Retrieval
In this paper we investigate how a small number of high-level concepts\ud
derived for video shots, such as Sport. Face.Indoor. etc., can be used effectively for ad hoc search in video material. We will answer the following questions: 1) Can we automatically construct concept queries from ordinary text queries? 2) What is the best way to combine evidence from single concept detectors into final search results? We evaluated algorithms for automatic concept query formulation using WordNet based concept extraction, and we evaluated algorithms for fast, on-line combination of concepts. Experimental results on data from the TREC Video 2005 workshop and 25 test users show the following. 1) Automatic query formulation through WordNet based concept extraction can achieve comparable results to user created query concepts and 2) Combination methods that take neighboring shots into account outperform more simple combination methods
WORD SENSE DISAMBIGUATION WITHIN A MULTILINGUAL FRAMEWORK
Word Sense Disambiguation (WSD) is the process of resolving the meaning of a
word unambiguously in a given natural language context. Within the scope of this
thesis, it is the process of marking text with explicit sense labels.
What constitutes a sense is a subject of great debate. An appealing perspective,
aims to define senses in terms of their multilingual correspondences, an idea explored
by several researchers, Dyvik (1998), Ide (1999), Resnik & Yarowsky (1999), and
Chugur, Gonzalo & Verdejo (2002) but to date it has not been given any practical
demonstration. This thesis is an empirical validation of these ideas of characterizing
word meaning using cross-linguistic correspondences. The idea is that word meaning
or word sense is quantifiable as much as it is uniquely translated in some language or
set of languages.
Consequently, we address the problem of WSD from a multilingual perspective;
we expand the notion of context to encompass multilingual evidence. We devise a
new approach to resolve word sense ambiguity in natural language, using a source of
information that was never exploited on a large scale for WSD before.
The core of the work presented builds on exploiting word correspondences across
languages for sense distinction. In essence, it is a practical and functional implementation
of a basic idea common to research interest in defining word meanings in
cross-linguistic terms.
We devise an algorithm, SALAAM for Sense Assignment Leveraging Alignment
And Multilinguality, that empirically investigates the feasibility and the validity of utilizing
translations for WSD. SALAAM is an unsupervised approach for word sense
tagging of large amounts of text given a parallel corpus — texts in translation — and
a sense inventory for one of the languages in the corpus. Using SALAAM, we obtain
large amounts of sense annotated data in both languages of the parallel corpus, simultaneously.
The quality of the tagging is rigorously evaluated for both languages of the
corpora.
The automatic unsupervised tagged data produced by SALAAM is further utilized
to bootstrap a supervised learning WSD system, in essence, combining supervised and
unsupervised approaches in an intelligent way to alleviate the resources acquisition
bottleneck for supervised methods. Essentially, SALAAM is extended as an unsupervised
approach for WSD within a learning framework; in many of the cases of the
words disambiguated, SALAAM coupled with the machine learning system rivals the
performance of a canonical supervised WSD system that relies on human tagged data
for training.
Realizing the fundamental role of similarity for SALAAM, we investigate different
dimensions of semantic similarity as it applies to verbs since they are relatively
more complex than nouns, which are the focus of the previous evaluations. We design
a human judgment experiment to obtain human ratings on verbs’ semantic similarity.
The obtained human ratings are cast as a reference point for comparing different
automated similarity measures that crucially rely on various sources of information.
Finally, a cognitively salient model integrating human judgments in SALAAM is proposed
as a means of improving its performance on sense disambiguation for verbs in
particular and other word types in general
- …