9 research outputs found

    Organizing Contextual Knowledge for Arabic Text Disambiguation and Terminology Extraction.

    Get PDF
    Ontologies have an important role in knowledge organization and information retrieval. Domain ontologies are composed of concepts represented by domain relevant terms. Existing approaches of ontology construction make use of statistical and linguistic information to extract domain relevant terms. The quality and the quantity of this information influence the accuracy of terminologyextraction approaches and other steps in knowledge extraction and information retrieval. This paper proposes an approach forhandling domain relevant terms from Arabic non-diacriticised semi-structured corpora. In input, the structure of documentsis exploited to organize knowledge in a contextual graph, which is exploitedto extract relevant terms. This network contains simple and compound nouns handled by a morphosyntactic shallow parser. The noun phrases are evaluated in terms of termhood and unithood by means of possibilistic measures. We apply a qualitative approach, which weighs terms according to their positions in the structure of the document. In output, the extracted knowledge is organized as network modeling dependencies between terms, which can be exploited to infer semantic relations.We test our approach on three specific domain corpora. The goal of this evaluation is to check if our model for organizing and exploiting contextual knowledge will improve the accuracy of extraction of simple and compound nouns. We also investigate the role of compound nouns in improving information retrieval results

    Arabic Query Expansion Using WordNet and Association Rules

    Get PDF
    Query expansion is the process of adding additional relevant terms to the original queries to improve the performance of information retrieval systems. However, previous studies showed that automatic query expansion using WordNet do not lead to an improvement in the performance. One of the main challenges of query expansion is the selection of appropriate terms. In this paper, we review this problem using Arabic WordNet and Association Rules within the context of Arabic Language. The results obtained confirmed that with an appropriate selection method, we are able to exploit Arabic WordNet to improve the retrieval performance. Our empirical results on a sub-corpus from the Xinhua collection showed that our automatic selection method has achieved a significant performance improvement in terms of MAP and recall and a better precision with the first top retrieved documents

    Semantic and Contextual Knowledge Representation for Lexical Disambiguation: Case of Arabic-French Query Translation

    Get PDF
    We present in this paper, an automatic query translation system in cross-language information retrieval (Arabic-French). For the lexical disambiguation, our system combines between two resources: a bilingual dictionary and a parallel corpus. To select the best translation, our method is based on a correspondence measure between two semantic networks. The first one represents the senses of ambiguous terms of the query. The second one is a semantic network contextually enriched, representing the collection of sentences responding to the query. This collection forms the knowledge base of our disambiguation method and it is obtained by alignment with the relevant sentences in Arabic. The evaluation of the proposed system shows the advantage of the contextual enrichment on the quality of the translation. We obtained a high precision, relatively proportional to the precision provided by the used alignment. Finally, our translation demonstrates its potential by comparing its Bleu score with that of Google translate.</p

    Ontology model for zakat hadith knowledge based on causal relationship, semantic relatedness and suggestion extraction

    Get PDF
    Hadith is the second most important source used by all Muslims. However, semantic ambiguity in the hadith raises issues such as misinterpretation, misunderstanding, and misjudgement of the hadith’s content. How to tackle the semantic ambiguity will be focused on this research (RQ). The Zakat hadith data should be expressed semantically by changing the surface-level semantics to a deeper sense of the intended meaning. This can be achieved using an ontology model covering three main aspects (i.e., semantic relationship extraction, causal relationship representation, and suggestion extraction). This study aims to resolve the semantic ambiguity in hadith, particularly in the Zakat topic by proposing a semantic approach to resolve semantic ambiguity, representing causal relationships in the Zakat ontology model, proposing methods to extract suggestion polarity in hadith, and building the ontology model for Zakat topic. The selection of the Zakat topic is based on the survey findings that respondents still lack knowledge and understanding of the Zakat process. Four hadith book types (i.e., Sahih Bukhari, Sahih Muslim, Sunan Abu Dawud, and Sunan Ibn Majah) that was covering 334 concept words and 247 hadiths were analysed. The Zakat ontology modelling cover three phases which are Preliminary study, source selection and data collection, data pre-processing and analysis, and development and evaluation of ontology models. Domain experts in language, Zakat hadith, and ontology have evaluated the Zakat ontology and identified that 85% of Zakat concept was defined correctly. The Ontology Usability Scale was used to evaluate the final ontology model. An expert in ontology development evaluated the ontology that was developed in Protégé OWL, while 80 respondents evaluated the ontology concepts developed in PHP systems. The evaluation results show that the Zakat ontology has resolved the issue of ambiguity and misunderstanding of the Zakat process in the Zakat hadith. The Zakat ontology model also allows practitioners in Natural language processing (NLP), hadith, and ontology to extract Zakat hadith based on the representation of a reusable formal model, as well as causal relationships and the suggestion polarity of the Zakat hadith

    Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering

    Full text link
    This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm

    The development of a fuzzy semantic sentence similarity measure

    Get PDF
    A problem in the field of semantic sentence similarity is the inability of sentence similarity measures to accurately represent the effect perception based (fuzzy) words, which are commonly used in natural language, have on sentence similarity. This research project developed a new sentence similarity measure to solve this problem. The new measure, Fuzzy Algorithm for Similarity Testing (FAST) is a novel ontology-based similarity measure that uses concepts of fuzzy and computing with words to allow for the accurate representation of fuzzy based words. Through human experimentation fuzzy sets were created for six categories of words based on their levels of association with particular concepts. These fuzzy sets were then defuzzified and the results used to create new ontological relations between the fuzzy words contained within them and from that a new fuzzy ontology was created. Using these relationships allows for the creation of a new ontology-based fuzzy semantic text similarity algorithm that is able to show the effect of fuzzy words on computing sentence similarity as well as the effect that fuzzy words have on non-fuzzy words within a sentence. In order to evaluate FAST, two new test datasets were created through the use of questionnaire based human experimentation. This involved the generation of a robust methodology for creating usable fuzzy datasets (including an automated method that was used to create one of the two fuzzy datasets). FAST was evaluated through experiments conducted using the new fuzzy datasets. The results of the evaluation showed that there was an improved level of correlation between FAST and human test results over two existing sentence similarity measures demonstrating its success in representing the similarity between pairs of sentences containing fuzzy words

    31th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    Information modelling is becoming more and more important topic for researchers, designers, and users of information systems.The amount and complexity of information itself, the number of abstractionlevels of information, and the size of databases and knowledge bases arecontinuously growing. Conceptual modelling is one of the sub-areas ofinformation modelling. The aim of this conference is to bring together experts from different areas of computer science and other disciplines, who have a common interest in understanding and solving problems on information modelling and knowledge bases, as well as applying the results of research to practice. We also aim to recognize and study new areas on modelling and knowledge bases to which more attention should be paid. Therefore philosophy and logic, cognitive science, knowledge management, linguistics and management science are relevant areas, too. In the conference, there will be three categories of presentations, i.e. full papers, short papers and position papers
    corecore