4 research outputs found

    Organizing Contextual Knowledge for Arabic Text Disambiguation and Terminology Extraction.

    Get PDF
    Ontologies have an important role in knowledge organization and information retrieval. Domain ontologies are composed of concepts represented by domain relevant terms. Existing approaches of ontology construction make use of statistical and linguistic information to extract domain relevant terms. The quality and the quantity of this information influence the accuracy of terminologyextraction approaches and other steps in knowledge extraction and information retrieval. This paper proposes an approach forhandling domain relevant terms from Arabic non-diacriticised semi-structured corpora. In input, the structure of documentsis exploited to organize knowledge in a contextual graph, which is exploitedto extract relevant terms. This network contains simple and compound nouns handled by a morphosyntactic shallow parser. The noun phrases are evaluated in terms of termhood and unithood by means of possibilistic measures. We apply a qualitative approach, which weighs terms according to their positions in the structure of the document. In output, the extracted knowledge is organized as network modeling dependencies between terms, which can be exploited to infer semantic relations.We test our approach on three specific domain corpora. The goal of this evaluation is to check if our model for organizing and exploiting contextual knowledge will improve the accuracy of extraction of simple and compound nouns. We also investigate the role of compound nouns in improving information retrieval results

    Computational Reading of Arabic Biographical Collections with Special Reference to Preaching in the Sunni World (661--1300 CE).

    Full text link
    A project in the digital humanities, the dissertation explores methods of computational text analysis. Relying on text-mining techniques to extract meaningful data from unstructured text, the study offers an effective and flexible method for the analysis of Arabic biographical collections, the most valuable source for the social history of the pre-modern Islamic world. It uses the largest collection, "The History of Islam" of al-Dhahabi (d. 1348), as a case-study of applying the new method and shows how almost 30,000 biographies can be studied as a whole. A step toward finding a viable solution for studying the entire digital corpus of classical Islamic texts (400 mln. words), Chapter I offers a detailed explanation of "computational reading" that was built upon existing digital approaches from a variety of disciplines. Chapter II models big data extracted from the main source to further our understanding of the social geography of the Islamic world and its major social transformations, simultaneously providing an important background for the next chapter. Chapter III applies the devised method to the study of Islamic preaching from chronological, geographical and social perspectives that have been overlooked in the academic treatment of this subject. Largely an exploratory overview, it traces long-term changes in preaching practices as well as statuses of preachers within the Islamic elites. This chapter demonstrates how exactly computational reading can contribute to the studies of specific phenomena and practices. The final section overviews broad prospects of the further application of "computational reading" to a variety of genres of pre-modern Arabic literature. The dissertation heavily relies on the visual display of information in the form of graphs, charts, maps, and tables that are used in the main body and supplied in Appendices.PHDNear Eastern StudiesUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102300/1/romanov_1.pd
    corecore