5 research outputs found

    Malay-language stemmer

    Get PDF
    Stemming is the removal of affixes (prefixes and suffixes) in a word in order to generate its root word. The objectives of this research were to build a software stemmer that can stem any given Malay word, and to develop a standard stemming algorithm for the Malay language. The Malay language was chosen because a complete stemmer for this language is unavailable. Stemmers have a wide variety of applications, such as in information retrieval and machine translation. It is expected that when this system is fully developed, it will benefit users and customers tremendously

    A Case Study of Using Domain Analysis for the Conflation Algorithms Domain

    Get PDF
    This paper documents the domain engineering process for much of the conflation algorithms domain. Empirical data on the process and products of domain engineering were collected. Six conflation algorithms of four different types: three affix removal, one successor variety, one table lookup, and one n-gram were analyzed. Products of the analysis include a generic architecture, reusable components, a little language and an application generator that extends the scope of the domain analysis beyond previous generators. The application generator produces source code for not only affix removal type but also successor variety, table lookup, and n-gram stemmers. The performance of the stemmers generated automatically was compared with the stemmers developed manually in terms of stem similarity, source and executable sizes, and development and execution times. All five stemmers generated by the application generator produced more than 99.9% identical stems with the manually developed stemmers. Some of the generated stemmers were as efficient as their manual equivalents and some were not

    The Information-seeking Strategies of Humanities Scholars Using Resources in Languages Other Than English

    Get PDF
    ABSTRACT THE INFORMATION-SEEKING STRATEGIES OF HUMANITIES SCHOLARS USING RESOURCES IN LANGUAGES OTHER THAN ENGLISH by Carol Sabbar The University of Wisconsin-Milwaukee, 2016 Under the Supervision of Dr. Iris Xie This dissertation explores the information-seeking strategies used by scholars in the humanities who rely on resources in languages other than English. It investigates not only the strategies they choose but also the shifts that they make among strategies and the role that language, culture, and geography play in the information-seeking context. The study used purposive sampling to engage 40 human subjects, all of whom are post-doctoral humanities scholars based in the United States who conduct research in a variety of languages. Data were collected through semi-structured interviews and research diaries in order to answer three research questions: What information-seeking strategies are used by scholars conducting research in languages other than English? What shifts do scholars make among strategies in routine, disruptive, and/or problematic situations? And In what ways do language, culture, and geography play a role in the information-seeking context, especially in the problematic situations? The data were then analyzed using grounded theory and the constant comparative method. A new conceptual model – the information triangle – was used and is presented in this dissertation to categorize and visually map the strategies and shifts. Based on data collected, thirty distinct strategies were identified and divided into four categories: formal system, informal resource, interactive human, and hybrid strategies. Three types of shifts were considered: planned, opportunistic, and alternative. Finally, factors related to language, culture, and geography were identified and analyzed according to their roles in the information-seeking context. This study is the first of its kind to combine the study of information-seeking behaviors with the factors of language, culture, and geography, and as such, it presents numerous methodological and practical implications along with many opportunities for future research
    corecore