5 research outputs found
Malay-language stemmer
Stemming is the removal of affixes (prefixes and suffixes) in a word in order to generate its root word. The objectives of this research were to build a software stemmer that can stem any given Malay word, and to develop a standard stemming algorithm for the Malay language. The Malay language was chosen because a complete stemmer for this language is unavailable. Stemmers have a wide variety of applications, such as in information retrieval and machine translation. It is expected that when this system is fully developed, it will benefit users and customers tremendously
A Case Study of Using Domain Analysis for the Conflation Algorithms Domain
This paper documents the domain engineering process for much
of the conflation algorithms domain. Empirical data on the process and
products of domain engineering were collected. Six conflation
algorithms of four different types: three affix removal, one successor
variety, one table lookup, and one n-gram were analyzed. Products of
the analysis include a generic architecture, reusable components, a
little language and an application generator that extends the scope of
the domain analysis beyond previous generators. The application
generator produces source code for not only affix removal type but
also successor variety, table lookup, and n-gram stemmers. The
performance of the stemmers generated automatically was compared with
the stemmers developed manually in terms of stem similarity, source
and executable sizes, and development and execution times. All five
stemmers generated by the application generator produced more than
99.9% identical stems with the manually developed stemmers. Some of
the generated stemmers were as efficient as their manual equivalents
and some were not
The Information-seeking Strategies of Humanities Scholars Using Resources in Languages Other Than English
ABSTRACT
THE INFORMATION-SEEKING STRATEGIES OF HUMANITIES SCHOLARS
USING RESOURCES IN LANGUAGES OTHER THAN ENGLISH
by
Carol Sabbar
The University of Wisconsin-Milwaukee, 2016
Under the Supervision of Dr. Iris Xie
This dissertation explores the information-seeking strategies used by scholars in the humanities who rely on resources in languages other than English. It investigates not only the strategies they choose but also the shifts that they make among strategies and the role that language, culture, and geography play in the information-seeking context. The study used purposive sampling to engage 40 human subjects, all of whom are post-doctoral humanities scholars based in the United States who conduct research in a variety of languages. Data were collected through semi-structured interviews and research diaries in order to answer three research questions: What information-seeking strategies are used by scholars conducting research in languages other than English? What shifts do scholars make among strategies in routine, disruptive, and/or problematic situations? And In what ways do language, culture, and geography play a role in the information-seeking context, especially in the problematic situations? The data were then analyzed using grounded theory and the constant comparative method. A new conceptual model – the information triangle – was used and is presented in this dissertation to categorize and visually map the strategies and shifts. Based on data collected, thirty distinct strategies were identified and divided into four categories: formal system, informal resource, interactive human, and hybrid strategies. Three types of shifts were considered: planned, opportunistic, and alternative. Finally, factors related to language, culture, and geography were identified and analyzed according to their roles in the information-seeking context. This study is the first of its kind to combine the study of information-seeking behaviors with the factors of language, culture, and geography, and as such, it presents numerous methodological and practical implications along with many opportunities for future research