570 research outputs found

    The Information-seeking Strategies of Humanities Scholars Using Resources in Languages Other Than English

    Get PDF
    ABSTRACT THE INFORMATION-SEEKING STRATEGIES OF HUMANITIES SCHOLARS USING RESOURCES IN LANGUAGES OTHER THAN ENGLISH by Carol Sabbar The University of Wisconsin-Milwaukee, 2016 Under the Supervision of Dr. Iris Xie This dissertation explores the information-seeking strategies used by scholars in the humanities who rely on resources in languages other than English. It investigates not only the strategies they choose but also the shifts that they make among strategies and the role that language, culture, and geography play in the information-seeking context. The study used purposive sampling to engage 40 human subjects, all of whom are post-doctoral humanities scholars based in the United States who conduct research in a variety of languages. Data were collected through semi-structured interviews and research diaries in order to answer three research questions: What information-seeking strategies are used by scholars conducting research in languages other than English? What shifts do scholars make among strategies in routine, disruptive, and/or problematic situations? And In what ways do language, culture, and geography play a role in the information-seeking context, especially in the problematic situations? The data were then analyzed using grounded theory and the constant comparative method. A new conceptual model – the information triangle – was used and is presented in this dissertation to categorize and visually map the strategies and shifts. Based on data collected, thirty distinct strategies were identified and divided into four categories: formal system, informal resource, interactive human, and hybrid strategies. Three types of shifts were considered: planned, opportunistic, and alternative. Finally, factors related to language, culture, and geography were identified and analyzed according to their roles in the information-seeking context. This study is the first of its kind to combine the study of information-seeking behaviors with the factors of language, culture, and geography, and as such, it presents numerous methodological and practical implications along with many opportunities for future research

    Universal Word Segmentation: Implementation and Interpretation

    Get PDF
    Word segmentation is a low-level NLP taskt hat is non-trivial for a considerable number of languages. In this paper, we present asequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains state-of-the-art accuracies on all the UD languages. It performs substantially better on languages that are non-trivial to segment, such as Chinese, Japanese, Arabic and Hebrew, when compared to previous work

    Key Expressions of Posttraumatic Distress in Cambodian Children: A Step Toward Culturally-Sensitive Trauma Assessment and Intervention

    Get PDF
    More than half of all children in Cambodia experience direct abuse and over 70% other traumatic events, which significantly increase risk for a range of physical and mental health problems. Additionally, Cambodian children face longstanding sociopolitical, intergenerational, and cultural factors that compound the impact of direct victimization. As a result, rates of posttraumatic stress symptoms among Cambodian youth are high. However, care providers often rely on Western-based nosology that does not account for culturally specific expressions of trauma. Lack of knowledge surrounding the expressions of distress that best represent the experience of traumatized Cambodian children hinders diagnostic accuracy and treatment effectiveness. To address this problem, the current study utilized a qualitative design to interview 30 Cambodian caregivers of children with trauma experiences and 30 Cambodian children (ages 10–13 years) with trauma experiences to identify key local expressions of trauma. Findings reveal certain PTSD symptoms and culturally specific frequent and severe posttraumatic problems for Cambodian children and domains of functioning impacted by trauma. Certain symptoms seem particularly important to evaluate in this group, such as anger, physical complaints (e.g., headache and palpitations), and cognitive-focused complaints (in particular, “thinking too much”). All caregivers and children reported physical health as impacted by posttraumatic problems, highlighting a particularly salient domain of functioning for this population. Expressions of distress explored in the current study are discussed in the context of assessment and intervention development to inform diagnostic and clinical efforts for those working with trauma-exposed Cambodian children

    Book Reviews

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationKơho, a Mon-Khmer (Austroasiatic) language, is spoken by an indigenous population of more than 207,000 people located in Lâm Đồng province in the highland region of Vietnam. There are also several thousand additional members of this ethnic group who live in France and the United States (primarily North Carolina). The goal of this dissertation is to describe the Kơho-Sre language in such a manner that it is accessible both to linguists and also to those in the Kơho-speaking community interested in their own language. This grammar-based on a linguistic analysis that is informed by current linguistic theory and best practices in the field-includes phonological, morphological, and syntactic data. A grammatical description of Kơho is needed, in spite of the fact that a literature of the language does exist. This is because (1) adequate documentation is not achieved by the extant literature; (2) materials are dated and do not reflect recent advances in typology and linguistic analysis; (3) many materials are published in Russian and Vietnamese or are not readily available to most researchers; and (4) earlier descriptions are cast in frameworks that are not amenable to contemporary documentary linguistic analysis. This dissertation, based on data collected during fieldwork in Vietnam and North Carolina, supplemented with previously published syntactic and lexicographic materials, provides an overview of the grammatical structure of Sre. Sre is a polysyllabic (usually dissyllabic) language with a synchronic tendency towards reduction of the presyllable (the weaker or minor syllable) and development in the remaining (main or major) syllable of contrastive pitch characteristics associated with vowel length. Vowel length, in turn, is influenced by the main syllable coda. A formerly complex system of nominal classifiers (operating in the pattern: numeral + classifier + noun) has been reduced to three generally used classifiers. Sentence structure is subject + verb + object with a fairly rigid word order with some phrase or clause movement to indicate certain syntactic functions

    Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings

    Get PDF
    Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, or pairs of translated sentences. In this thesis, we directly incorporate comparable corpora into the estimation of end-to-end SMT models. In contrast to parallel corpora, comparable corpora are pairs of monolingual corpora that have some cross-lingual similarities, for example topic or publication date, but that do not necessarily contain any direct translations. Comparable corpora are more readily available in large quantities than parallel corpora, which require significant human effort to compile. We use comparable corpora to estimate machine translation model parameters and show that doing so improves performance in settings where a limited amount of parallel data is available for training. The major contributions of this thesis are the following: * We release ‘language packs’ for 151 human languages, which include bilingual dictionaries, comparable corpora of Wikipedia document pairs, comparable corpora of time-stamped news text that we harvested from the web, and, for non-roman script languages, dictionaries of name pairs, which are likely to be transliterations. * We present a novel technique for using a small number of example word translations to learn a supervised model for bilingual lexicon induction which takes advantage of a wide variety of signals of translation equivalence that can be estimated over comparable corpora. * We show that using comparable corpora to induce new translations and estimate new phrase table feature functions improves end-to-end statistical machine translation performance for low resource language pairs as well as domains. * We present a novel algorithm for composing multiword phrase translations from multiple unigram translations and then use comparable corpora to prune the large space of hypothesis translations. We show that these induced phrase translations improve machine translation performance beyond that of component unigrams. This thesis focuses on critical low resource machine translation settings, where insufficient parallel corpora exist for training statistical models. We experiment with both low resource language pairs and low resource domains of text. We present results from our novel error analysis methodology, which show that most translation errors in low resource settings are due to unseen source language words and phrases and unseen target language translations. We also find room for fixing errors due to how different translations are weighted, or scored, in the models. We target both error types; we use comparable corpora to induce new word and phrase translations and estimate novel translation feature scores. Our experiments show that augmenting baseline SMT systems with new translations and features estimated over comparable corpora improves translation performance significantly. Additionally, our techniques expand the applicability of statistical machine translation to those language pairs for which zero parallel text is available

    English/Russian lexical cognates detection using NLP Machine Learning with Python

    Full text link
    Изучение языка – это замечательное занятие, которое расширяет наш кругозор и позволяет нам общаться с представителями различных культур и людей по всему миру. Традиционно языковое образование основывалось на традиционных методах, таких как учебники, словарный запас и языковой обмен. Однако с появлением машинного обучения наступила новая эра в обучении языку, предлагающая инновационные и эффективные способы ускорения овладения языком. Одним из интригующих применений машинного обучения в изучении языков является использование родственных слов, слов, которые имеют схожее значение и написание в разных языках. Для решения этой темы в данной исследовательской работе предлагается облегчить процесс изучения второго языка с помощью искусственного интеллекта, в частности нейронных сетей, которые могут идентифицировать и использовать слова, похожие или идентичные как на первом языке учащегося, так и на целевом языке. Эти слова, известные как лексические родственные слова, могут облегчить изучение языка, предоставляя учащимся знакомый ориентир и позволяя им связывать новый словарный запас со словами, которые они уже знают. Используя возможности нейронных сетей для обнаружения и использования этих родственных слов, учащиеся смогут ускорить свой прогресс в освоении второго языка. Хотя исследование семантического сходства в разных языках не является новой темой, наша цель состоит в том, чтобы применить другой подход для выявления русско-английских лексических родственных слов и представить полученные результаты в качестве инструмента изучения языка, используя выборку данных о лексическом и семантическом сходстве. между языками, чтобы построить модель обнаружения лексических родственных слов и ассоциаций слов. Впоследствии, в зависимости от нашего анализа и результатов, мы представим приложение для определения словесных ассоциаций, которое смогут использовать конечные пользователи. Учитывая, что русский и английский являются одними из наиболее распространенных языков в мире, а Россия является популярным местом для иностранных студентов со всего мира, это послужило значительной мотивацией для разработки инструмента искусственного интеллекта, который поможет людям, изучающим русский язык как англоговорящие, или изучающим английский язык. как русскоязычные.Language learning is a remarkable endeavor that expands our horizons and allows us to connect with diverse cultures and people around the world. Traditionally, language education has relied on conventional methods such as textbooks, vocabulary drills, and language exchanges. However, with the advent of machine learning, a new era has dawned upon language instruction, offering innovative and efficient ways to accelerate language acquisition. One intriguing application of machine learning in language learning is the utilization of cognates, words that share similar meanings and spellings across different languages. To address this subject, this research paper proposes to facilitate the process of acquiring a second language with the help of artificial intelligence, particularly neural networks, which can identify and use words that are similar or identical in both the learner's first language and the target language. These words, known as lexical cognates which can facilitate language learning by providing a familiar point of reference for the learner and enabling them to associate new vocabulary with words they already know. By leveraging the power of neural networks to detect and utilize these cognates, learners will be able to accelerate their progress in acquiring a second language. Although the study of semantic similarity across different languages is not a new topic, our objective is to adopt a different approach for identifying Russian-English Lexical cognates and present the obtained results as a language learning tool, by using the lexical and semantic similarity data sample across languages to build a lexical cognates detection and words association model. Subsequently, depend on our analysis and results, will present a word association application that can be utilized by end users. Given that Russian and English are among the most widely spoken languages globally and that Russia is a popular destination for international students from around the world, it served as a significant motivation to develop an AI tool to assist individuals learning Russian as English speakers or learning English as Russian speakers
    corecore