105 research outputs found

    Proceedings of the 1st Conference on Central Asian Languages and Linguistics (ConCALL)

    Get PDF
    The Conference on Central Asian Languages and Linguistics (ConCALL) was founded in 2014 at Indiana University by Dr. Öner Özçelik, the residing director of the Center for Languages of the Central Asian Region (CeLCAR). As the nation’s sole U.S. Department of Education funded Language Resource Center focusing on the languages of the Central Asian Region, CeLCAR’s main mission is to strengthen and improve the nation’s capacity for teaching and learning Central Asian languages through teacher training, research, materials development projects, and dissemination. As part of this mission, CeLCAR has an ultimate goal to unify and fortify the Central Asian language learning community by facilitating networking between linguists and language educators, encouraging research projects that will inform language instruction, and provide opportunities for professionals in the field to both showcase their work and receive feedback from their peers. Thus ConCALL was established to be the first international academic conference to bring together linguists and language educators in the languages of the Central Asian region, including both the Altaic and Eastern Indo-European languages spoken in the region, to focus on research into how these specific languages are represented formally, as well as acquired by second/foreign language learners, and also to present research driven teaching methods. Languages served by ConCALL include, but are not limited to: Azerbaijani, Dari, Karakalpak, Kazakh, Kyrgyz, Lokaabharan, Mari, Mongolian, Pamiri, Pashto, Persian, Russian, Shughnani, Tajiki, Tibetan, Tofalar, Tungusic, Turkish, Tuvan, Uyghur, Uzbek, Wakhi and more!The Conference on Central Asian Languages and Linguistics held at Indiana University on 16-17 May 1014 was made possible through the generosity of our sponsors: Center for Languages of the Central Asian Region (CeLCAR), Ostrom Grant Programs, IU's College of Arts and Humanities Center (CAHI), Inner Asian and Uralic National Resource Center (IAUNRC), IU's School of Global and International Studies (SGIS), IU's College of Arts and Sciences, Sinor Research Institute for Inner Asian Studies (SRIFIAS), IU's Department of Central Eurasian Studies (CEUS), and IU's Department of Linguistics

    Examining Narratives of Place: Representations of Xinjiang in Tourism and Geography Education

    Get PDF
    This thesis examines how Xinjiang Uyghur Autonomous Region, located in northwest China, is represented in tourism and geographic education literature. The research demonstrates the limited and distorted place narratives of Xinjiang that are promoted by the government-backed tourist enterprise in China for consumption by English language speakers; as well as, the inadequate and uncritical representations of the region currently available to students in the United States. Qualitative content analysis methodology is employed to investigate the narrative representations of Xinjiang contained within tourist brochures, geography textbooks, and regionally appropriate curricular guides. The thesis includes a body of geographic lesson plans pertaining to Xinjiang I created that are informed by the research results. The purpose of this thesis is to move toward a more nuanced understanding of Xinjiang as a dynamic region of global significance, challenge prevailing stereotypes of the region, and strengthen geography literacy, particularly among school aged students

    Multiethnic Societies of Central Asia and Siberia Represented in Indigenous Oral and Written Literature

    Get PDF
    Central Asia and Siberia are characterized by multiethnic societies formed by a patchwork of often small ethnic groups. At the same time large parts of them have been dominated by state languages, especially Russian and Chinese. On a local level the languages of the autochthonous people often play a role parallel to the central national language. The contributions of this conference proceeding follow up on topics such as: What was or is collected and how can it be used under changed conditions in the research landscape, how does it help local ethnic communities to understand and preserve their own culture and language? Do the spatially dispersed but often networked collections support research on the ground? What contribution do these collections make to the local languages and cultures against the backdrop of dwindling attention to endangered groups? These and other questions are discussed against the background of the important role libraries and private collections play for multiethnic societies in often remote regions that are difficult to reach

    Multiethnic Societies of Central Asia and Siberia Represented in Indigenous Oral and Written Literature

    Get PDF
    Central Asia and Siberia are characterized by multiethnic societies formed by a patchwork of often small ethnic groups. At the same time large parts of them have been dominated by state languages, especially Russian and Chinese. On a local level the languages of the autochthonous people often play a role parallel to the central national language. The contributions of this conference proceeding follow up on topics such as: What was or is collected and how can it be used under changed conditions in the research landscape, how does it help local ethnic communities to understand and preserve their own culture and language? Do the spatially dispersed but often networked collections support research on the ground? What contribution do these collections make to the local languages and cultures against the backdrop of dwindling attention to endangered groups? These and other questions are discussed against the background of the important role libraries and private collections play for multiethnic societies in often remote regions that are difficult to reach

    Cold-start universal information extraction

    Get PDF
    Who? What? When? Where? Why? are fundamental questions asked when gathering knowledge about and understanding a concept, topic, or event. The answers to these questions underpin the key information conveyed in the overwhelming majority, if not all, of language-based communication. At the core of my research in Information Extraction (IE) is the desire to endow machines with the ability to automatically extract, assess, and understand text in order to answer these fundamental questions. IE has been serving as one of the most important components for many downstream natural language processing (NLP) tasks, such as knowledge base completion, machine reading comprehension, machine translation and so on. The proliferation of the Web also intensifies the need of dealing with enormous amount of unstructured data from various sources, such as languages, genres and domains. When building an IE system, the conventional pipeline is to (1) ask expert linguists to rigorously define a target set of knowledge types we wish to extract by examining a large data set, (2) collect resources and human annotations for each type, and (3) design features and train machine learning models to extract knowledge elements. In practice, this process is very expensive as each step involves extensive human effort which is not always available, for example, to specify the knowledge types for a particular scenario, both consumers and expert linguists need to examine a lot of data from that domain and write detailed annotation guidelines for each type. Hand-crafted schemas, which define the types and complex templates of the expected knowledge elements, often provide low coverage and fail to generalize to new domains. For example, none of the traditional event extraction programs, such as ACE (Automatic Content Extraction) and TAC-KBP, include "donation'' and "evacuation'' in their schemas in spite of their potential relevance to natural disaster management users. Additionally, these approaches are highly dependent on linguistic resources and human labeled data tuned to pre-defined types, so they suffer from poor scalability and portability when moving to a new language, domain, or genre. The focus of this thesis is to develop effective theories and algorithms for IE which not only yield satisfactory quality by incorporating prior linguistic and semantic knowledge, but also greater portability and scalability by moving away from the high cost and narrow focus of large-scale manual annotation. This thesis opens up a new research direction called Cold-Start Universal Information Extraction, where the full extraction and analysis starts from scratch and requires little or no prior manual annotation or pre-defined type schema. In addition to this new research paradigm, we also contribute effective algorithms and models towards resolving the following three challenges: How can machines extract knowledge without any pre-defined types or any human annotated data? We develop an effective bottom-up and unsupervised Liberal Information Extraction framework based on the hypothesis that the meaning and underlying knowledge conveyed by linguistic expressions is usually embodied by their usages in language, which makes it possible to automatically induces a type schema based on rich contextual representations of all knowledge elements by combining their symbolic and distributional semantics using unsupervised hierarchical clustering. How can machines benefit from available resources, e.g., large-scale ontologies or existing human annotations? My research has shown that pre-defined types can also be encoded by rich contextual or structured representations, through which knowledge elements can be mapped to their appropriate types. Therefore, we design a weakly supervised Zero-shot Learning and a Semi-Supervised Vector Quantized Variational Auto-Encoder approach that frames IE as a grounding problem instead of classification, where knowledge elements are grounded into any types from an extensible and large-scale target ontology or induced from the corpora, with available annotations for a few types. How can IE approaches be extent to low-resource languages without any extra human effort? There are more than 6000 living languages in the real world while public gold-standard annotations are only available for a few dominant languages. To facilitate the adaptation of these IE frameworks to other languages, especially low resource languages, a Multilingual Common Semantic Space is further proposed to serve as a bridge for transferring existing resources and annotated data from dominant languages to more than 300 low resource languages. Moreover, a Multi-Level Adversarial Transfer framework is also designed to learn language-agnostic features across various languages

    The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

    Get PDF
    Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009

    Buddhism in Central Asia I

    Get PDF
    Buddhism in Central Asia (Part I): Patronage, Legitimation, Sacred Space, and Pilgrimage, 6-14th Centuries deals with the various strategies of legitimation and the establishment of sacred space and pilgrimage among both trans-regional (Chinese, Indian, Tibetan) and local (Khotanese, Uyghur, Tangut, Kitan) Buddhist traditions. Readership: All interested in dynamics of inter-cultural encounter and Buddhist transfer in pre-modern Eastern Central Asia

    Resource Generation from Structured Documents for Low-density Languages

    Get PDF
    The availability and use of electronic resources for both manual and automated language related processing has increased tremendously in recent years. Nevertheless, many resources still exist only in printed form, restricting their availability and use. This especially holds true in low density languages or languages with limited electronic resources. For these documents, automated conversion into electronic resources is highly desirable. This thesis focuses on the semi-automated conversion of printed structured documents (dictionaries in particular) to usable electronic representations. In the first part we present an entry tagging system that recognizes, parses, and tags the entries of a printed dictionary to reproduce the representation. The system uses the consistent layout and structure of the dictionaries, and the features that impose this structure, to capture and recover lexicographic information. We accomplish this by adapting two methods: rule-based and HMM-based. The system is designed to produce results quickly with minimal human assistance and reasonable accuracy. The use of an adaptive transformation-based learning as a post-processor at two points in the system yields significant improvements, even with an extremely small amount of user provided training data. The second part of this thesis presents Morphology Induction from Noisy Data (MIND), a natural language morphology discovery framework that operates on information from limited, noisy data obtained from the conversion process. To use the resulting resources effectively, however, users must be able to search for them using the root form of morphologically deformed variant found in the text. Stemming and data driven methods are not suitable when data are sparse. The approach is based on the novel application of string searching algorithms. The evaluations show that MIND can segment words into roots and affixes from the noisy, limited data contained in a dictionary, and it can extract prefixes, suffixes, circumfixes, and infixes. MIND can also identify morphophonemic changes, i.e., phonemic variations between allomorphs of a morpheme, specifically point-of-affixation stem changes. This, in turn, allows non-native speakers to perform multilingual tasks for applications where response must be rapid, and they have limited knowledge. In addition, this analysis can feed other natural language processing tools requiring lexicons

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    The Racial State and Race Formation: A Comparative Case Study of the Use of Racial Narratives and Government Coercion for Racial Nation-State Building in Chile, China, and Myanmar

    Get PDF
    This is a study about what David Theo Goldberg (2002) describes as “the racial state”: a modern nation-state where rule and constructions of race are deeply intertwined (2002: 7). Racial difference, he posits, is one of the easiest and most potent ways a governing body in modern times can establish power, social order, and dominance (130). Racial projects are a robust form of establishing state formation and legitimacy as they create two intertwined identities: a national, racial state identity and personal identities who depend on the very racial narratives governing bodies create. Yet, such narratives do not gain support without arduous application. This study is also about coercion, symbolic and literal, and how governing bodies deploy violence as to enforce racial narratives and further establish legitimate governance. As Goldberg (2002) states, “Power is to the state and the state to power as blood is to the human body” (9). Guided by Goldberg’s (2002) theory, I set out to explore how the racial nation-state materializes across three distinct countries: China, Chile, and Myanmar. While highly distinct in many aspects, governing bodies in all are persecuting an indigenous, religious, or ethnic minority group and then are implementing racial narratives and government coercion to justify such suppression. Executing a secondary source, comparative analysis, I have focused on four themes I’ve made chapters—racial narratives, symbolic and real peripheries, rhetoric of terrorism, and methods of oppression—to argue that governing bodies in each of these three countries are coercively enforcing racial narratives as to achieve government legitimacy. I argue that type of government affects how racial narratives and government coercion manifest as well as threat of minority group separatism, but that ultimately, racial narratives are how governing bodies retain authority across all three countries. I conclude by predicting that a globally racist society is emerging in which methods of oppression and racial narratives are converging globally
    • …
    corecore