122 research outputs found

    Development of Multilingual Resource Management Mechanisms for Libraries

    Get PDF
    Multilingual is one of the important concept in any library. This study is create on the basis of global recommendations and local requirement for each and every libraries. Select the multilingual components for setting up the multilingual cluster in different libraries to each user. Development of multilingual environment for accessing and retrieving the library resources among the users as well as library professionals. Now, the methodology of integration of Google Indic Transliteration for libraries have follow the five steps such as (i) selection of transliteration tools for libraries (ii) comparison of tools for libraries (iii) integration Methods in Koha for libraries (iv) Development of Google indic transliteration in Koha for users (v) testing for libraries (vi) results for libraries. Development of multilingual framework for libraries is also an important task in integrated library system and in this section have follow the some important steps such as (i) Bengali Language Installation in Koha for libraries (ii) Settings Multilingual System Preferences in Koha for libraries (iii) Translate the Modules for libraries (iv) Bengali Interface in Koha for libraries. Apart from these it has also shows the Bengali data entry process in Koha for libraries such as Data Entry through Ibus Avro Phonetics for libraries and Data Entry through Virtual Keyboard for libraries. Development of Multilingual Digital Resource Management for libraries by using the DSpace and Greenstone. Management of multilingual for libraries in different areas such as federated searching (VuFind Multilingual Discovery tool ; Multilingual Retrieval in OAI-PMH tool ; Multilingual Data Import through Z39.50 Server ). Multilingual bibliographic data edit through MarcEditor for the better management of integrated library management system. It has also create and editing the content by using the content management system tool for efficient and effective retrieval of multilingual digital content resources among the users

    Sinhala and Tamil : a case of contact-induced restructuring

    Get PDF
    PhD ThesisThe dissertation presents a comparative synchronic study of the morphosyntactic features of modern spoken Sinhala and Tamil, the two main languages of Sri Lanka. The main motivation of the research is that Sinhala and Tamil, two languages of diverse origins—the New Indo-Aryan (NIA) and Dravidian families respectively—share a wide spectrum of morphosyntactic features. Sinhala has long been isolated from the other NIA languages and co-existed with Tamil in Sri Lanka ever since both reached Sri Lanka from India. This coexistence, it is believed, led to what is known as the contact-induced restructuring that Sinhala morphosyntax has undergone on the model of Tamil, while retaining its NIA lexicon. Moreover, as languages of South Asia, the two languages share the areal features of this region. The research seeks to address the following questions: (i) What features do the two languages share and what features do they not share?; (ii) Are the features that they share areal features of the region or those diffused into one another owing to contact?; (iii) If the features that they share are due to contact, has diffusion taken place unidirectionally or bidirectionally?; and (iv) Does contact have any role to play with respect to features that they do not share? The claim that this research intends to substantiate is that Sinhala has undergone morphosyntactic restructuring on the model of Tamil. The research, therefore, attempts to answer another question: (v) Can the morphosyntactic restructuring that Sinhala has undergone be explained in syntactic terms? The morphosyntactic features of the two languages are analyzed at macro- and micro-levels. At the macro-level, a wide range of morphosyntactic features of Tamil and Sinhala, and those of seven other languages of the region are compared with a view to determining the origins of these features and showing the large scale morphosyntactic convergence between Sinhala and Tamil and the divergence between Sinhala and other NIA languages. At the micro-level the dissertation analyzes in detail two morphosyntactic phenomena, namely null arguments and focus constructions. It examines whether subject/verb agreement, which is different across the two languages, plays a role in the licensing of null arguments in each language. It also examines the nature of the changes Sinhala morphosyntax has undergone because of the two kinds of Tamil focus constructions that Sinhala has replicated. It is hoped, that this dissertation will make a significant contribution to the knowledge and understanding of the morphosyntax of the two languages, the effects of language contact on morphosyntax, and more generally, the nature of linguistic variation.Scholarship Programme of the Higher Education for the Twenty First Century (HETC) Project, Ministry of Higher Education, Sri Lanka

    Unity and diversity in grammaticalization scenarios

    Get PDF
    The volume contains a selection of papers originally presented at the symposium on “Areal patterns of grammaticalization and cross-linguistic variation in grammaticalization scenarios” held on 12-14 March 2015 at Johannes Gutenberg University of Mainz. The papers, written by leading scholars combining expertise in historical linguistics and grammaticalization research, study variation in grammaticalization scenarios in a variety of language families (Slavic, Indo-Aryan, Tibeto-Burman, Bantu, Mande, "Khoisan", Siouan, and Mayan). The volume stands out in the vast literature on grammaticalization by focusing on variation in grammaticalization scenarios and areal patterns in grammaticalization. Apart from documenting new grammaticalization paths, the volume makes a methodological contribution as it addresses an important question of how to reconcile universal outcomes of grammaticalization processes with the fact that the input to these processes is language-specific and construction-specific

    A Hybrid Machine Translation Framework for an Improved Translation Workflow

    Get PDF
    Over the past few decades, due to a continuing surge in the amount of content being translated and ever increasing pressure to deliver high quality and high throughput translation, translation industries are focusing their interest on adopting advanced technologies such as machine translation (MT), and automatic post-editing (APE) in their translation workflows. Despite the progress of the technology, the roles of humans and machines essentially remain intact as MT/APE are moving from the peripheries of the translation field closer towards collaborative human-machine based MT/APE in modern translation workflows. Professional translators increasingly become post-editors correcting raw MT/APE output instead of translating from scratch which in turn increases productivity in terms of translation speed. The last decade has seen substantial growth in research and development activities on improving MT; usually concentrating on selected aspects of workflows starting from training data pre-processing techniques to core MT processes to post-editing methods. To date, however, complete MT workflows are less investigated than the core MT processes. In the research presented in this thesis, we investigate avenues towards achieving improved MT workflows. We study how different MT paradigms can be utilized and integrated to best effect. We also investigate how different upstream and downstream component technologies can be hybridized to achieve overall improved MT. Finally we include an investigation into human-machine collaborative MT by taking humans in the loop. In many of (but not all) the experiments presented in this thesis we focus on data scenarios provided by low resource language settings.Aufgrund des stetig ansteigenden Übersetzungsvolumens in den letzten Jahrzehnten und gleichzeitig wachsendem Druck hohe Qualität innerhalb von kürzester Zeit liefern zu müssen sind Übersetzungsdienstleister darauf angewiesen, moderne Technologien wie Maschinelle Übersetzung (MT) und automatisches Post-Editing (APE) in den Übersetzungsworkflow einzubinden. Trotz erheblicher Fortschritte dieser Technologien haben sich die Rollen von Mensch und Maschine kaum verändert. MT/APE ist jedoch nunmehr nicht mehr nur eine Randerscheinung, sondern wird im modernen Übersetzungsworkflow zunehmend in Zusammenarbeit von Mensch und Maschine eingesetzt. Fachübersetzer werden immer mehr zu Post-Editoren und korrigieren den MT/APE-Output, statt wie bisher Übersetzungen komplett neu anzufertigen. So kann die Produktivität bezüglich der Übersetzungsgeschwindigkeit gesteigert werden. Im letzten Jahrzehnt hat sich in den Bereichen Forschung und Entwicklung zur Verbesserung von MT sehr viel getan: Einbindung des vollständigen Übersetzungsworkflows von der Vorbereitung der Trainingsdaten über den eigentlichen MT-Prozess bis hin zu Post-Editing-Methoden. Der vollständige Übersetzungsworkflow wird jedoch aus Datenperspektive weit weniger berücksichtigt als der eigentliche MT-Prozess. In dieser Dissertation werden Wege hin zum idealen oder zumindest verbesserten MT-Workflow untersucht. In den Experimenten wird dabei besondere Aufmertsamfit auf die speziellen Belange von sprachen mit geringen ressourcen gelegt. Es wird untersucht wie unterschiedliche MT-Paradigmen verwendet und optimal integriert werden können. Des Weiteren wird dargestellt wie unterschiedliche vor- und nachgelagerte Technologiekomponenten angepasst werden können, um insgesamt einen besseren MT-Output zu generieren. Abschließend wird gezeigt wie der Mensch in den MT-Workflow intergriert werden kann. Das Ziel dieser Arbeit ist es verschiedene Technologiekomponenten in den MT-Workflow zu integrieren um so einen verbesserten Gesamtworkflow zu schaffen. Hierfür werden hauptsächlich Hybridisierungsansätze verwendet. In dieser Arbeit werden außerdem Möglichkeiten untersucht, Menschen effektiv als Post-Editoren einzubinden

    Studying the Effect and Treatment of Misspelled Queries in Cross-Language Information Retrieval

    Get PDF
    [Abstract] The performance of Information Retrieval systems is limited by the linguistic variation present in natural language texts. Word-level Natural Language Processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance.Ministerio de Economía y Competitividad; FFI2014-51978-C2-1-RRede Galega de Procesamento da Linguaxe e Recuperación de Información; CN2014/034Ministerio de Economía y Competitividad; BES-2015-073768Ministerio de Economía y Competitividad; FFI2014-51978-C2-2-

    Unity and diversity in grammaticalization scenarios

    Get PDF
    The volume contains a selection of papers originally presented at the symposium on “Areal patterns of grammaticalization and cross-linguistic variation in grammaticalization scenarios” held on 12-14 March 2015 at Johannes Gutenberg University of Mainz. The papers, written by leading scholars combining expertise in historical linguistics and grammaticalization research, study variation in grammaticalization scenarios in a variety of language families (Slavic, Indo-Aryan, Tibeto-Burman, Bantu, Mande, "Khoisan", Siouan, and Mayan). The volume stands out in the vast literature on grammaticalization by focusing on variation in grammaticalization scenarios and areal patterns in grammaticalization. Apart from documenting new grammaticalization paths, the volume makes a methodological contribution as it addresses an important question of how to reconcile universal outcomes of grammaticalization processes with the fact that the input to these processes is language-specific and construction-specific

    Spoken Term Detection on Low Resource Languages

    Get PDF
    Developing efficient speech processing systems for low-resource languages is an immensely challenging problem. One potentially effective approach to address the lack of resources for any particular language, is to employ data from multiple languages for building speech processing sub-systems. This thesis investigates possible methodologies for Spoken Term Detection (STD) from low- resource Indian languages. The task of STD intend to search for a query keyword, given in text form, from a considerably large speech database. This is usually done by matching templates of feature vectors, representing sequence of phonemes from the query word and the continuous speech from the database. Typical set of features used to represent speech signals in most of the speech processing systems are the mel frequency cepstral coefficients (MFCC). As speech is a very complexsignal, holding information about the textual message, speaker identity, emotional and health state of the speaker, etc., the MFCC features derived from it will also contain information about all these factors. For eficient template matching, we need to neutralize the speaker variability in features and stabilize them to represent the speech variability alone
    corecore