Search CORE

1,541 research outputs found

Sanskrit Lexical Sources: Digital Synthesis and Revision

Author: Peter M. Scharf
Peter M. Scharf
Thomas Malten
Publication venue: 'Modern Language Association'
Publication date: 01/01/2014
Field of study

The proposed project aims to synthesize, extend, revise, and improve the principal lexical reference works of Sanskrit, one of the world's richest culture-bearing languages, and to provide wide public access to them in the digital Sanskrit library

Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

Author: Sandhan Jivnesh
Publication venue
Publication date: 17/08/2023
Field of study

The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.Comment: Ph.D. dissertatio

arXiv.org e-Print Archive

Proposing a Multi-lingual Translation Scheme Utilizing the Extensible Markup Language XML

Author: R. Neupane Bhooshan
Yajima Shuzo
矢島脩三
Publication venue: 関西大学
Publication date: 21/12/2000
Field of study

The paper proposes a new idea concerning a Multi-lingual translation scheme utilizing the newly evolving Internet tag language, Extensible Markup Language XML. The data description property of XML can be used to create an effective system to translate documents. First of all the XML tagged document of a source language is prepared manually. These tags are not only for the words but also for the grammatical structure, or for example, the phrase structure grammar, so that the analytical process of the translation can be reduced to a level suitable for many Internet applications. Next, XML document type definitions (DTDs) of grammatical structures of different languages are created. In the translation process the source sentences are broken down into pieces and then categorized to the respective DTDs to which they belong. The sentence structure of each language serves as a main structure tree, the broken elements are mapped with the elements of the DTD accordingly, which consequently maps them with the relative elements of the target language structure tree and the appropriate transformations are made. Among the possible transformation patterns, the most appropriate one is selected as an output

Enabling High-Level Application Development for the Internet of Things

Author: Cassou Damien
Patel Pankesh
Publication venue
Publication date: 01/01/2015
Field of study

Application development in the Internet of Things (IoT) is challenging because it involves dealing with a wide range of related issues such as lack of separation of concerns, and lack of high-level of abstractions to address both the large scale and heterogeneity. Moreover, stakeholders involved in the application development have to address issues that can be attributed to different life-cycles phases. when developing applications. First, the application logic has to be analyzed and then separated into a set of distributed tasks for an underlying network. Then, the tasks have to be implemented for the specific hardware. Apart from handling these issues, they have to deal with other aspects of life-cycle such as changes in application requirements and deployed devices. Several approaches have been proposed in the closely related fields of wireless sensor network, ubiquitous and pervasive computing, and software engineering in general to address the above challenges. However, existing approaches only cover limited subsets of the above mentioned challenges when applied to the IoT. This paper proposes an integrated approach for addressing the above mentioned challenges. The main contributions of this paper are: (1) a development methodology that separates IoT application development into different concerns and provides a conceptual framework to develop an application, (2) a development framework that implements the development methodology to support actions of stakeholders. The development framework provides a set of modeling languages to specify each development concern and abstracts the scale and heterogeneity related complexity. It integrates code generation, task-mapping, and linking techniques to provide automation. Code generation supports the application development phase by producing a programming framework that allows stakeholders to focus on the application logic, while our mapping and linking techniques together support the deployment phase by producing device-specific code to result in a distributed system collaboratively hosted by individual devices. Our evaluation based on two realistic scenarios shows that the use of our approach improves the productivity of stakeholders involved in the application development

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Development of Multilingual Resource Management Mechanisms for Libraries

Author: Mandal Sukumar
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 03/04/2018
Field of study

Multilingual is one of the important concept in any library. This study is create on the basis of global recommendations and local requirement for each and every libraries. Select the multilingual components for setting up the multilingual cluster in different libraries to each user. Development of multilingual environment for accessing and retrieving the library resources among the users as well as library professionals. Now, the methodology of integration of Google Indic Transliteration for libraries have follow the five steps such as (i) selection of transliteration tools for libraries (ii) comparison of tools for libraries (iii) integration Methods in Koha for libraries (iv) Development of Google indic transliteration in Koha for users (v) testing for libraries (vi) results for libraries. Development of multilingual framework for libraries is also an important task in integrated library system and in this section have follow the some important steps such as (i) Bengali Language Installation in Koha for libraries (ii) Settings Multilingual System Preferences in Koha for libraries (iii) Translate the Modules for libraries (iv) Bengali Interface in Koha for libraries. Apart from these it has also shows the Bengali data entry process in Koha for libraries such as Data Entry through Ibus Avro Phonetics for libraries and Data Entry through Virtual Keyboard for libraries. Development of Multilingual Digital Resource Management for libraries by using the DSpace and Greenstone. Management of multilingual for libraries in different areas such as federated searching (VuFind Multilingual Discovery tool ; Multilingual Retrieval in OAI-PMH tool ; Multilingual Data Import through Z39.50 Server ). Multilingual bibliographic data edit through MarcEditor for the better management of integrated library management system. It has also create and editing the content by using the content management system tool for efficient and effective retrieval of multilingual digital content resources among the users