42 research outputs found

    Constraint Based Hybrid Approach to Parsing Indian Languages

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Assessing Translation capabilities of Large Language Models involving English and Indian Languages

    Full text link
    Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. In this work, our aim is to explore the multilingual capabilities of large language models by using machine translation as a task involving English and 22 Indian languages. We first investigate the translation capabilities of raw large language models, followed by exploring the in-context learning capabilities of the same raw models. We fine-tune these large language models using parameter efficient fine-tuning methods such as LoRA and additionally with full fine-tuning. Through our study, we have identified the best performing large language model for the translation task involving LLMs, which is based on LLaMA. Our results demonstrate significant progress, with average BLEU scores of 13.42, 15.93, 12.13, 12.30, and 12.07, as well as CHRF scores of 43.98, 46.99, 42.55, 42.42, and 45.39, respectively, using 2-stage fine-tuned LLaMA-13b for English to Indian languages on IN22 (conversational), IN22 (general), flores200-dev, flores200-devtest, and newstest2019 testsets. Similarly, for Indian languages to English, we achieved average BLEU scores of 14.03, 16.65, 16.17, 15.35 and 12.55 along with chrF scores of 36.71, 40.44, 40.26, 39.51, and 36.20, respectively, using fine-tuned LLaMA-13b on IN22 (conversational), IN22 (general), flores200-dev, flores200-devtest, and newstest2019 testsets. Overall, our findings highlight the potential and strength of large language models for machine translation capabilities, including for languages that are currently underrepresented in LLMs

    A Preliminary Work on Causative Verbs in Hindi

    No full text
    Abstract This paper introduces a preliminary work on Hindi causative verbs: their classification, a linguistic model for their classification and their verb frames. The main objective of this work is to come up with a classification of the Hindi causative verbs. In the classification we show how different types of Hindi verbs have different types of causative forms. It will be a linguistic resource for Hindi causative verbs which can be used in various NLP applications. This resource enriches the already available linguistic resource on Hindi verb frames (Begum et al., 2008b). This resource will be helpful in getting proper insight into Hindi verbs. In this paper, we present the morphology, semantics and syntax of the causative verbs. The morphology is captured by the word generation process; semantics is captured by the linguistic model followed for classifying the verbs and the syntax has been captured by the verb frames using relations given by Panini

    CREATING LANGUAGE RESOURCES FOR NLP IN INDIAN LANGUAGES 1. BACKGROUND

    No full text
    Non-availability of lexical resources in the electronic form is a major bottleneck for anyone working in the field of NLP on Indian languages. Some measures were taken to alleviate this bottleneck in a quick and efficient way. It was felt that if the development of these resources is linked with an example application then it can act as a test bed for the developing resources and provide constant feedback. Moreover, immediate results in terms of a performing system also enthuses the developers for such time consuming jobs. It was decided to take up the building of a machine translation system as an example application, which would also serve as a vehicle for building lexical resources. 2. DEVELOPING LEXICAL RESOURCES The following lexical resources were built or are being built as part of a planned effort: a) Electronic dictionary (Shabdanjali English- Hindi dictionary) b) Transfer lexicon and grammar (TransLexGram) c) Part-of-Speech tagged corpora. These are described below. 2.1 SHABDANJALI ELECTRONIC DICTIONARY: As a first step in this direction a collaborative effort was undertaken to develop a bilingual electronic dictionary in the free software model. The interesting aspect of this effort was that the work was carried out by school children, teachers and others. People in about 8 cities were involved in the exercise. The school teachers participated, to some extent, in correcting and refining the work. The development of the dictionary resource took advantage of the bilingual ability of the contributors. The contributors provided the basic data: a) A number of Hindi equivalents required to cover various senses of the English lexical item in various contexts. b) An English example sentence for every Hindi equivalent. (The developed resource is now available as an "open resource " under General Public License
    corecore