Search CORE

163 research outputs found

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

Author: AI4Bharat
AK Raghavan
Chitale Pranjal A.
Dabre Raj
Doddapaneni Sumanth
Gala Jay
Gumma Varun
Khapra Mitesh M.
Kumar Aswanth
Kumar Pratyush
Kunchukuttan Anoop
Nawale Janki
Puduppully Ratish
Raghavan Vivek
Sujatha Anupama
Publication venue
Publication date: 17/06/2023
Field of study

India has a rich linguistic landscape with languages from 4 major language families spoken by over a billion people. 22 of these languages are listed in the Constitution of India (referred to as scheduled languages) are the focus of this work. Given the linguistic diversity, high-quality and accessible Machine Translation (MT) systems are essential in a country like India. Prior to this work, there was (i) no parallel training data spanning all the 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models which support all the 22 scheduled languages of India. In this work, we aim to address this gap by focusing on the missing pieces required for enabling wide, easy, and open access to good machine translation systems for all 22 scheduled Indian languages. We identify four key areas of improvement: curating and creating larger training datasets, creating diverse and high-quality benchmarks, training multilingual models, and releasing models with open access. Our first contribution is the release of the Bharat Parallel Corpus Collection (BPCC), the largest publicly available parallel corpora for Indic languages. BPCC contains a total of 230M bitext pairs, of which a total of 126M were newly added, including 644K manually translated sentence pairs created as part of this work. Our second contribution is the release of the first n-way parallel benchmark covering all 22 Indian languages, featuring diverse domains, Indian-origin content, and source-original test sets. Next, we present IndicTrans2, the first model to support all 22 languages, surpassing existing models on multiple existing and new benchmarks created as a part of this work. Lastly, to promote accessibility and collaboration, we release our models and associated data with permissive licenses at https://github.com/ai4bharat/IndicTrans2

arXiv.org e-Print Archive

Publications from NIAS: January 1988-June 2013 (NIAS Report No. R23-2014)

Author
Publication venue: NIAS
Publication date: 01/01/2014
Field of study

This report has a bibliographic listing of all the publications from NIAS since inception till June 201

NIAS Repository

CMFRI Annual Report 2022 केंद्रीय समुद्री मात्स्यिकी अनुसंधान संस्थान वार्षिक प्रतिवेदन 2022

Author: CMFRI Kochi
Publication venue: ICAR-Central Marine Fisheries Research Institute
Publication date: 01/01/2023
Field of study

In 2022, the total marine fish landings along the coast of India’s mainland were approximately 3.49 million tonnes, indicating a 14.53% increase compared to 2021. The year saw a significant rise of 28.02% in fish landings compared to the pandemic-affected year of 2020. However, despite these improvements, the 2022 estimate was 2.0% lower than the pre-COVID year of 2019. Among the coastal states, Tamil Nadu secured the highest position with 7.22 lakh tonnes of fish landings, followed by Karnataka with 6.95 lakh tonnes and Kerala with 6.87 lakh tonnes. Gujarat, which had previously held the top rankings, dropped to fourth place with 5.03 lakh tonnes. These four states, namely Tamil Nadu, Karnataka, Kerala, and Gujarat accounted for 20.69%, 19.90%, 19.68%, and 14.40% of the national total, respectively. Except for Odisha and Gujarat, all states witnessed an increase in fish landings compared to 2021

CMFRI Digital Repository

IEOM Society International

Author: Wijoyo Hadion
Publication venue: 'Science Repository OU'
Publication date: 31/03/2023
Field of study

IEOM Society Internationa

E-Journal STMIK Dharmapala Riau

A Hybrid Machine Translation Framework for an Improved Translation Workflow

Author: Pal Santanu
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2018
Field of study

Over the past few decades, due to a continuing surge in the amount of content being translated and ever increasing pressure to deliver high quality and high throughput translation, translation industries are focusing their interest on adopting advanced technologies such as machine translation (MT), and automatic post-editing (APE) in their translation workflows. Despite the progress of the technology, the roles of humans and machines essentially remain intact as MT/APE are moving from the peripheries of the translation field closer towards collaborative human-machine based MT/APE in modern translation workflows. Professional translators increasingly become post-editors correcting raw MT/APE output instead of translating from scratch which in turn increases productivity in terms of translation speed. The last decade has seen substantial growth in research and development activities on improving MT; usually concentrating on selected aspects of workflows starting from training data pre-processing techniques to core MT processes to post-editing methods. To date, however, complete MT workflows are less investigated than the core MT processes. In the research presented in this thesis, we investigate avenues towards achieving improved MT workflows. We study how different MT paradigms can be utilized and integrated to best effect. We also investigate how different upstream and downstream component technologies can be hybridized to achieve overall improved MT. Finally we include an investigation into human-machine collaborative MT by taking humans in the loop. In many of (but not all) the experiments presented in this thesis we focus on data scenarios provided by low resource language settings.Aufgrund des stetig ansteigenden Übersetzungsvolumens in den letzten Jahrzehnten und gleichzeitig wachsendem Druck hohe Qualität innerhalb von kürzester Zeit liefern zu müssen sind Übersetzungsdienstleister darauf angewiesen, moderne Technologien wie Maschinelle Übersetzung (MT) und automatisches Post-Editing (APE) in den Übersetzungsworkflow einzubinden. Trotz erheblicher Fortschritte dieser Technologien haben sich die Rollen von Mensch und Maschine kaum verändert. MT/APE ist jedoch nunmehr nicht mehr nur eine Randerscheinung, sondern wird im modernen Übersetzungsworkflow zunehmend in Zusammenarbeit von Mensch und Maschine eingesetzt. Fachübersetzer werden immer mehr zu Post-Editoren und korrigieren den MT/APE-Output, statt wie bisher Übersetzungen komplett neu anzufertigen. So kann die Produktivität bezüglich der Übersetzungsgeschwindigkeit gesteigert werden. Im letzten Jahrzehnt hat sich in den Bereichen Forschung und Entwicklung zur Verbesserung von MT sehr viel getan: Einbindung des vollständigen Übersetzungsworkflows von der Vorbereitung der Trainingsdaten über den eigentlichen MT-Prozess bis hin zu Post-Editing-Methoden. Der vollständige Übersetzungsworkflow wird jedoch aus Datenperspektive weit weniger berücksichtigt als der eigentliche MT-Prozess. In dieser Dissertation werden Wege hin zum idealen oder zumindest verbesserten MT-Workflow untersucht. In den Experimenten wird dabei besondere Aufmertsamfit auf die speziellen Belange von sprachen mit geringen ressourcen gelegt. Es wird untersucht wie unterschiedliche MT-Paradigmen verwendet und optimal integriert werden können. Des Weiteren wird dargestellt wie unterschiedliche vor- und nachgelagerte Technologiekomponenten angepasst werden können, um insgesamt einen besseren MT-Output zu generieren. Abschließend wird gezeigt wie der Mensch in den MT-Workflow intergriert werden kann. Das Ziel dieser Arbeit ist es verschiedene Technologiekomponenten in den MT-Workflow zu integrieren um so einen verbesserten Gesamtworkflow zu schaffen. Hierfür werden hauptsächlich Hybridisierungsansätze verwendet. In dieser Arbeit werden außerdem Möglichkeiten untersucht, Menschen effektiv als Post-Editoren einzubinden

Universaar

Acronym

NIAS Annual Report 2018-2019

Author: NIAS *
Publication venue: NIAS Press
Publication date: 01/01/2019
Field of study

NIAS Repository

Bridges and Boundaries

Author
Publication venue
Publication date: 01/01/2021
Field of study

Foreign relations law and public international law are two closely related academic fields that tend to speak past each other. As this innovative volume shows, the two are closely interrelated and depend on each other for their mutual construction and identity. A better understanding of this relationship is of vital importance for upholding important constitutional values like democracy, the rule of law and the protection of human rights, while enabling states to engage in meaningful forms of international cooperation. The book takes a close look at the encounters between the two fields and offers perspectives for a constructive engagement between the two. Collectively, the contributions argue that the delimitation between the two fields occurs in a hybrid zone of interaction which requires both bridges and boundaries: bridges for the construction of the relationship between the two fields, and boundaries for preserving key normative expectations of both domestic and international law

Institutional Repository of the Freie Universität Berlin

Encounters between Foreign Relations Law and International Law

Author
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 09/08/2022
Field of study

This book offers fresh perspectives on the encounters between foreign relations law and public international law. These can occur in a hybrid zone of interaction which requires both bridges and boundaries. A timely book with crucial relevance for scholars, students and practitioners in both foreign relations law and international law

Directory of Open Access Books (DOAB)

Fine Art Pattern Extraction and Recognition

Author: Bellavia Fabio
Castellano Giovanna
Fabio Bellavia Giovanna Castellano, Gennaro Vessio
Vessio Gennaro
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

This is a reprint of articles from the Special Issue published online in the open access journal Journal of Imaging (ISSN 2313-433X) (available at: https://www.mdpi.com/journal/jimaging/special issues/faper2020)

Archivio istituzionale della ricerca - Università di Bari

The Power of the Local Site: A Comparative Approach to Colonial Black Christs and Medieval Black Madonnas

Author: Preisinger Raphaèle
Publication venue: Comite International d'Histoire de I'Art
Publication date: 01/01/2017
Field of study

ZORA