163 research outputs found
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages
India has a rich linguistic landscape with languages from 4 major language
families spoken by over a billion people. 22 of these languages are listed in
the Constitution of India (referred to as scheduled languages) are the focus of
this work. Given the linguistic diversity, high-quality and accessible Machine
Translation (MT) systems are essential in a country like India. Prior to this
work, there was (i) no parallel training data spanning all the 22 languages,
(ii) no robust benchmarks covering all these languages and containing content
relevant to India, and (iii) no existing translation models which support all
the 22 scheduled languages of India. In this work, we aim to address this gap
by focusing on the missing pieces required for enabling wide, easy, and open
access to good machine translation systems for all 22 scheduled Indian
languages. We identify four key areas of improvement: curating and creating
larger training datasets, creating diverse and high-quality benchmarks,
training multilingual models, and releasing models with open access. Our first
contribution is the release of the Bharat Parallel Corpus Collection (BPCC),
the largest publicly available parallel corpora for Indic languages. BPCC
contains a total of 230M bitext pairs, of which a total of 126M were newly
added, including 644K manually translated sentence pairs created as part of
this work. Our second contribution is the release of the first n-way parallel
benchmark covering all 22 Indian languages, featuring diverse domains,
Indian-origin content, and source-original test sets. Next, we present
IndicTrans2, the first model to support all 22 languages, surpassing existing
models on multiple existing and new benchmarks created as a part of this work.
Lastly, to promote accessibility and collaboration, we release our models and
associated data with permissive licenses at
https://github.com/ai4bharat/IndicTrans2
Publications from NIAS: January 1988-June 2013 (NIAS Report No. R23-2014)
This report has a bibliographic listing of all the publications from NIAS since inception till June 201
CMFRI Annual Report 2022 केंद्रीय समुद्री मात्स्यिकी अनुसंधान संस्थान वार्षिक प्रतिवेदन 2022
In 2022, the total marine fish
landings along the coast of India’s
mainland were approximately 3.49
million tonnes, indicating a 14.53%
increase compared to 2021. The year
saw a significant rise of 28.02% in
fish landings compared to the
pandemic-affected year of 2020.
However, despite these
improvements, the 2022 estimate
was 2.0% lower than the pre-COVID
year of 2019. Among the coastal
states, Tamil Nadu secured the
highest position with 7.22 lakh
tonnes of fish landings, followed by
Karnataka with 6.95 lakh tonnes and
Kerala with 6.87 lakh tonnes.
Gujarat, which had previously held
the top rankings, dropped to fourth
place with 5.03 lakh tonnes. These
four states, namely Tamil Nadu,
Karnataka, Kerala, and Gujarat
accounted for 20.69%, 19.90%,
19.68%, and 14.40% of the national
total, respectively. Except for Odisha
and Gujarat, all states witnessed an
increase in fish landings
compared to 2021
A Hybrid Machine Translation Framework for an Improved Translation Workflow
Over the past few decades, due to a continuing surge in the amount of content being translated and ever increasing pressure to deliver high quality and high throughput translation, translation industries are focusing their interest on adopting advanced technologies such as machine translation (MT), and automatic post-editing (APE) in their translation workflows. Despite the progress of the technology, the roles of humans and machines essentially remain intact as MT/APE are moving from the peripheries of the translation field closer towards collaborative human-machine based MT/APE in modern translation workflows. Professional translators increasingly become post-editors correcting raw MT/APE output instead of translating from scratch which in turn increases productivity in terms of translation speed. The last decade has seen substantial growth in research and development activities on improving MT; usually concentrating on selected aspects of workflows starting from training data pre-processing techniques to core MT processes to post-editing methods. To date, however, complete MT workflows are less investigated than the core MT processes. In the research presented in this thesis, we investigate avenues towards achieving improved MT workflows. We study how different MT paradigms can be utilized and integrated to best effect. We also investigate how different upstream and downstream component technologies can be hybridized to achieve overall improved MT. Finally we include an investigation into human-machine collaborative MT by taking humans in the loop. In many of (but not all) the experiments presented in this thesis we focus on data scenarios provided by low resource language settings.Aufgrund des stetig ansteigenden Übersetzungsvolumens in den letzten Jahrzehnten und
gleichzeitig wachsendem Druck hohe Qualität innerhalb von kürzester Zeit liefern zu
müssen sind Übersetzungsdienstleister darauf angewiesen, moderne Technologien wie
Maschinelle Übersetzung (MT) und automatisches Post-Editing (APE) in den Übersetzungsworkflow
einzubinden. Trotz erheblicher Fortschritte dieser Technologien haben
sich die Rollen von Mensch und Maschine kaum verändert. MT/APE ist jedoch nunmehr
nicht mehr nur eine Randerscheinung, sondern wird im modernen Übersetzungsworkflow
zunehmend in Zusammenarbeit von Mensch und Maschine eingesetzt. Fachübersetzer
werden immer mehr zu Post-Editoren und korrigieren den MT/APE-Output, statt wie
bisher Übersetzungen komplett neu anzufertigen. So kann die Produktivität bezüglich
der Übersetzungsgeschwindigkeit gesteigert werden. Im letzten Jahrzehnt hat sich in den
Bereichen Forschung und Entwicklung zur Verbesserung von MT sehr viel getan: Einbindung
des vollständigen Übersetzungsworkflows von der Vorbereitung der Trainingsdaten
über den eigentlichen MT-Prozess bis hin zu Post-Editing-Methoden. Der vollständige
Übersetzungsworkflow wird jedoch aus Datenperspektive weit weniger berücksichtigt
als der eigentliche MT-Prozess. In dieser Dissertation werden Wege hin zum
idealen oder zumindest verbesserten MT-Workflow untersucht. In den Experimenten
wird dabei besondere Aufmertsamfit auf die speziellen Belange von sprachen mit geringen
ressourcen gelegt. Es wird untersucht wie unterschiedliche MT-Paradigmen verwendet
und optimal integriert werden können. Des Weiteren wird dargestellt wie unterschiedliche
vor- und nachgelagerte Technologiekomponenten angepasst werden können, um insgesamt
einen besseren MT-Output zu generieren. Abschließend wird gezeigt wie der Mensch in
den MT-Workflow intergriert werden kann. Das Ziel dieser Arbeit ist es verschiedene
Technologiekomponenten in den MT-Workflow zu integrieren um so einen verbesserten
Gesamtworkflow zu schaffen. Hierfür werden hauptsächlich Hybridisierungsansätze verwendet.
In dieser Arbeit werden außerdem Möglichkeiten untersucht, Menschen effektiv
als Post-Editoren einzubinden
Bridges and Boundaries
Foreign relations law and public international law are two closely related academic fields that tend to speak past each other. As this innovative volume shows, the two are closely interrelated and depend on each other for their mutual construction and identity. A better understanding of this relationship is of vital importance for upholding important constitutional values like democracy, the rule of law and the protection of human rights, while enabling states to engage in meaningful forms of international cooperation. The book takes a close look at the encounters between the two fields and offers perspectives for a constructive engagement between the two. Collectively, the contributions argue that the delimitation between the two fields occurs in a hybrid zone of interaction which requires both bridges and boundaries: bridges for the construction of the relationship between the two fields, and boundaries for preserving key normative expectations of both domestic and international law
Encounters between Foreign Relations Law and International Law
This book offers fresh perspectives on the encounters between foreign relations law and public international law. These can occur in a hybrid zone of interaction which requires both bridges and boundaries. A timely book with crucial relevance for scholars, students and practitioners in both foreign relations law and international law
Fine Art Pattern Extraction and Recognition
This is a reprint of articles from the Special Issue published online in the open access journal Journal of Imaging (ISSN 2313-433X) (available at: https://www.mdpi.com/journal/jimaging/special issues/faper2020)
- …