1,621 research outputs found
CLARIN. The infrastructure for language resources
CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future.
The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
Automatic Extraction of Lithuanian Cybersecurity Terms Using Deep Learning Approaches
The paper presents the results of research on deep learning methods
aiming to determine the most effective one for automatic extraction of Lithuanian
terms from a specialized domain (cybersecurity) with very restricted resources. A
semi-supervised approach to deep learning was chosen for the research as
Lithuanian is a less resourced language and large amounts of data, necessary for
unsupervised methods, are not available in the selected domain. The findings of the
research show that Bi-LSTM network with Bidirectional Encoder Representations
from Transformers (BERT) can achieve close to state-of-the-art results
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
Terminology Integration in Statistical Machine Translation
Elektroniskā versija nesatur pielikumusPromocijas darbs apraksta autora izpētītas metodes un izstrādātus rīkus divvalodu terminoloģijas integrācijai statistiskās mašīntulkošanas sistēmās. Autors darbā piedāvā inovatīvas metodes terminu integrācijai SMT sistēmu trenēšanas fāzē (ar statiskas integrācijas palīdzību) un tulkošanas fāzē (ar dinamiskas integrācijas palīdzību). Darbā uzmanība pievērsta ne tikai metodēm terminu integrācijai SMT, bet arī metodēm valodas resursu, kas nepieciešami dažādu uzdevumu veikšanai terminu integrācijas SMT darbplūsmās, ieguvei. Piedāvātās metodes ir novērtētas automātiskas un manuālas novērtēšanas eksperimentos. Iegūtie rezultāti parāda, ka statiskās un dinamiskās integrācijas metodes ļauj būtiski uzlabot tulkošanas kvalitāti. Darbā aprakstītie rezultāti ir aprobēti vairākos pētniecības projektos un ieviesti praktiskos risinājumos. Atslēgvārdi: statistiskā mašīntulkošana, terminoloģija, starpvalodu informācijas izvilkšanaThe doctoral thesis describes methods and tools researched and developed by the author for bilingual terminology integration into statistical machine translation systems. The author presents novel methods for terminology integration in SMT systems during training (through static integration) and during translation (through dynamic integration). The work focusses not only on the SMT integration techniques, but also on methods for acquisition of linguistic resources that are necessary for different tasks involved in workflows for terminology integration in SMT systems. The proposed methods have been evaluated using automatic and manual evaluation methods. The results show that both static and dynamic integration methods allow increasing translation quality. The thesis describes also areas where the methods have been approbated in practice. Keywords: statistical machine translation, terminology, cross-lingual information extractio
Exploring Text Mining and Analytics for Applications in Public Security: An in-depth dive into a systematic literature review
Text mining and related analytics emerge as a technological approach to support human activities in extracting useful knowledge through texts in several formats. From a managerial point of view, it can help organizations in planning and decision-making processes, providing information that was not previously evident through textual materials produced internally or even externally. In this context, within the public/governmental scope, public security agencies are great beneficiaries of the tools associated with text mining, in several aspects, from applications in the criminal area to the collection of people's opinions and sentiments about the actions taken to promote their welfare. This article reports details of a systematic literature review focused on identifying the main areas of text mining application in public security, the most recurrent technological tools, and future research directions. The searches covered four major article bases (Scopus, Web of Science, IEEE Xplore, and ACM Digital Library), selecting 194 materials published between 2014 and the first half of 2021, among journals, conferences, and book chapters. There were several findings concerning the targets of the literature review, as presented in the results of this article
Cross-Lingual Link Discovery for Under-Resourced Languages
CC BY-NC 4.0In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges,
experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual
linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied
to language data can play in this context. We define under-resourced languages with a specific focus on languages actively
used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language
technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are
available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream
applications for under-resourced languages via the localisation and adaptation of existing technologies and resources
HiPHET: A Hybrid Approach to Translate Code Mixed Language (Hinglish) to Pure Languages (Hindi and English)
Bilingual code mixed (hybrid) languages has become very popular in India as a result of the spread of Western technology in the form of the television, the Internet and social media. Due to this increase in usage of code-mixed languages in day-to-day communication, the need for maintaining the integrity of Indian languages has arisen. As a result of this need the tool named Hinglish to Pure Hindi and English Translator was developed. The tool translated in three ways, namely, Hinglish to Pure Hindi and Pure English, Pure Hindi to Pure English and vice versa. The tool has achieved accuracy of 91% in giving Hindi sentences as output and of 84% in giving English sentences as output, where the input sentences were in Hinglish. The tool has also been compared with another similar tool in the paper
- …