Search CORE

988 research outputs found

Online Dictionary - Tool for Preservation of Language Heritage

Author: Dutsova Ralitsa
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/2012
Field of study

The paper aims to represent a bilingual online dictionary as a useful tool helping preservation of the natural languages. The author focuses on the approach that was taken to develop compatible bilingual lexical database for the Bulgarian-Polish online dictionary. A formal model for the dictionary encoding is developed in accordance with the complex structures of the dictionary entries. These structures vary depending on the grammatical characteristics of Bulgarian headwords. The Web-application for presentation of the bilingual dictionary is also describred

Bulgarian Digital Mathematics Library at IMI-BAS

Information Technologies for the Preservation of Language Heritage

Author: Dimitrova Ludmila
Dutsova Ralitsa
Panova Rumiana
Publication venue: Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Publication date: 01/01/2011
Field of study

In this paper we try to present how information technologies as tools for the creation of digital bilingual dictionaries can help the preservation of natural languages. Natural languages are an outstanding part of human cultural values and for that reason they should be preserved as part of the world cultural heritage. We describe our work on the bilingual lexical database supporting the Bulgarian-Polish Online dictionary. The main software tools for the web- presentation of the dictionary are shortly described. We focus our special attention on the presentation of verbs, the richest from a specific characteristics viewpoint linguistic category in Bulgarian

Bulgarian Digital Mathematics Library at IMI-BAS

MONDILEX – towards the research infrastructure for digital resources in Slavic lexicography

Author
Publication venue: 'Institute of Slavic Studies Polish Academy of Sciences'
Publication date
Field of study

Crossref

Low hanging fruit and the Boasian trilogy in digital lexicography of morphologically rich languages: Lessons from a survey of Indigenous language resources in Canada

Author: Arppe Antti
Lachler Jordan
Pankratz Elizabeth
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 30/08/2022
Field of study

Online lexicographical resources for the morphologically rich Indigenous languages in Canada use a wide range of strategies for conveying their language’s morphological system, i.e. how words are inflected and derived, which this paper illustrates in a survey of seventeen bilingual online resources. The strategies these resources employ boil down to two basic approaches to the underlying structure of the resource: 1) a lexical database, or 2) a computational model. Most resources we surveyed are constructed around lexical databases. These assume the word(form) as the basic unit, an assumption that makes it difficult to incorporate the language’s sub-word, morphological structure in full detail. However, one resource uses a computational morphological model to bring the language’s morphology into the core of the lexicon – this proved to be a “low-hanging fruit” in the application of language technology that had been accomplished within a reasonable time-frame, as has been advocated by Trond Trosterud. We discuss the value created and questions raised by this approach and argue that it successfully overcomes the traditional Boasian three-way partition of dictionary, grammar, and text, creating integrated language resources that meet the modern needs of low-resource endangered languages and their communities

Septentrio Academic Publishing

Multilingual digital resources with Bulgarian language

Author: Dimitrova Ludmila
Publication venue: 'Institute of Slavic Studies Polish Academy of Sciences'
Publication date: 01/11/2015
Field of study

Multilingual digital resources with Bulgarian languageThe paper presents in brief Bulgarian language resources as a part of multilingual digital resources developed in the frame of some international projects, among them parallel annotated and aligned corpora, comparable corpora, morpho-syntactic specifications for corpora annotation and dictionaries encoding, lexicons, lexical databases, and electronic dictionaries

Directory of Open Access Journals

Multilingual digital resources with Bulgarian language

Author
Publication venue: 'Institute of Slavic Studies Polish Academy of Sciences'
Publication date
Field of study

Crossref

Simulating the Machine Translation of Low-Resource Languages by Designing a Translator Between English and an Artificially Constructed Language

Author: Snyder Michaela
Publication venue: TopSCHOLAR®
Publication date: 01/01/2023
Field of study

Natural language processing (NLP), or the use of computers to analyze natural language, is a field that relies heavily on syntax. It would seem intuitive that computers would thrive in this area due to their strict syntax requirements, but the syntax of natural languages leaves them unable to properly parse and generate sentences that seem normal to the average speaker. A subfield of NLP, machine translation, works mainly to computerize translation between different languages. Unfortunately, such translation is not without its weaknesses; language documentation is not created equal, and many low-resource languages—languages with relatively few kinds of documentation, most often written—are left with no way to effectively benefit from machine translation. As a step toward better translation processors for low-resource languages, this thesis examined the possibility of machine translation between high resource languages and low resource languages through an analysis of different machine learning techniques, and ultimately constructing a simple translator between English and an artificially constructed language using a context-free grammar (CFG)

TopSCHOLAR

Vector Search with OpenAI Embeddings: Lucene Is All You Need

Author: Lin Jimmy
Pradeep Ronak
Teofili Tommaso
Xian Jasper
Publication venue
Publication date: 28/08/2023
Field of study

We provide a reproducible, end-to-end demonstration of vector search with OpenAI embeddings using Lucene on the popular MS MARCO passage ranking test collection. The main goal of our work is to challenge the prevailing narrative that a dedicated vector store is necessary to take advantage of recent advances in deep neural networks as applied to search. Quite the contrary, we show that hierarchical navigable small-world network (HNSW) indexes in Lucene are adequate to provide vector search capabilities in a standard bi-encoder architecture. This suggests that, from a simple cost-benefit analysis, there does not appear to be a compelling reason to introduce a dedicated vector store into a modern "AI stack" for search, since such applications have already received substantial investments in existing, widely deployed infrastructure

arXiv.org e-Print Archive

Bulgarian-Polish Language Resources (Current State and Future Development)

Author: Dimitrova Ludmila
Koseska-Toszewa Violetta
Publication venue: 'Institute of Slavic Studies Polish Academy of Sciences'
Publication date: 01/06/2015
Field of study

Bulgarian-Polish Language Resources (Current State and Future Development)The paper briefly reviews the first Bulgarian-Polish digital bilingual resources: corpora and dictionaries, which are currently developed under bilateral collaboration between IMI-BAS and ISS-PAS: joint research project “Semantics and contrastive linguistics with a focus on a bilingual electronic dictionary”, coordinated by L. Dimitrova (IMI-BAS) and V. Koseska (ISS-PAS)

Directory of Open Access Journals