29 research outputs found

    Pomen in obseg oznak za jezikovne različice v okviru informacijske tehnologije

    Get PDF
    This article first discusses the various criteria for and scopes of the language codes registered as ISO 639-1, 639-2, 639-3, and 639-6. It then introduces RFC 5646 and RFC 4647 in order to explain how well-formed language tags can be created. Finally, the case of applying for a language code for Resian is summarized.Najprej so predstavljena različna merila in obsegi jezikovnih kod, ki so registrirane pri ISO 639-1, 639-2, 639-3 in 639-6. Potem sta predstavljena RFC 5646 in 4647, kar naj bi razložilo, kako se lahko oblikujejo pravilne oznake za jezikovne različice. Nazadnje je ponujen povzetek zadeve o vložitvi prošnje o kodi za rezijanščino

    When linguistics meets web technologies. Recent advances in modelling linguistic linked data

    Get PDF
    This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and ontologies) used for representing linguistic linked data (LLD). It focuses on the latest developments in the area and both builds upon and complements previous works covering similar territory. The article begins with an overview of recent trends which have had an impact on linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding of several major projects in which LLD is a key component, and the increasing importance of the relationship of the digital humanities with LLD. Next, we give an overview of some of the most well known vocabularies and models in LLD. After this we look at some of the latest developments in community standards and initiatives such as OntoLex-Lemon as well as recent work which has been in carried out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies META-SHARE and lime and language identifiers. In the following part of the paper we look at work which has been realised in a number of recent projects and which has a significant impact on LLD vocabularies and models

    Syväoppiminen puhutun kielen tunnistamisessa

    Get PDF
    This thesis applies deep learning based classification techniques to identify natural languages from speech. The primary motivation behind this thesis is to implement accurate techniques for segmenting multimedia materials by the languages spoken in them. Several existing state-of-the-art, deep learning based approaches are discussed and a subset of the discussed approaches are selected for quantitative experimentation. The selected model architectures are trained on several well-known spoken language identification datasets containing several different languages. Segmentation granularity varies between models, some supporting input audio lengths of 0.2 seconds, while others require 10 second long input to make a language decision. Results from the thesis experiments show that an unsupervised representation of acoustic units, produced by a deep sequence-to-sequence auto encoder, cannot reach the language identification performance of a supervised representation, produced by a multilingual phoneme recognizer. Contrary to most existing results, in this thesis, acoustic-phonetic language classifiers trained on labeled spectral representations outperform phonotactic classifiers trained on bottleneck features of a multilingual phoneme recognizer. More work is required, using transcribed datasets and automatic speech recognition techniques, to investigate why phoneme embeddings did not outperform simple, labeled spectral features. While an accurate online language segmentation tool for multimedia materials could not be constructed, the work completed in this thesis provides several insights for building feasible, modern spoken language identification systems. As a side-product of the experiments performed during this thesis, a free open source spoken language identification software library called "lidbox" was developed, allowing future experiments to begin where the experiments of this thesis end.Tämä diplomityö keskittyy soveltamaan syviä neuroverkkomalleja luonnollisten kielien automaattiseen tunnistamiseen puheesta. Tämän työn ensisijainen tavoite on toteuttaa tarkka menetelmä multimediamateriaalien ositteluun niissä esiintyvien puhuttujen kielien perusteella. Työssä tarkastellaan useampaa jo olemassa olevaa neuroverkkoihin perustuvaa lähestymistapaa, joista valitaan alijoukko tarkempaan tarkasteluun, kvantitatiivisten kokeiden suorittamiseksi. Valitut malliarkkitehtuurit koulutetaan käyttäen eri puhetietokantoja, sisältäen useampia eri kieliä. Kieliosittelun hienojakoisuus vaihtelee käytettyjen mallien mukaan, 0,2 sekunnista 10 sekuntiin, riippuen kuinka pitkän aikaikkunan perusteella malli pystyy tuottamaan kieliennusteen. Diplomityön aikana suoritetut kokeet osoittavat, että sekvenssiautoenkoodaajalla ohjaamattomasti löydetty puheen diskreetti akustinen esitysmuoto ei ole riittävä kielen tunnistamista varten, verrattuna foneemitunnistimen tuottamaan, ohjatusti opetettuun foneemiesitysmuotoon. Tässä työssä havaittiin, että akustisfoneettiset kielentunnistusmallit saavuttavat korkeamman kielentunnistustarkkuuden kuin foneemiesitysmuotoa käyttävät kielentunnistusmallit, mikä eroaa monista kirjallisuudessa esitetyistä tuloksista. Diplomityön tutkimuksia on jatkettava, esimerkiksi litteroituja puhetietokantoja ja puheentunnistusmenetelmiä käyttäen, jotta pystyttäisiin selittämään miksi foneemimallin tuottamalla esitysmuodolla ei saatu parempia tuloksia kuin yksinkertaisemmalla, taajuusspektrin esitysmuodolla. Tämän työn aikana puhutun kielen tunnistaminen osoittautui huomattavasti haasteellisemmaksi kuin mitä työn alussa oli arvioitu, eikä työn aikana onnistuttu toteuttamaan tarpeeksi tarkkaa multimediamateriaalien kielienosittelumenetelmää. Tästä huolimatta, työssä esitetyt lähestymistavat tarjoavat toimivia käytännön menetelmiä puhutun kielen tunnistamiseen tarkoitettujen, modernien järjestelmien rakentamiseksi. Tämän diplomityön sivutuotteena syntyi myös puhutun kielen tunnistamiseen tarkoitettu avoimen lähdekoodin kirjasto nimeltä "lidbox", jonka ansiosta tämän työn kvantitatiivisia kokeita voi jatkaa siitä, mihin ne tämän työn päätteeksi jäivät

    The construction of a linguistic linked data framework for bilingual lexicographic resources

    Get PDF
    Little-known lexicographic resources can be of tremendous value to users once digitised. By extending the digitisation efforts for a lexicographic resource, converting the human readable digital object to a state that is also machine-readable, structured data can be created that is semantically interoperable, thereby enabling the lexicographic resource to access, and be accessed by, other semantically interoperable resources. The purpose of this study is to formulate a process when converting a lexicographic resource in print form to a machine-readable bilingual lexicographic resource applying linguistic linked data principles, using the English-Xhosa Dictionary for Nurses as a case study. This is accomplished by creating a linked data framework, in which data are expressed in the form of RDF triples and URIs, in a manner which allows for extensibility to a multilingual resource. Click languages with characters not typically represented by the Roman alphabet are also considered. The purpose of this linked data framework is to define each lexical entry as “historically dynamic”, instead of “ontologically static” (Rafferty, 2016:5). For a framework which has instances in constant evolution, focus is thus given to the management of provenance and linked data generation thereof. The output is an implementation framework which provides methodological guidelines for similar language resources in the interdisciplinary field of Library and Information Science

    Deep encoding of etymological information in TEI

    Get PDF
    International audienceThis paper aims to provide a comprehensive modeling and representation of etymological data in digital dictionaries. The purpose is to integrate in one coherent framework both digital representations of legacy dictionaries, and also born-digital lexical databases that are constructed manually or semi-automatically. We want to propose a systematic and coherent set of modeling principles for a variety of etymological phenomena that may contribute to the creation of a continuum between existing and future lexical constructs, where anyone interested in tracing the history of words and their meanings will be able to seamlessly query lexical resources.Instead of designing an ad hoc model and representation language for digital etymological data, we will focus on identifying all the possibilities offered by the TEI guidelines for the representation of lexical information

    UNICORE/X MANUAL

    Get PDF
    The UNICORE/X server is the central component of a UNICORE site. It hosts the services such as job submission, job management, storage access, and provides the bridge to the functionality of the target resources, e.g. batch systems or file system

    25. mednarodni kongres Društva mladih jezikoslovcev v Valladolidu

    Get PDF
    This article reports on the twenty-fifth international congress of the Young Linguists’ Society held in Valladolid, Spain. It took place from 10 to 12 March 2010 and the participants included approximately 150 young scholars from various countries and continents. In addition to young linguists that presented their research in all areas of linguistics, there were invited talks by three established linguists: Teun A. van Dijk, Manuel García Teijeiro, and José Antonio Pascual.Prispevek je poročilo o 25. mednarodnem kongresu Društva mladih jezikoslovcev v Valladolidu v Španiji, ki je potekal od 10. do 12. marca 2010 in na katerem je nastopilo okoli 150 udeležencev – mladih raziskovalcev iz različnih držav in z različnih celin. Poleg mladih jezikoslovcev, ki so predstavili svoje raziskave z vseh področij jezikoslovja, so na kongresu nastopili še trije povabljeni predavatelji, uveljavljeni jezikoslovci: Teun A. van Dijk, Manuel García Teijeiro in José Antonio Pascual

    Adoption of ISO 20022 for payments

    Get PDF
    O cenário dos sistemas de pagamentos mudou significativamente na última década, e mais mudanças estão no horizonte para os próximos anos. A adoção global da norma ISO 20022 é uma iniciativa de grande alcance e impacto. Tanto as instituições financeiras como os seus clientes corporate são encorajados a proativamente se adaptarem a um cenário em constante mudança e evolução, e assim colher os benefícios de uma adoção oportuna, eficaz e sustentável. Apesar do impacto dessa iniciativa, além das instruções técnicas, faltam modelos de referência que possam auxiliar no processo de adoção efetiva da norma ISO 20022 para a realização de pagamentos. Para resolver este problema, propõe-se um modelo de adoção, composto por um conjunto de diretrizes, boas práticas, templates para recolha e mapeamento de requisitos, e um conjunto de objetos técnicos que permitem flexibilidade e a transformação dos dados no seu formato final. Este modelo é aplicado num contexto real, numa organização que pretende adotar a norma ISO 20022. O desenvolvimento de tal modelo, dota os adotantes, não apenas ferramentas para aderir com sucesso à norma ISO 20022, mas também a possibilidade de criação de uma fábrica de pagamentos eficiente e sustentável.The payments landscape has significantly changed in the past decade and more changes are on the horizon for the coming years. The global adoption of ISO 20022 is a farreaching and impacting initiative. Both financial institutions and their corporate customers are prompted to proactively adapt to the ever-changing and ever-evolving payments landscape and reap the benefits of an effective, timely, and sustainable adoption. Despite the impact of this initiative, apart from technical instructions, there is a lack of reference models or bodies of research that can assist in the process of effectively adopting ISO 20022 for payments. To solve this problem, an adoption model is proposed, composed of a set of guidelines, best practices and templates to collect and map requirements, and technical objects that enable flexibility and the transformation of data into its final output. This adoption model is applied in a real-life context, in an organization in the process of adoption of ISO 20022. The development of such a model, endows the adopters, not only with the tools to successfully adhere to ISO 20022, but also with setting up an efficient and sustainable payment factory
    corecore