894 research outputs found
Multi-Module G2P Converter for Persian Focusing on Relations between Words
In this paper, we investigate the application of end-to-end and multi-module
frameworks for G2P conversion for the Persian language. The results demonstrate
that our proposed multi-module G2P system outperforms our end-to-end systems in
terms of accuracy and speed. The system consists of a pronunciation dictionary
as our look-up table, along with separate models to handle homographs, OOVs and
ezafe in Persian created using GRU and Transformer architectures. The system is
sequence-level rather than word-level, which allows it to effectively capture
the unwritten relations between words (cross-word information) necessary for
homograph disambiguation and ezafe recognition without the need for any
pre-processing. After evaluation, our system achieved a 94.48% word-level
accuracy, outperforming the previous G2P systems for Persian.Comment: 10 pages, 4 figure
Recommended from our members
Multiple alignments of inflectional paradigms
Most models of inflectional morphology rely at their core on the identification of recurrent and diverging material across inflected forms. Across theoretical frameworks, this can be expressed in terms of morpheme segmentation, rules, processes, patterns or analogies.
Finding these recurrences in large structured lexicons is an important step in empirical computational morphology, where analyses are induced bottom-up from inflected forms. This can be done by aligning all the forms in each paradigm, a task of Multiple Sequence Alignments which is well known in other fields such as evolutionary biology and historical linguistics.
In this paper, we present the specific problems which arise when aligning inflected forms, provide a simple alignment format, define evaluation measures and compare two implemented methods on 13 inflectional lexicons. Our intent is to provide the conditions for the inter-operability of future systems, and for incremental improvements in this fundamental step for quantitative morphology
Identification of fluency and word-finding difficulty in samples of children with diverse language backgrounds
BACKGROUND: Stuttering and word-finding difficulty (WFD) are two types of communication difficulty that occur frequently in children who learn English as an additional language (EAL), as well as those who only speak English. The two disorders require different, specific forms of intervention. Prior research has described the symptoms of each type of difficulty. This paper describes the development of a non-word repetition test (UNWR), applicable across languages, that was validated by comparing groups of children identified by their speech and language symptoms as having either stuttering or WFD.
AIMS:
To evaluate whether non-word repetition scores using the UNWR test distinguished between children who stutter and those who have a WFD, irrespective of the children's first language.
METHODS AND PROCEDURES
UNWR was administered to ninety-six 4–5-year-old children attending UK schools (20.83% of whom had EAL). The children's speech samples in English were assessed for symptoms of stuttering and WFD. UNWR scores were calculated.
OUTCOMES AND RESULTS:
Regression models were fitted to establish whether language group (English only/EAL) and symptoms of (1) stuttering and (2) WFD predicted UNWR scores. Stuttering symptoms predicted UNWR, whereas WFD did not. These two findings suggest that UNWR scores dissociate stuttering from WFD. There were no differences between monolingual English-speakers and children who had EAL.
CONCLUSIONS AND IMPLICATIONS:
UNWR scores distinguish between stuttering and WFD irrespective of language(s) spoken, allowing future evaluation of a range of languages in clinics or schools
Design of a Controlled Language for Critical Infrastructures Protection
We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates
from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically
represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of
traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an
analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
IberSPEECH 2020: XI Jornadas en TecnologĂa del Habla and VII Iberian SLTech
IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de TecnologĂas del Habla. Universidad de Valladoli
Approaches to English for specific and academic purposes. Perspectives on teaching and assessing in tertiary and adult education
This volume presents a selection of eight papers presented at three symposia on English for Specific Purposes (ESP) and English for Academic Purposes (EAP) that were held at the Free University of Bozen-Bolzano, Italy. The experiences detailed in the chapters offer a representative sample of the diversity of approaches to teaching and assessing ESP and EAP that were shared on those occasions. The contributions vary markedly by teaching and research context: whereas some report the results of meticulously planned research projects, others describe in detail cases embedded in specific contexts in Italy and in the US; others analyse the specialised language of particular discourses or domains, or reflect upon teaching methods and materials. (DIPF/Orig.
Approaches to English for Specific and Academic Purposes
This volume presents a selection of eight papers presented at three symposia on English for Specific Purposes (ESP) and English for Academic Purposes (EAP) that were held at the Free University of Bozen-Bolzano, Italy. The experiences detailed in the chapters offer a representative sample of the diversity of approaches to teaching and assessing ESP and EAP that were shared on those occasions. The contributions vary markedly by teaching and research context: whereas some report the results of meticulously planned research projects, others describe in detail cases embedded in specific contexts in Italy and in the US; others analyse the specialised language of particular discourses or domains, or reflect upon teaching methods and materials
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech
The Handbook to English as a Lingua Franca Practices for Inclusive Multilingual Classrooms
This handbook is an important companion for future users of the ENRICH CPD Course, including, but not limited to: (a) pre- or in-service English language teachers who may wish to engage with the CPD materials and activities at their own pace; (b) teacher educators who would like to employ the CPD materials and activities with their own trainees; (c ) researchers in the fields which ENRICH revolves around (e.g., English as a Lingua Franca, multilingualism, English language pedagogy) who may be interested in finding out whether, and how, information gathered through ENRICH could inform their research studies; and (d) members of educational policy- making organisations and institutions which may want to explore the relevance of ENRICH to their own professional endeavours. It is divided into five main chapters where the ENRICH project is firstly introduced, followed by an explanation of the needs analysis for the development of the CPD Course, a rationale for the target audience, a detailed description of each of the CPD Course sections, and a final reflection on the evaluation of the Course and lessons learnt.info:eu-repo/semantics/publishedVersio
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
- …