Search CORE

1,477 research outputs found

A ROBUST ENSEMBLE MODEL FOR SPOKEN LANGUAGE RECOGNITION

Author: BABATUNDE Gideon
WOODS Nancy
Publication venue: Lublin University of Technology
Publication date: 01/01/2020
Field of study

Effective decision-making in industry conditions requires access and proper presentation of manufacturing data on the realised manufacturing process. Although the frequently applied ERP systems allow for recording economic events, their potential for decision support is limited. The article presents an original system for reporting manufacturing data based on Business Intelligence technology as a support for junior and middle management. As an example a possibility of utilising data from ERP systems to support decision-making in the field of purchases and logistics in  small and medium enterprises

Biblioteka Nauki - repozytorium artykuÅÃ³w

Lublin University of Technology Journals

Speech and Speech-Related Resources at BAS

Author: Schiel Florian
Publication venue
Publication date: 01/01/1998
Field of study

The Bavarian Archive for Speech Signals (BAS) located at the Ludwig Maximilians Universitat Munchen, Germany collects, evaluates, produces and disseminates German speech resources to the scientific community. Our focus is the German language covering a large geographical part of central Europe. Speech and speech-related resources are usually produced for certain tasks or projects. Therefore, it is not easy for scientists or engineers starting a new project or application to decide, whether existing resources may be re-used for their special purpose, or whether it is necessary to finance an new specialised data collection (which is usually very expensive). With this contribution we'll try to facilitate this decision by giving detailed information about existing resources as well as possibilities to produce new resources. This paper has two major parts. The first part deals with our experiences during the last three years to produce new, highly re-usable speech resources in close cooper..

CiteSeerX

Open Access LMU

Articulatory features for robust visual speech recognition

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Improving ASR performance using context-dependent phoneme model

Author: Husni Husniza
Jamaludin Zulikha
Publication venue: Emerald Group Publishing
Publication date: 01/01/2010
Field of study

Purpose – The purpose of this paper is to present evidence of the need to have a carefully designed lexical model for speech recognition for dyslexic children reading in Bahasa Melayu (BM). Design/methodology/approach – Data collection is performed to obtain the most frequent reading error patterns and the reading recordings. Design and development of the lexical model considers the errors for better recognition accuracy. Findings – It is found that the recognition accuracy is increased to 75 percent when using contextdependent (CD) phoneme model and phoneme refinement rule. Comparison between contextindependent phoneme models and CD phoneme model is also presented. Research limitations/implications – The most frequent errors recognized and obtained from data collection and analysis illustrate and support that phonological deficit is the major factor for reading disabilities in dyslexics. Practical implications – This paper provides the first step towards materializing an automated speech recognition (ASR)-based application to support reading for BM, which is the first language in Malaysia. Originality/value – The paper contributes to the knowledge of the most frequent error patterns for dyslexic children’s reading in BM and to the knowledge that a CD phoneme model together with the phoneme refinement rule can built up a more fine-tuned lexical model for an ASR specifically for dyslexic children’s reading isolated words in BM

UUM Repository

Symbolic inductive bias for visually grounded learning of spoken language

Author: Chrupała Grzegorz
Publication venue
Publication date: 01/01/2019
Field of study

A widespread approach to processing spoken language is to first automatically transcribe it into text. An alternative is to use an end-to-end approach: recent works have proposed to learn semantic embeddings of spoken language from images with spoken captions, without an intermediate transcription step. We propose to use multitask learning to exploit existing transcribed speech within the end-to-end setting. We describe a three-task architecture which combines the objectives of matching spoken captions with corresponding images, speech with text, and text with images. We show that the addition of the speech/text task leads to substantial performance improvements on image retrieval when compared to training the speech/image task in isolation. We conjecture that this is due to a strong inductive bias transcribed speech provides to the model, and offer supporting evidence for this.Comment: ACL 201

arXiv.org e-Print Archive

Crossref

Tilburg University Repository