1,477 research outputs found
A ROBUST ENSEMBLE MODEL FOR SPOKEN LANGUAGE RECOGNITION
Effective decision-making in industry conditions requires access and proper presentation of manufacturing data on the realised manufacturing process. Although the frequently applied ERP systems allow for recording economic events, their potential for decision support is limited. The article presents an original system for reporting manufacturing data based on Business Intelligence technology as a support for junior and middle management. As an example a possibility of utilising data from ERP systems to support decision-making in the field of purchases and logistics in small and medium enterprises
Speech and Speech-Related Resources at BAS
The Bavarian Archive for Speech Signals (BAS) located at the Ludwig Maximilians Universitat Munchen, Germany collects, evaluates, produces and disseminates German speech resources to the scientific community. Our focus is the German language covering a large geographical part of central Europe. Speech and speech-related resources are usually produced for certain tasks or projects. Therefore, it is not easy for scientists or engineers starting a new project or application to decide, whether existing resources may be re-used for their special purpose, or whether it is necessary to finance an new specialised data collection (which is usually very expensive). With this contribution we'll try to facilitate this decision by giving detailed information about existing resources as well as possibilities to produce new resources. This paper has two major parts. The first part deals with our experiences during the last three years to produce new, highly re-usable speech resources in close cooper..
Improving ASR performance using context-dependent phoneme model
Purpose – The purpose of this paper is to present evidence of the need to have a carefully designed
lexical model for speech recognition for dyslexic children reading in Bahasa Melayu (BM).
Design/methodology/approach – Data collection is performed to obtain the most frequent reading
error patterns and the reading recordings. Design and development of the lexical model considers the
errors for better recognition accuracy.
Findings – It is found that the recognition accuracy is increased to 75 percent when using contextdependent
(CD) phoneme model and phoneme refinement rule. Comparison between contextindependent phoneme models and CD phoneme model is also presented.
Research limitations/implications – The most frequent errors recognized and obtained from data
collection and analysis illustrate and support that phonological deficit is the major factor for reading
disabilities in dyslexics.
Practical implications – This paper provides the first step towards materializing an automated
speech recognition (ASR)-based application to support reading for BM, which is the first language in Malaysia.
Originality/value – The paper contributes to the knowledge of the most frequent error patterns for dyslexic children’s reading in BM and to the knowledge that a CD phoneme model together with the phoneme refinement rule can built up a more fine-tuned lexical model for an ASR specifically for dyslexic children’s reading isolated words in BM
Symbolic inductive bias for visually grounded learning of spoken language
A widespread approach to processing spoken language is to first automatically
transcribe it into text. An alternative is to use an end-to-end approach:
recent works have proposed to learn semantic embeddings of spoken language from
images with spoken captions, without an intermediate transcription step. We
propose to use multitask learning to exploit existing transcribed speech within
the end-to-end setting. We describe a three-task architecture which combines
the objectives of matching spoken captions with corresponding images, speech
with text, and text with images. We show that the addition of the speech/text
task leads to substantial performance improvements on image retrieval when
compared to training the speech/image task in isolation. We conjecture that
this is due to a strong inductive bias transcribed speech provides to the
model, and offer supporting evidence for this.Comment: ACL 201
- …