7,119 research outputs found
BEA â A multifunctional Hungarian spoken language database
In diverse areas of linguistics, the demand for studying actual language use is on
the increase. The aim of developing a phonetically-based multi-purpose database of
Hungarian spontaneous speech, dubbed BEA2, is to accumulate a large amount of
spontaneous speech of various types together with sentence repetition and reading.
Presently, the recorded material of BEA amounts to 260 hours produced by 280
present-day Budapest speakers (ages between 20 and 90, 168 females and 112
males), providing also annotated materials for various types of research and practical
applications
Automatic Phonetic Transcription of Non-Prompted Speech
A reliable method for automatic phonetic transcription of nonâ prompted German speech has been developed at th
Data-driven Natural Language Generation: Paving the Road to Success
We argue that there are currently two major bottlenecks to the commercial use
of statistical machine learning approaches for natural language generation
(NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b)
The scarcity of high quality in-domain corpora. We address the first problem by
thoroughly analysing current evaluation metrics and motivating the need for a
new, more reliable metric. The second problem is addressed by presenting a
novel framework for developing and evaluating a high quality corpus for NLG
training.Comment: WiNLP workshop at ACL 201
Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation
We investigate whether infant-directed speech (IDS) could facilitate word
form learning when compared to adult-directed speech (ADS). To study this, we
examine the distribution of word forms at two levels, acoustic and
phonological, using a large database of spontaneous speech in Japanese. At the
acoustic level we show that, as has been documented before for phonemes, the
realizations of words are more variable and less discriminable in IDS than in
ADS. At the phonological level, we find an effect in the opposite direction:
the IDS lexicon contains more distinctive words (such as onomatopoeias) than
the ADS counterpart. Combining the acoustic and phonological metrics together
in a global discriminability score reveals that the bigger separation of
lexical categories in the phonological space does not compensate for the
opposite effect observed at the acoustic level. As a result, IDS word forms are
still globally less discriminable than ADS word forms, even though the effect
is numerically small. We discuss the implication of these findings for the view
that the functional role of IDS is to improve language learnability.Comment: Draf
Speech and Speech-Related Resources at BAS
The Bavarian Archive for Speech Signals (BAS) located at the Ludwig Maximilians Universitat Munchen, Germany collects, evaluates, produces and disseminates German speech resources to the scientific community. Our focus is the German language covering a large geographical part of central Europe. Speech and speech-related resources are usually produced for certain tasks or projects. Therefore, it is not easy for scientists or engineers starting a new project or application to decide, whether existing resources may be re-used for their special purpose, or whether it is necessary to finance an new specialised data collection (which is usually very expensive). With this contribution we'll try to facilitate this decision by giving detailed information about existing resources as well as possibilities to produce new resources. This paper has two major parts. The first part deals with our experiences during the last three years to produce new, highly re-usable speech resources in close cooper..
- âŠ