3,782 research outputs found

    You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

    Full text link
    Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (text-to-speech, or TTS), we build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition model. We argue that, when the training data amount is relatively low, this approach can allow an end-to-end model to reach hybrid systems' quality. For an artificial low-to-medium-resource setup, we compare the proposed augmentation with the semi-supervised learning technique. We also investigate the influence of vocoder usage on final ASR performance by comparing Griffin-Lim algorithm with our modified LPCNet. When applied with an external language model, our approach outperforms a semi-supervised setup for LibriSpeech test-clean and only 33% worse than a comparable supervised setup. Our system establishes a competitive result for end-to-end ASR trained on LibriSpeech train-clean-100 set with WER 4.3% for test-clean and 13.5% for test-other

    Perception of linguistic rhythm by newborn infants

    Get PDF
    Previous studies have shown that newborn infants are able to discriminate between certain languages, and it has been suggested that they do so by categorizing varieties of speech rhythm. However, in order to confirm this hypothesis, it is necessary to show that language discrimination is still performed by newborns when all speech cues other than rhythm are removed. Here, we conducted a series of experiments assessing discrimination between Dutch and Japanese by newborn infants, using a speech resynthesis technique to progressively degrade non-rhythmical properties of the sentences. When the stimuli are resynthesized using identical phonemes and artificial intonation contours for the two languages, thereby preserving only their rhythmic structure, newborns are still able to discriminate the languages. We conclude that new-borns are able to classify languages according to their type of rhythm, and that this ability may help them bootstrap other phonological properties of their native language

    Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues

    Get PDF
    Speech rhythm has long been claimed to be a useful bootstrapping cue in the very first steps of language acquisition. Previous studies have suggested that newborn infants do categorize varieties of speech rhythm, as demonstrated by their ability to discriminate between certain languages. However, the existing evidence is not unequivocal: in previous studies, stimuli discriminated by newborns always contained additional speech cues on top of rhythm. Here, we conducted a series of experiments assessing discrimination between Dutch and Japanese by newborn infants, using a speech resynthesis technique to progressively degrade non-rhythmical properties of the sentences. When the stimuli are resynthesized using identical phonemes and artificial intonation contours for the two languages, thereby preserving only their rhythmic and broad phonotactic structure, newborns still seem to be able to discriminate between the two languages, but the effect is weaker than when intonation is present. This leaves open the possibility that the temporal correlation between intonational and rhythmic cues might actually facilitate the processing of speech rhythm

    Affective Medicine: a review of Affective Computing efforts in Medical Informatics

    Get PDF
    Background: Affective computing (AC) is concerned with emotional interactions performed with and through computers. It is defined as “computing that relates to, arises from, or deliberately influences emotions”. AC enables investigation and understanding of the relation between human emotions and health as well as application of assistive and useful technologies in the medical domain. Objectives: 1) To review the general state of the art in AC and its applications in medicine, and 2) to establish synergies between the research communities of AC and medical informatics. Methods: Aspects related to the human affective state as a determinant of the human health are discussed, coupled with an illustration of significant AC research and related literature output. Moreover, affective communication channels are described and their range of application fields is explored through illustrative examples. Results: The presented conferences, European research projects and research publications illustrate the recent increase of interest in the AC area by the medical community. Tele-home healthcare, AmI, ubiquitous monitoring, e-learning and virtual communities with emotionally expressive characters for elderly or impaired people are few areas where the potential of AC has been realized and applications have emerged. Conclusions: A number of gaps can potentially be overcome through the synergy of AC and medical informatics. The application of AC technologies parallels the advancement of the existing state of the art and the introduction of new methods. The amount of work and projects reviewed in this paper witness an ambitious and optimistic synergetic future of the affective medicine field
    • …
    corecore