5,444 research outputs found
Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information
In conversational speech, the acoustic signal provides cues that help
listeners disambiguate difficult parses. For automatically parsing spoken
utterances, we introduce a model that integrates transcribed text and
acoustic-prosodic features using a convolutional neural network over energy and
pitch trajectories coupled with an attention-based recurrent neural network
that accepts text and prosodic features. We find that different types of
acoustic-prosodic features are individually helpful, and together give
statistically significant improvements in parse and disfluency detection F1
scores over a strong text-only baseline. For this study with known sentence
boundaries, error analyses show that the main benefit of acoustic-prosodic
features is in sentences with disfluencies, attachment decisions are most
improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201
An infrastructure for Turkish prosody generation in text-to-speech synthesis
Text-to-speech engines benefit from natural language processing while generating the appropriate prosody. In this study, we investigate the natural language processing infrastructure for Turkish prosody generation in three steps as pronunciation disambiguation, phonological phrase detection and intonation level assignment. We focus on phrase boundary detection and intonation assignment. We propose a phonological phrase detection scheme based on syntactic analysis for Turkish and assign one of three intonation levels to words in detected phrases. Empirical observations on 100 sentences show that the proposed scheme works with approximately 85% accuracy
Research on Architectures for Integrated Speech/Language Systems in Verbmobil
The German joint research project Verbmobil (VM) aims at the development of a
speech to speech translation system. This paper reports on research done in our
group which belongs to Verbmobil's subproject on system architectures (TP15).
Our specific research areas are the construction of parsers for spontaneous
speech, investigations in the parallelization of parsing and to contribute to
the development of a flexible communication architecture with distributed
control.Comment: 6 pages, 2 Postscript figure
Morphological word structure in English and Swedish : the evidence from prosody
Trubetzkoy's recognition of a delimitative function of phonology, serving to signal boundaries between morphological units, is expressed in terms of alignment constraints in Optimality Theory, where the relevant constraints require specific morphological boundaries to coincide with phonological structure (Trubetzkoy 1936, 1939, McCarthy & Prince 1993). The approach pursued in the present article is to investigate the distribution of phonological boundary signals to gain insight into the criteria underlying morphological analysis. The evidence from English and Swedish suggests that necessary and sufficient conditions for word-internal morphological analysis concern the recognizability of head constituents, which include the rightmost members of compounds and head affixes. The claim is that the stability of word-internal boundary effects in historical perspective cannot in general be sufficiently explained in terms of memorization and imitation of phonological word form. Rather, these effects indicate a morphological parsing mechanism based on the recognition of word-internal head constituents. Head affixes can be shown to contrast systematically with modifying affixes with respect to syntactic function, semantic content, and prosodic properties. That is, head affixes, which cannot be omitted, often lack inherent meaning and have relatively unmarked boundaries, which can be obscured entirely under specific phonological conditions. By contrast, modifying affixes, which can be omitted, consistently have inherent meaning and have stronger boundaries, which resist prosodic fusion in all phonological contexts. While these correlations are hardly specific to English and Swedish it remains to be investigated to which extent they hold cross-linguistically. The observation that some of the constituents identified on the basis of prosodic evidence lack inherent meaning raises the issue of compositionality. I will argue that certain systematic aspects of word meaning cannot be captured with reference to the syntagmatic level, but require reference to the paradigmatic level instead. The assumption is then that there are two dimensions of morphological analysis: syntagmatic analysis, which centers on the criteria for decomposing words in terms of labelled constituents, and paradigmatic analysis, which centers on the criteria for establishing relations among (whole) words in the mental lexicon. While meaning is intrinsically connected with paradigmatic analysis (e.g. base relations, oppositeness) it is not essential to syntagmatic analysis
Unsupervised syntactic chunking with acoustic cues: Computational models for prosodic bootstrapping
Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosody, which NLP systems typically ignore. To evaluate the utility of prosodic information for phrase discovery, we present an HMM-based unsupervised chunker that learns from only transcribed words and raw acoustic correlates to prosody. Unlike previous work on unsupervised parsing and chunking, we use neither gold standard part-of-speech tags nor punctuation in the input. Evaluated on the Switchboard corpus, our model outperforms several baselines that exploit either lexical or prosodic information alone, and, despite producing a flat structure, performs competitively with a state-of-the-art unsupervised lexicalized parser, with a substantial advantage in precision. Our results support the hypothesis that acoustic-prosodic cues provide useful evidence about syntactic phrases for language-learning infants.10 page(s
- âŠ