139 research outputs found
Multigranular scale speech recognition: tehnological and cognitive view
We propose a Multigranular Automatic Speech Recognizer. The hypothesis is that
speech signal contains information distributed on more different time scales.
Many works from various scientific fields ranging from neurobiology to speech
technologies, seem to concord on this assumption. In a broad sense, it seems
that speech recognition in human is optimal because of a partial
parallelization process according to which the left-to-right stream of
speech is captured in a multilevel grid in which several linguistic analyses take
place contemporarily. Our investigation aims, in this view, to apply these new
ideas to the project of more robust and efficient recognizers
Syllable classification using static matrices and prosodic features
In this paper we explore the usefulness of prosodic features for
syllable classification. In order to do this, we represent the
syllable as a static analysis unit such that its acoustic-temporal
dynamics could be merged into a set of features that the SVM
classifier will consider as a whole. In the first part of our
experiment we used MFCC as features for classification,
obtaining a maximum accuracy of 86.66%. The second part of
our study tests whether the prosodic information is
complementary to the cepstral information for syllable
classification. The results obtained show that combining the
two types of information does improve the classification, but
further analysis is necessary for a more successful
combination of the two types of features
Silent pauses as clarification trigger
Among possible pragmatic feedback an interlocutor can use to acknowledge the degree of understanding of an utterance, clarification requests (CRs) are to be considered. The functional role of CRs can furthermore be expressed via silent pauses - or failed turn-giving moves - which express an understanding problem and are solved through a clarify speech act. In this work, we therefore hypothesise that some silent pauses, in specific conditions, may also have an interactional role which is interpreted by the speaker as a clarification need
Machine Learning of Probabilistic Phonological Pronunciation Rules from the Italian CLIPS Corpus
A blending of phonological concepts and technical analysis is proposed to yield a better modeling and understanding of
phonological processes. Based on the manual segmentation and labeling of the Italian CLIPS corpus we automatically derive a probabilistic set of phonological pronunciation rules: a new alignment technique is used to map the phonological form of spontaneous sentences onto the phonetic surface form. A machine-learning algorithm then calculates a set of phonologi-
cal replacement rules together with their conditional probabilities. A critical analysis of the resulting probabilistic rule set is presented and discussed with regard to regional Italian accents. The rule set presented here is also applied in the newly
published web-service WebMAUS that allows a user to segment and phonetically label Italian speech via a simple web-interface
- …