6 research outputs found

    Syllable classification using static matrices and prosodic features

    Get PDF
    In this paper we explore the usefulness of prosodic features for syllable classification. In order to do this, we represent the syllable as a static analysis unit such that its acoustic-temporal dynamics could be merged into a set of features that the SVM classifier will consider as a whole. In the first part of our experiment we used MFCC as features for classification, obtaining a maximum accuracy of 86.66%. The second part of our study tests whether the prosodic information is complementary to the cepstral information for syllable classification. The results obtained show that combining the two types of information does improve the classification, but further analysis is necessary for a more successful combination of the two types of features

    Automatic Segmentation of Indonesian Speech into Syllables using Fuzzy Smoothed Energy Contour with Local Normalization, Splitting, and Assimilation

    Get PDF
    This paper discusses the usage of short term energy contour of a speech smoothed by a fuzzy-based method to automatically segment the speech into syllabic units. Two additional procedures, local normalization and postprocessing, are proposed to improve the method. Testing to Indonesian speech dataset shows that local normalization significantly improves the accuracy of fuzzy smoothing. In postprocessing step, the procedure of splitting missed short syllables reduces the deletion errors, but unfortunately it increases the insertion ones. On the other hand, an assimilation of a single consonant segment into its previous or next segment reduces the insertion errors, but increases the deletion ones. The sequential combination of splitting and then assimilation gives quite significant improvement of accuracy as well as reduction of deletion errors, but it slightly increases the insertion ones

    From Speech to Personality: Mapping Voice Quality and Intonation into Personality Differences

    Get PDF
    From a cognitive point of view, personality perception corresponds to capturing individual dierences and can be thought of as positioning the people around us in an ideal personality space. The more similar the personality of two individuals the closer their position in the space. This work shows that the mutual position of two individuals in the personality space can be inferred from prosodic features. The experiments, based on ordinal regression techniques, have been performed over a corpus of 640 speech samples comprising 322 individuals assessed in terms of personality traits by 11 human judges, which is the largest database of this type in the literature. The results show that the mutual position of two individuals can be predicted with up to 80% accuracy

    The SSPNet-Mobile Corpus: from the detection of non-verbal cues to the inference of social behaviour during mobile phone conversations

    Get PDF
    Mobile phones are one of the main channels of communication in contemporary society. However, the effect of the mobile phone on both the process of and, also, the non-verbal behaviours used during conversations mediated by this technology, remain poorly understood. This thesis aims to investigate the role of the phone on the negotiation process as well as, the automatic analysis of non-verbal behavioural cues during conversations using mobile telephones, by following the Social Signal Processing approach. The work in this thesis includes the collection of a corpus of 60 mobile phone conversations involving 120 subjects, development of methods for the detection of non-verbal behavioural events (laughter, fillers, speech and silence) and the inference of characteristics influencing social interactions (personality traits and conflict handling style) from speech and movements while using the mobile telephone, as well as the analysis of several factors that influence the outcome of decision-making processes while using mobile phones (gender, age, personality, conflict handling style and caller versus receiver role). The findings show that it is possible to recognise behavioural events at levels well above chance level, by employing statistical language models, and that personality traits and conflict handling styles can be partially recognised. Among the factors analysed, participant role (caller versus receiver) was the most important in determining the outcome of negotiation processes in the case of disagreement between parties. Finally, the corpus collected for the experiments (the SSPNet-Mobile Corpus) has been used in an international benchmarking campaign and constitutes a valuable resource for future research in Social Signal Processing and more generally in the area of human-human communication
    corecore