106 research outputs found

    Modeling Long and Short-term prosody for language identification

    Get PDF
    International audienceThis paper addresses the problem of modeling prosody for language identification. The main goal is to validate (or invalidate) some languages characteristics proposed by the linguists by the mean of an automatic language identification (ALI) system. In previous papers, we defined a prosodic unit, the pseudo-syllable. Static modeling has proven the relevance of the pseudo-syllable unit for ALI. In this paper, we try to model the prosody dynamics. This is achieved by the separation of long-term and short-term components of prosody and the proposing of suitable models. Experiments are made on seven languages and the efficiency of the modeling is discussed

    Automatic prosodic variations modelling for language and dialect discrimination

    Get PDF
    International audienceThis paper addresses the problem of modelling prosody for language identification. The aim is to create a system that can be used prior to any linguistic work to show if prosodic differences among languages or dialects can be automatically determined. In previous papers, we defined a prosodic unit, the pseudo-syllable. Rhythmic modelling has proven the relevance of the pseudo-syllable unit for automatic language identification. In this paper, we propose to model the prosodic variations, that is to say model sequences of prosodic units. This is achieved by the separation of phrase and accentual components of intonation. We propose an independent coding of those components on differentiated scales of duration. Short-term and long-term language-dependent sequences of labels are modelled by n-gram models. The performance of the system is demonstrated by experiments on read speech and evaluated by experiments on spontaneous speech. Finally, an experiment is described on the discrimination of Arabic dialects, for which there is a lack of linguistic studies, notably on prosodic comparisons. We show that our system is able to clearly identify the dialectal areas, leading to the hypothesis that those dialects have prosodic differences

    Regularized Optimal Transport and the Rot Mover's Distance

    Full text link
    This paper presents a unified framework for smooth convex regularization of discrete optimal transport problems. In this context, the regularized optimal transport turns out to be equivalent to a matrix nearness problem with respect to Bregman divergences. Our framework thus naturally generalizes a previously proposed regularization based on the Boltzmann-Shannon entropy related to the Kullback-Leibler divergence, and solved with the Sinkhorn-Knopp algorithm. We call the regularized optimal transport distance the rot mover's distance in reference to the classical earth mover's distance. We develop two generic schemes that we respectively call the alternate scaling algorithm and the non-negative alternate scaling algorithm, to compute efficiently the regularized optimal plans depending on whether the domain of the regularizer lies within the non-negative orthant or not. These schemes are based on Dykstra's algorithm with alternate Bregman projections, and further exploit the Newton-Raphson method when applied to separable divergences. We enhance the separable case with a sparse extension to deal with high data dimensions. We also instantiate our proposed framework and discuss the inherent specificities for well-known regularizers and statistical divergences in the machine learning and information geometry communities. Finally, we demonstrate the merits of our methods with experiments using synthetic data to illustrate the effect of different regularizers and penalties on the solutions, as well as real-world data for a pattern recognition application to audio scene classification

    Comparison of Spectral Properties of Read, Prepared and Casual Speech in French

    Get PDF
    International audienceIn this paper, we investigate the acoustic properties of phonemes in three speaking styles: read speech, prepared speech and spontaneous speech. Our aim is to better understand why speech recognition systems still fails to achieve good performances on spontaneous speech. This work follows the work of Nakamura et al. \cite{nakamura2008} on Japanese speaking styles, with the difference that we here focus on French. Using Nakamura's method, we use classical speech recognition features, MFCC, and try to represent the effects of the speaking styles on the spectral space. Two measurements are defined in order to represent the spectral space reduction and the spectral variance extension. Experiments are then carried on to investigate if indeed we find some differences between the three speaking styles using these measurements. We finally compare our results to those obtained by Nakamura on Japanese to see if the same phenomenon appears

    In search of cues discriminating West-african accents in French

    Get PDF
    International audienceThis study investigates to what extent West-African French accents can be distinguished, based on recordings made in Burkina Faso, Ivory Coast, Mali and Senegal. First, a perceptual experiment was conducted, suggesting that these accents are well identified by West-African listeners (especially the Senegal and Ivory Coast accents). Second, prosodic and segmental cues were studied by using speech processing methods such as automatic phoneme alignment. Results show that the Senegal accent (with a tendency toward word-initial stress followed by a falling pitch movement) and the Ivory Coast accent (with a tendency to delete/vocalise the /R/ consonant) are most distinct from standard French and among the West-African accents under investigation

    Rhythmic unit extraction and modelling for automatic language identification

    Get PDF
    International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

    Facial Action Units Intensity Estimation by the Fusion of Features with Multi-kernel Support Vector Machine

    Get PDF
    International audience— Automatic facial expression recognition has emerged over two decades. The recognition of the posed facial expressions and the detection of Action Units (AUs) of facial expression have already made great progress. More recently, the automatic estimation of the variation of facial expression, either in terms of the intensities of AUs or in terms of the values of dimensional emotions, has emerged in the field of the facial expression analysis. However, discriminating different intensities of AUs is a far more challenging task than AUs detection due to several intractable problems. Aiming to continuing standardized evaluation procedures and surpass the limits of the current research, the second Facial Expression Recognition and Analysis challenge (FERA2015) is presented. In this context, we propose a method using the fusion of the different appearance and geometry features based on a multi-kernel Support Vector Machine (SVM) for the automatic estimation of the intensities of the AUs. The result of our approach benefiting from taking advantages of the different features adapting to a multi-kernel SVM is shown to outperform the conventional methods based on the mono-type feature with single kernel SVM

    Going ba-na-nas: Prosodic analysis of spoken Japanese attitudes

    Get PDF
    International audienceThe aim of this paper is to examine cues for prosodic characterization of attitudes in Japanese. This work is based on previous studies where 16 communicative social affects were defined. The audio signal parameters (fundamental frequency, amplitude and duration) of previously recorded Japanese attitudes, are statistically analyzed. Interesting interactions among the parameters, the gender and the expression of specific attitude (e.g. politeness) were found, and we report on which parameters most significantly characterize each attitude. Index Terms: speech, prosody, attitude, social affect, emotional speech, Japanese languag

    Extraction automatique de paramĂštres prosodiques pour l'identification automatique des langues

    Get PDF
    International audienceThe aim of this study is to propose a new approach to Automatic Language Identi - cation: it is based on rhythmic modelling and fundamental frequency modelling and does not require any hand labelled data. First we need to investigate how prosodic or rhythmic information can be taken into account for Automatic Language Identi cation. A new automatically extracted unit, the pseudo syllable, is introduced. Rhythmic and intonative features are then automatically extracted from this unit. Elementary decision modules are de ned with gaussian mixture models. These prosodic modellings are combined with a more classical approach, a vocalic system acoustic modelling. Experiments are conducted on the ve European languages of the MULTEXT corpus: English, French, German, Italian and Spanish. The relevance of the rhythmic parameters and the ef ciency of each system (rhythmic model, fundamental frequency model and vowel system model) are evaluated. The in uence of these approaches on the performances of automatic language identi cation system is addressed. We obtain 91 % of correct identi cation with 21 s. utterances using all the information sources
    • 

    corecore