3,053 research outputs found

    Midbrain areas as candidates for audio-vocal interface in echolocating bats

    Get PDF

    Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

    Get PDF
    Automatic emotion recognition from speech has been recently focused on the prediction of time-continuous dimensions (e.g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition system, based on ensemble of single-speaker-regression-models (SSRMs). The estimation of emotion is provided by combining a subset of the initial pool of SSRMs selecting those that are most concordance among them. The proposed approach allows the addition or removal of speakers from the ensemble without the necessity to re-build the entire machine learning system. The simplicity of this aggregation strategy, coupled with the flexibility assured by the modular architecture, and the promising results obtained on the RECOLA database highlight the potential implications of the proposed method in a real-life scenario and in particular in WEB-based applications

    Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets

    Get PDF
    In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of Abstract naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance.Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra- segmental audio features; prosodic features seem to be the most important ones

    Stellar clusters in the inner Galaxy and their correlation with cold dust emission

    Full text link
    Stars are born within dense clumps of giant molecular clouds, constituting young stellar agglomerates known as embedded clusters, which only evolve into bound open clusters under special conditions. We statistically study all embedded clusters (ECs) and open clusters (OCs) known so far in the inner Galaxy, investigating particularly their interaction with the surrounding molecular environment and the differences in their evolution. We first compiled a merged list of 3904 clusters from optical and infrared clusters catalogs in the literature, including 75 new (mostly embedded) clusters discovered by us in the GLIMPSE survey. From this list, 695 clusters are within the Galactic range |l| < 60 deg and |b| < 1.5 deg covered by the ATLASGAL survey, which was used to search for correlations with submm dust continuum emission tracing dense molecular gas. We defined an evolutionary sequence of five morphological types: deeply embedded cluster (EC1), partially embedded cluster (EC2), emerging open cluster (OC0), OC still associated with a submm clump in the vicinity (OC1), and OC without correlation with ATLASGAL emission (OC2). Together with this process, we performed a thorough literature survey of these 695 clusters, compiling a considerable number of physical and observational properties in a catalog that is publicly available. We found that an OC defined observationally as OC0, OC1, or OC2 and confirmed as a real cluster is equivalent to the physical concept of OC (a bound exposed cluster) for ages in excess of ~16 Myr. Some observed OCs younger than this limit can actually be unbound associations. We found that our OC and EC samples are roughly complete up to ~1 kpc and ~1.8 kpc from the Sun, respectively, beyond which the completeness decays exponentially. Using available age estimates for a few ECs, we derived an upper limit of 3 Myr for the duration of the embedded phase... (Abridged)Comment: 39 pages, 9 figures. Accepted for publication in A&A on Sept 16, 2013. The catalog will be available at the CDS after official publication of the articl

    Lubrication and performance of high-speed rolling-element bearings

    Get PDF
    Trends in aircraft engine operating speeds have dictated the need for rolling-element bearings capable of speeds to 3 million DN. A review of high-speed rolling-element bearing state-of-the-art performance and lubrication is presented. Through the use of under-race lubrication and bearing thermal management bearing operation can be obtained to speeds of 3 million DN. Jet lubricated ball bearings are limited to 2.5 million DN for large bore sizes and to 3 million DN for small bore sizes. Current computer programs are able to predict bearing thermal performance

    Shared acoustic codes underlie emotional communication in music and speech—Evidence from deep transfer learning

    Get PDF
    Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies—the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain
    • …
    corecore