2,457 research outputs found

    Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

    Get PDF
    Automatic emotion recognition from speech has been recently focused on the prediction of time-continuous dimensions (e.g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition system, based on ensemble of single-speaker-regression-models (SSRMs). The estimation of emotion is provided by combining a subset of the initial pool of SSRMs selecting those that are most concordance among them. The proposed approach allows the addition or removal of speakers from the ensemble without the necessity to re-build the entire machine learning system. The simplicity of this aggregation strategy, coupled with the flexibility assured by the modular architecture, and the promising results obtained on the RECOLA database highlight the potential implications of the proposed method in a real-life scenario and in particular in WEB-based applications

    Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets

    Get PDF
    In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of Abstract naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance.Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra- segmental audio features; prosodic features seem to be the most important ones

    I hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on ASR performance

    Get PDF
    We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient

    Life long learning in rural areas: a report to the Countryside Agency

    Get PDF
    Lifelong Learning is a broad umbrella term which includes many different kinds of provision and different forms of learning. At its heart is formal learning, often classroom based, or involving paper and electronic media, undertaken within educational institutions such as colleges and universities. It may or may not lead to an award and it includes learning undertaken for vocational reasons as well as for general interest. It encompasses what are sometimes also known as adult education, continuing education, continuing professional development (cpd), vocational training and the acquisition of basic skills. It may also include work-based learning, and may overlap with post compulsory (post 16) education, i.e. with further education and higher education, but normally applies to all ‘adult learning’ i.e. by people over the age of 19, in particular those who are returning to study after completing their initial education. From the perspective of the individual learner, however, non-formal learning (organised, systematic study carried on outside the framework of the formal system) is also important. This forms a continuum with informal learning that occurs frequently in the process of daily living, sometimes coincidentally for example through information media or through interpretive provision (such as at museums or heritage sites ). This report focuses on those aspects of adult learning which are directly affected by government policies, and thus of prime concern for rural proofing

    Canonical differential geometry of string backgrounds

    Full text link
    String backgrounds and D-branes do not possess the structure of Lorentzian manifolds, but that of manifolds with area metric. Area metric geometry is a true generalization of metric geometry, which in particular may accommodate a B-field. While an area metric does not determine a connection, we identify the appropriate differential geometric structure which is of relevance for the minimal surface equation in such a generalized geometry. In particular the notion of a derivative action of areas on areas emerges naturally. Area metric geometry provides new tools in differential geometry, which promise to play a role in the description of gravitational dynamics on D-branes.Comment: 20 pages, no figures, improved journal versio

    Naturalistic Affective Expression Classification by a Multi-Stage Approach Based on Hidden Markov Models

    Get PDF
    In naturalistic behaviour, the affective states of a person change at a rate much slower than the typical rate at which video or audio is recorded (e.g. 25fps for video). Hence, there is a high probability that consecutive recorded instants of expressions represent a same affective content. In this paper, a multi-stage automatic affective expression recognition system is proposed which uses Hidden Markov Models (HMMs) to take into account this temporal relationship and finalize the classification process. The hidden states of the HMMs are associated with the levels of affective dimensions to convert the classification problem into a best path finding problem in HMM. The system was tested on the audio data of the Audio/Visual Emotion Challenge (AVEC) datasets showing performance significantly above that of a one-stage classification system that does not take into account the temporal relationship, as well as above the baseline set provided by this Challenge. Due to the generality of the approach, this system could be applied to other types of affective modalities

    Multiple episodes of star formation in the CN15/16/17 molecular complex

    Full text link
    We have started a campaign to identify massive star clusters inside bright molecular bubbles towards the Galactic Center. The CN15/16/17 molecular complex is the first example of our study. The region is characterized by the presence of two young clusters, DB10 and DB11, visible in the NIR, an ultra-compact HII region identified in the radio, several young stellar objects visible in the MIR, a bright diffuse nebulosity at 8\mu m coming from PAHs and sub-mm continuum emission revealing the presence of cold dust. Given its position on the sky (l=0.58, b=-0.85) and its kinematic distance of ~7.5 kpc, the region was thought to be a very massive site of star formation in proximity of the CMZ. The cluster DB11 was estimated to be as massive as 10^4 M_sun. However the region's properties were known only through photometry and its kinematic distance was very uncertain given its location at the tangential point. We aimed at better characterizing the region and assess whether it could be a site of massive star formation located close to the Galactic Center. We have obtained NTT/SofI JHKs photometry and long slit K band spectroscopy of the brightest members. We have additionally collected data in the radio, sub-mm and mid infrared, resulting in a quite different picture of the region. We have confirmed the presence of massive early B type stars and have derived a spectro-photometric distance of ~1.2 kpc, much smaller than the kinematic distance. Adopting this distance we obtain clusters masses of M(DB10) ~ 170 M_sun and M(DB11) ~ 275 M_sun. This is consistent with the absence of any O star, confirmed by the excitation/ionization status of the nebula. No HeI diffuse emission is detected in our spectroscopic observations at 2.113\mu m, which would be expected if the region was hosting more massive stars. Radio continuum measurements are also consistent with the region hosting at most early B stars.Comment: Accepted for publication in Astronomy and Astrophysics. Fig. 1 and 3 presented in reduced resolutio

    Faster Base64 Encoding and Decoding Using AVX2 Instructions

    Get PDF
    Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the efficiency of base64 encoding and decoding. Compared to state-of-the-art implementations, we multiply the speeds of both the encoding (~10x) and the decoding (~7x). We achieve these good results by using the single-instruction-multiple-data (SIMD) instructions available on recent Intel processors (AVX2). Our accelerated software abides by the specification and reports errors when encountering characters outside of the base64 set. It is available online as free software under a liberal license.Comment: software at https://github.com/lemire/fastbase6

    The ISOGAL field FC--01863+00035: Mid-IR interstellar extinction and stellar populations

    Get PDF
    A 0.35\degr ×\times 0.29\degr field centered at ll=--18.63\degr, bb=0.35\degr was observed during the ISOGAL survey by ISOCAM imaging at 7μ\mum and 15{\rm μ\mum}. 648 objects were detected and their brightness are measured. By combining with the DENIS data in the near-infrared J and KS_{\rm S} bands, one derives the extinction at 7{\rm μ\mum} through AKSA7=0.35(AJAKS){\rm A_{K_{\rm S}}-A_7= 0.35 (A_J-A_{K_{\rm S}})} which yields A7_{7}/AV_{\it V} \sim0.03 from the near-IR extinction values of van de Hulst--Glass (Glass 1999). The extinction structure along the line of sight is then determined from the values of J--KS_{\rm S} or KS_{\rm S}--[7] of the ISOGAL sources identified as RGB or early AGB stars with mild mass-loss. The distribution of AV_{\it V} ranges from 0 to \sim45 and it reflects the concentration of the extinction in the spiral arms. Based on their locations in color-magnitude diagrams and a few cross-identifications with IRAS and MSX sources, the nature of objects is discussed in comparison with the case of a low extinction field in Baade's Window. Most of the objects are either AGB stars with moderate mass loss rate or luminous RGB stars. Some of them may be AGB stars with high mass loss rate. In addition, a few young stellar objects (YSOs) are present
    corecore