    Emotion Recognition from Acted and Spontaneous Speech

    Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

    Parallel task in Subjective Audio Quality and Speech Intelligibility Assessments

    Tato disertační práce se zabývá subjektivním testováním jak kvality řeči, tak i srozumitelnosti řeči, prozkoumává existující metody, určuje jejich základní principy a podstaty a porovnává jejich výhody a nevýhody. Práce také porovnává testy z hlediska různých parametrů a poskytuje moderní řešení pro již existující metody testování. První část práce se zabývá opakovatelností subjektivních testování provedených v ideálních laboratorních podmínkách. Takové úlohy opakovatelnosti se provádí použitím Pearsonové korelace, porovnání po párech a jinými matematickými analýzami. Tyto úlohy dokazují správnost postupů provedených subjektivních testů. Z tohoto důvodu byly provedeny čtyři subjektivní testy kvality řeči ve třech různých laboratořích. Získané výsledky potvrzují, že provedené testy byly vysoce opakovatelné a testovací požadavky byly striktně dodrženy. Dále byl proveden výzkum pro ověření významnosti subjektivních testování kvality řeči a srozumitelnosti řeči v komunikačních systémech. Za tímto účelem bylo analyzováno více než 16 miliónů záznamů živých hovorů přes VoIP telekomunikační sítě. Výsledky potvrdily základní předpoklad, že lepší uživatelská zkušenost působí delší trvání hovorů. Kromě dosažených hlavních výsledků však byly učiněny další důležité závěry. Dalším krokem disertační práce bylo prozkoumat techniku paralelních zátěží, existující přístupy a jejich výhody a nevýhody. Ukázalo se, že většina paralelních zátěží používaných v testech byla buď fyzicky, nebo mentálně orientovaná. Jelikož subjekty ve většině případů nejsou stejně fyzicky nebo mentálně zdatní, jejich výkony během úkolů nejsou stejné, takže výsledky nelze správně porovnat. V této disertační práci je navržen nový přístup, kdy jsou podmínky pro všechny subjekty stejné. Tento přístup představuje celou řadu úkolů, které zahrnují kombinaci mentálních a fyzických zátěží (simulátor laserové střelby, simulátor řízení auta, třídění předmětů apod.). Tyto metody byly použity v několika subjektivních testech kvality řeči a srozumitelnosti řeči. Závěry naznačují, že testy s paralelními zátěží mají realističtější výsledky než ty, které jsou prováděny v laboratorních podmínkách. Na základě výzkumu, zkušeností a dosažených výsledků byl Evropskému institutu pro normalizaci v telekomunikacích předložen nový standard s přehledem, příklady a doporučeními pro zajištění subjektivních testování kvality řeči a srozumitelnosti řeči. Standard byl přijat a publikován pod číslem ETSI TR 103 503.This thesis deals with the subjective testing of both speech quality and speech intelligibility, investigates the existing methods, record their main features, as well as advantages and disadvantages. The work also compares different tests in terms of various parameters and provides a modern solution for existing subjective testing methods. The first part of the research deals with the repeatability of subjective speech quality tests provided in perfect laboratory conditions. Such repeatability tasks are performed using Pearson correlations, pairwise comparison, and other mathematical analyses, and are meant to prove the correctness of procedures of provided subjective tests. For that reason, four subjective speech quality tests were provided in three different laboratories. The obtained results confirmed that the provided tests were highly repeatable, and the test requirements were strictly followed. Another research was done to verify the significance of speech quality and speech intelligibility tests in communication systems. To this end, more than 16 million live call records over VoIP telecommunications networks were analyzed. The results confirmed the primary assumption that better user experience brings longer call durations. However, alongside the main results, other valuable conclusions were made. The next step of the thesis was to investigate the parallel task technique, existing approaches, their advantages, and disadvantages. It turned out that the majority of parallel tasks used in tests were either physically or mentally oriented. As the subjects in most cases are not equally trained or intelligent, their performances during the tasks are not equal either, so the results could not be compared correctly. In this thesis, a novel approach is proposed where the conditions for all subjects are equal. The approach presents a variety of tasks, which include a mix of mental and physical tasks (laser-shooting simulator, car driving simulator, objects sorting, and others.). Afterward, the methods were used in several subjective speech quality and speech intelligibility tests. The results indicate that the tests with parallel tasks have more realistic values than the ones provided in laboratory conditions. Based on the research, experience, and achieved results, a new standard was submitted to the European Telecommunications Standards Institute with an overview, examples, and recommendations for providing subjective speech quality and speech intelligibility tests. The standard was accepted and published under the number ETSI TR 103 503

    A revised speech spectrum for STI calculations

    The ability of the Speech Transmission Index (STI) to predict speech intelligibility under noisy conditions is highly dependent on the assumed spectrum of the speech signal. Examination of the literature showed that the long-term average speech spectrum of male talkers differs substantially from the speech spectrum recommended for STI calculations (IEC 60268-16). To explore these issues, the long-term average speech spectrum of forty male British English people was first measured, compared with the available literature and proposed for STI calculations. Then, using several voice alarm systems, the influence of the measured spectrum on STI calculations was assessed and comparisons made with the standard speech spectrum. The results showed significant STI differences under noisy conditions and considerable reductions in the required electrical power with the use of the new proposed male spectrum. This indicated that the current STI method could benefit from a revised speech spectrum

    A novel method for subjective picture quality assessment and further studies of HDTV formats

    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ IEEE 2008.This paper proposes a novel method for the assessment of picture quality, called triple stimulus continuous evaluation scale (TSCES), to allow the direct comparison of different HDTV formats. The method uses an upper picture quality anchor and a lower picture quality anchor with defined impairments. The HDTV format under test is evaluated in a subjective comparison with the upper and lower anchors. The method utilizes three displays in a particular vertical arrangement. In an initial series of tests with the novel method, the HDTV formats 1080p/50,1080i/25, and 720p/50 were compared at various bit-rates and with seven different content types on three identical 1920 times 1080 pixel displays. It was found that the new method provided stable and consistent results. The method was tested with 1080p/50,1080i/25, and 720p/50 HDTV images that had been coded with H.264/AVC High profile. The result of the assessment was that the progressive HDTV formats found higher appreciation by the assessors than the interlaced HDTV format. A system chain proposal is given for future media production and delivery to take advantage of this outcome. Recommendations for future research conclude the paper

    Cultures of of Aspiration and Poverty? Aspirational Inequalities in Northeast and Southern Thailand

    The paper provides micro-level evidence of rising inequality in Thailand, using data from an intensive study of seven communities in Northeast and Southern Thailand. This inequality affects participants’ material and subjective wellbeing, their aspirations, and the extent to which they feel these are realised. The paper argues that adaptation, expressed as reduced aspirations, could explain why the effect of material poverty on people’s satisfaction with their lives is small. The reduction in attainment of aspirations linked to socio-economic status suggests that a small, but constant group of people are being excluded from a shift in the societal consensus over what constitutes a good life

    Subjective listening experiments for annoyance investigation

    Noise limits and guidelines that consider only the sound pressure level or the loudness of noises are not efficient in protecting people from all the adverse effects of noise. Other physical characteristics, e.g., tonality, modulation, and frequency content, should also be considered, especially when the noise level is low and it cannot cause hearing risk, but might lead to annoyance and disturbance. Annoying noises have an impact on health and well-being, but this impact and its relationship with the physical properties have not been sufficiently studied. Subjective annoyance caused by noises like those we experience in living spaces and offices should be further investigated via psychoacoustic laboratory experiments. The primary aim of this work was to develop systematic, effective, and reliable methodology to perform this type of psychoacoustic tests. The secondary aim was to investigate the objective metrics that best predict subjective annoyance in four typical noise conditions: ventilation noise in office spaces, traffic noise in homes, neighbors’ noise in homes, and noises with tonal components in homes. The main result was the development of the methodology, which in turn enabled us to define our own standards and guidelines. Furthermore, we identified the objective metrics that best correlated with subjective annoyance in each one of the four studied noise situations. In offices, five metrics predicted subjective ratings reasonably well. Noise with sound energy at higher frequencies was less tolerated. Noise with a slope of -7 dB per octave band increment resulted in the highest satisfaction. In dwellings, related to neighbors’ living sounds, four metrics of airborne sound insulation performed well to predict annoyance. We demonstrated that 50–80 Hz bands should not be included in the objective rating. In dwellings, related to five types of traffic noise transmitted through façade elements, one metric Rw+C50–3150 performed significantly better than the others. The last experiment proved that tonality is not properly considered in current standards and noise guidelines. The performed psychoacoustic research demonstrated that other physical properties than the sound pressure level should be considered when assessing noise annoyance, and it provided evidence to the objective metrics that would make noise guidelines more efficient with respect to health protection.Subjektiivisia kuuntelukokeita häiritsevyyden tutkimiseksi Melurajat ja ohjeet suojelevat ihmisiä melun haitallisista vaikutuksista, mutta ne ottavat enimmäkseen huomioon vain melun äänenpainetason tai voimakkuuden. Muut fyysiset ominaisuudet, kuten kapeakaistaisuus, modulaatio ja taajuussisältö, joilla on selvä vaikutus subjektiiviseen kokemukseen ja häiritsevyyteen, jätetään usein huomiomatta. Ärsyttävät äänet saattavat noudattaa lakia niiden kielteisistä vaikutuksista huolimatta, koska niiden äänenpainetaso ei ylitä yhtään melurajaa. Asuintilojen ja toimistojen melun aiheuttamaa subjektiivista ärsytystä tulisi tutkia tarkemmin psykoakustisten laboratoriokokeiden avulla. Työn ensisijaisena tavoitteena oli kehittää järjestelmällinen, tehokas ja luotettava menetelmä tämän tyyppisten psykoakustisten testien suorittamiseksi. Lisäksi selvitettiin, mitä muita objektiivisia mittareita, kuin äänenpainetaso tai äänenvoimakkuus, ennustavat parasta subjektiivista ärsytystä ja häiritsevyyttä. Työssä tutkittiin neljää tyypillistä meluolosuhdetta: toimistotilojen ilmanvaihdonääniä, kaupungin liikenteen melua kodeissa, naapurin melua kodeissa, ja kapeakaistaisia komponentteja sisältävää melua. Päätuloksena oli menetelmän kehittäminen, joka mahdollisti omien standardien ja toimintaohjeiden määrittämisen. Lisäksi tunnistettiin objektiiviset mittarit, jotka korreloivat paremmin subjektiivisen häiritsevyyden kanssa kussakin neljästä tutkitusta melutilanteesta. Toimistoissa viisi mittaria ennusti kohtuullisen hyvin subjektiivisia luokituksia. Kohinaa, joka kuului korkeammilla taajuuksilla toimivalla äänenergialla, siedettiin vähemmän. Asunnoissa, kun asumisääniä syntyy naapurin asunnossa, neljä ilmaääneneristysmittaria toimi hyvin ennustamaan asukkaiden subjektiivista ärsytystä. Osoitettiin, että 50–80 Hz: n kaistoja ei pitäisi sisällyttää objektiiviseen luokitukseen. Myös asunnoissa, liittyen viitteen eri liikennemeluun kantautumassa sisätilaan julkisivuelementtien kautta, yksi metrinen Rw+C50–3150 toimi huomattavasti paremmin kuin muut. Viimeinen koe osoitti, että tonaalisuutta ei oteta asianmukaisesti huomioon nykyisissä standardeissa ja meluohjeissa. Tämä tutkimus osoitti, että oikein suoritetut psykoakustiset kokeet tarjoavat laadullista ja määrällistä tietoa subjektiivisesta häiritsevyydestä, ja että näiden tietojen perusteella voidaan määrittää objektiiviset mittarit, jotka tekisivät ohjearvoista tehokkaampia melun haitallisilta vaikutuksilta suojauduttaessa

    Understanding user experience of mobile video: Framework, measurement, and optimization

    Since users have become the focus of product/service design in last decade, the term User eXperience (UX) has been frequently used in the field of Human-Computer-Interaction (HCI). Research on UX facilitates a better understanding of the various aspects of the user’s interaction with the product or service. Mobile video, as a new and promising service and research field, has attracted great attention. Due to the significance of UX in the success of mobile video (Jordan, 2002), many researchers have centered on this area, examining users’ expectations, motivations, requirements, and usage context. As a result, many influencing factors have been explored (Buchinger, Kriglstein, Brandt & Hlavacs, 2011; Buchinger, Kriglstein & Hlavacs, 2009). However, a general framework for specific mobile video service is lacking for structuring such a great number of factors. To measure user experience of multimedia services such as mobile video, quality of experience (QoE) has recently become a prominent concept. In contrast to the traditionally used concept quality of service (QoS), QoE not only involves objectively measuring the delivered service but also takes into account user’s needs and desires when using the service, emphasizing the user’s overall acceptability on the service. Many QoE metrics are able to estimate the user perceived quality or acceptability of mobile video, but may be not enough accurate for the overall UX prediction due to the complexity of UX. Only a few frameworks of QoE have addressed more aspects of UX for mobile multimedia applications but need be transformed into practical measures. The challenge of optimizing UX remains adaptations to the resource constrains (e.g., network conditions, mobile device capabilities, and heterogeneous usage contexts) as well as meeting complicated user requirements (e.g., usage purposes and personal preferences). In this chapter, we investigate the existing important UX frameworks, compare their similarities and discuss some important features that fit in the mobile video service. Based on the previous research, we propose a simple UX framework for mobile video application by mapping a variety of influencing factors of UX upon a typical mobile video delivery system. Each component and its factors are explored with comprehensive literature reviews. The proposed framework may benefit in user-centred design of mobile video through taking a complete consideration of UX influences and in improvement of mobile videoservice quality by adjusting the values of certain factors to produce a positive user experience. It may also facilitate relative research in the way of locating important issues to study, clarifying research scopes, and setting up proper study procedures. We then review a great deal of research on UX measurement, including QoE metrics and QoE frameworks of mobile multimedia. Finally, we discuss how to achieve an optimal quality of user experience by focusing on the issues of various aspects of UX of mobile video. In the conclusion, we suggest some open issues for future study

    Human response to aircraft noise

    The human auditory system and the perception of sound are discussed. The major concentration is on the annnoyance response and methods for relating the physical characteristics of sound to those psychosociological attributes associated with human response. Results selected from the extensive laboratory and field research conducted on human response to aircraft noise over the past several decades are presented along with discussions of the methodology commonly used in conducting that research. Finally, some of the more common criteria, regulations, and recommended practices for the control or limitation of aircraft noise are examined in light of the research findings on human response

    NASA and the challenge of ISDN: The role of satellites in an ISDN world

    To understand what role satellites may play in Integrated Services Digital Network (ISDN), it is necessary to understand the concept of ISDN, including key organizations involved, the current status of key standards recommendations, and domestic and international progress implementation of ISDN. Each of these areas are explained. A summary of the technical performance criteria for ISDN, current standards for satellites in ISDN, key players in the ISDN environment, and what steps can be taken to encourage application of satellites in ISDN are also covered

    Intergenerational Education for Social Inclusion and Solidarity: The Case Study of the EU Funded Project "Connecting Generations"

    This paper reflects on lessons learned from a validated model of international collaboration based on research and practice. During the European Year for Active Ageing, a partnership of seven organizations from the European Union plus Turkey implemented the Lifelong Learning Programme partnership “Connecting Generations‘ which involved universities, non-governmental organizations, third age Universities and municipalities in collaboration with local communities. Reckoning that Europe has dramatically changed in its demographic composition and is facing brand new challenges regarding intergenerational and intercultural solidarity, each partner formulated and tested innovative and creative practices that could enhance better collaboration and mutual understanding between youth and senior citizens, toward a more inclusive Europe for all. Several innovative local practices have experimented, attentively systematized and peer-valuated among the partners. On the basis of a shared theoretical framework coherent with EU and Europe and Training 2020 Strategy, an action-research approach was adopted throughout the project in order to understand common features that have been replicated and scaled up since today