736 research outputs found

    The wisdom of crowds versus the madness of crowds

    Get PDF
    Declining trust in northern liberal democratic institutions poses serious challenges to legislatures (parliaments). That mistrust extends to traditional media at a time when new digital media are fanning ‘fake news’ and a ‘madness of crowds’. Will the ‘wisdom of crowds’ on which liberal democracy critically depends prevail over the ‘madness’? Can parliaments resolve that tension positively? In New Zealand trust in political institutions is still high, but voter turnout has slid, especially among the young. Parliament has work to do

    A Comparison Between Convolutional and Transformer Architectures for Speech Emotion Recognition

    Get PDF
    © 2022, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This is the accepted manuscript version of a conference paper which has been published in final form at https://doi.org/10.1109/IJCNN55064.2022.9891882Creating speech emotion recognition models com-parable to the capability of how humans recognise emotions is a long-standing challenge in the field of speech technology with many potential commercial applications. As transformer-based architectures have recently become the state-of-the-art for many natural language processing related applications, this paper investigates their suitability for acoustic emotion recognition and compares them to the well-known AlexNet convolutional approach. This comparison is made using several publicly available speech emotion corpora. Experimental results demonstrate the efficacy of the different architectural approaches for particular emotions. The results show that the transformer-based models outperform their convolutional counterparts yielding F1-scores in the range [70.33%, 75.76%]. This paper further provides insights via dimensionality reduction analysis of output layer activations in both architectures and reveals significantly improved clustering in transformer-based models whilst highlighting the nuances with regard to the separability of different emotion classes

    Active Learning for Auditory Hierarchy

    Get PDF
    Much audio content today is rendered as a static stereo mix: fundamentally a fixed single entity. Object-based audio envisages the delivery of sound content using a collection of individual sound ‘objects’ controlled by accompanying metadata. This offers potential for audio to be delivered in a dynamic manner providing enhanced audio for consumers. One example of such treatment is the concept of applying varying levels of data compression to sound objects thereby reducing the volume of data to be transmitted in limited bandwidth situations. This application motivates the ability to accurately classify objects in terms of their ‘hierarchy’. That is, whether or not an object is a foreground sound, which should be reproduced at full quality if possible, or a background sound, which can be heavily compressed without causing a deterioration in the listening experience. Lack of suitably labelled data is an acknowledged problem in the domain. Active Learning is a method that can greatly reduce the manual effort required to label a large corpus by identifying the most effective instances to train a model to high accuracy levels. This paper compares a number of Active Learning methods to investigate which is most effective in the context of a hierarchical labelling task on an audio dataset. Results show that the number of manual labels required can be reduced to 1.7% of the total dataset while still retaining high prediction accuracy

    Sonic Elongation: Creative Audition in Documentary Film

    Get PDF
    This paper investigates documentary films in which real-world sound captured from the location shoot has been treated more creatively than the captured image; in particular, instances when real-world noises pass freely between sound and musical composition. I call this process the sonic elongation from sound to music; a blurring that allows the soundtrack to keep one foot in the image, thus allowing the film to retain a loose grip on the traditional nonfiction aesthetic. With reference to several recent documentary feature films, I argue that such moments rely on a confusion between hearing and listening

    Guanabara Bay: For all hopes to a new awakening of paradise

    Get PDF
    Abstract: Exclusion Territories are geographical areas under the action of degenerative environmental phenomena of anthropogenic origin, which compromise quality of life in general. One of the greatest examples of such areas is the Guanabara Bay and its surroundings, the scene of some of the worst disastrous incidents and locale of frequent episodes of human misery. This article presents a brief description of the main characteristics of the region, providing some technological suggestions of biogeographic recovery to be adopted by public policies that intend to align themselves with the good practices of ecological economy, sustainability and quality of life. The work falls within the context of macro-engineering cum eco-innovation applied to the preservation and management of water sources and water bodies that serve productive purposes as natural niches and breeding grounds.Key words: Exclusion Territories, Guanabara Bay, waste management, quality of life.=================================================================== Resumo: Territórios de Exclusão são áreas geográficas sob ação de fenômenos ambientais degenerativos de origem antropogênica, os quais comprometem a qualidade de vida em geral. Um dos maiores exemplos de zonas desse tipo é a Baía de Guanabara e seu entorno, palco de alguns dos piores incidentes desastrosos e de frequentes episódios da miséria humana. O presente artigo descreve sumariamente as principais características da região, fornecendo algumas sugestões tecnológicas de recuperação biogeográfica a serem adotadas por políticas públicas que pretendam alinhar-se às boas práticas de economia ecológica, sustentabilidade e qualidade de vida. O trabalho se insere no contexto da macroengenharia cum eco-inovação aplicada à preservação e à gestão das fontes hídricas e dos corpos de água que servem a propósitos produtivos como nichos naturais e criadouros.Palavras-chave: Territórios de Exclusão, Baía de Guanabara, gestão de resíduos, qualidade de vida.=================================================================== Abstrakt: Ausschlussgebiete sind geografische Regionen, in denen degenerative Umweltphänomene anthropogenen Ursprungs auftreten, die im Allgemeinen die Lebensqualität beeinträchtigen. Eines der besten Beispiele für solche Gebiete ist die Guanabara-Bucht und die Umgebung, Schauplatz einiger der schlimmsten katastrophalen Vorfälle und Schauplatz häufiger Episoden menschlichen Elends. Dieser Artikel enthält eine kurze Beschreibung der Hauptmerkmale der Region sowie einige technologische Vorschläge für die biogeografische Erholung, die die öffentliche Politik zur Angleichung an bewährte Praktiken in Bezug auf ökologische Ökonomie, Nachhaltigkeit und Lebensqualität annehmen sollte. Die Arbeit fällt in den Kontext von Makrotechnik und Öko-Innovation, die auf die Erhaltung und Bewirtschaftung von Wasserquellen und Gewässern angewendet werden, die als natürliche Nischen und Brutstätten für produktive Zwecke dienen.Schlüsselwörter: Ausschlussgebiete, Guanabara-Bucht, Abfallwirtschaft, Lebensqualität

    Conditioning Text-to-Speech synthesis on dialect accent: a case study

    Get PDF
    Modern text-to-speech systems are modular in many different ways. In recent years, end-users gained the ability to control speech attributes such as degree of emotion, rhythm and timbre, along with other suprasegmental features. More ambitious objectives are related to modelling a combination of speakers and languages, e.g. to enable cross-speaker language transfer. Though, no prior work has been done on the more fine-grained analysis of regional accents. To fill this gap, in this thesis we present practical end-to-end solutions to synthesise speech while controlling within-country variations of the same language, and we do so for 6 different dialects of the British Isles. In particular, we first conduct an extensive study of the speaker verification field and tweak state-of-the-art embedding models to work with dialect accents. Then, we adapt standard acoustic models and voice conversion systems by conditioning them on dialect accent representations and finally compare our custom pipelines with a cutting-edge end-to-end architecture from the multi-lingual world. Results show that the adopted models are suitable and have enough capacity to accomplish the task of regional accent conversion. Indeed, we are able to produce speech closely resembling the selected speaker and dialect accent, where the most accurate synthesis is obtained via careful fine-tuning of the multi-lingual model to the multi-dialect case. Finally, we delineate limitations of our multi-stage approach and propose practical mitigations, to be explored in future work

    The Biometric Evolution of Sound and Space

    Get PDF
    Auditoria in the late 20th and 21st centuries have evolved into a series of spatial conventions that are an established and accepted norm. The relationship between space and music now exists in a decoupled condition, and music is no longer reliant on volumetric and material conditions to define its form (Glantz 2000). This thesis looks at a series of novel approaches to investigate how the links between music and space can be reconnected though evolutionary computation, parametric modelling, virtual acoustics and biometric sensing. The thesis describes in detail the experiments undertaken in developing methodologies in linking music, space and the body. The thesis will show how it is possible to develop new form finding and musical generation tools that allow new room shapes and acoustic measures to inform how new acoustic and musical forms can be developed unconsciously and objectively by a listener, in response to sound and site

    Designing for quality in real-world mobile crowdsourcing systems

    Get PDF
    PhD ThesisCrowdsourcing has emerged as a popular means to collect and analyse data on a scale for problems that require human intelligence to resolve. Its prompt response and low cost have made it attractive to businesses and academic institutions. In response, various online crowdsourcing platforms, such as Amazon MTurk, Figure Eight and Prolific have successfully emerged to facilitate the entire crowdsourcing process. However, the quality of results has been a major concern in crowdsourcing literature. Previous work has identified various key factors that contribute to issues of quality and need to be addressed in order to produce high quality results. Crowd tasks design, in particular, is a major key factor that impacts the efficiency and effectiveness of crowd workers as well as the entire crowdsourcing process. This research investigates crowdsourcing task designs to collect and analyse two distinct types of data, and examines the value of creating high-quality crowdwork activities on new crowdsource enabled systems for end-users. The main contribution of this research includes 1) a set of guidelines for designing crowdsourcing tasks that support quality collection, analysis and translation of speech and eye tracking data in real-world scenarios; and 2) Crowdsourcing applications that capture real-world data and coordinate the entire crowdsourcing process to analyse and feed quality results back. Furthermore, this research proposes a new quality control method based on workers trust and self-verification. To achieve this, the research follows the case study approach with a focus on two real-world data collection and analysis case studies. The first case study, Speeching, explores real-world speech data collection, analysis, and feedback for people with speech disorder, particularly with Parkinson’s. The second case study, CrowdEyes, examines the development and use of a hybrid system combined of crowdsourcing and low-cost DIY mobile eye trackers for real-world visual data collection, analysis, and feedback. Both case studies have established the capability of crowdsourcing to obtain high quality responses comparable to that of an expert. The Speeching app, and the provision of feedback in particular were well perceived by the participants. This opens up new opportunities in digital health and wellbeing. Besides, the proposed crowd-powered eye tracker is fully functional under real-world settings. The results showed how this approach outperforms all current state-of-the-art algorithms under all conditions, which opens up the technology for wide variety of eye tracking applications in real-world settings

    The quality of experience of next generation audio :exploring system, context and human influence factors

    Get PDF
    PhD ThesisThe next generation of audio reproduction technology has the potential to deliver immersive and personalised experiences to the user; multichannel with-height loudspeaker arrays and binaural techniques offer 3D audio experiences, whereas objectbased techniques offer possibilities of adapting content to suit the system, context and user. A fundamental process in the advancement of such technology is perceptual evaluation. It is crucial to understand how listeners perceive new technology in order to drive future developments. This thesis explores the experience provided by next generation audio technology by taking a quality of experience (QoE) approach to evaluation. System, context and human factors all influence QoE and in this thesis three case studies are presented to explore the role of these categories of influence factors (IFs) in the context of next generation audio evaluation. Furthermore, these case studies explore suitable methods and approaches for the evaluation of the QoE of next generation audio with respect to its various IFs. Specific contributions delivered from these individual studies include a subjective comparison between soundbar and discrete surround sound technology, the application of the Open Profiling of Quality method to the field of audio evaluation, an understanding of both how and why environmental noise influences preferred audio object balance, an understanding of how the influence of technical audio quality on overall listening experience is related to a range of psychographic variables and an assessment of the impact of binaural processing on overall listening experience. When considering these studies as a whole, the research presented here contributes the thesis that to effectively evaluate the perceived quality of next generation audio, a QoE mindset should be taken that considers system, context and human IFs.Engineering and Physical Sciences Research Council (EPSRC) and the British Broadcasting Corporation Research & Development department (BBC R&D
    • …
    corecore