42 research outputs found

    On the effectiveness of facial expression recognition for evaluation of urban sound perception

    Get PDF
    Sound perception studies mostly depend on questionnaires with fixed indicators. Therefore, it is desirable to explore methods with dynamic outputs. The present study aims to explore the effects of sound perception in the urban environment on facial expressions using a software named FaceReader based on facial expression recognition (FER). The experiment involved three typical urban sound recordings, namely, traffic noise, natural sound, and community sound. A questionnaire on the evaluation of sound perception was also used, for comparison. The results show that, first, FER is an effective tool for sound perception research, since it is capable of detecting differences in participants' reactions to different sounds and how their facial expressions change over time in response to those sounds, with mean difference of valence between recordings from 0.019 to 0.059 (p < 0.05or p < 0.01). In a natural sound environment, for example, facial expression increased by 0.04 in the first 15 s and then went down steadily at 0.004 every 20 s. Second, the expression indices, namely, happy, sad, and surprised, change significantly under the effect of sound perception. In the traffic sound environment, for example, happy decreased by 0.012, sad increased by 0.032, and surprised decreased by 0.018. Furthermore, social characteristics such as distance from living place to natural environment (r = 0.313), inclination to communicate (r = 0.253), and preference for crowd (r = 0.296) have effects on facial expression. Finally, the comparison of FER and questionnaire survey results showed that in the traffic noise recording, valence in the first 20 s best represents acoustic comfort and eventfulness; for natural sound, valence in the first 40 s best represents pleasantness; and for community sound, valence in the first 20 s of the recording best represents acoustic comfort, subjective loudness, and calmness

    Predictive Modelling of Complex Urban Soundscapes: Enabling an engineering approach to soundscape design

    Get PDF
    Conventional noise control methods typically limit their focus to the reduction of unwanted noise, ignoring the benets of positive sounds and struggling to reflect the totality of noise impacts. Modern approaches to achieve improved health outcomes and public satisfaction aim to incorporate the perception of an acoustic environment, an approach known as ‘soundscape’. When attempting to apply soundscape in practice, it is apparent that new methods of analysing soundscape perception in urban spaces are required; in particular, a predictive model of the users’ perceptual response to the acoustic environment is necessary. This thesis is intended to enable a move towards applying engineering approaches to soundscape design. This is achieved by developing predictive models of soundscape perception through empirical studies examining a large scale soundscape assessment database. The results are presented in three parts: first, the data collection protocol and modelling methods developed for this work are presented; the second part demonstrates an initial development and application of a predictive soundscape model; the final section expands upon this initial model with two empirical studies exploring the potential for additional information to be included in the model. This thesis begins by establishing a protocol for large scale soundscape data collection based on ISO 12913-2 and the creation of a database containing 1,318 responses paired with 693 binaural recordings collected in 13 locations in London and Venice. The first study then presents an initial development and application of a model designed to predict soundscape perception based on psychoacoustic analysis of the binaural recordings. Through the collection of an additional 571 binaural recordings during the COVID-19 lockdowns, sound level reductions at every location are seen, ranging from a reduction of 1.27 dB(A) in Regents Park Japan to 17.33 dB(A) in Piazza SanMarco, with an average reduction across all locations of 7.27 dB(A). Multi-level models were developed to predict the overall soundscape pleasantness (R2 = 0.85) and eventfulness (R2 = 0.715) of each location and applied to the lockdown recordings to determine how the soundscape perception likely changed. The results demonstrated that perception shifted toward less eventful soundscapes and to more pleasant soundscapes for previously traffic-dominated locations but not for human- and natural-dominated locations. The modelling process also demonstrated that contextual information was important for predicting pleasantness but not for predicting eventfulness. The next stage of the thesis considers a series of expansions to the initial model. The second piece of empirical work makes use of a dataset of recordings collected from a Wireless Acoustic Sensor Network (WASN) which includes sound source labels and annoyance ratings collected from 100 participants in an online listening study. A multilevel model was constructed using a combination of psychoacoustic metrics and sound source labels to predict perceived annoyance, achieving an R2 of 0.64 for predicting individual responses. The sound source information is demonstrated to be a crucial factor, as the relationship between roughness, impulsiveness, and tonality and the predicted annoyance varies as a function of the sound source label. The third piece of empirical work uses multilevel models to examine the extent to which personal factors influence soundscape perception. The findings suggest that personal factors, including psychological wellbeing, age, gender, and occupational status, account for approximately 1.4% of the variance for pleasantness and 3.9% for eventfulness, while the influence of the locations accounted for approximately 34% and 14%, respectively. Drawing from the experience gained working with urban soundscape data, a new method of analysing and presenting the soundscape perception of urban spaces is developed. This method inherently considers the variety of perceptions within a group and provides an open-source visualisation tool to facilitate a nuanced approach to soundscape assessment and design. Based on this empirical evidence, a framework is established for developing future predictive soundscape models which can be integrated into an engineering approach. At each stage, the results of these studies is discussed in terms of how it can contribute to a generalisable predictive soundscape model

    AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance

    Full text link
    Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on sound-scape augmentation.Comment: The Journal of the Acoustical Society of America, 154 (5), 314

    AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyancea)

    Get PDF
    Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on soundscape augmentation

    A new methodology for modelling urban soundscapes: a psychometric revisitation of the current standard and a Bayesian approach for individual response prediction

    Get PDF
    Measuring how the urban sound environment is perceived by public space users, which is usually referred as urban soundscape, is a research field of particular in terest for a broad and multidisciplinary scientific community besides private and public agencies. The need for a tool to quantify soundscapes would provide much support to urban planning and design, so to public healthcare. Soundscape liter ature still does not show a unique strategy for addressing this topic. Soundscape definition, data collection, and analysis tools have been recently standardised and published in three respective ISO (International Organisation for Standardization) items. In particular, the third item of the ISO series defines the calculation of the soundscape experience of public space users by means of multiple Likert scales. In this thesis, with regards to the third item of the soundscape ISO series, the soundscape data analysis standard method is questioned and a correction paradigm is proposed. This thesis questiones the assumption of a point-wise superimposition match across the Likert scales used during the soundscape assessment task. In order to do that, the thesis presents a new method which introduces correction values, or metric, for adjusting the scales in accordance to the results of common scaling behaviours found across the investigated locations. In order to validate the results, the outcome of the new metric is used as tar get to predict the individual experience of soundscapes from the participants. In comparison to the current ISO output, the new correction values reveal to achievea better predictability in both linear and non-linear modelling by increasing the ac-curacy of prediction of individual responses up to 52.6% (8.3% higher than theaccuracy obtained with the standard method).Finally, the new metric is used to validate the collection of data samples acrossseveral locations on individual questionnaires responses. Models are trained, in aiterative way, on all the locations except the one used during the validation. Thisprocedure provides a strong validating framework for predicting individual subjectassessments belonging to locations totally unseen during the model training. The results show that the combination of the new metrics with the proposed modelling structure achieves good performance on individual responses across the dataset withan average accuracy above 54%. A new index for measuring the soundscape is fi-nally introduced based on the percentage of people agreeing on soundscape pleas-antness calculated from the new proposed metric and performing a r-squared valueequals to 0.87.The framework introduced is limited by cultural and linguistic factors. Indeed,different corrected metric space are expected to be found when data is collected from different countries or urban context. The current values found in this thesis areso expected to be valid in large British cities and eventually in international hub andcapital cities. In these scenarios the corrected metric would provide a more realisticand direction-invariant representation of how the urban soundscape is perceived compared to the current ISO tool, showing that some components in the circumplex model are perceived softer or stronger according to the dimension. Future research will need to understand better the limitations of this new ramework and to extendand compare it towards different urban, cultural, and linguistic contexts

    VR-based Soundscape Evaluation: Auralising the Sound from Audio Rendering, Reflection Modelling to Source Synthesis in the Acoustic Environment

    Get PDF
    Soundscape has been growing as a research field associated with acoustics, urban planning, environmental psychology and other disciplines since it was first introduced in the 1960s. To assess soundscapes, subjective validation is frequently integrated with soundscape reproduction. However, the existing soundscape standards do not give clear reproduction specifications to recreate a virtual sound environment. Selecting appropriate audio rendering methods, simulating sound propagation, and synthesising non-point sound sources remain major challenges for researchers. This thesis therefore attempts to give alternative or simplified strategies to reproduce a virtual sound environment by suggesting binaural or monaural audio renderings, reflection modelling during sound propagation, and less synthesis points of non-point sources. To solve these unclear issues, a systematic review of original studies first examines the ecological validity of immersive virtual reality in soundscape evaluation. Through recording and reproducing audio-visual stimuli of sound environments, participants give their subjective responses according to the structured questionnaires. Thus, different audio rendering, reflection modelling, and source synthesis methods are validated by subjective evaluation. The results of this thesis reveal that a rational setup of VR techniques and evaluation methods will be a solid foundation for soundscape evaluation with reliable ecological validity. For soundscape audio rendering, the binaural rendering still dominates the soundscape evaluation compared with the monaural. For sound propagation with consideration of different reflection conditions, fewer orders can be employed during sound reflection to assess different kinds of sounds in outdoor sound environments through VR experiences. The VR experience combining both HMDs and Ambisonics will significantly strengthen our immersion at low orders. For non-point source synthesis, especially line sources, when adequate synthesis points reach the threshold of the minimum audible angle, human ears cannot distinguish the location of the synthesised sound sources in the horizontal plane, thus increasing immersion significantly. These minimum specifications and simplifications refine the understanding of soundscape reproduction, and the findings will be beneficial for researchers and engineers in determining appropriate audio rendering, sound propagation modelling, and non-point source synthesis strategies

    A model for implementing soundscape maps in smart cities

    Get PDF
    Smart cities are required to engage with local communities by promoting a user-centred approach to deal with urban life issues and ultimately enhance people's quality of life. Soundscape promotes a similar approach, based on individuals' perception of acoustic environments. This paper aims to establish a model to implement soundscape maps for the monitoring and management of the acoustic environment and to demonstrate its feasibility. The final objective of the model is to generate visual maps related to perceptual attributes (e.g. 'calm', 'pleasant'), starting from audio recordings of everyday acoustic environments. The proposed model relies on three main stages: (1) sound sources recognition and profiling, (2) prediction of the soundscape's perceptual attributes and (3) implementation of soundscape maps. This research particularly explores the two latter phases, for which a set of sub-processes and methods is proposed and discussed. An accuracy analysis was performed with satisfactory results: the prediction models of the second stage explained up to the 57.5% of the attributes' variance; the cross-validation errors of the model were close to zero. These findings show that the proposed model is likely to produce representative maps of an individual's sonic perception in a given environment
    corecore