43 research outputs found

    Conference Proceedings of the Euroregio / BNAM 2022 Joint Acoustic Conference

    Get PDF

    Perception of Reverberation in Domestic and Automotive Environments

    Get PDF
    nrpages: 227status: publishe

    Subjective evaluation and electroacoustic theoretical validation of a new approach to audio upmixing

    Get PDF
    Audio signal processing systems for converting two-channel (stereo) recordings to four or five channels are increasingly relevant. These audio upmixers can be used with conventional stereo sound recordings and reproduced with multichannel home theatre or automotive loudspeaker audio systems to create a more engaging and natural-sounding listening experience. This dissertation discusses existing approaches to audio upmixing for recordings of musical performances and presents specific design criteria for a system to enhance spatial sound quality. A new upmixing system is proposed and evaluated according to these criteria and a theoretical model for its behavior is validated using empirical measurements.The new system removes short-term correlated components from two electronic audio signals using a pair of adaptive filters, updated according to a frequency domain implementation of the normalized-least-means-square algorithm. The major difference of the new system with all extant audio upmixers is that unsupervised time-alignment of the input signals (typically, by up to +/-10 ms) as a function of frequency (typically, using a 1024-band equalizer) is accomplished due to the non-minimum phase adaptive filter. Two new signals are created from the weighted difference of the inputs, and are then radiated with two loudspeakers behind the listener. According to the consensus in the literature on the effect of interaural correlation on auditory image formation, the self-orthogonalizing properties of the algorithm ensure minimal distortion of the frontal source imagery and natural-sounding, enveloping reverberance (ambiance) imagery.Performance evaluation of the new upmix system was accomplished in two ways: Firstly, using empirical electroacoustic measurements which validate a theoretical model of the system; and secondly, with formal listening tests which investigated auditory spatial imagery with a graphical mapping tool and a preference experiment. Both electroacoustic and subjective methods investigated system performance with a variety of test stimuli for solo musical performances reproduced using a loudspeaker in an orchestral concert-hall and recorded using different microphone techniques.The objective and subjective evaluations combined with a comparative study with two commercial systems demonstrate that the proposed system provides a new, computationally practical, high sound quality solution to upmixing

    Singing in Space(s): Singing performance in real and virtual acoustic environments - Singers' evaluation, performance analysis and listeners' perception

    Get PDF
    The Virtual Singing Studio (VSS), a loudspeaker-based room acoustic simulation, was developed in order to facilitate investigations into the correlations and interactions between room acoustic characteristics and vocal performance parameters. To this end, the VSS provides a virtual performance space with interactivity in real-time for an active sound source - meaning that singers can hear themselves sing as if in a real performance space. An objective evaluation of the simulation was carried out through measurement and comparison of room acoustic parameters of the simulation and the real performance space. Furthermore a subjective evaluation involved a number of professional singers who sang in the virtual and real performance spaces and reported their impressions of the experience. Singing performances recorded in the real and virtual spaces were compared via the analysis of tempo, vibrato rate, vibrato extent and measures of intonation accuracy and precision. A stimuli sorting task evaluated listeners' perception of the similarity between singing performances recorded in the real and simulated spaces. A multi-dimensional scaling analysis was undertaken on the data obtained and dimensions of the common perceptual space were identified using property fitting techniques in order to assess the relationship between performance attributes and the perceived similarities. In general significant proportions of the perceived similarity between recordings could be explained by differences in global tempo, vibrato extent and intonation precision. Although there were few statistically significant effects of room acoustic condition all singers self-reported changes to their singing according to the different room acoustic configurations, and listeners perceived these differences, especially in vibrato extent and global tempo. The present VSS has been shown to be not fully ``realistic'' enough to elicit variations in singing performance according to room acoustic conditions. Therefore, further improvements are suggested including the incorporation of visual aspect to the simulation. Nonetheless, the VSS is already able to provide a ``plausible'' interactive room acoustic simulation for singers to hear themselves in real-time as if in a real performance venue

    Spatial impression in multichannel surround sound systems

    Get PDF
    Spatial impression in both concert halls and reproduced sound has been identified as an important attribute of the listening experience. In this study, the synthesis and objective measurement of spatial impression in reproduced sound is examined. A novel, multichannel spatializing technique for musical synthesis has been developed that entailed the separation of the individual harmonics of a musical note that were spatially distributed over multichannel surround systems. Subjective testing of the techniques revealed that the perceived degree of spatial impression significantly increased as the angular spread of harmonics increased, however, extending the spatial spread beyond 90° did not significantly increase the perception of spatial impression. The concert hall measure of spatial impression, the interaural cross correlation coefficient (IACC) was used to objectively measure the effects of the spatializing techniques. The IACC measurements displayed a strong correlation to the subjective results. Further examination of the IACC measurement indicated the possibility of it’s adaptation to multichannel surround sound in general. A method of adapting IACC to reproduced sound was further developed that involved comparing IACC measurements taken in a concert hall to IACC measurements taken in reproduced versions of the same concert hall. The method was first conducted as a simulation using basic auralisation techniques. Real concert hall measurements and reproduction systems were then employed. Results showed that the method was able to discriminate between the spatial capabilities of a number of different surround sound systems and rank them in a predictable order. The results were further validated by means of a subjective test. In an attempt to sensitise the IACC measurement, the frequency dependency of IACC was investigated by means of a subjective test. The results indicated that a perceptually more accurate indication of spatial impression may be gained by applying a frequency-dependent weighting to IACC measurements. This may be useful in the spatial measurement of both reproduced sound and concert halls.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Spatial impression in multichannel surround sound systems

    Get PDF
    Spatial impression in both concert halls and reproduced sound has been identified as an important attribute of the listening experience. In this study, the synthesis and objective measurement of spatial impression in reproduced sound is examined.A novel, multichannel spatializing technique for musical synthesis has been developed that entailed the separation of the individual harmonics of a musical note that were spatially distributed over multichannel surround systems. Subjective testing of the techniques revealed that the perceived degree of spatial impression significantly increased as the angular spread of harmonics increased, however, extending the spatial spread beyond 90° did not significantly increase the perception of spatial impression.The concert hall measure of spatial impression, the interaural cross correlation coefficient (IACC) was used to objectively measure the effects of the spatializing techniques. The IACC measurements displayed a strong correlation to the subjective results. Further examination of the IACC measurement indicated the possibility of it’s adaptation to multichannel surround sound in general.A method of adapting IACC to reproduced sound was further developed that involved comparing IACC measurements taken in a concert hall to IACC measurements taken in reproduced versions of the same concert hall. The method was first conducted as a simulation using basic auralisation techniques. Real concert hall measurements and reproduction systems were then employed. Results showed that the method was able to discriminate between the spatial capabilities of a number of different surround sound systems and rank them in a predictable order. The results were further validated by means of a subjective test.In an attempt to sensitise the IACC measurement, the frequency dependency of IACC was investigated by means of a subjective test. The results indicated that a perceptually more accurate indication of spatial impression may be gained by applying a frequency-dependent weighting to IACC measurements. This may be useful in the spatial measurement of both reproduced sound and concert halls

    Investigations into the Perception of Vertical Interchannel Decorrelation in 3D Surround Sound Reproduction

    Get PDF
    The use of three-dimensional (3D) surround sound systems has seen a rapid increase over recent years. In two-dimensional (2D) loudspeaker formats (i.e. two-channel stereophony (stereo) and 5.1 Surround), horizontal interchannel decorrelation is a well-established technique for controlling the horizontal spread of a phantom image. Use of interchannel decorrelation can also be found within established two-to-five channel upmixing methods (stereo to 5.1). More recently, proprietary algorithms have been developed that perform 2D-to-3D upmixing, which presumably make use of interchannel decorrelation as well; however, it is not currently known how interchannel decorrelation is perceived in the vertical domain. From this, it is considered that formal investigations into the perception of vertical interchannel decorrelation are necessary. Findings from such experiments may contribute to the improved control of a sound source within 3D surround systems (i.e. the vertical spread), in addition to aiding the optimisation of 2D-to-3D upmixing algorithms. The current thesis presents a series of experiments that systematically assess vertical interchannel decorrelation under various conditions. Firstly, a comparison is made between horizontal and vertical interchannel decorrelation, where it is found that vertical decorrelation is weaker than horizontal decorrelation. However, it is also seen that vertical decorrelation can generate a significant increase of vertical image spread (VIS) for some conditions. Following this, vertical decorrelation is assessed for octave-band pink noise stimuli at various azimuth angles to the listener. The results demonstrate that vertical decorrelation is dependent on both frequency and presentation angle – a general relationship between the interchannel cross-correlation (ICC) and VIS is observed for the 500 Hz octave-band and above, and strongest for the 8 kHz octave-band. Objective analysis of these stimuli signals determined that spectral changes at higher frequencies appear to be associated with VIS perception – at 0° azimuth, the 8 and 16 kHz octave-bands demonstrate potential spectral cues, at ±30°, similar cues are seen in the 4, 8 and 16 kHz bands, and from ±110°, cues are featured in the 2, 4, 8 and 16 kHz bands. In the case of the 8 kHz octave-band, it seems that vertical decorrelation causes a ‘filling in’ of vertical localisation notch cues, potentially resulting in ambiguous perception of vertical extent. In contrast, the objective analysis suggests that VIS perception of the 500 Hz and 1 kHz bands may have been related to early reflections in the listening room. From the experiments above, it is demonstrated that the perception of VIS from vertical interchannel decorrelation is frequency-dependent, with high frequencies playing a particularly important role. A following experiment explores the vertical decorrelation of high frequencies only, where it is seen that decorrelation of the 500 Hz octave-band and above produces a similar perception of VIS to broadband decorrelation, whilst improving tonal quality. The results also indicate that decorrelation of the 8 kHz octave-band and above alone can significantly increase VIS, provided the source signal has sufficient high frequency energy. The final experimental chapter of the present thesis aims to provide a controlled assessment of 2D-to-3D upmixing, taking into account the findings of the previous experiments. In general, 2D-to-3D upmixing by vertical interchannel decorrelation had little impact on listener envelopment (LEV), when compared against a level-matched 2D 5.1 reference. Furthermore, amplitude-based decorrelation appeared to be marginally more effective, and ‘high-pass decorrelation’ resulted in slightly better tonal quality for sources that featured greater low frequency energy

    The role that sound spatialization plays in improving performance in an interactive installation : study of the correlation between gesture and localization of sound sources in space

    Get PDF
    The main objective of this research work is to study the correlation between gesture and localization of sound sources in space within the framework of interactive installations, based on theories of hearing and gesture. We have therefore chosen the experimental method by developing an interactive installation with which we carry out three different experiments, in which a subject’s hand is tracked by a Microsoft Kinect depth camera (motion capture) and a deictic gesture is used to trigger recorded music sounds and identify their localization in the horizontal plane. Thus, we manipulate the direction of sound and we measure the percentage of correct perceptual sound source localizations resulting from the participant’s responses in an Inquiry Mode Questionnaire in comparison with the actual directions of the gesture and perceptual sound sources provided by software. Descriptive and inferential statistics is applied to the collected data. The main results show that it is easier to define the origin of sound and that auditory perception is more accurate when its incidence is frontal in the horizontal plane, just as sound source localization theory predicts. Whereas 86.1% of all volunteers consider that their gesture coincides with the origin of sound in experiment 1, in which the use of their gesture in a certain direction produces a sound from that direction, only 58.1% admit the same in experiment 3, in which the same gesture is used to identify the system-predetermined localization of a perceptual sound source in an angle of 260o around a subject. At least 55.9% of all participants do not perceive that their gesture cannot coincide with the origin of sound in experiment 2, since sound is produced from the opposite surround direction, which seems to demonstrate that, when sounds are produced frontally or from the back and a person has the task of controlling their motion with a deictic gesture at the same time, his or her ability to identify the origin of sound generally diminishes, in addition to the already well-known reduced ability to identify it when it is in the median plane, if the head is not rotated. We therefore conclude that there is a relatively high correlation between gesture and localization of sound sources in space, but this is not as perfect as it could be owing to the limitations of the human auditory system and to the natural dependence of head movement on gesture.O objectivo principal deste trabalho de pesquisa é o de estudar a correlação entre gesto e localização de fontes sonoras no espaço, no âmbito das instalações interactivas, com base nas teorias da audição e do gesto. Na ocasisão em que começamos a nossa investigação verificámos que havia vários estudos que abordavam os assuntos “gesto” e “localização de fontes sonoras” de diversas maneiras: 1) de forma independente um do outro e/ou noutros contextos distintos dos das instalações interactivas, como por exemplo em Blauert (1997), Pulkki (1999) Pulkki & Karjalainen (2001), Pulkki (2001a), Bates et al. (2007), Hammershøi (2009), McNeill (1992), Coutaz & Crowley (1995), Choi (2000), Cadoz & Wanderley (2000), Nehaniv (2005), Campbell (2005), ou Godøy & Leman (2010); 2) de um ponto de vista mais técnico, como por exemplo em Harada et al. (1992), Jensenius et al. (2006), Marshall et al. (2006), Schacher (2007), Neukom & Schacher (2008), Zelli (2009), Marshall et al. (2009), Bhuiyan & Picking (2009), ou Schumacher & Bresson (2010); ou 3) de um ponto de vista mais artístico, como em Bencina et al. (2008) ou Grigoriou & Floros (2010). Havia, no entanto, muito poucos estudos a envolver ou a abordar ambos os assuntos e a analisar de maneira conjugada as suas relações de um ponto de vista mais perceptual, como por exemplo em Gröhn (2002), de Götzen (2004) ou Marentakis et al. (2008). Foi esta última perspectiva que decidimos seguir e que aqui exploramos. Desta forma, optámos pelo método experimental, aplicando um desenho de medidas repetidas e desenvolvendo uma instalação interactiva com a qual realizamos três experiências diferentes, em que a mão de um sujeito é rastreada por uma câmara de profundidade Microsoft Kinect (captação de movimento) e um gesto díctico é usado para activar sons de música gravada e para identificar as suas localizações no plano de escuta horizontal. Assim, manipulamos a direcção do som e medimos a percentagem de localizações de fontes sonoras perceptuais correctas, resultante das respostas dos participantes num Inquérito Por Questionário em comparação com as direcções reais do gesto díctico e das fontes sonoras perceptuais fornecidas pelo software que utilizamos no nosso trabalho. Para população-alvo pensámos em pessoas com conhecimentos musicais e pessoas com poucos ou nenhuns conhecimentos musicais, o que nos levou a solicitar a um grande número de pessoas a sua participação voluntária, anónima e sem constrangimentos no nosso estudo. Isso foi levado a cabo sobretudo através do envio de correio electrónico para amigos, para estudantes de diferentes áreas a frequentar e para colegas a trabalhar na Escola de Artes da Universidade Católica Portuguesa (EA- -UCP), na Escola Superior de Música e Artes do Espetáculo do Instituto Politécnico do Porto e na Academia de Música de Espinho. Para além disso, foi também crucial falar-se com amigos e familiares e informar tantas pessoas quanto possíıvel sobre a nossa investigação, através da colocação de cartazes informativos nas paredes dos corredores da Universidade Católica, alguns dias antes de as experiências terem sido realizadas no Laboratório de Captação de Movimento da EA-UCP. Por fim, é efectuada uma análise estatística descritiva e inferencial dos dados recolhidos. Os principais resultados apontam no sentido de ser mais fácil definir a origem do som quando a sua incidência é frontal no plano de escuta horizontal, para além de a percepção auditiva ser mais precisa nessa direcção, tal como a teoria da localização de fontes sonoras prevê. Enquanto 86.1% de todos os participantes consideram que o seu gesto díctico coincide com a origem do som na experiência 1, em que o uso desse gesto numa determinada direcção faz despoletar um som proveniente dessa direcção, apenas 58.1% admitem o mesmo na experiência 3, em que o mesmo gesto é usado para identificar a localização de uma fonte sonora perceptual predeterminada pelo sistema num ângulo de 260º em torno de um sujeito. Esta última percentagem parece dever-se ao facto de a maior parte dos sons ser produzida a partir de direcções laterais na experiência 3, tendo a posição da cabeça voltada para a câmara como referência. Pelo menos 55.9% de todos os voluntários não percebem que o seu gesto não poderia ter coincidido com a origem do som na experiência 2, já que o som é produzido a partir da direcção envolvente oposta. Este facto parece demonstrar que, quando os sons são produzidos frontalmente ou de trás e uma pessoa tem a tarefa de controlar os seus movimentos com um gesto díctico ao mesmo tempo, a sua capacidade para identificar a origem do som é, em geral, ainda mais baixa, para além da já conhecida capacidade reduzida para identificá-la quando o som se encontra no plano mediano, se a cabeça não for rodada. A maior parte dos participantes sente um controlo imediato sobre o som nas experiências 1 e 2, mas os tempos estimados pelos próprios são bastante superiores aos aproximadamente 650 milissegundos necessários para o ser humano ouvir e reagir a um som na nossa instalação interactiva. Descobrimos também que o tempo médio necessário para localizar sons com o uso de um gesto díctico na nossa experiência 3 é de cerca de 10 segundos, o que corresponde a um tempo bastante mais longo do que os 3 segundos que supusemos. Para além disso, os voluntários fazem em média 2 tentativas para localizar sons com os seus gestos dícticos, tendo a necessidade de ouvir apenas uma vez em média cada som na íntegra para o localizar. Os desvios à esquerda e à direita efectuados pela maior parte dos participantes relativamente às direcções verdadeiras do som, quando estes tentam identificar as localizações predeterminadas pelo sistema das fontes sonoras perceptuais com os seus gestos dícticos na zona periférica do corpo, são em média de 7.97º e -7.19º, respectivamente. Desta forma, o desvio médio absoluto é de 7.76º. Comparando esses desvios com aqueles levados a cabo pelos participantes usando a mão esquerda (desvios de 6.86o para a esquerda e -6.35º para a direita das direcções verdadeiras do som) e com aqueles usando a mão direita (desvios de 8.46º para a esquerda e -7.38º para a direita das direcções verdadeiras do som), concluímos que os resultados são bastante parecidos entre si. Descobrimos que a maior parte dos voluntários estima um tempo muito mais longo do que os 2 segundos que supusemos experimentalmente para entender cada uma das três experiências. Para além disso, esse tempo estimado pelos participantes diminui da primeira para a última experiência, aparentemente devido à familiarização, conscientemente provocada por nós através da mesma sequência de realização das experiências imposta a cada participante, com o nosso sistema interactivo, embora considerem ter entendido cada uma das três experiências rapidamente. Acresce que a maioria dos voluntários interage facilmente com a nossa instalação e concorda que o gesto sugerido por nós foi adequadamente seleccionado para qualquer uma das três experiências. Também constatamos que os participantes consideram a resposta do sistema ao gesto como sendo imediata nas nossas três experiências, ou seja, estimam cerca de 1 segundo, o que é consistente com o resultado da medição da latência do sistema de cerca de 470 milissegundos. Além disso, verificamos que a maioria dos voluntários se sente envolvida pelo som na nossa instalação interactiva usando Ambisonics Equivalent Panning. Portanto, concluímos que, usando uma instalação interactiva como a nossa com um público-alvo semelhante aquele que tivemos, há uma correlação relativamente elevada entre o gesto e a localização de fontes sonoras no espaço, mas que esta não é tão perfeita como poderia ser devido às limitações do nosso sistema auditivo e aparentemente à dependência natural do movimento da cabeça do gesto. Assim, parece que a espacialização sonora pode melhorar o desempenho numa instalação interactiva, mas de forma moderada. Mesmo assim, defendemos que um sistema como o nosso pode vir a ser aplicado com vantagem em domínios diversos como os que apresentamos como exemplos
    corecore