22 research outputs found

    Computational Methods for Assisting Radio Drama Production.

    Get PDF
    PhD Theses.Radio Drama is a theatrical form of art that usually exists solely in the acoustic domain consisting of music, speech, and sound effects and is most often consumed through broadcast radio. This thesis proposes methods for assisting a human creator in producing radio dramas. Much research has been done to aiding creativity using artificial intelligence techniques in storytelling, music composition, the visual arts, and fi lm. Despite that, radio drama is under-represented in such research. Radio drama consists of both literary aspects, such as plot, story characters, or environments, as well as production aspects, such as speech, music, and sound effects. While plenty of research has been examining each of those aspects individually there is currently no research that combines such studies in the context of radio drama production. In this thesis, an interdisciplinary approach to assisting a human creator in radio drama production is developed. The task is explored through the joint prism of natural language processing, music information retrieval, and automatic mixing. We show that individual literary aspects of radio drama can be automatically extracted from a story draft provided by a human creator, by using natural language processing methods. Formal rules can be used to express the aforementioned elements in the form of a script able to be read and altered by both the human creator and the computer. We devise recommender systems for sound, music, and audio effects to retrieve the assets required for production. Rules derived from radio drama literature can then use those recorded assets to produce a radio drama mix in a semi-automatic way. Furthermore, an adaptive reverberation effect suggests reverberation settings for each track based on track content and past user choices. The degree of success for individual tasks in aiding production is demonstrated using examples of radio drama production from raw stories and validated through objective evaluation metrics, and listening tests

    About Edge Detection in Digital Images

    Get PDF
    Edge detection is one of the most commonly used procedures in digital image processing. In the last 30-40 years, many methods and algorithms for edge detection have been proposed. This article presents an overview of edge detection methods, the methods are divided according to the applied basic principles. Next, the measures and image database used for edge detectors performance quantification are described. Ordinary users as well as authors proposing new edge detectors often use Matlab function without understanding it in details. Therefore, one chapter is devoted to some of Matlab function parameters that affect the final result. Finally, the latest trends in edge detection are listed. Picture Lena and two images from Berkeley segmentation data set (BSDS500) are used for edge detection methods comparison

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Diskretfrequente Synthese von Nachhall-Prozessen

    Get PDF
    Die Arbeit verfolgt einen neuartigen Ansatz zur digitalen Nachhallsynthese. Es wird dabei von folgender Beobachtung ausgegangen: Ruft man in ein Klavier, dessen Saiten unbedämpft sind, so antwortet das Instrument mit einem aus diskreten Einzeltönen bestehenden Nachklang, der ähnlich wie ein raumakustischer Nachhall klingt. Verbreitert man bei einer diskretfrequenten Synthese nun die spektrale Breite eines jeden "Klaviertons" auf einen schmalbandigen Bandpass, so dass nicht 12 Töne, sondern 12 äquidistante Bandpässe pro Oktave zur Verfügung stehen, so verschwindet die Tonhaltigkeit des Signals zugunsten eines natürlichen Nachhall-Eindrucks. Die Parameter wie Lautstärke, Hüllkurve etc. jedes einzelnen Bandpasses lassen sich dabei aus einer natürlichen Raumimpulsantwort extrahieren. Im vorliegenden Modell werden die zur Synthese herangezogenen Bandpässe durch jeweils zwei gegeneinander verstimmte [frequenzmodulierte ?] Sinusgeneratoren berechnet. Bei einem Umfang des Hallklaviers von 9 Oktaven ergeben sich somit 216 zu berechnende Sinusgeneratoren. In einem Hörversuch wurde der mit Hilfe des "Klavierhalls" synthetisierte, diffuse Nachhallanteil dem von hochwertigen, auf IIR-Filtern beruhenden Nachhallgeräten (Lexicon 480, TC3000) gegenübergestellt. Hierbei wurde der "Klavierhall" im Mittel als ebenso "natürlich" bewertet, auch wenn ein realer, durch Faltung mit einer Raumimpulsantwort erzeugter Nachhall von den VPn signifikant häufiger als "natürlich" bewertet wurde. Aufgrund der weitgehenden Parameterisierbarkeit des Algorithmus sind Anwendungen nicht nur in der Synthese von natürlichem Raumklang zu sehen, sondern auch im künstlerischen, gestaltenden Einsatz von raumähnlichen Abklingprozessen mit einem fließenden Übergang zur Klangsynthese.The thesis deals with a novel approach to digitally synthesizing reverberation and is based on the following observation: Shouting into a piano the strings of which are non-damped causes the instrument to respond with a resonance of discrete tones sounding similar to room resonance. Discrete-frequency synthesis increases the spectral width of each "piano tone" to a narrow bandpass such that twelve equidistant bandpasses are available per octave instead of only twelve tones. Consequently, the tonality of the signal vanishes for the benefit of a natural reverberation effect. The parameters such as volume, envelope, etc. of each individual bandpass can be extracted from a natural room impulse response. In this model the bandpasses used for the synthesis are calculated from two sine wave generators that are frequency modulated and out of tune against each other. As a reverberating "piano" comprises nine octaves, 216 sine wave generators need to be calculated. In a listening test, the diffuse reverberation component synthesized from the "piano reverberation" was compared to that generated by means of high-quality reverberation generators based on IIR filters (Lexicon480, TC3000). On average, the "piano reverberation" was considered to be "natural", even though the test persons significantly more often rated real reverberation "natural" - as generated from a convolution using spatial impulse response. Owing to the versatile parametrizability of the algorithm, applications are not only in the field of synthesizing natural room reverberation, but also in arts and design for the creation of decaying room-simulating processes including fading into spatial sound synthesis

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f

    Preparation and characterisation of ceramic and thin film Zn(_2)SnO(_4)

    Get PDF
    Ceramic zinc stannate, Zn(_2)SnO(_4), was prepared from 1SnO(_2):2ZnO mixture using powders of the highest commercially available purity. The solid state reaction between the ZnO and the SnO(_2), thought to be an evaporation-recondensation mechanism, was found to start at ~ 900 ˚C (12 hours heating, rate 5 ˚C min(^-1)). However, the reaction did not go to completion in the timescale of the experiment unless the temperature was raised to~1300 C. In this case mono-phase, polycrystalline Zn(_2)SnO(_4) was produced, as confirmed by X-ray diffraction (XRD), scanning electron microscopy and energy dispersive X-ray analysis (EDAX). Further evidence for these reaction temperatures was obtained from thermal analysis experiments. As-sintered, Zn(_2)SnO(_4) was insulating (σ — 10(^19) Ω(^-1) cm(^-1)) although it could be made conductive, by a reduction heat-treatment. This entailed refiring the sintered pellets of Zn(_2)SnO(_4) in an atmosphere of mixture gas (25% H(_2) + 75% N(_2)) at ~ 450 ˚C for 14 hours (heating rate of 10 C min(^-1)). This reduced the conductivity to values of σ~1 x 10(^-2) Ω(^-1) cm(^-1) . XRD failed to reveal any changes in the phase of the material after the reduction treatment. Several dopants were investigated, the most successful of which was in, using a vapour phase method. Doping with In this way gave a significant change in the colour from white to dark grey together with a reduction in electrical resistivity, without recourse to further heating treatments. No change in the usual phase of the Zn(_2)SnO(_4) was detected. Doping with group V oxides, such as Nb(_2)O(_5), V(_2)O(_5) etc, produced changes in the colour from white to dark grey, but no reduction in the resistivity, unless further heating treatments were carried out in reducing ambients. When high concentrations of Nb were introduced an additional phase, possibly Nb(_2)Sn(_2)O(_7) was observed by XRD. Thin film Zn(_2)SnO(_4) was prepared by Electron Beam Evaporation using Zn2Sn04 sintered powder as the evaporant material. The thin films were deposited onto glass substrates, at a range of substrate temperatures between room temperature and 250 ˚C. XRD was used to confirm the formation of Zn(_2)SnO(_4), and provide estimates for the grain size, which varied from 20 to 25 nm. RHEED studies indicated that the grain size increased as the substrate temperature was increased. SEM revealed that the thin films were flat and uniform, with no cracks. The optical transmission of the thin films was about 88% for films deposited at 200 ˚C, but decreased significantly as the substrate temperature was decreased. The spectral dependence of complex refractive index (n&k) suggested that true thin film formation did not take place until the substrate temperature exceeded ~ 150 ˚C, and that the material was apparently a direct gap semiconductor with a band gap energy of ~1.95 eV. It was found that the main carrier transportation mechanism for doped, un- doped, and thin films of Zn(_2)SnO(_4) was variable range hopping, with a temperature dependence of the form exp(To/T)'^\ This result was consistent with Hall effect measurements, where high, temperature independent carrier concentrations of about 10(^17) cm(^-3) were obtained, along with low values of carrier mobility ( ~ 1 cm(^2) v(^-1) sec(^-1)) that obeyed the same temperature dependence as the conductivity, [exp(To/T)(^1/4)]
    corecore