Search CORE

939 research outputs found

Sound Event Localization, Detection, and Tracking by Deep Neural Networks

Author: Adavanne Sharath
Publication venue: Tampere University
Publication date: 04/03/2020
Field of study

In this thesis, we present novel sound representations and classification methods for the task of sound event localization, detection, and tracking (SELDT). The human auditory system has evolved to localize multiple sound events, recognize and further track their motion individually in an acoustic environment. This ability of humans makes them context-aware and enables them to interact with their surroundings naturally. Developing similar methods for machines will provide an automatic description of social and human activities around them and enable machines to be context-aware similar to humans. Such methods can be employed to assist the hearing impaired to visualize sounds, for robot navigation, and to monitor biodiversity, the home, and cities. A real-life acoustic scene is complex in nature, with multiple sound events that are temporally and spatially overlapping, including stationary and moving events with varying angular velocities. Additionally, each individual sound event class, for example, a car horn can have a lot of variabilities, i.e., different cars have different horns, and within the same model of the car, the duration and the temporal structure of the horn sound is driver dependent. Performing SELDT in such overlapping and dynamic sound scenes while being robust is challenging for machines. Hence we propose to investigate the SELDT task in this thesis and use a data-driven approach using deep neural networks (DNNs). The sound event detection (SED) task requires the detection of onset and offset time for individual sound events and their corresponding labels. In this regard, we propose to use spatial and perceptual features extracted from multichannel audio for SED using two different DNNs, recurrent neural networks (RNNs) and convolutional recurrent neural networks (CRNNs). We show that using multichannel audio features improves the SED performance for overlapping sound events in comparison to traditional single-channel audio features. The proposed novel features and methods produced state-of-the-art performance for the real-life SED task and won the IEEE AASP DCASE challenge consecutively in 2016 and 2017. Sound event localization is the task of spatially locating the position of individual sound events. Traditionally, this has been approached using parametric methods. In this thesis, we propose a CRNN for detecting the azimuth and elevation angles of multiple temporally overlapping sound events. This is the first DNN-based method performing localization in complete azimuth and elevation space. In comparison to parametric methods which require the information of the number of active sources, the proposed method learns this information directly from the input data and estimates their respective spatial locations. Further, the proposed CRNN is shown to be more robust than parametric methods in reverberant scenarios. Finally, the detection and localization tasks are performed jointly using a CRNN. This method additionally tracks the spatial location with time, thus producing the SELDT results. This is the first DNN-based SELDT method and is shown to perform equally with stand-alone baselines for SED, localization, and tracking. The proposed SELDT method is evaluated on nine datasets that represent anechoic and reverberant sound scenes, stationary and moving sources with varying velocities, a different number of overlapping sound events and different microphone array formats. The results show that the SELDT method can track multiple overlapping sound events that are both spatially stationary and moving

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Biomedical Photoacoustic Imaging and Sensing Using Affordable Resources

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The overarching goal of this book is to provide a current picture of the latest developments in the capabilities of biomedical photoacoustic imaging and sensing in an affordable setting, such as advances in the technology involving light sources, and delivery, acoustic detection, and image reconstruction and processing algorithms. This book includes 14 chapters from globally prominent researchers , covering a comprehensive spectrum of photoacoustic imaging topics from technology developments and novel imaging methods to preclinical and clinical studies, predominantly in a cost-effective setting. Affordability is undoubtedly an important factor to be considered in the following years to help translate photoacoustic imaging to clinics around the globe. This first-ever book focused on biomedical photoacoustic imaging and sensing using affordable resources is thus timely, especially considering the fact that this technique is facing an exciting transition from benchtop to bedside. Given its scope, the book will appeal to scientists and engineers in academia and industry, as well as medical experts interested in the clinical applications of photoacoustic imaging

Directory of Open Access Books (DOAB)

Estudo da energia de ultra-som para a estimativa de retro-espalhamento invasiva em tecidos mediante temperatura

Author: Rahmati Javid J.
Publication venue
Publication date: 01/01/2012
Field of study

Dissertação de mest., Engenharia Eletrónica e Telecomunicações, Faculdade de Ciências e Tecnologia, Univ. do Algarve, 2012This experimental work is part of the application of ultrasound for hyperthermia (thermal therapy) aiming at treatment of cancer cells. The analysis of the back-scattered ultrasound energy enables the study of the temperature behavior induced by the ultrasonic signal in the tissues and it is the primary goal of this case study. To carry out this experiment we developed a gel-based phantom which mimics the behavior of human tissues under ultrasound signals. Subsequently, in order to obtain a more human like phantom the experiments were repeated with ex-vivo pork loin. The experiments involved ultrasonic therapeutic and imaging instrumentation connected to a function generator and a signal acquisition system. Experiments were performed considering different energies (0.5, 1, 1.5 and 2 W/cm3) of the therapeutic transducer and two emission frequencies of the image transducer (5 and 7 MHz). Five temperature sensors were used to measure the invasive temperature in the gel-based phantoms and two sensors in the experiments with pork loin. Analyzing the time delays in the echoes of the back-scattered ultrasonic signals of both types of phantoms we verified the relationship between temperature rise and the increase in the speed of propagation of the echoes. The assessment of variations in the back-scattered energy proved its dependency on the temperature applied in pork loin tissue, but no conclusion could be taken in the case of gel-based phantomEste trabalho experimental enquadra-se na aplicação de ultrassom para hipertermia (terapêutica térmica) com vista ao tratamento de células neoplásicas. A análise da energia do ultrassom retro-difundida possibilita o estudo do comportamento da variação de temperatura espalhada pelo sinal ultrassónico nos tecidos, constituindo o objetivo primordial deste trabalho. Para a realização das experiências foram desenvolvidos ‘phantoms’ baseados em agar-agar para mimificar o comportamento dos tecidos humanos com o ultrassom. Posteriormente, com vista a obter um ‘phantom’ mais próximo do tecido humano, as experiências foram repetidas com lombo de porco ex-vivo. As experiências envolveram instrumentação ultrassónica de terapêutica e de imagem, conectados a instrumentação de geração de funções e de aquisição de sinais. Foram realizadas experiências considerando diferentes energias (0.5, 1, 1.5 e 2 W/cm3) do transdutor de terapia e duas frequências de emissão do transdutor de imagem (5 e 7 MHz). Utilizaram-se 5 sensores de temperatura para medição invasiva da temperatura nos fantômas baseados em gel e dois sensores nas experiências com lombo de porco. Analisando os atrasos temporais nos ecos dos sinais ultrassónicos retro-espalhados para ambos os tipos de ‘phantoms’ verificou-se a relação entre o aumento de temperatura e o aumento da velocidade de propagação dos ecos. A análise das variações das energias retro-espalhadas provou ser dependente da temperatura aplicada no lombo de porco não sendo contudo conclusiva no caso dos ‘phantoms’ baseados em gel

Sapientia

Developement of a compressible solver for the simulation of explosions in electric transformers

Author: Márquez Martín Pau
Publication venue: Universitat Politècnica de Catalunya
Publication date: 14/07/2021
Field of study

El sector dels transformadors elèctrics està afectat per possibles explosions en l’interior dels tancs plens d’oli que aïllen els components elèctrics a causa de malfuncionaments elèctrics que generen un gas inflamable a mesure que l’oli que envolta l’arc elèctric es vaporitza. En questió de mili-segons, el volum de gas és pressuritzat donat que la inèrcia del fluid que l’envolta prevé la bombolla d’incrementar el seu volum i reduir- ne la pressió. Aquest fenòmen comporta la formació d’ones de pressió a causa del fort gradient de pressió a la interfície entre el gas i el líquid, que es propaguen i interaccionen amb l’estructura del transformador. Les parets del tanc poden suportar el primer pic de pressió, de caràcter dinàmic, que tendeix a ser el més gran en magnitud però de curt període. Tot i això, la pressió estàtica que es genera en el tanc a mesura que les ones de pressió interaccionen amb les ones emergents de la bombolla és la més perillosa i la que pot generar explosions, trencament del tanc, contaminació i costs tant materials com humans. En aquesta tesis, una formulació multi-fluid i quasi-incompressible simula a través de la implementació de les equacions de Navier-Stokes amb el mètode d’Elements Finits el fenòmen detallat per tal de tenir coneixement i entendre les condicions físiques dins del tanc i implementar estratègies de depressurització adequades.The transformer manufacturing sector often suffers explosions originated in the oil- filled tanks due to arcing faults that generate flammable gas as the oil surrounding the electrical arc vaporizes. In a matter of milliseconds, the volume containing the gas is pressurized as the oil inertia prevents it from growing and reducing its pressure. This phenomena leads to the formation of pressure waves due to the strong pressure gradient at the oil-gas interface, which propagate and interact with the transformer structure. The tank walls may sustain the first pressure peak, of dynamic nature, which tends to be the highest in magnitude, but has a short period. On the contrary, the static pressure which builds up in the tank as the reflecting pressure waves interact with the incoming waves may generate the highest hazard and lead to explosions, tank rupture, pollution and expensive human and material costs. In this work, a weakly-compressible multi-fluid flow formulation simulates with the Finite Element implementation of the Navier-Stokes equations the phenomena detailed above in order to have a proper understanding of the physical conditions in the tank and devise adequate depressurization strategies

UPCommons. Portal del coneixement obert de la UPC

Audio-Based Retrieval of Musical Score Data

Author: Subedi Bishwa Prasad
Publication venue
Publication date: 13/08/2014
Field of study

Given an audio query, such as polyphonic musical piece, this thesis address the problem of retrieving a matching (similar) musical score data from a collection of musical scores. There are different techniques for measuring similarity between any musical piece such as metadata based similarity measure, collaborative filtering and content-based similarity measure. In this thesis, we use the information in the digital music itself for similarity measures and this technique is known as content-based similarity measure. First we extract chroma features to represents musical segments. Chroma feature captures both melodic information and harmonic information and is robust to timbre variation. Tempo variation in the performance of a same song may cause dissimilarity between them. In order to address this issue we extract beat sequences and combine them with chroma features to obtain beat synchronous chroma features. Next, we use Dynamic Time Warping (DTW) algorithm. This algorithm first computes the DTW matrix between two feature sequences and calculates the cost of traversing from starting point to end point of the matrix. Minimum the cost value, more similar the musical segments are. The performance of DTW is improved by choosing suitable path constraints and path weight. Then, we implement LSH algorithm, which first indexes the data and then searches for a similar item. Processing time of LSH is shorter than that of DTW. For a smaller fragment of query audio, say 30 seconds, LSH outperformed DTW. Performance of LSH depends on the number of hash tables, number of projections per table and width of the projection. Both algorithms were applied in two types of data sets, RWC (where audio and midi are from the same source) and TUT (where audio and midi are from different sources). The contribution of this thesis is twofold. First we proposed a suitable feature representation of a musical segment for melodic similarity. And then we apply two different similarity measure algorithms and enhance their performances. This thesis work also includes development of mobile application capable of recording audio from surroundings and displaying its acoustic features in real time

Trepo - Institutional Repository of Tampere University

Altered Auditory BOLD Response to Conspecific Birdsong in Zebra Finches with Stuttered Syllables

Author: Helekar Santosh A.
Salgado-Commissariat Delanthi
Voss Henning U.
Publication venue: Public Library of Science
Publication date: 01/12/2010
Field of study

How well a songbird learns a song appears to depend on the formation of a robust auditory template of its tutor's song. Using functional magnetic resonance neuroimaging we examine auditory responses in two groups of zebra finches that differ in the type of song they sing after being tutored by birds producing stuttering-like syllable repetitions in their songs. We find that birds that learn to produce the stuttered syntax show attenuated blood oxygenation level-dependent (BOLD) responses to tutor's song, and more pronounced responses to conspecific song primarily in the auditory area field L of the avian forebrain, when compared to birds that produce normal song. These findings are consistent with the presence of a sensory song template critical for song learning in auditory areas of the zebra finch forebrain. In addition, they suggest a relationship between an altered response related to familiarity and/or saliency of song stimuli and the production of variant songs with stuttered syllables

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central