Search CORE

30 research outputs found

Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network

Author: Adavanne Sharath
Pertilä Pasi
Virtanen Tuomas
Publication venue
Publication date: 01/01/2017
Field of study

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a volume. Using the proposed spatial features over monaural features on the same network gives an absolute F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and 2.7% on the TUT-SED 2009 dataset that is fifteen times larger.Comment: Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Passiivimuotojen aktiivistuminen suomen kielessä

Author: Pertilä Laura
Publication venue: 'Sananjalka'
Publication date: 01/01/2000
Field of study

Crossref

Journal.fi

The automatic analysis of classroom talk

Author: Araya Roberto
Caballero Daniela
Kronholm Hanna
Kurimo Mikko
Lehesvuori Sami
Mansikkaniemi André
Pertilä Pasi
Viiri Jouni
Virtanen Tuomas
Publication venue: Matematiikan ja luonnontieteiden opetuksen tutkimusseura r.y. / The Finnish Mathematics and Science Education Research Association (FMSERA)
Publication date: 01/11/2017
Field of study

The SMART SPEECH Project is a joint venture between three Finnish universities and a Chilean university. The aim is to develop a mobile application that can be used to record classroom talk and enable observations to be made of classroom interactions. We recorded Finnish and Chilean physics teachers’ speech using both a conventional microphone/dictator setup and a microphone/mobile application setup. The recordings were analysed via automatic speech recognition (ASR). The average word error rate achieved for the Finnish teachers’ speech was under 40%. The ASR approach also enabled us to determine the key topics discussed within the Finnish physics lessons under scrutiny. The results here were promising as the recognition accuracy rate was about 85% on average

Journal.fi

Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments

Author: Cakir Emre
Eronen Antti
Fagerlund Eemi
Hakala Aapo
Pertilä Pasi
Politis Archontis
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Joint sound event localization and detection (SELD) is an integral part of developing context awareness into communication interfaces of mobile robots, smartphones, and home assistants. For example, an automatic audio focus for video capture on a mobile phone requires robust detection of relevant acoustic events around the device and their direction. Existing SELD approaches have been evaluated using material produced in controlled indoor environments, or the audio is simulated by mixing isolated sounds to different spatial locations. This paper studies SELD of speech in diverse everyday environments, where the audio corresponds to typical usage scenarios of handheld mobile devices. In order to allow weighting the relative importance of localization vs. detection, we will propose a two-stage hierarchical system, where the first stage is to detect the target events, and the second stage is to localize them. The proposed method utilizes convolutional recurrent neural network (CRNN) and is evaluated on a database of manually annotated microphone array recordings from various acoustic conditions. The array is embedded in a contemporary mobile phone form factor. The obtained results show good speech detection and localization accuracy of the proposed method in contrast to a non-hierarchical flat classification model.acceptedVersionPeer reviewe

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Audio source separation into the wild

Author: Aichner
Anguera Miro
Araki
Araki
Arberet
Arberet
Arberet
Attias
Avargel
Avargel
Badeau
Benaroya
Benesty
Bertrand
Bertrand
Bishop
Bustamante
Cardoso
Cemgil
Chazan
Chazan
Cherkassky
Cook
Cox
Crochiere
Dempster
DiBiase
Dillon
Doclo
Doclo
Drude
Duong
Duong
Dvorkind
Evers
Evers
Fallon
Feng
Févotte
Févotte
Gannot
Gannot
Gannot
Gilloire
Girgis
Girin
Habets
Hadad
Hershey
Higuchi
Higuchi
Higuchi
Hild
Hori
Ikram
Kamkar-Parsi
Kleijn
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Koutras
Kowalski
Kuttruff
Laufer
Lee
Leglaive
Leglaive
Leglaive
Li
Li
Li
Li
Liutkus
Loesch
Loizou
Luo
Lyon
Löllmann
Ma
Malik
Mandel
Markovich
Markovich-Golan
Markovich-Golan
Markovich-Golan
Markovich-Golan
Marquardt
Mitianoudis
Mukai
Nakadai
Nakadai
Narayanan
Nesta
Nugraha
O'Connor
O'Grady
Ozerov
Ozerov
Ozerov
Parra
Parra
Parsons
Pedersen
Pertilä
Plumbley
Prieto
Roman
Roman
Sawada
Sawada
Schmid
Schmidt
Schwartz
Schwartz
Schwartz
Simon
Smaragdis
Sturmel
Talmon
Talmon
Thiergart
Thiergart
Valin
Van Trees
Vijayasenan
Vincent
Vincent
Wang
Wang
Wang
Wang
Warsitz
Wehr
Weinstein
Widrow
Winter
Yilmaz
Yoshioka
Zeng
Zhang
Publication venue: 'Elsevier BV'
Publication date: 16/11/2018
Field of study

International audienceThis review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Acoustic Source Localization in a Room Environment and at Moderate Distances

Author: Pertilä Pasi
Publication venue: Tampere University of Technology
Publication date: 01/01/2009
Field of study

The pressure changes of an acoustic wavefront are sensed with a microphone that acts as a transducer, converting sound pressure into voltage. The voltage is then converted into digital form with an analog to digital (AD) -converter to provide a discrete time quantized digital signal. This thesis discusses methods to estimate the location of a sound source from the signals of multiple microphones. Acoustic source localization (ASL) can be used to locate talkers, which is useful for speech communication systems such as teleconferencing and hearing aids. Active localization methods receive and send energy, whereas passive methods only receive energy. The discussed ASL methods are passive which makes them attractive for surveillance applications, such as localization of vehicles and monitoring of areas. This thesis focuses on ASL in a room environment and at moderate distances that are often present in outdoor applications. The frequency range of many commonly occurring sounds such as speech, vehicles, and jet aircraft is large. Time delay estimation (TDE) methods are suitable for estimating properties from such wideband signals. Since TDE methods have been extensively studied, the theory is attractive to apply in localization. Time difference of arrival (TDOA) -based methods estimate the source location from measured TDOA values between microphones. These methods are computationally attractive but deteriorate rapidly when the TDOA estimates are no longer directly related to the source position. In a room environment such conditions could be faced when reverberation or noise starts to dominate TDOA estimation. The combination of microphone pairwise TDE measurements is studied as a more robust localization solution. TDE measurements are combined into a spatial likelihood function (SLF) of source position. A sequential Bayesian method known as particle filtering (PF) is used to estimate the source position. The PF based localization accuracy increases when the variance of SLF decreases. Results from simulations and real-data show that multiplication (intersection operation) results in a SLF with smaller variance than the typically applied summation (union operation). The above localization methods assume that the source is located in the near-field of the microphone array, i.e., the source emitted wavefront curvature is observable. In the far-field, the source wavefront is assumed planar and localization is considered by using spatially separated direction observations. The direction of arrival (DOA) of a source emitted wavefront impinging on a microphone array is traditionally estimated by steering the array to a direction that maximizes the steered response power. Such estimates can be deteriorated by noise and reverberation. Therefore, talker localization is considered using DOA discrimination. The sound propagation delay from the source to the microphone array becomes significant at moderate distances. As a result, the directional observations from a moving sound source point behind the true source position. Omitting the propagation delay results in a biased location estimate of a moving or discontinuously emitting source. To solve this problem the propagation delay is proposed to be modeled in the estimation process. Motivated by the robustness of localization using the combination of TDE measurements, source localization by directly combining the TDE-based array steered responses is considered. This extends the near-field talker localization methods to far-field source localization. The presented propagation delay modeling is then proposed for the steered response localization. The improvement in localization accuracy by including the propagation delay is studied using a simulated moving sound source in the atmosphere. The presented indoor localization methods have been evaluated in the Classification of Events, Activities and Relationships (CLEAR) 2006 and CLEAR'07 technology evaluations. In the evaluations, the performance of the proposed ASL methods was evaluated by a third party from several hours of annotated data. The data was gathered from meetings held in multiple smart rooms. According to the obtained results from CLEAR'07 development dataset (166 min) presented in this thesis, 92 % of speech activity in a meeting situation was located within 17 cm accuracy

Trepo - Institutional Repository of Tampere University

Lisämyynti osana asiakaspalveluprosessia Sokos Hotel Ilveksessä

Author: Iivonen Virve
Pertilä Anna-Maija
Publication venue: Jyväskylän ammattikorkeakoulu
Publication date: 01/01/2010
Field of study

Työn toimeksiantajana oli Sokos Hotel Ilves. Työn tavoitteena oli kehittää Sokos Hotel Ilveksen lisämyyntiä asiakkaan näkökulmasta. Tutkimuksen tavoitteena oli myös laatia tutkimustulosten pohjalta lisämyyntimalli, jossa kuvailtiin Sokos Hotel Ilveksen asiakaspalveluprosessin vaiheiden lisämyyntimahdollisuudet. Tavoitteiden saavuttamiseksi tutkittiin Sokos Hotel Ilveksessä yöpyneiden asiakkaiden mielipiteitä ja kokemuksia vastaanottovirkailijan suosittelusta sekä toimipaikkamainonnasta. Tutkimuksessa käytettiin teemahaastattelua, joka on laadullinen tutkimusmenetelmä. Työssä haastateltiin puhelimitse 32 asiakasta. Puhelinhaastattelut toteutettiin loppuvuodesta 2009. Tutkimusaineisto purettiin teemoittelemalla, jolloin teemoiksi nousivat suosittelu palvelumuotona ja lisämyynnin konkretisointi. Tutkimusaineistoa käsiteltiin myös asiakaspalvelutapahtuman vaiheiden kautta. Haastatteluista kävi ilmi, että asiakkaat kokivat lisämyynnin osaksi hyvää asiakaspalvelua. Vastaanottovirkailijan tekemä suosittelu vaikutti myönteisesti asiakkaiden viihtyvyyteen sekä ostopäätöksiin. Erityisen tärkeänä lisämyyntiä pitivät asiakkaat, jotka yöpyivät ensimmäistä kertaa Sokos Hotel Ilveksessä. Vähiten suosittelua arvostivat liikematkustajat, jotka majoittuivat usein Sokos Hotel Ilveksessä. Tulokset osoittivat, että Sokos Hotel Ilveksen on kehitettävä erityisesti toimipaikkamainontaa asiakkaiden tarpeita vastaavaksi. Tutkimustulosten pohjalta syntyneessä lisämyyntimallissa vastaanottovirkailijaa opastetaan käytännön esimerkkien kautta suosittelemaan oikeissa kohdissa asiakkaan tarpeita vastaavia tuotteita ja palveluita. Lisämyyntimallia voidaan hyödyntää tulevaisuudessa Sokos Hotel Ilveksen lisäksi muissakin Sokos Hotels -ketjun hotelleissa.The commissioner of the thesis was Sokos Hotel Ilves and the objective was to develop supplementary selling to be executed at Sokos Hotel Ilves, from the customer point of view. Another objective was to make a supplementary selling model describing the supplementary selling phases in the customer service process of Sokos Hotel Ilves. The supplementary selling model was based on the research results. To reach the objectives a survey on customers´ opinions and experiences of the recommendations made by the receptionist, was conducted. The advertisement at the hotel was also studied. The research was executed by using a theme interview which is a qualitative research method. At the end of the year 2009 32 customers were interviewed by telephone. The research results were analyzed by themes. The themes were recommendation as a service type and concretizing the supplementary selling. The research results were also analyzed through the customer service process. The results showed that the customers considered recommendation to be good customer service. Recommendations made by the receptionist affected customer satisfaction and also the decisions to purchase in a positive way. Especially the customers who stayed at Sokos Hotel Ilves for the first time considered supplementary selling very important. The customers who had stayed at Sokos Hotel Ilves before did not value recommendations. The research showed that Sokos Hotel Ilves should especially focus on advertisement at the hotel to meet the customer needs. The supplementary selling model points out by practical examples how the receptionist can recommend the services and products to meet the customer needs at the right time. In the future the supplementary selling model can be utilized besides Sokos Hotel Ilves in other Sokos Hotels

Theseus

Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement

Author: Pertilä Pasi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2019
Field of study

The time-frequency mask and the magnitude spectrum are two common targets for deep learning-based speech enhancement. Both the ensemble and the neural network fusion of magnitude spectra obtained with these approaches have been shown to improve the objective perceptual quality with synthetic mixtures of data. This work generalizes the ensemble approach by proposing neural network layers to predict time-frequency varying weights for the combination of the two magnitude spectra. In order to combine the best individual magnitude spectrum estimates, the weight prediction network is trained after the time-frequency mask and magnitude spectrum sub-networks have been separately trained for their corresponding objectives and their weights have been frozen. Using the publicly available CHiME3 -challenge data, which consists of both simulated and real speech recordings in everyday environments with noise and interference, the proposed approach leads to significantly higher noise suppression in terms of segmental source-to-distortion ratio over the alternative approaches. In addition, the approach achieves similar improvements in the average objective instrumentally measured intelligibility scores with respect to the best achieved scores.acceptedVersionPeer reviewe

Crossref

Trepo - Institutional Repository of Tampere University

Microphone-Array-Based Speech Enhancement Using Neural Networks

Author: Pertilä Pasi
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

This chapter analyses the use of artificial neural networks (ANNs) in learning to predict time-frequency (TF) masks from the noisy input data. Artificial neural networks are inspired by the operation of biological neural networks, where individual neurons receive inputs from other connected neurons. The chapter focuses on TF mask prediction for speech enhancement in dynamic noise environments using artificial neural networks. It reviews the enhancement framework of microphone array signals using beamforming with post-filtering. The chapter presents an overview of the supervised learning framework used for the TF mask-based speech enhancement. It explores the effectiveness of feed-forward neural networks for a real-world enhancement application using recordings from everyday noisy environments, where a microphone array is used to capture the signals. Estimated instrumental intelligibility and signal-to-noise ratio (SNR) scores are evaluated to measure how well the predicted masks improve speech quality, using networks trained on different input features.acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University