Search CORE

180 research outputs found

Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

Author: Hata Kazuya
Nakadai Kazuhiro
Sudo Yui
Publication venue
Publication date: 28/05/2023
Field of study

End-to-end automatic speech recognition (E2E-ASR) has the potential to improve performance, but a specific issue that needs to be addressed is the difficulty it has in handling enharmonic words: named entities (NEs) with the same pronunciation and part of speech that are spelled differently. This often occurs with Japanese personal names that have the same pronunciation but different Kanji characters. Since such NE words tend to be important keywords, ASR easily loses user trust if it misrecognizes them. To solve these problems, this paper proposes a novel retraining-free customized method for E2E-ASRs based on a named-entity-aware E2E-ASR model and phoneme similarity estimation. Experimental results show that the proposed method improves the target NE character error rate by 35.7% on average relative to the conventional E2E-ASR model when selecting personal names as a target NE.Comment: accepted by INTERSPEECH202

arXiv.org e-Print Archive

Improvement of DOA Estimation by using Quaternion Output in Sound Event Localization and Detection

Author: Itoyama Katsutoshi
Nakadai Kazuhiro
Nishida Kenji
Sudo Yui
Publication venue: 'New York University'
Publication date: 01/01/2019
Field of study

This paper describes improvement of Direction of Arrival (DOA) estimation performance using quaternion output in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 3. DCASE 2019 Task3 focuses on the sound event localization and detection (SELD) which is a task that simultaneously estimates the sound source direction in addition to conventional sound event detection (SED). In the baseline method, the sound source direction angle is directly regressed. However, the angle is a periodic function and it has discontinuities which may make learning unstable. Specifical-ly, even though -180 deg and 180 deg are in the same direc-tion, a large loss is calculated. Estimating DOA angles with a classification approach instead of regression can solve such instability of discontinuities but this causes limitation of reso-lution. In this paper, we propose to introduce the quaternion which is a continuous function into the output layer of the neural network instead of directly estimating the sound source direction angle. This method can be easily implemented only by changing the output of the existing neural network, and thus does not significantly increase the number of parameters in the middle layers. Experimental results show that proposed method improves the DOA estimation without significantly increasing the number of parameters.24424

Crossref

New York University Faculty Digital Archive

FOOTSTEP DETECTION AND CLASSIFICATION USING DISTRIBUTED MICROPHONES

Author: Eng
Grad Sch
Kazuhiro Nakadai
Sci Of Creative
Shigeki Sugano
Yuta Fujii
Publication venue
Publication date: 23/04/2020
Field of study

ABSTRACT This paper addresses footstep detection and classification with multiple microphones distributed on the floor. We propose to introduce geometrical features such as position and velocity of a sound source for classification which is estimated by amplitude-based localization. It does not require precise inter-microphone time synchronization unlike a conventional microphone array technique. To classify various types of sound events, we introduce four types of features, i.e., time-domain, spectral and Cepstral features in addition to the geometrical features. We constructed a prototype system for footstep detection and classification based on the proposed ideas with eight microphones aligned in a 2-by-4 grid manner. Preliminary classification experiments showed that classification accuracy for four types of sound sources such as a walking footstep, running footstep, handclap, and utterance maintains over 70% even when the signal-to-noise ratio is low, like 0 dB. We also confirmed two advantages with the proposed footstep detection and classification. One is that the proposed features can be applied to classification of other sound sources besides footsteps. The other is that the use of a multichannel approach further improves noise-robustness by selecting the best microphone among the microphones, and providing geometrical information on a sound source

CiteSeerX

Noise robust 2D bird localization via sound using microphone arrays

Author: Gabriel Daniel
Hoshiba Kotaro
Itoyama Katsutoshi
Kazuhiro Nakadai
Kojima Ryosuke
Nishida Kenji
Publication venue
Publication date: 01/01/2018
Field of study

Birds in the wild are difficult to localize, because their sizes tend to be small, they move swiftly, and they are often visually occluded. However, their location information is crucial for ethological studies on birds' behaviour. Recently, automating the process has been studied as a hot topic, where spatial sensors and sensor networks are commonly used. To avoid the visual occlusion problem, many studies focus on acoustic signal processing by applying microphone arrays and perform 1D azimuth localization through bird songs. In this study, we perform 2D sound source localization in the Cartesian coordinates using azimuths from multiple microphone arrays. To estimate the exact bird's location, we calculate the intersection points of these azimuth lines. Although this approach is simple and easy to be implemented, it has two main issues. One is that even small noise interference in azimuth values results in corrupting the localization data. This leads to a problem, where the intersection points between the azimuth lines do not intersect in one point for a single bird, but in several points. This proves difficulty in estimating the exact location of each bird. Especially in a far-field application, even small noise corruption leads to large localization errors. The other issue is that in the bird's natural habitat, elements such as leaves, grass and rivers are natural noise sources. It is difficult to extract the bird songs in such a noisy environment. We propose an algorithm involving statistic methods, sound feature analysis and machine learning. Based on this approach, a noise robust bird localization system has been established. We have performed numerous simulations to further understand the limitations of the system. Based on the results we have also derived the system's design guidelines, describing how the results change depending on the number of microphone arrays, signal-to-noise ratio, bird's distance from the devices, array's transfer function, type of the singing bird and specific parameter settings used in the algorithms. Such detailed guidelines support interested researchers in creating a similar system, which can contribute to ethological researches

Digitale Bibliothek Thüringen

A robot uses its own microphone to synchronize its steps to musical beats while scatting and singing

Author: Hiroshi G. Okuno
Hiroshi Tsujino
Kazuhiro Nakadai
Kazumasa Murata
Kazuyoshi Yoshii
Ryu Takeda
Toyotaka Torii
Yuji Hasegawa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Abstract—Musical beat tracking is one of the effective technologies for human-robot interaction such as musical ses-sions. Since such interaction should be performed in various environments in a natural way, musical beat tracking for a robot should cope with noise sources such as environmental noise, its own motor noises, and self voices, by using its own microphone. This paper addresses a musical beat tracking robot which can step, scat and sing according to musical beats by using its own microphone. To realize such a robot, we propose a robust beat tracking method by introducing two key techniques, that is, spectro-temporal pattern matching and echo cancellation. The former realizes robust tempo estimation with a shorter window length, thus, it can quickly adapt to tempo changes. The latter is effective to cancel self noises such as stepping, scatting, and singing. We implemented the proposed beat tracking method for Honda ASIMO. Experimental results showed ten times faster adaptation to tempo changes and high robustness in beat tracking for stepping, scatting and singing noises. We also demonstrated the robot times its steps while scatting or singing to musical beats. I

CiteSeerX

Crossref

Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive Applications

Author: Aylett Matthew
Gomez Randy
Irani Pourang
Munteanu Cosmin
Nakadai Kazuhiro
Nakamura Keisuke
Oviatt Sharon
Pan Shimei
Penn Gerald
Rudzicz Frank
Sharma Nikhil
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Crossref

Edinburgh Research Explorer

ダイナミックゾウエイ CT スキャンニヨルハイシュリュウノケッコウドウタイカイセキノキソテキケントウ

Author: Andoh Takashi
Kasuga Toshio
Li Feng
Maruyama Yuichirou
Nakadai Yuki
Oguchi Kazuhiro
Sakai Fumizazu
Sone Shusuke
Watanabe Tomofumi
アンドウタカシ
オグチカズヒロ
カスガトシオ
サカイフミカズ
ソネシュウスケ
ナカダイユキ
マルヤマユイチロウ
リフェン
ワタナベトモフミ
中台有紀
丸山雄一郎
安藤隆
小口和浩
春日敏夫
曽根脩輔
李峰
渡辺智文
酒井文和
Publication venue: 日本医学放射線学会
Publication date: 25/06/1996
Field of study

Osaka University Knowledge Archive