Search CORE

10 research outputs found

Automatic Estimation of Intelligibility Measure for Consonants in Speech

Author: Abavisani Ali
Hasegawa-Johnson Mark
Publication venue: 'International Speech Communication Association'
Publication date: 28/06/2020
Field of study

In this article, we provide a model to estimate a real-valued measure of the intelligibility of individual speech segments. We trained regression models based on Convolutional Neural Networks (CNN) for stop consonants \textipa{/p,t,k,b,d,g/} associated with vowel \textipa{/A/}, to estimate the corresponding Signal to Noise Ratio (SNR) at which the Consonant-Vowel (CV) sound becomes intelligible for Normal Hearing (NH) ears. The intelligibility measure for each sound is called SNR

_{90}

, and is defined to be the SNR level at which human participants are able to recognize the consonant at least 90\% correctly, on average, as determined in prior experiments with NH subjects. Performance of the CNN is compared to a baseline prediction based on automatic speech recognition (ASR), specifically, a constant offset subtracted from the SNR at which the ASR becomes capable of correctly labeling the consonant. Compared to baseline, our models were able to accurately estimate the SNR

_{90}

~intelligibility measure with less than 2 [dB

^2

] Mean Squared Error (MSE) on average, while the baseline ASR-defined measure computes SNR

_{90}

~with a variance of 5.2 to 26.6 [dB

^2

], depending on the consonant.Comment: 5 pages, 1 figure, 7 tables, submitted to Inter Speech 2020 Conferenc

arXiv.org e-Print Archive

Crossref

Two vs. Four-Channel Sound Event Localization and Detection

Author: Abavisani Ali
Bello Juan Pablo
Bondi Luca
Fuentes Magdalena
Ghaffarzadegan Shabnam
Wilkins Julia
Publication venue
Publication date: 23/09/2023
Field of study

Sound event localization and detection (SELD) systems estimate both the direction-of-arrival (DOA) and class of sound sources over time. In the DCASE 2022 SELD Challenge (Task 3), models are designed to operate in a 4-channel setting. While beneficial to further the development of SELD systems using a multichannel recording setup such as first-order Ambisonics (FOA), most consumer electronics devices rarely are able to record using more than two channels. For this reason, in this work we investigate the performance of the DCASE 2022 SELD baseline model using three audio input representations: FOA, binaural, and stereo. We perform a novel comparative analysis illustrating the effect of these audio input representations on SELD performance. Crucially, we show that binaural and stereo (i.e. 2-channel) audio-based SELD models are still able to localize and detect sound sources laterally quite well, despite overall performance degrading as less audio information is provided. Further, we segment our analysis by scenes containing varying degrees of sound source polyphony to better understand the effect of audio input representation on localization and detection performance as scene conditions become increasingly complex

arXiv.org e-Print Archive

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

Author: Abavisani Ali
Dehak Najim
Feng Siyuan
Hasegawa-Johnson Mark
Moro-Velázquez Laureano
Scharenborg Odette
Żelasko Piotr
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The idea of combining multiple languages' recordings to train a single automatic speech recognition (ASR) model brings the promise of the emergence of universal speech representation. Recently, a Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training. However, the representations it learned were not successful in zero-shot transfer to unseen languages. Because that model lacks an explicit factorization of the acoustic model (AM) and language model (LM), it is unclear to what degree the performance suffered from differences in pronunciation or the mismatch in phonotactics. To gain more insight into the factors limiting zero-shot ASR transfer, we replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM. Then, we perform an extensive evaluation of monolingual, multilingual, and crosslingual (zero-shot) acoustic and language models on a set of 13 phonetically diverse languages. We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer. Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.Comment: Accepted for publication in IEEE ICASSP 2021. The first 2 authors contributed equally to this wor

arXiv.org e-Print Archive

TU Delft Repository

Evaluating hearing aid amplification using idiosyncratic consonant errors

Author: Ali Abavisani
Allen J. B.
Fousek P.
Jont B. Allen
Vysochanskij D. F.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date
Field of study

Crossref

Investigation of engineering properties of steel fiber reinforced concrete exposed to homogeneous magnetic field

Author: Abavisani
Abavisani
Abavisani
Ahmed
Ali Kheyroddin
Bungey
Caggiano
Chen
Desmettre
Esfahani
Ferrara
Ferrández
Gholhaki
Ghorbani
Ghorbani
Ghorbani
Hajforoush
Han
Hubert
Kazemi
Kazemi
Kazemi
Khan
Madandoust
Mohammad Hajforoush
Mu
Omid Rezaifar
Plagué
Popovics
Rezaifar
Saidani
Shahir Liew
Song
Su
Su
Ulucan
Villar
Villar
Wei
Whitehurst
Xu
Xue
Yamamoto
Zieliński
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Optimum cost design of frames using genetic algorithms

Author: Abavisani Ali
Chen Chulin
Khorami Majid
Mohamad Edy Tonnizam
Najem Rabi' Muyad
Pham Binh Thai
Wakil Karzan
Yousif Salim Taib
Publication venue: 'Techno-Press'
Publication date: 01/02/2019
Field of study

The optimum cost of a reinforced concrete plane and space frames have been found by using the Genetic Algorithm (GA) method. The design procedure is subjected to many constraints controlling the designed sections (beams and columns) based on the standard specifications of the American Concrete Institute ACI Code 2011. The design variables have contained the dimensions of designed sections, reinforced steel and topology through the section. It is obtained from a predetermined database containing all the single reinforced design sections for beam and columns subjected to axial load, uniaxial or biaxial moments. The designed optimum beam sections by using GAs have been unified through MATLAB to satisfy axial, flexural, shear and torsion requirements based on the designed code. The frames’ functional cost has contained the cost of concrete and reinforcement of steel in addition to the cost of the frames’ formwork. The results have found that limiting the dimensions of the frame’s beams with the frame’s columns have increased the optimum cost of the structure by 2%, declining the re-analysis of the optimum designed structures through GA

Universiti Teknologi Malaysia Institutional Repository

Discovering phonetic inventories with crosslingual automatic speech recognition

Author: Abavisani Ali (author)
Bhati Saurabhchand (author)
Dehak Najim (author)
Feng S. (author)
Hasegawa-Johnson Mark (author)
Moro Velázquez Laureano (author)
Scharenborg O.E. (author)
Żelasko Piotr (author)
Publication venue
Publication date: 01/01/2022
Field of study

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language. The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper, we (1) investigate the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; (2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and (3) present different methods to build a phone inventory of an unseen language in an unsupervised way. To that end, we conducted mono-, multi-, and crosslingual experiments on a set of 13 phonetically diverse languages and several in-depth analyses. We found a number of universal phone tokens (IPA symbols) that are well-recognized cross-linguistically. Through a detailed analysis of results, we conclude that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Multimedia Computin

TU Delft Repository

Prevalence of gestational diabetes mellitus in Eastern Mediterranean region: a systematic review and meta-analysis

Author: A Ali
A Bener
A Hossein-Nezhad
A Jiwani
Abbas Balouchi
AD Ali
AO Abdelmola
C Bommer
C Gao
D Moher
E Alfadhli
E Chiefari
E Hosseini
EM Alfadhli
EM Wendland
F Hadaegh
F Mohammadzadeh
F Momenzadeh
Fereshteh Daneshi
H Shahbazian
H Wahabi
HA Wahabi
Hosien Rafiemanesh
J McGowan
KE Rajab
KE Rajab
Khadije Rezaie Keikhaie
KW Lee
L Hartling
M Keshavarz
M Makgoba
M Manafi
M Meregaglia
MA Al-Rowaily
Mahin Badakhsh
Mahmood Sheyback
Mahnaz Abavisani
MG Al-Kuwari
MM Agarwal
MS Ardawi
N Cho
N Schwartz
N Shirazian
O Zamstein
S Al Mahroos
S Bibi
S Macaulay
S Soheilykhah
Salehoddin Bouya
SS Casagrande
T Sella
T Sella
T Tamayo
W Ahsen
Y Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref