Search CORE

13 research outputs found

A study of voice activity detection techniques for NIST speaker recognition evaluations

Author: Atal
Auckenthaler
Basbug
Benyassine
Beritelli
Boll
Campbell
Campbell
Chang
Chengalvarayan
Cornu
Dalmasso
Davis
Dehak
Deller
Ephraim
Ephraim
ETSI
ETSI
Freeman
Fukuda
Garcia-Romero
Ghosh
Gu
Góriz
Hautamaki
Hirsch
Hon-Bill Yu
Kinnunen
Kitaoka
Li
Mak
Man-Wai Mak
Marciniak
Martin
Martin
Marzinzik
Nemer
Pelecanos
Ramirez
Ramirez
Ramirez
Reynolds
Sangwan
Sohn
Sun
Sun
Tanyer
Torre
Tucker
Varela
Virag
Vlaj
Woo
Yu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Speech Enhancement using Hmm and Snmf(Os)

Author: BarinderPal Singh, Dr. Shashi Bhushan
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/11/2015
Field of study

The speech enhancement is the process to enhance the speech signal by reducing the noise from the signal as well as improving the quality of the signal. The speech signal enhancement requires various techniques associated with the signal noise removal as well as the signal patch fixation in order to enhance the frequency of the speech signal. In this paper, we have proposed the new speech enhancement model for the speech enhancement with the amalgamation of the various speech processing techniques together. The proposed model is equipped with the Supervised sparse non-negative matrix factorization (S-SNMF) along with hidden markov model (HMM) and noise reducing filter to overcome the problem of the signal enhancement by reducing the missing values and by enhancing the signal on the weak points detected under the application of the HMM. The experimental results have proved the efficiency of the proposed model in comparison with the existing model. The improvement of nearly 50% has been recorded from the parameters of peak signal to noise ratio (PSNR), mean squared error (MSE), signal to noise ratio (SNR) etc

International Journal on Recent and Innovation Trends in Computing and Communication

The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016

Author: Ajili M.
Alegre F.
Ambikairajah E.
Aronowitz H.
Bahmaninezhad F.
Bonastre J. F.
Bousquet P. M.
Busch C.
Chng E. S.
Delgado H.
Evans N.
Fauve B.
Halonen M.
Hansen J. H.L.
Hautamäki V.
Isadskiy S.
Jin R.
Kanervisto A.
Kheder W. B.
Kinnunen T.
Larcher A.
Le Lan G.
Lee K. A.
Li H.
Li Haizhou
Lim Z. H.
Lin W. W.
Liu Gang
Ma B.
Ma J.
Mak M. W.
Matrouf D.
Nautsch A.
Nguyen T. H.
Qian Q.
Rao W.
Rathgeb C.
Rouvier M.
Saeidi R.
Sahidullah M.
Sarkar A. K.
Sethu V.
Sizov A.
Sriskandaraja K.
Stafylakis T.
Sun H.
Tan Z. H.
Thomsen D. A.L.
Todisco M.
Tzimiropoulos G.
Vestman V.
Wang G.
Wang Tianzhou
Wang Z.
Xiao X.
Xu C.
Xu H.
Xue J.
Zhang C.
Zhao Q.
Zhao T.
Zhu S.
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2017
Field of study

The 2016 speaker recognition evaluation (SRE'16) is the latest edition in the series of benchmarking events conducted by the National Institute of Standards and Technology (NIST). I4U is a joint entry to SRE'16 as the result from the collaboration and active exchange of information among researchers from sixteen Institutes and Universities across 4 continents. The joint submission and several of its 32 sub-systems were among top-performing systems. A lot of efforts have been devoted to two major challenges, namely, unlabeled training data and dataset shift from Switchboard-Mixer to the new Call My Net dataset. This paper summarizes the lessons learned, presents our shared view from the sixteen research groups on recent advances, major paradigm shift, and common tool chain used in speaker recognition as we have witnessed in SRE'16. More importantly, we look into the intriguing question of fusing a large ensemble of sub-systems and the potential benefit of large-scale collaboration.Peer reviewe

Aaltodoc Publication Archive

VBN

NEGATIVE CLASSIFICATION DIGITAL INFORMATION OF VOICE TONE

Author: Tomislav Jarimić
Publication venue: University of applied sciences Zagreb
Publication date: 01/01/2019
Field of study

U tehničkom smislu ne postoji prepoznavanje osobe po boji glasa na razini čovjeka. Prepoznavanje glasa i prepoznavanje govora su pojmovi koji su sadržajno potpuno različiti. Svrha prepoznavanja govora je prepoznavanje riječi govornika, dok se prepoznavanjem glasa identificirao poznatu ili nepoznatu osobu. Boja glasa osobe [1] ključan je uzorak u glasu govornika. Može se promatrati kao subjektivni osjet glasnoće, visine i boje zvuka te njegove percepcije u prostoru. Cilj istraživanja je pronaći u zvučnom zapisu govornika karakteristične značajke kojima se može odrediti digitalna informacija boja glasa. Digitalnom informacijom boje glasa uz upotrebu informacijskokomunikacijske tehnologije[2] prepoznajemo spol osobe, ciljanog govornika ili skupnu govornika. Namjera je do sada neiskorištene Walsh-Hadamardove koeficijente upotrijebiti za određivanje karakterističnih značajke. U procesu predprocesiranja izlučujemo karakteristične značajke za digitalnu informaciju boje glasa i to tako da se postupkom Kepstralne analize određuje osnovna frekvenciju F0 za svaki zvučni zapis, temeljem koje se izračunavaju pripadajući harmonici. Dobivene uzorci klasificiraju se po spolu. U slijedećem koraku iz zvučnih zapisa izlučuju se Walsh-Hadamardovi koeficijenti, klasificiramo ih po spolu, te se uspoređuju rezultati klasifikacija. U praksi model se može koristiti za prepoznavanje unaprijed preddefinirane klasu govornika, odnosno za brzu eliminaciju ciljane skupine podataka.In a technical sense, does not exist a developed recognition of voice tone on a human level. Voice recognition and speech recognition are concepts that are completely different in content. The purpose of speech recognition is to recognize a speaker\u27s speech, while recognizing a voice identifies the known person or an unknown person. The voice tone of a person is a key pattern in the voice of the speaker. It can be seen as a subjective sense of the volume, height and color of the sound and its perception in the space. The aim of the research is to find in the recorded voice of the speaker a characteristic feature that can determine the digital information of the of the voice tone. By using digital information of voice tone and using information-communication technology, we recognize a gender, a target speaker, or a group of speakers. Intention is to use Walsh-Hadamard\u27s coefficients to determine the characteristic features. In the preprocessing process, we separate the characteristic features for digital information of the voice tone. With method of the Cepstral analysis determines the basic frequency F0 for each recorded sound on which basisi the respective harmonics are calculated. The obtained samples are classified by gender. In the next step from the recorded sound, the Walsh-Hadamard coefficients are mark off, classified by gender, and the results of the classification are compared. In practice, the model is used to identify, or to quickly eliminate a target group of data

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Recommended from our members

Pattern mining approaches used in sensor-based biometric recognition: a review

Author: Chaki Jyotismita
Dey Nilanjan
Sherratt R. Simon
Shi Fuqian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Sensing technologies place significant interest in the use of biometrics for the recognition and assessment of individuals. Pattern mining techniques have established a critical step in the progress of sensor-based biometric systems that are capable of perceiving, recognizing and computing sensor data, being a technology that searches for the high-level information about pattern recognition from low-level sensor readings in order to construct an artificial substitute for human recognition. The design of a successful sensor-based biometric recognition system needs to pay attention to the different issues involved in processing variable data being - acquisition of biometric data from a sensor, data pre-processing, feature extraction, recognition and/or classification, clustering and validation. A significant number of approaches from image processing, pattern identification and machine learning have been used to process sensor data. This paper aims to deliver a state-of-the-art summary and present strategies for utilizing the broadly utilized pattern mining methods in order to identify the challenges as well as future research directions of sensor-based biometric systems

Central Archive at the University of Reading

Crossref

Speech Analysis by Natural Language Processing Techniques: A Possible Tool for Very Early Detection of Cognitive Decline?

Author: Daniela Beltrami
Daniela Beltrami
Enrico Ghidoni
Fabio Tamburini
Gloria Gagliardi
Gloria Gagliardi
Laura Calzà
Laura Calzà
Rema Rossini Favretti
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Background: The discovery of early, non-invasive biomarkers for the identification of “preclinical” or “pre-symptomatic” Alzheimer's disease and other dementias is a key issue in the field, especially for research purposes, the design of preventive clinical trials, and drafting population-based health care policies. Complex behaviors are natural candidates for this. In particular, recent studies have suggested that speech alterations might be one of the earliest signs of cognitive decline, frequently noticeable years before other cognitive deficits become apparent. Traditional neuropsychological language tests provide ambiguous results in this context. In contrast, the analysis of spoken language productions by Natural Language Processing (NLP) techniques can pinpoint language modifications in potential patients. This interdisciplinary study aimed at using NLP to identify early linguistic signs of cognitive decline in a population of elderly individuals.Methods: We enrolled 96 participants (age range 50–75): 48 healthy controls (CG) and 48 cognitively impaired participants: 16 participants with single domain amnestic Mild Cognitive Impairment (aMCI), 16 with multiple domain MCI (mdMCI) and 16 with early Dementia (eD). Each subject underwent a brief neuropsychological screening composed by MMSE, MoCA, GPCog, CDT, and verbal fluency (phonemic and semantic). The spontaneous speech during three tasks (describing a complex picture, a typical working day and recalling a last remembered dream) was then recorded, transcribed and annotated at various linguistic levels. A multidimensional parameter computation was performed by a quantitative analysis of spoken texts, computing rhythmic, acoustic, lexical, morpho-syntactic, and syntactic features.Results: Neuropsychological tests showed significant differences between controls and mdMCI, and between controls and eD participants; GPCog, MoCA, PF, and SF also discriminated between controls and aMCI. In the linguistic experiments, a number of features regarding lexical, acoustic and syntactic aspects were significant in differentiating between mdMCI, eD, and CG (non-parametric statistical analysis). Some features, mainly in the acoustic domain also discriminated between CG and aMCI.Conclusions: Linguistic features of spontaneous speech transcribed and analyzed by NLP techniques show significant differences between controls and pathological states (not only eD but also MCI) and seems to be a promising approach for the identification of preclinical stages of dementia. Long duration follow-up studies are needed to confirm this assumption

ARCHIVIO ISTITUZIONALE DELLA RICERCA-UNIVERSITA' DEGLI STUDI DI NAPOLI "L'ORIENTALE"

Directory of Open Access Journals

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Frontiers - Publisher Connector

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

FigShare

A Study of Voice Activity Detection Techniques for NIST Speaker Recognition Evaluations

Author: Hon-bill Yu
Man-wai Mak
Publication venue
Publication date
Field of study

Since 2008, interview-style speech has become an important part of the NIST Speaker Recognition Evaluations (SREs). Unlike telephone speech, interview speech has lower signal-to-noise ratio, which necessitates robust voice activity detectors (VADs). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speech/non-speech segmentation in these files. To overcome these difficulties, this paper proposes using speech enhancement techniques as a preprocessing step for enhancing the reliability of energy-based and statisticalmodel-based VADs. A decision strategy is also proposed to overcome the undesirable effects caused by impulsive signals and sinusoidal background signals. The proposed VAD is compared with the ASR transcripts provided by NIST, VAD in the ETSI-AMR Option 2 coder, satistical-model (SM) based VAD, and Gaussian mixture model (GMM) based VAD. Experimental results based on the NIST 2010 SRE dataset suggest that the proposed VAD outperforms these conventional ones whenever interview-style speech is involved. This study also demonstrates that (1) noise reduction is vital for energy-based VAD under low SNR; (2) the ASR transcripts and ETSI-AMR speech coder do not produceaccurate speech and non-speech segmentations; and(3)spectralsubtractionmakesbetteruseofbackgroundspectrathanthe likelihood-ratio tests intheSM-basedVAD.Thesegmentation filesproduce

CiteSeerX

The Hong Kong Polytechnic University Pao Yue-kong Library