13 research outputs found
Speech Enhancement using Hmm and Snmf(Os)
The speech enhancement is the process to enhance the speech signal by reducing the noise from the signal as well as improving the quality of the signal. The speech signal enhancement requires various techniques associated with the signal noise removal as well as the signal patch fixation in order to enhance the frequency of the speech signal. In this paper, we have proposed the new speech enhancement model for the speech enhancement with the amalgamation of the various speech processing techniques together. The proposed model is equipped with the Supervised sparse non-negative matrix factorization (S-SNMF) along with hidden markov model (HMM) and noise reducing filter to overcome the problem of the signal enhancement by reducing the missing values and by enhancing the signal on the weak points detected under the application of the HMM. The experimental results have proved the efficiency of the proposed model in comparison with the existing model. The improvement of nearly 50% has been recorded from the parameters of peak signal to noise ratio (PSNR), mean squared error (MSE), signal to noise ratio (SNR) etc
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016
The 2016 speaker recognition evaluation (SRE'16) is the latest edition in the series of benchmarking events conducted by the National Institute of Standards and Technology (NIST). I4U is a joint entry to SRE'16 as the result from the collaboration and active exchange of information among researchers from sixteen Institutes and Universities across 4 continents. The joint submission and several of its 32 sub-systems were among top-performing systems. A lot of efforts have been devoted to two major challenges, namely, unlabeled training data and dataset shift from Switchboard-Mixer to the new Call My Net dataset. This paper summarizes the lessons learned, presents our shared view from the sixteen research groups on recent advances, major paradigm shift, and common tool chain used in speaker recognition as we have witnessed in SRE'16. More importantly, we look into the intriguing question of fusing a large ensemble of sub-systems and the potential benefit of large-scale collaboration.Peer reviewe
NEGATIVE CLASSIFICATION DIGITAL INFORMATION OF VOICE TONE
U tehničkom smislu ne postoji prepoznavanje
osobe po boji glasa na razini čovjeka.
Prepoznavanje glasa i prepoznavanje govora
su pojmovi koji su sadržajno potpuno različiti.
Svrha prepoznavanja govora je prepoznavanje
riječi govornika, dok se prepoznavanjem glasa
identificirao poznatu ili nepoznatu osobu.
Boja glasa osobe [1] ključan je uzorak u glasu
govornika. Može se promatrati kao subjektivni
osjet glasnoće, visine i boje zvuka te njegove
percepcije u prostoru. Cilj istraživanja je pronaći
u zvučnom zapisu govornika karakteristične
značajke kojima se može odrediti digitalna
informacija boja glasa. Digitalnom informacijom
boje glasa uz upotrebu informacijskokomunikacijske tehnologije[2] prepoznajemo
spol osobe, ciljanog govornika ili skupnu
govornika. Namjera je do sada neiskorištene
Walsh-Hadamardove koeficijente upotrijebiti za
određivanje karakterističnih značajke. U procesu
predprocesiranja izlučujemo karakteristične
značajke za digitalnu informaciju boje glasa
i to tako da se postupkom Kepstralne analize
određuje osnovna frekvenciju F0
za svaki
zvučni zapis, temeljem koje se izračunavaju
pripadajući harmonici. Dobivene uzorci
klasificiraju se po spolu. U slijedećem koraku
iz zvučnih zapisa izlučuju se Walsh-Hadamardovi koeficijenti, klasificiramo ih po spolu, te se
uspoređuju rezultati klasifikacija. U praksi model
se može koristiti za prepoznavanje unaprijed
preddefinirane klasu govornika, odnosno za brzu
eliminaciju ciljane skupine podataka.In a technical sense, does not exist a developed
recognition of voice tone on a human level. Voice
recognition and speech recognition are concepts
that are completely different in content. The
purpose of speech recognition is to recognize
a speaker\u27s speech, while recognizing a voice
identifies the known person or an unknown
person. The voice tone of a person is a key pattern
in the voice of the speaker. It can be seen as a
subjective sense of the volume, height and color
of the sound and its perception in the space. The
aim of the research is to find in the recorded
voice of the speaker a characteristic feature that
can determine the digital information of the of
the voice tone. By using digital information of
voice tone and using information-communication
technology, we recognize a gender, a target
speaker, or a group of speakers. Intention is to
use Walsh-Hadamard\u27s coefficients to determine
the characteristic features. In the preprocessing
process, we separate the characteristic features
for digital information of the voice tone. With
method of the Cepstral analysis determines the
basic frequency F0
for each recorded sound
on which basisi the respective harmonics are
calculated. The obtained samples are classified
by gender. In the next step from the recorded
sound, the Walsh-Hadamard coefficients are mark
off, classified by gender, and the results of the
classification are compared. In practice, the model
is used to identify, or to quickly eliminate a target
group of data
Recommended from our members
Pattern mining approaches used in sensor-based biometric recognition: a review
Sensing technologies place significant interest in the use of biometrics for the recognition and assessment of individuals. Pattern mining techniques have established a critical step in the progress of sensor-based biometric systems that are capable of perceiving, recognizing and computing sensor data, being a technology that searches for the high-level information about pattern recognition from low-level sensor readings in order to construct an artificial substitute for human recognition. The design of a successful sensor-based biometric recognition system needs to pay attention to the different issues involved in processing variable data being - acquisition of biometric data from a sensor, data pre-processing, feature extraction, recognition and/or classification, clustering and validation. A significant number of approaches from image processing, pattern identification and machine learning have been used to process sensor data. This paper aims to deliver a state-of-the-art summary and present strategies for utilizing the broadly utilized pattern mining methods in order to identify the challenges as well as future research directions of sensor-based biometric systems
Speech Analysis by Natural Language Processing Techniques: A Possible Tool for Very Early Detection of Cognitive Decline?
Background: The discovery of early, non-invasive biomarkers for the identification of “preclinical” or “pre-symptomatic” Alzheimer's disease and other dementias is a key issue in the field, especially for research purposes, the design of preventive clinical trials, and drafting population-based health care policies. Complex behaviors are natural candidates for this. In particular, recent studies have suggested that speech alterations might be one of the earliest signs of cognitive decline, frequently noticeable years before other cognitive deficits become apparent. Traditional neuropsychological language tests provide ambiguous results in this context. In contrast, the analysis of spoken language productions by Natural Language Processing (NLP) techniques can pinpoint language modifications in potential patients. This interdisciplinary study aimed at using NLP to identify early linguistic signs of cognitive decline in a population of elderly individuals.Methods: We enrolled 96 participants (age range 50–75): 48 healthy controls (CG) and 48 cognitively impaired participants: 16 participants with single domain amnestic Mild Cognitive Impairment (aMCI), 16 with multiple domain MCI (mdMCI) and 16 with early Dementia (eD). Each subject underwent a brief neuropsychological screening composed by MMSE, MoCA, GPCog, CDT, and verbal fluency (phonemic and semantic). The spontaneous speech during three tasks (describing a complex picture, a typical working day and recalling a last remembered dream) was then recorded, transcribed and annotated at various linguistic levels. A multidimensional parameter computation was performed by a quantitative analysis of spoken texts, computing rhythmic, acoustic, lexical, morpho-syntactic, and syntactic features.Results: Neuropsychological tests showed significant differences between controls and mdMCI, and between controls and eD participants; GPCog, MoCA, PF, and SF also discriminated between controls and aMCI. In the linguistic experiments, a number of features regarding lexical, acoustic and syntactic aspects were significant in differentiating between mdMCI, eD, and CG (non-parametric statistical analysis). Some features, mainly in the acoustic domain also discriminated between CG and aMCI.Conclusions: Linguistic features of spontaneous speech transcribed and analyzed by NLP techniques show significant differences between controls and pathological states (not only eD but also MCI) and seems to be a promising approach for the identification of preclinical stages of dementia. Long duration follow-up studies are needed to confirm this assumption
A Study of Voice Activity Detection Techniques for NIST Speaker Recognition Evaluations
Since 2008, interview-style speech has become an important part of the NIST Speaker Recognition Evaluations (SREs). Unlike telephone speech, interview speech has lower signal-to-noise ratio, which necessitates robust voice activity detectors (VADs). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speech/non-speech segmentation in these files. To overcome these difficulties, this paper proposes using speech enhancement techniques as a preprocessing step for enhancing the reliability of energy-based and statisticalmodel-based VADs. A decision strategy is also proposed to overcome the undesirable effects caused by impulsive signals and sinusoidal background signals. The proposed VAD is compared with the ASR transcripts provided by NIST, VAD in the ETSI-AMR Option 2 coder, satistical-model (SM) based VAD, and Gaussian mixture model (GMM) based VAD. Experimental results based on the NIST 2010 SRE dataset suggest that the proposed VAD outperforms these conventional ones whenever interview-style speech is involved. This study also demonstrates that (1) noise reduction is vital for energy-based VAD under low SNR; (2) the ASR transcripts and ETSI-AMR speech coder do not produceaccurate speech and non-speech segmentations; and(3)spectralsubtractionmakesbetteruseofbackgroundspectrathanthe likelihood-ratio tests intheSM-basedVAD.Thesegmentation filesproduce