13 research outputs found

    Speech Enhancement using Hmm and Snmf(Os)

    Get PDF
    The speech enhancement is the process to enhance the speech signal by reducing the noise from the signal as well as improving the quality of the signal. The speech signal enhancement requires various techniques associated with the signal noise removal as well as the signal patch fixation in order to enhance the frequency of the speech signal. In this paper, we have proposed the new speech enhancement model for the speech enhancement with the amalgamation of the various speech processing techniques together. The proposed model is equipped with the Supervised sparse non-negative matrix factorization (S-SNMF) along with hidden markov model (HMM) and noise reducing filter to overcome the problem of the signal enhancement by reducing the missing values and by enhancing the signal on the weak points detected under the application of the HMM. The experimental results have proved the efficiency of the proposed model in comparison with the existing model. The improvement of nearly 50% has been recorded from the parameters of peak signal to noise ratio (PSNR), mean squared error (MSE), signal to noise ratio (SNR) etc

    The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016

    Get PDF
    The 2016 speaker recognition evaluation (SRE'16) is the latest edition in the series of benchmarking events conducted by the National Institute of Standards and Technology (NIST). I4U is a joint entry to SRE'16 as the result from the collaboration and active exchange of information among researchers from sixteen Institutes and Universities across 4 continents. The joint submission and several of its 32 sub-systems were among top-performing systems. A lot of efforts have been devoted to two major challenges, namely, unlabeled training data and dataset shift from Switchboard-Mixer to the new Call My Net dataset. This paper summarizes the lessons learned, presents our shared view from the sixteen research groups on recent advances, major paradigm shift, and common tool chain used in speaker recognition as we have witnessed in SRE'16. More importantly, we look into the intriguing question of fusing a large ensemble of sub-systems and the potential benefit of large-scale collaboration.Peer reviewe

    NEGATIVE CLASSIFICATION DIGITAL INFORMATION OF VOICE TONE

    Get PDF
    U tehničkom smislu ne postoji prepoznavanje osobe po boji glasa na razini čovjeka. Prepoznavanje glasa i prepoznavanje govora su pojmovi koji su sadržajno potpuno različiti. Svrha prepoznavanja govora je prepoznavanje riječi govornika, dok se prepoznavanjem glasa identificirao poznatu ili nepoznatu osobu. Boja glasa osobe [1] ključan je uzorak u glasu govornika. Može se promatrati kao subjektivni osjet glasnoće, visine i boje zvuka te njegove percepcije u prostoru. Cilj istraživanja je pronaći u zvučnom zapisu govornika karakteristične značajke kojima se može odrediti digitalna informacija boja glasa. Digitalnom informacijom boje glasa uz upotrebu informacijskokomunikacijske tehnologije[2] prepoznajemo spol osobe, ciljanog govornika ili skupnu govornika. Namjera je do sada neiskorištene Walsh-Hadamardove koeficijente upotrijebiti za određivanje karakterističnih značajke. U procesu predprocesiranja izlučujemo karakteristične značajke za digitalnu informaciju boje glasa i to tako da se postupkom Kepstralne analize određuje osnovna frekvenciju F0 za svaki zvučni zapis, temeljem koje se izračunavaju pripadajući harmonici. Dobivene uzorci klasificiraju se po spolu. U slijedećem koraku iz zvučnih zapisa izlučuju se Walsh-Hadamardovi koeficijenti, klasificiramo ih po spolu, te se uspoređuju rezultati klasifikacija. U praksi model se može koristiti za prepoznavanje unaprijed preddefinirane klasu govornika, odnosno za brzu eliminaciju ciljane skupine podataka.In a technical sense, does not exist a developed recognition of voice tone on a human level. Voice recognition and speech recognition are concepts that are completely different in content. The purpose of speech recognition is to recognize a speaker\u27s speech, while recognizing a voice identifies the known person or an unknown person. The voice tone of a person is a key pattern in the voice of the speaker. It can be seen as a subjective sense of the volume, height and color of the sound and its perception in the space. The aim of the research is to find in the recorded voice of the speaker a characteristic feature that can determine the digital information of the of the voice tone. By using digital information of voice tone and using information-communication technology, we recognize a gender, a target speaker, or a group of speakers. Intention is to use Walsh-Hadamard\u27s coefficients to determine the characteristic features. In the preprocessing process, we separate the characteristic features for digital information of the voice tone. With method of the Cepstral analysis determines the basic frequency F0 for each recorded sound on which basisi the respective harmonics are calculated. The obtained samples are classified by gender. In the next step from the recorded sound, the Walsh-Hadamard coefficients are mark off, classified by gender, and the results of the classification are compared. In practice, the model is used to identify, or to quickly eliminate a target group of data

    Speech Analysis by Natural Language Processing Techniques: A Possible Tool for Very Early Detection of Cognitive Decline?

    Get PDF
    Background: The discovery of early, non-invasive biomarkers for the identification of “preclinical” or “pre-symptomatic” Alzheimer's disease and other dementias is a key issue in the field, especially for research purposes, the design of preventive clinical trials, and drafting population-based health care policies. Complex behaviors are natural candidates for this. In particular, recent studies have suggested that speech alterations might be one of the earliest signs of cognitive decline, frequently noticeable years before other cognitive deficits become apparent. Traditional neuropsychological language tests provide ambiguous results in this context. In contrast, the analysis of spoken language productions by Natural Language Processing (NLP) techniques can pinpoint language modifications in potential patients. This interdisciplinary study aimed at using NLP to identify early linguistic signs of cognitive decline in a population of elderly individuals.Methods: We enrolled 96 participants (age range 50–75): 48 healthy controls (CG) and 48 cognitively impaired participants: 16 participants with single domain amnestic Mild Cognitive Impairment (aMCI), 16 with multiple domain MCI (mdMCI) and 16 with early Dementia (eD). Each subject underwent a brief neuropsychological screening composed by MMSE, MoCA, GPCog, CDT, and verbal fluency (phonemic and semantic). The spontaneous speech during three tasks (describing a complex picture, a typical working day and recalling a last remembered dream) was then recorded, transcribed and annotated at various linguistic levels. A multidimensional parameter computation was performed by a quantitative analysis of spoken texts, computing rhythmic, acoustic, lexical, morpho-syntactic, and syntactic features.Results: Neuropsychological tests showed significant differences between controls and mdMCI, and between controls and eD participants; GPCog, MoCA, PF, and SF also discriminated between controls and aMCI. In the linguistic experiments, a number of features regarding lexical, acoustic and syntactic aspects were significant in differentiating between mdMCI, eD, and CG (non-parametric statistical analysis). Some features, mainly in the acoustic domain also discriminated between CG and aMCI.Conclusions: Linguistic features of spontaneous speech transcribed and analyzed by NLP techniques show significant differences between controls and pathological states (not only eD but also MCI) and seems to be a promising approach for the identification of preclinical stages of dementia. Long duration follow-up studies are needed to confirm this assumption

    A Study of Voice Activity Detection Techniques for NIST Speaker Recognition Evaluations

    No full text
    Since 2008, interview-style speech has become an important part of the NIST Speaker Recognition Evaluations (SREs). Unlike telephone speech, interview speech has lower signal-to-noise ratio, which necessitates robust voice activity detectors (VADs). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speech/non-speech segmentation in these files. To overcome these difficulties, this paper proposes using speech enhancement techniques as a preprocessing step for enhancing the reliability of energy-based and statisticalmodel-based VADs. A decision strategy is also proposed to overcome the undesirable effects caused by impulsive signals and sinusoidal background signals. The proposed VAD is compared with the ASR transcripts provided by NIST, VAD in the ETSI-AMR Option 2 coder, satistical-model (SM) based VAD, and Gaussian mixture model (GMM) based VAD. Experimental results based on the NIST 2010 SRE dataset suggest that the proposed VAD outperforms these conventional ones whenever interview-style speech is involved. This study also demonstrates that (1) noise reduction is vital for energy-based VAD under low SNR; (2) the ASR transcripts and ETSI-AMR speech coder do not produceaccurate speech and non-speech segmentations; and(3)spectralsubtractionmakesbetteruseofbackgroundspectrathanthe likelihood-ratio tests intheSM-basedVAD.Thesegmentation filesproduce
    corecore