2,132 research outputs found
Perceptual adaptation by normally hearing listeners to a simulated "hole" in hearing
Simulations of cochlear implants have demonstrated that the deleterious effects of a frequency misalignment between analysis bands and characteristic frequencies at basally shifted simulated electrode locations are significantly reduced with training. However, a distortion of frequency-to-place mapping may also arise due to a region of dysfunctional neurons that creates a "hole" in the tonotopic representation. This study simulated a 10 mm hole in the mid-frequency region. Noise-band processors were created with six output bands (three apical and three basal to the hole). The spectral information that would have been represented in the hole was either dropped or reassigned to bands on either side. Such reassignment preserves information but warps the place code, which may in itself impair performance. Normally hearing subjects received three hours of training in two reassignment conditions. Speech recognition improved considerably with training. Scores were much lower in a baseline (untrained) condition where information from the hole region was dropped. A second group of subjects trained in this dropped condition did show some improvement; however, scores after training were significantly lower than in the reassignment conditions. These results are consistent with the view that speech processors should present the most informative frequency range irrespective of frequency misalignment. 0 2006 Acoustical Society of America
Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification
Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization
Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Generalized Perceptual Linear Prediction (gPLP) Features for Animal Vocalization Analysis
A new feature extraction model, generalized perceptual linear prediction (gPLP), is developed to calculate a set of perceptually relevant features for digital signal analysis of animalvocalizations. The gPLP model is a generalized adaptation of the perceptual linear prediction model, popular in human speech processing, which incorporates perceptual information such as frequency warping and equal loudness normalization into the feature extraction process. Since such perceptual information is available for a number of animal species, this new approach integrates that information into a generalized model to extract perceptually relevant features for a particular species. To illustrate, qualitative and quantitative comparisons are made between the species-specific model, generalized perceptual linear prediction (gPLP), and the original PLP model using a set of vocalizations collected from captive African elephants (Loxodonta africana) and wild beluga whales (Delphinapterus leucas). The models that incorporate perceptional information outperform the original human-based models in both visualization and classification tasks
Survey of Features Extraction and Classification Techniques for Speaker Identification
تكسب تقنيات معالجة الكلام شيوعًا اكثر يومًا بعد يوم لتوفير قدر هائل من الأمان.كما يشيع استخدام الكلام لغرض التوثيق. التعرف على المتكلم هو الطريقة التي يمكن من خلالها فحص المتكلم والتعرف عليه. يختلف نظام التعرف على الكلام عن طريقة التعرف على المتكلم. يشيع استخدام التعرف على المتكلمين في القطاعات والمستشفيات والمختبرات وما إلى ذلك. فوائده أكثر أمانًا وأسهل في التنفيذ وأكثر سهولة في الاستخدام. تعد طريقة تحديد المتكلم واحدة من أكثر التقنيات شيوعًا في المنطقة حيث تعتبر السلامة أمرًا بالغ الأهمية. تقدم هذه المقالة نظرة عامة على الطرق المختلفة التي يمكن استخدامها للتعرف على المتكلمين مثل الترميز الخطي التنبؤي (LPC) ، معاملات الطيف التنبؤية الخطية (LPCC) ، التحويل الحقيقي الفريد المعين (UMRT) ، معاملات Cepstral الحقيقية (RCC) ، "تردد ميل Cepstrum" (MFCC). بالإضافة إلى مجموعة من المصنفات المختلفة مثل "نموذج الخليط الغاوسي (GMM)"، "تزييف الوقت الديناميكي (DTW)" ، آلة المتجهات الداعمة (SVM) ، الشبكة العصبية (NN) ، "تكميم المتجهات" (VQ). الغرض الأساسي من شرح طرق التعرف على السماعات الشائعة. النتائج التي تم الحصول عليها هي أنه تم اختيار MFCC لكفاءة عالية ومنخفضة التعقيد. و GMM مفيد في تصنيف ذاكرة أقل ونتائج تخطيط واختبار أقل.Speech processing is more common day by day to provide enormous safety. The speech for the purpose of authentication is commonly used. Recognition of the speaker is the method that can check and recognize the speaker. The scheme of speech recognition is distinct from the scheme of speaker recognition. Recognition of speakers is commonly used in sectors, hospitals, laboratories, etc. Its benefits are safer, easier to implement, more user-friendly. Speaker identification method is one of the most commonly used techniques for the region where safety is very crucial. This article presents an overview of various methods that can be used to recognize speakers’ systems, the feature extraction techniques such as Linear Predictive Coding (LPC), Linear Predictive Cepstral Coefficients (LPCC), Unique Mapped Real Transform (UMRT), Real Cepstral Coefficients (RCC), “Mel-frequency Cepstrum” (MFCC), in addition to various classification techniques such as “Gaussian mixture model (GMM)”, “Dynamic Time Warping (DTW)”, Support Vector Machine (SVM), Neural Network (NN), “Vector Quantization” (VQ). The primary purpose of is to explain the common speaker recognition methods. The obtained results are that, MFCC is chosen for high efficiency and low complexity. and GMM is helpful in classifying less memory and less planning and efficient test results
Text-Independent Automatic Speaker Identification Using Partitioned Neural Networks
This dissertation introduces a binary partitioned approach to statistical pattern classification which is applied to talker identification using neural networks. In recent years artificial neural networks have been shown to work exceptionally well for small but difficult pattern classification tasks. However, their application to large tasks (i.e., having more than ten to 20 categories) is limited by a dramatic increase in required training time. The time required to train a single network to perform N-way classification is nearly proportional to the exponential of N. In contrast, the binary partitioned approach requires training times on the order of N2. Besides partitioning, other related issues were investigated such as acoustic feature selection for speaker identification and neural network optimization.
The binary partitioned approach was used to develop an automatic speaker identification system for 120 male and 130 female speakers of a standard speech data base. The system performs with 100% accuracy in a text-independent mode when trained with about nine to 14 seconds of speech and tested with six to eight seconds of speech
- …