239 research outputs found

    Speech Recognition on an FPGA Using Discrete and Continuous Hidden Markov Models

    Get PDF
    Speech recognition is a computationally demanding task, particularly the stage which uses Viterbi decoding for converting pre-processed speech data into words or sub-word units. Any device that can reduce the load on, for example, a PC’s processor, is advantageous. Hence we present FPGA implementations of the decoder based alternately on discrete and continuous hidden Markov models (HMMs) representing monophones, and demonstrate that the discrete version can process speech nearly 5,000 times real time, using just 12% of the slices of a Xilinx Virtex XCV1000, but with a lower recognition rate than the continuous implementation, which is 75 times faster than real time, and occupies 45% of the same device

    THE EFFECTS OF WORD LENGTH, WORD FREQUENCY AND WORD REPETITION IN AUDITORY MEMORY (LEXICON, VERBAL ABILITY, PSYCHOLINGUISTICS)

    Get PDF
    Four auditory lexical decision experiments were run to assess the effects of word frequency, word length, and word repetition. Experiment I examined the effects of word length, word frequency, and stimulus repetition on RT. The results demonstrated a significant main effect for length and frequency and a significant length by frequency interaction. Long words showed the greatest frequency effect. There was no significant repetition effect. In this experiment the repeated stimuli were separated by at least seven minutes. Experiment II examined the effect of stimulus repetition with the repeated stimuli separated by 0-15 stimuli. This experiment demonstrated a significant repetition effect, but no frequency effect. Experiment III treated word length as a continuous variable and word frequency as a dichotomized variable. Here it was demonstrated that word length accounted for 10% of the variability. Word frequency accounted for 4.6% of the variance of polysyllables and 6% of the variance of monosyllables. Further, it was found that there was a frequency effect only for monosyllables under 500 msec long. Experiment IV treated word length and word frequency both as continuous variables. Word length accounted for 9% of the variance in this Experiment. Word frequency accounted for 3.4% of the variance of polysyllables and 4.8% of the monosyllables under 500 msec. Finally, Experiments I, II, and IV included the subject\u27s verbal ability as a predictor of RT. In all three experiments verbal ability was negatively correlated to false positive responses. In Experiment I only, subjects with high verbal ability scores responded faster to stimuli than subjects with low verbal ability scores (mean difference = 158 msec)

    Performance with a new bone conduction implant audio processor in patients with single-sided deafness

    Full text link
    Purpose: The SAMBA 2 BB audio processor for the BONEBRIDGE bone conduction implant features a new automatic listening environment detection to focus on target speech and to reduce interfering speech and background noises. The aim of this study was to evaluate the audiological benefit of the SAMBA 2 BB (AP2) and to compare it with its predecessor SAMBA BB (AP1). Methods: Prospective within-subject comparison study. We compared the aided sound field hearing thresholds, speech understanding in quiet (Freiburg monosyllables), and speech understanding in noise (Oldenburg sentence test) with the AP1 and AP2. Each audio processor was worn for 2 weeks before assessment and seven users with single-sided sensorineural deafness (SSD) participated in the study. For speech understanding in noise, two complex noise scenarios with multiple noise sources including single talker interfering speech were used. The first scenario included speech presented from the front (S0NMIX), while in the second scenario speech was presented from the side of the implanted ear (SIPSINMIX). In addition, subjective evaluation using the SSQ12, APSQ, and the BBSS questionnaires was performed. Results: We found improved speech understanding in quiet with the AP2 compared to the AP1 aided condition (on average + 17%, p = 0.007). In both noise scenarios, the AP2 lead to improved speech reception thresholds by 1.2 dB (S0NMIX, p = 0.032) and 2.1 dB (SIPSINMIX, p = 0.048) compared to the AP1. The questionnaires revealed no statistically significant differences, except an improved APSQ usability score with the AP2. Conclusion: Clinicians can expect that patients with SSD will benefit from the SAMBA 2 BB by improved speech understanding in both quiet and in complex noise scenarios, when compared to the older SAMBA BB. Keywords: BONEBRIDGE; SAMBA 2 BB; Speech enhancement; Speech understanding in noise; Unilateral deafness

    Performance with a new bone conduction implant audio processor in patients with single-sided deafness.

    Get PDF
    PURPOSE The SAMBA 2 BB audio processor for the BONEBRIDGE bone conduction implant features a new automatic listening environment detection to focus on target speech and to reduce interfering speech and background noises. The aim of this study was to evaluate the audiological benefit of the SAMBA 2 BB (AP2) and to compare it with its predecessor SAMBA BB (AP1). METHODS Prospective within-subject comparison study. We compared the aided sound field hearing thresholds, speech understanding in quiet (Freiburg monosyllables), and speech understanding in noise (Oldenburg sentence test) with the AP1 and AP2. Each audio processor was worn for 2 weeks before assessment and seven users with single-sided sensorineural deafness (SSD) participated in the study. For speech understanding in noise, two complex noise scenarios with multiple noise sources including single talker interfering speech were used. The first scenario included speech presented from the front (S0NMIX), while in the second scenario speech was presented from the side of the implanted ear (SIPSINMIX). In addition, subjective evaluation using the SSQ12, APSQ, and the BBSS questionnaires was performed. RESULTS We found improved speech understanding in quiet with the AP2 compared to the AP1 aided condition (on average + 17%, p = 0.007). In both noise scenarios, the AP2 lead to improved speech reception thresholds by 1.2 dB (S0NMIX, p = 0.032) and 2.1 dB (SIPSINMIX, p = 0.048) compared to the AP1. The questionnaires revealed no statistically significant differences, except an improved APSQ usability score with the AP2. CONCLUSION Clinicians can expect that patients with SSD will benefit from the SAMBA 2 BB by improved speech understanding in both quiet and in complex noise scenarios, when compared to the older SAMBA BB

    A simple statistical speech recognition of mandarin monosyllables

    Get PDF
    Abstract Each mandarin syllable is represented by a sequence of vectors of linear predict coding cepstra (LPCC). Since all syllables have a simple phonetic structure, in our speech recognition, we partition the sequence of LPCC vectors of all syllables into equal segments and average the LPCC vectors in each segment. The mean vector of LPCC is used as the feature of a syllable. Our simple feature does not need any time consuming and complicated nonlinear contraction and expansion as adopted by the dynamic time-warping. We propose several probability distributions for the feature values. A simplified Bayes decision rule is used for classification of mandarin syllables. For the speaker-independent mandarin digits, the recognition rate is 98.6% if a normal distribution is used for feature values and the rate is 98.1% if an exponential distribution is used for the absolute values of the features. The feature proposed in this paper to represent a syllable is the simplest one, much easier to be extracted than any other known features. The computation for feature extraction and classification is much faster and more accurate than using the HMM method or any other known techniques

    Attitudes towards English usage in the late modern period: the case of phrasal verbs

    Get PDF
    Phrasal verbs are an intrinsic part of Late Modern English, and are found in both informal and colloquial language (check out, listen up) and more formal styles (a thesis might set out some problems and then sum up the main points). They are highly productive: 'up' can be added to almost any verb to signify goal or end-point (read up, finish up, eat up, meet up, fatten up); and once a phrasal verb has been coined, a conversion often follows (for example, the verb 'phone in' was first recorded in 1946, and the noun 'phone-in' in 1967; 'dumb down' was coined in 1933, and we read of 'dumbed-down' material in 1982). Perhaps because of their pervasiveness, phrasal verbs are frequently criticized (although occasionally praised) in Late Modern English texts about language. The purpose of this thesis is to examine such attitudes in three strands. Firstly, over one hundred language texts (grammars, dictionaries, and usage manuals, among others, from 1750 to 1970) were examined to discover how phrasal verbs were recognized and classified in Late Modern English. Secondly, these materials were analyzed in order to find out how attitudes towards phrasal verbs in English developed in relation to broader attitudes towards language in the Late Modern period. Thirdly, phrasal verb usage in A Representative Corpus of Historical English Registers, a corpus of British and American English from 1650 to 1990, was analyzed to determine how such attitudes affect usage. It will be shown that attitudes towards phrasal verbs reflect various strands of language ideology, including opinions about Latinate as opposed to native vocabulary; ideals relating to etymology, polysemy, and redundancy; reactions to neologisms; and attitudes towards language variety. Furthermore, it will be suggested that in the case of certain redundant combinations such as 'return back' and 'raise up', proscriptions of phrasal verbs did have an effect on their usage in the Late Modern period

    Sensitivity to Consonantal Context in Reading English Vowels: The Case of Arabic Learners

    Get PDF
    Both experimental and anecdotal evidence document the difficulty Arabic learners of English demonstrate when learning to read and write in English. The complex phoneme-grapheme mapping rules for English may explain this difficulty in part, but the question remains why Arabic learners in particular have difficulty decoding English. This dissertation attempts to pinpoint what specific sub-word processes may contribute to this observed difficulty Arabic learners of English commonly experience. Vowel processing is an appropriate place to begin given the inconsistency of the grapheme-phoneme mapping rules for English vowels. The statisical patterns of the English language itself for the relationship between the onset and vowel or vowel and coda greatly enhance the likelihood of a particular vowel pronunciation, reducing the inconsistency for vowel grapheme-phoneme mappings. When reading, native English speakers use the context (preceding and following consonants) in which a vowel occurs to narrow the range of possible pronunciations, and thus are said to demonstrate sensitivity to consonantal context. For this dissertation, sensitivity to consonantal context in reading English vowels was tested in three groups (Arabic speakers, native English speakers, and speakers from other language backgrounds) using an experiment based on prior studies of native English speakers. Results indicate that non-native speakers of English show less sensitivity to consonantal context than native English speakers, especially in the greater use of the critical vowel pronunciation in control contexts. Furthermore, Arabic speakers show even less sensitivity to consonantal context than both the native English speakers and speakers from other language backgrounds, especially for vowel-to-coda associations. In fact, the results for the Arabic speakers for three of six vowel-to-coda test cases run counter to the expected outcome, resulting in what might be called an anti-sensitivity to consonantal context. The small number of participants in the Arabic group limits the ability to draw a strong conclusion, but that the results for the Arabic group run opposite the expected outcome for some test items warrants future study

    Application-specific instruction set processor for speech recognition.

    Get PDF
    Cheung Man Ting.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 69-71).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- The Emergence of ASIP --- p.1Chapter 1.1.1 --- Related Work --- p.3Chapter 1.2 --- Motivation --- p.6Chapter 1.3 --- ASIP Design Methodologies --- p.7Chapter 1.4 --- Fundamentals of Speech Recognition --- p.8Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Automatic Speech Recognition --- p.11Chapter 2.1 --- Overview of ASR system --- p.11Chapter 2.2 --- Theory of Front-end Feature Extraction --- p.12Chapter 2.3 --- Theory of HMM-based Speech Recognition --- p.14Chapter 2.3.1 --- Hidden Markov Model (HMM) --- p.14Chapter 2.3.2 --- The Typical Structure of the HMM --- p.14Chapter 2.3.3 --- Discrete HMMs and Continuous HMMs --- p.15Chapter 2.3.4 --- The Three Basic Problems for HMMs --- p.17Chapter 2.3.5 --- Probability Evaluation --- p.18Chapter 2.4 --- The Viterbi Search Engine --- p.19Chapter 2.5 --- Isolated Word Recognition (IWR) --- p.22Chapter 3 --- Design of ASIP Platform --- p.24Chapter 3.1 --- Instruction Fetch --- p.25Chapter 3.2 --- Instruction Decode --- p.26Chapter 3.3 --- Datapath --- p.29Chapter 3.4 --- Register File Systems --- p.30Chapter 3.4.1 --- Memory Hierarchy --- p.30Chapter 3.4.2 --- Register File Organization --- p.31Chapter 3.4.3 --- Special Registers --- p.34Chapter 3.4.4 --- Address Generation --- p.34Chapter 3.4.5 --- Load and Store --- p.36Chapter 4 --- Implementation of Speech Recognition on ASIP --- p.37Chapter 4.1 --- Hardware Architecture Exploration --- p.37Chapter 4.1.1 --- Floating Point and Fixed Point --- p.37Chapter 4.1.2 --- Multiplication and Accumulation --- p.38Chapter 4.1.3 --- Pipelining --- p.41Chapter 4.1.4 --- Memory Architecture --- p.43Chapter 4.1.5 --- Saturation Logic --- p.44Chapter 4.1.6 --- Specialized Addressing Modes --- p.44Chapter 4.1.7 --- Repetitive Operation --- p.47Chapter 4.2 --- Software Algorithm Implementation --- p.49Chapter 4.2.1 --- Implementation Using Base Instruction Set --- p.49Chapter 4.2.2 --- Implementation Using Refined Instruction Set --- p.54Chapter 5 --- Simulation Results --- p.56Chapter 6 --- Conclusions and Future Work --- p.60Appendices --- p.62Chapter A --- Base Instruction Set --- p.62Chapter B --- Special Registers --- p.65Chapter C --- Chip Microphotograph of ASIP --- p.67Chapter D --- The Testing Board of ASIP --- p.68Bibliography --- p.6
    • …
    corecore