49 research outputs found

    Precision-Energy-Throughput Scaling Of Generic Matrix Multiplication and Convolution Kernels Via Linear Projections

    Get PDF
    Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute- and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the number of projections computed by each kernel, which in turn produces approximate results, i.e. changes the precision of the performed computation. Results derived from a voltage- and frequency-scaled ARM Cortex A15 processor running face recognition and music matching algorithms demonstrate that the proposed approach allows for 280%~440% increase of processing throughput and 75%~80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the obtained recognition or matching accuracy. Even higher gains can be obtained if one is willing to tolerate some reduction in the accuracy of the recognition and matching applications

    Development and Assessment of Signal Processing Algorithms for Assistive Hearing Devices

    Get PDF
    Speech identification in the presence of background noise is difficult for children with auditory processing disorder and adults with sensorineural hearing loss. The listening difficulty arises from deficits in their temporal, spectral, binaural, and/ or cognitive processing. Given the lack of improvement with conventional assistive hearing devices, alternate speech processing methodologies, which exaggerate the temporal and spectral cues, need to be developed to improve speech intelligibility for individuals who have poor temporal and/ or spectral processing. This thesis first, reports results from a series of experiments on subjective and objective assessments of two different schemes of envelope enhancement algorithms (dynamic and static) across different types and levels of background noise. The subjective results revealed that the speech intelligibility scores are lower for children with auditory processing disorder compared to children with normal hearing. The subjective results also demonstrated that enhancing the temporal envelope is much more beneficial for children with auditory processing disorder when compared to children with normal hearing. Comprehensive objective assessments, which were conducted by developing novel intrusive and non-intrusive objective speech intelligibility predictors, demonstrated that both dynamic and static envelope enhancement algorithms are only effective in improving speech intelligibility under certain processing conditions that depended on the type, level and location of the background noise. Furthermore, the application of noise reduction algorithms prior to the envelope enhancement techniques increased their range of effectiveness. Second, using the proposed objective predictors, the effectiveness of a companding architecture (which enhances both temporal and spectral cues) is shown to be better than temporal envelope enhancement alone, across different noisy environments in the presence of a noise reduction algorithm. Third, the application of the binaural dichotic processing is evaluated in stationary and non-stationary background noise environments through subjective experiments. The subjective results demonstrated that the dichotic processing is mainly effective in improving speech intelligibility for stationary background noise at poor signal to noise ratios. It is also shown that the incorporation of a noise reduction algorithm as a front-end to the dichotic hearing processing is inferior to increase its range of effectiveness regardless of the type and level of the background noise

    Analog adaptive nonlinear filtering and spectral analysis for low-power audio applications

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, September 2006."August 2006."Includes bibliographical references.Filters are one of the basic building blocks of analog circuits. For linear operation, the power consumption is proportional to the dynamic range for a given topology. I have explored techniques to lower the power consumption below this limit by extending operation beyond the linear range. First, I built a power-efficient linear gm-C filter that demonstrates that dynamic range can be shifted to higher linear ranges using capacitive attenuation. In a standard gm-C filter, the minimum noise is limited by the discrete charge on the electrons and holes stored on the capacitor. This noise can only be reduced by collecting more charge on a larger capacitor, consuming more power. The maximum signal is determined by the linear range of the transconductor. This work showed that both the noise and the maximum signal can be amplified by including a capacitive attenuator in the feedback path of filter. In order to increase the dynamic range, I explored the non-linear operation of the filters, including jump resonance. Unlike harmonic distortion and gain compression which slowly increase with the input amplitude, jump resonance is not present in a linear system, but develops in the presence of strong nonlinearity.(cont.) It is characterized by a discontinuous jump in the frequency response near the resonant peak. I have analyzed the behavior using both describing function and state-space techniques. Then, I developed a novel graphical analysis technique. Finally, I design, built, and tested a circuit for avoiding jump resonance for audio filters. Finally, I took advantage of nonlinearities in a filtering system to build a micropower companding speech processor. This system implements the companding speech processing algorithm to improve speech comprehension in moderate noise environments. The sixteen channel system increases the spectral contrast of speech signals by performing an adjustable two-tone suppression function, replacing the function of a normally function cochlea for hearing aid or cochlear implant users. The system runs on less than 60uW of power, a consumption so low it could run for 6 months on a standard hearing aid battery.by Christopher D. Salthouse.Ph.D

    Biophysical modeling of a cochlear implant system: progress on closed-loop design using a novel patient-specific evaluation platform

    Get PDF
    The modern cochlear implant is one of the most successful neural stimulation devices, which partially mimics the workings of the auditory periphery. In the last few decades it has created a paradigm shift in hearing restoration of the deaf population, which has led to more than 324,000 cochlear implant users today. Despite its great success there is great disparity in patient outcomes without clear understanding of the aetiology of this variance in implant performance. Furthermore speech recognition in adverse conditions or music appreciation is still not attainable with today's commercial technology. This motivates the research for the next generation of cochlear implants that takes advantage of recent developments in electronics, neuroscience, nanotechnology, micro-mechanics, polymer chemistry and molecular biology to deliver high fidelity sound. The main difficulties in determining the root of the problem in the cases where the cochlear implant does not perform well are two fold: first there is not a clear paradigm on how the electrical stimulation is perceived as sound by the brain, and second there is limited understanding on the plasticity effects, or learning, of the brain in response to electrical stimulation. These significant knowledge limitations impede the design of novel cochlear implant technologies, as the technical specifications that can lead to better performing implants remain undefined. The motivation of the work presented in this thesis is to compare and contrast the cochlear implant neural stimulation with the operation of the physiological healthy auditory periphery up to the level of the auditory nerve. As such design of novel cochlear implant systems can become feasible by gaining insight on the question `how well does a specific cochlear implant system approximate the healthy auditory periphery?' circumventing the necessity of complete understanding of the brain's comprehension of patterned electrical stimulation delivered from a generic cochlear implant device. A computational model, termed Digital Cochlea Stimulation and Evaluation Tool (‘DiCoStET’) has been developed to provide an objective estimate of cochlear implant performance based on neuronal activation measures, such as vector strength and average activation. A patient-specific cochlea 3D geometry is generated using a model derived by a single anatomical measurement from a patient, using non-invasive high resolution computed tomography (HRCT), and anatomically invariant human metrics and relations. Human measurements of the neuron route within the inner ear enable an innervation pattern to be modelled which joins the space from the organ of Corti to the spiral ganglion subsequently descending into the auditory nerve bundle. An electrode is inserted in the cochlea at a depth that is determined by the user of the tool. The geometric relation between the stimulation sites on the electrode and the spiral ganglion are used to estimate an activating function that will be unique for the specific patient's cochlear shape and electrode placement. This `transfer function', so to speak, between electrode and spiral ganglion serves as a `digital patient' for validating novel cochlear implant systems. The novel computational tool is intended for use by bioengineers, surgeons, audiologists and neuroscientists alike. In addition to ‘DiCoStET’ a second computational model is presented in this thesis aiming at enhancing the understanding of the physiological mechanisms of hearing, specifically the workings of the auditory synapse. The purpose of this model is to provide insight on the sound encoding mechanisms of the synapse. A hypothetical mechanism is suggested in the release of neurotransmitter vesicles that permits the auditory synapse to encode temporal patterns of sound separately from sound intensity. DiCoStET was used to examine the performance of two different types of filters used for spectral analysis in the cochlear implant system, the Gammatone type filter and the Butterworth type filter. The model outputs suggest that the Gammatone type filter performs better than the Butterworth type filter. Furthermore two stimulation strategies, the Continuous Interleaved Stimulation (CIS) and Asynchronous Interleaved Stimulation (AIS) have been compared. The estimated neuronal stimulation spatiotemporal patterns for each strategy suggest that the overall stimulation pattern is not greatly affected by the temporal sequence change. However the finer detail of neuronal activation is different between the two strategies, and when compared to healthy neuronal activation patterns the conjecture is made that the sequential stimulation of CIS hinders the transmission of sound fine structure information to the brain. The effect of the two models developed is the feasibility of collaborative work emanating from various disciplines; especially electrical engineering, auditory physiology and neuroscience for the development of novel cochlear implant systems. This is achieved by using the concept of a `digital patient' whose artificial neuronal activation is compared to a healthy scenario in a computationally efficient manner to allow practical simulation times.Open Acces

    Palmo : a novel pulsed based signal processing technique for programmable mixed-signal VLSI

    Get PDF
    In this thesis a new signal processing technique is presented. This technique exploits the use of pulses as the signalling mechanism. This Palmo 1 signalling method applied to signal processing is novel, combining the advantages of both digital and analogue techniques. Pulsed signals are robust, inherently low-power, easily regenerated, and easily distributed across and between chips. The Palmo cells used to perform analogue operations on the pulsed signals are compact, fast, simple and programmable

    A Study of the Automatic Speech Recognition Process and Speaker Adaptation

    Get PDF
    This thesis considers the entire automated speech recognition process and presents a standardised approach to LVCSR experimentation with HMMs. It also discusses various approaches to speaker adaptation such as MLLR and multiscale, and presents experimental results for cross­-task speaker adaptation. An analysis of training parameters and data sufficiency for reasonable system performance estimates are also included. It is found that Maximum Likelihood Linear Regression (MLLR) supervised adaptation can result in 6% reduction (absolute) in word error rate given only one minute of adaptation data, as compared with an unadapted model set trained on a different task. The unadapted system performed at 24% WER and the adapted system at 18% WER. This is achieved with only 4 to 7 adaptation classes per speaker, as generated from a regression tree

    Time and frequency domain algorithms for speech coding

    Get PDF
    The promise of digital hardware economies (due to recent advances in VLSI technology), has focussed much attention on more complex and sophisticated speech coding algorithms which offer improved quality at relatively low bit rates. This thesis describes the results (obtained from computer simulations) of research into various efficient (time and frequency domain) speech encoders operating at a transmission bit rate of 16 Kbps. In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM) systems employing both forward and backward adaptive prediction were examined. A number of algorithms were proposed and evaluated, including several variants of the Stochastic Approximation Predictor (SAP). A Backward Block Adaptive (BBA) predictor was also developed and found to outperform the conventional stochastic methods, even though its complexity in terms of signal processing requirements is lower. A simplified Adaptive Predictive Coder (APC) employing a single tap pitch predictor considered next provided a slight improvement in performance over ADPCM, but with rather greater complexity. The ultimate test of any speech coding system is the perceptual performance of the received speech. Recent research has indicated that this may be enhanced by suitable control of the noise spectrum according to the theory of auditory masking. Various noise shaping ADPCM configurations were examined, and it was demonstrated that a proposed pre-/post-filtering arrangement which exploits advantageously the predictor-quantizer interaction, leads to the best subjective performance in both forward and backward prediction systems. Adaptive quantization is instrumental to the performance of ADPCM systems. Both the forward adaptive quantizer (AQF) and the backward oneword memory adaptation (AQJ) were examined. In addition, a novel method of decreasing quantization noise in ADPCM-AQJ coders, which involves the application of correction to the decoded speech samples, provided reduced output noise across the spectrum, with considerable high frequency noise suppression. More powerful (and inevitably more complex) frequency domain speech coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder (SBC) offer good quality speech at 16 Kbps. To reduce complexity and coding delay, whilst retaining the advantage of sub-band coding, a novel transform based split-band coder (TSBC) was developed and found to compare closely in performance with the SBC. To prevent the heavy side information requirement associated with a large number of bands in split-band coding schemes from impairing coding accuracy, without forgoing the efficiency provided by adaptive bit allocation, a method employing AQJs to code the sub-band signals together with vector quantization of the bit allocation patterns was also proposed. Finally, 'pipeline' methods of bit allocation and step size estimation (using the Fast Fourier Transform (FFT) on the input signal) were examined. Such methods, although less accurate, are nevertheless useful in limiting coding delay associated with SRC schemes employing Quadrature Mirror Filters (QMF)
    corecore