91 research outputs found

    Improved compactly computable objective measures for predicting the acceptiability of speech communications systems

    Get PDF
    Issued as Monthly status reports [1-7], and Final report, Project no. E-21-61

    2-D blood vector velocity estimation using a phase shift estimator

    Get PDF

    Fixed-analysis adaptive-synthesis filter banks

    Get PDF
    Subband/Wavelet filter analysis-synthesis filters are a major component in many compression algorithms. Such compression algorithms have been applied to images, voice, and video. These algorithms have achieved high performance. Typically, the configuration for such compression algorithms involves a bank of analysis filters whose coefficients have been designed in advance to enable high quality reconstruction. The analysis system is then followed by subband quantization and decoding on the synthesis side. Decoding is performed using a corresponding set of synthesis filters and the subbands are merged together. For many years, there has been interest in improving the analysis-synthesis filters in order to achieve better coding quality. Adaptive filter banks have been explored by a number of authors where by the analysis filters and synthesis filters coefficients are changed dynamically in response to the input. A degree of performance improvement has been reported but this approach does require that the analysis system dynamically maintain synchronization with the synthesis system in order to perform reconstruction. In this thesis, we explore a variant of the adaptive filter bank idea. We will refer to this approach as fixed-analysis adaptive-synthesis filter banks. Unlike the adaptive filter banks proposed previously, there is no analysis synthesis synchronization issue involved. This implies less coder complexity and more coder flexibility. Such an approach can be compatible with existing subband wavelet encoders. The design methodology and a performance analysis are presented.Ph.D.Committee Chair: Smith, Mark J. T.; Committee Co-Chair: Mersereau, Russell M.; Committee Member: Anderson, David; Committee Member: Lanterman, Aaron; Committee Member: Rosen, Gail; Committee Member: Wardi, Yora

    Size discrimination of transient signals

    No full text
    The importance of spectral cues in size discrimination of transient signals was investigated, and a model for this ability, tAIM, was created based on the biological principles of human hearing. A psychophysics experiment involving 40 participants found that the most important cue for size discrimination of transient signals, created by striking different sizes of polystyrene spheres, was similar to that of speakers listening to vowels – the relative positions of the resonances between comparison signals. It was found possible to scale the sphere signals in order to confuse listeners into believing the signal source was a different size, but two methods of scaling signals in order to sound the same size as another proved inconclusive, suggesting the possibility that transient signals cannot be scaled in a linear fashion as has been shown possible for vowels. Filtering the signals in a number of different ways found that the most important cue in size discrimination of transient signals is the difference between the most prominent resonances available in the spectra of the comparison signals. A model of the auditory system using the dynamic compressive Gammachirp filterbank, and based on the well-known AIM, was created to produce auditory images of transient signals that could be normalised for size. Transient-AIM, or tAIM used the Mellin transform to produce images that showed size normalisation was possible due to the spectral envelope similarities across the sizes of the spheres. tAIM was extended to carry out size discrimination of the spheres using the information contained within the Mellin images. There was a systematic association between Mellin phase and size of objects of various shapes, which suggests that tAIM is able to infer object size from sound recordings of objects being struck

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Audio watermarking techniques using singular value decomposition

    Get PDF
    In an increasingly digital world, proving ownership of files is more and more difficult. For audio files, many schemes have been put into place to attempt to protect the rights of the digital content owners. In general, these techniques fall under the classification of Digital Rights Management (DRM). Audio watermarking is one of the less invasive schemes which embeds security into the data itself instead of in an outside layer meant to encapsulate and protect the data. There are many domains in which an audio watermark can be applied. The simplest is that of the time domain; often, however, other domains may be more desirable due to greater imperceptibility and robustness to attack. Common domains include the frequency domain, or domains similar to frequency through functions such as the Wavelet Transform. One domain of particular interest is that of the Singular Value Decomposition. The goal of this thesis is to propose and test many different watermarking schemes as well as test an existing watermarking scheme operating in the SVD domain in order to assess the viability of the SVD as a watermarking carrier domain. Different carrier matrices as well as bit embedding methods are explored. The use of a standard set of audio files was used to help test the systems; a standard set of watermarking tests was unavailable, so a comparable test bed was implemented and utilized

    Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

    Get PDF
    Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

    Designing sound : procedural audio research based on the book by Andy Farnell

    Get PDF
    In procedural media, data normally acquired by measuring something, commonly described as sampling, is replaced by a set of computational rules (procedure) that defines the typical structure and/or behaviour of that thing. Here, a general approach to sound as a definable process, rather than a recording, is developed. By analysis of their physical and perceptual qualities, natural objects or processes that produce sound are modelled by digital Sounding Objects for use in arts and entertainments. This Thesis discusses different aspects of Procedural Audio introducing several new approaches and solutions to this emerging field of Sound Design.Em Media Procedimental, os dados os dados normalmente adquiridos através da medição de algo habitualmente designado como amostragem, são substituídos por um conjunto de regras computacionais (procedimento) que definem a estrutura típica, ou comportamento, desse elemento. Neste caso é desenvolvida uma abordagem ao som definível como um procedimento em vez de uma gravação. Através da anålise das suas características físicas e perceptuais , objetos naturais ou processos que produzem som, são modelados como objetos sonoros digitais para utilização nas Artes e Entretenimento. Nesta Tese são discutidos diferentes aspectos de Áudio Procedimental, sendo introduzidas vårias novas abordagens e soluçÔes para o campo emergente do Design Sonoro

    Offline and real time noise reduction in speech signals using the discrete wavelet packet decomposition

    Get PDF
    This thesis describes the development of an offline and real time wavelet based speech enhancement system to process speech corrupted with various amounts of white Gaussian noise and other different noise types
    • 

    corecore