254 research outputs found

    Coding overcomplete representations of audio using the MCLT

    Get PDF
    We propose a system for audio coding using the modulated complex lapped transform (MCLT). In general, it is difficult to encode signals using overcomplete representations without avoiding a penalty in rate-distortion performance. We show that the penalty can be significantly reduced for MCLT-based representations, without the need for iterative methods of sparsity reduction. We achieve that via a magnitude-phase polar quantization and the use of magnitude and phase prediction. Compared to systems based on quantization of orthogonal representations such as the modulated lapped transform (MLT), the new system allows for reduced warbling artifacts and more precise computation of frequency-domain auditory masking functions

    Estimation and Modeling Problems in Parametric Audio Coding

    Get PDF

    Complex Neural Networks for Audio

    Get PDF
    Audio is represented in two mathematically equivalent ways: the real-valued time domain (i.e., waveform) and the complex-valued frequency domain (i.e., spectrum). There are advantages to the frequency-domain representation, e.g., the human auditory system is known to process sound in the frequency-domain. Furthermore, linear time-invariant systems are convolved with sources in the time-domain, whereas they may be factorized in the frequency-domain. Neural networks have become rather useful when applied to audio tasks such as machine listening and audio synthesis, which are related by their dependencies on high quality acoustic models. They ideally encapsulate fine-scale temporal structure, such as that encoded in the phase of frequency-domain audio, yet there are no authoritative deep learning methods for complex audio. This manuscript is dedicated to addressing the shortcoming. Chapter 2 motivates complex networks by their affinity with complex-domain audio, while Chapter 3 contributes methods for building and optimizing complex networks. We show that the naive implementation of Adam optimization is incorrect for complex random variables and show that selection of input and output representation has a significant impact on the performance of a complex network. Experimental results with novel complex neural architectures are provided in the second half of this manuscript. Chapter 4 introduces a complex model for binaural audio source localization. We show that, like humans, the complex model can generalize to different anatomical filters, which is important in the context of machine listening. The complex model\u27s performance is better than that of the real-valued models, as well as real- and complex-valued baselines. Chapter 5 proposes a two-stage method for speech enhancement. In the first stage, a complex-valued stochastic autoencoder projects complex vectors to a discrete space. In the second stage, long-term temporal dependencies are modeled in the discrete space. The autoencoder raises the performance ceiling for state of the art speech enhancement, but the dynamic enhancement model does not outperform other baselines. We discuss areas for improvement and note that the complex Adam optimizer improves training convergence over the naive implementation

    Algorithms and VLSI architectures for parametric additive synthesis

    Get PDF
    A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable

    Analog dithering techniques for highly linear and efficient transmitters

    Get PDF
    The current thesis is about investigation of new methods and techniques to be able to utilize the switched mode amplifiers, for linear and efficient applications. Switched mode amplifiers benefit from low overlap between the current and voltage wave forms in their output terminals, but they seriously suffer from nonlinearity. This makes it impossible to use them to amplify non-constant envelope message signals, where very high linearity is expected. In order to do that, dithering techniques are studied and a full linearity analysis approach is developed, by which the linearity performance of the dithered amplifier can be analyzed, based on the dithering level and frequency. The approach was based on orthogonalization of the equivalent nonlinearity and is capable of prediction of both co-channel and adjacent channel nonlinearity metrics, for a Gaussian complex or real input random signal. Behavioral switched mode amplifier models are studied and new models are developed, which can be utilized to predict the nonlinear performance of the dithered power amplifier, including the nonlinear capacitors effects. For HFD application, self-oscillating and asynchronous sigma delta techniques are currently used, as pulse with modulators (PWM), to encode a generic RF message signal, on the duty cycle of an output pulse train. The proposed models and analysis techniques were applied to this architecture in the first phase, and the method was validated with measurement on a prototype sample, realized in 65 nm TSMC CMOS technology. Afterwards, based on the same dithering phenomenon, a new linearization technique was proposed, which linearizes the switched mode class D amplifier, and at the same time can reduce the reactive power loss of the amplifier. This method is based on the dithering of the switched mode amplifier with frequencies lower than the band-pass message signal and is called low frequency dithering (LFD). To test this new technique, two test circuits were realized and the idea was applied to them. Both of the circuits were of the hard nonlinear type (class D) and are integrated CMOS and discrete LDMOS technologies respectively. The idea was successfully tested on both test circuits and all of the linearity metric predictions for a digitally modulated RF signal and a random signal were compared to the measurements. Moreover a search method to find the optimum dither frequency was proposed and validated. Finally, inspired by averaging interpretation of the dithering phenomenon, three new topologies were proposed, which are namely DLM, RF-ADC and area modulation power combining, which are all nonlinear systems linearized with dithering techniques. A new averaging method was developed and used for analysis of a Gilbert cell mixer topology, which resulted in a closed form relationship for the conversion gain, for long channel devices

    Applied Signal Processing

    Get PDF
    Being an inter-disciplinary subject, Signal Processing has application in almost all scientific fields. Applied Signal Processing tries to link between the analog and digital signal processing domains. Since the digital signal processing techniques have evolved from its analog counterpart, this book begins by explaining the fundamental concepts in analog signal processing and then progresses towards the digital signal processing. This will help the reader to gain a general overview of the whole subject and establish links between the various fundamental concepts. While the focus of this book is on the fundamentals of signal processing, the understanding of these topics greatly enhances the confident use as well as further development of the design and analysis of digital systems for various engineering and medical applications. Applied Signal Processing also prepares readers to further their knowledge in advanced topics within the field of signal processing

    Spread spectrum-based video watermarking algorithms for copyright protection

    Get PDF
    Merged with duplicate record 10026.1/2263 on 14.03.2017 by CS (TIS)Digital technologies know an unprecedented expansion in the last years. The consumer can now benefit from hardware and software which was considered state-of-the-art several years ago. The advantages offered by the digital technologies are major but the same digital technology opens the door for unlimited piracy. Copying an analogue VCR tape was certainly possible and relatively easy, in spite of various forms of protection, but due to the analogue environment, the subsequent copies had an inherent loss in quality. This was a natural way of limiting the multiple copying of a video material. With digital technology, this barrier disappears, being possible to make as many copies as desired, without any loss in quality whatsoever. Digital watermarking is one of the best available tools for fighting this threat. The aim of the present work was to develop a digital watermarking system compliant with the recommendations drawn by the EBU, for video broadcast monitoring. Since the watermark can be inserted in either spatial domain or transform domain, this aspect was investigated and led to the conclusion that wavelet transform is one of the best solutions available. Since watermarking is not an easy task, especially considering the robustness under various attacks several techniques were employed in order to increase the capacity/robustness of the system: spread-spectrum and modulation techniques to cast the watermark, powerful error correction to protect the mark, human visual models to insert a robust mark and to ensure its invisibility. The combination of these methods led to a major improvement, but yet the system wasn't robust to several important geometrical attacks. In order to achieve this last milestone, the system uses two distinct watermarks: a spatial domain reference watermark and the main watermark embedded in the wavelet domain. By using this reference watermark and techniques specific to image registration, the system is able to determine the parameters of the attack and revert it. Once the attack was reverted, the main watermark is recovered. The final result is a high capacity, blind DWr-based video watermarking system, robust to a wide range of attacks.BBC Research & Developmen

    Project OASIS: The Design of a Signal Detector for the Search for Extraterrestrial Intelligence

    Get PDF
    An 8 million channel spectrum analyzer (MCSA) was designed the meet to meet the needs of a SETI program. The MCSA puts out a very large data base at very high rates. The development of a device which follows the MCSA, is presented
    corecore