Search CORE

1,488 research outputs found

Silicon Technologies for Speaker Independent Speech Processing and Recognition Systems in Noisy Environments

Author: Arun Selvaraj
Karthikeyan Natarajan
Mala John
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Personal Sound Zones by Subband Filtering and Time Domain Optimization

Author: Diego Antón María de
Gonzalez Alberto
Molés-Cases Vicent
Piñero Gema
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

[EN] Personal Sound Zones (PSZ) systems aim to render independent sound signals to multiple listeners within a room by using arrays of loudspeakers. One of the algorithms used to provide PSZ is Weighted Pressure Matching (wPM), which computes the filters required to render a desired response in the listening zones while reducing the acoustic energy arriving to the quiet zones. This algorithm can be formulated in time and frequency domains. In general, the time-domain formulation (wPM-TD) can obtain good performance with shorter filters and delays than the frequency-domain formulation (wPM-FD). However, wPM-TD requires higher computation for obtaining the optimal filters. In this article, we propose a novel approach to the wPM algorithm named Weighted Pressure Matching with Subband Decomposition (wPMSD), which formulates an independent time-domain optimization problem for each of the subbands of a Generalized Discrete Fourier Transform (GDFT) filter bank. Solving the optimization independently for each subband has two main advantages: 1) lower computational complexity than wPM-TD to compute the optimal filters; 2) higher versatility than the classic wPM algorithms, as it allows different configurations (sets of loudspeakers, filter lengths, etc.) in each subband. Moreover, filtering the input signals with a GDFT filter bank (as in wPM-SD) requires lower computational effort than broadband filtering (as in wPM-TD and wPM-FD), which is beneficial for practical PSZ systems. We present experimental evaluations showing that wPM-SD offers very similar performance to wPM-TD. In addition, two cases where the versatility of wPM-SD is beneficial for a PSZ system are presented and experimentally validated.This work was supported by Grants RTI2018-098085-B-C41 (MCIU/AEI/FEDER, UE), RED2018-102668-T and PROMETEO/2019/109. The work of Vicent Moles-Cases has been supported by Spanish Ministry of Education under Grant FPU17/01288.Molés-Cases, V.; Piñero, G.; Diego Antón, MD.; Gonzalez, A. (2020). Personal Sound Zones by Subband Filtering and Time Domain Optimization. IEEE/ACM Transactions on Audio Speech and Language Processing. 28:2684-2696. https://doi.org/10.1109/TASLP.2020.3023628S268426962

RiuNet

Fusion of Audio and Visual Information for Implementing Improved Speech Recognition System

Author: Acharya Vikrant Satish
Publication venue: ScholarWorks@GVSU
Publication date: 01/04/2018
Field of study

Speech recognition is a very useful technology because of its potential to develop applications, which are suitable for various needs of users. This research is an attempt to enhance the performance of a speech recognition system by combining the visual features (lip movement) with audio features. The results were calculated using utterances of numerals collected from participants inclusive of both male and female genders. Discrete Cosine Transform (DCT) coefficients were used for computing visual features and Mel Frequency Cepstral Coefficients (MFCC) were used for computing audio features. The classification was then carried out using Support Vector Machine (SVM). The results obtained from the combined/fused system were compared with the recognition rates of two standalone systems (Audio only and visual only)

Scholarworks@GVSU

Wavelets and Subband Coding

Author: Kovacevic Jelena
Vetterli Martin
Publication venue: Englewood Cliffs, NJ, Prentice-Hall
Publication date: 18/04/2005
Field of study

First published in 1995, Wavelets and Subband Coding offered a unified view of the exciting field of wavelets and their discrete-time cousins, filter banks, or subband coding. The book developed the theory in both continuous and discrete time, and presented important applications. During the past decade, it filled a useful need in explaining a new view of signal processing based on flexible time-frequency analysis and its applications. Since 2007, the authors now retain the copyright and allow open access to the book

Infoscience - École polytechnique fédérale de Lausanne

Transmitter precoding for multi-antenna multi-user communications

Author: Modi Kirtan N.
Publication venue: The Research Repository @ WVU
Publication date: 01/08/2004
Field of study

Emerging wireless sensor networks and existing wireless cellular and ad hoc networks motivate the design of low-power receivers. Multi-user interference drastically reduces the energy efficiency of wireless multi-user communications by introducing errors in the bits being detected at the receiver. Interference rejection algorithms and multiple antenna techniques can significantly reduce the bit-error-rate at the receiver. Unfortunately, while interference rejection algorithms burden the receiver with heavy signal processing functionalities, thereby increasing the power consumption at the receiver, the small size of receivers, specifically in sensor networks and in downlink cellular communications, prohibits the use of multiple receive antennas. In a broadcast channel, where a central transmitter is transmitting independent streams to decentralized receivers, it is possible for the transmitter to have a priori knowledge of the interference. Multiple antennas can be used at the transmitter to enhance energy efficiency. In some systems, the transmitter has access to virtually an infinite source of power. A typical example would be the base station transmitter for the downlink of a cellular system. The power consumption at receivers can be reduced if some of the signal processing functionality of the receiver is moved to the transmitter.;In this thesis, we consider a wireless broadcast channel with a transmitter equipped with multiple antennas and having a priori knowledge of interference. Our objective is to minimize the receiver complexity by adding extra signal processing functions to the transmitter. We need to determine the optimal signal that should be transmitted so that interference is completely eliminated, and the benefits that can be obtained by using multiple transmit antennas can be maximized. We investigate the use of linear precoders, linear transformations made on the signal before transmission, for this purpose

The Research Repository @ WVU (West Virginia University)

A Multiscale Pyramid Transform for Graph Signals

Author: Faraji Mohammad Javad
Shuman David I
Vandergheynst Pierre
Publication venue
Publication date: 18/08/2015
Field of study

Multiscale transforms designed to process analog and discrete-time signals and images cannot be directly applied to analyze high-dimensional data residing on the vertices of a weighted graph, as they do not capture the intrinsic geometric structure of the underlying graph data domain. In this paper, we adapt the Laplacian pyramid transform for signals on Euclidean domains so that it can be used to analyze high-dimensional data residing on the vertices of a weighted graph. Our approach is to study existing methods and develop new methods for the four fundamental operations of graph downsampling, graph reduction, and filtering and interpolation of signals on graphs. Equipped with appropriate notions of these operations, we leverage the basic multiscale constructs and intuitions from classical signal processing to generate a transform that yields both a multiresolution of graphs and an associated multiresolution of a graph signal on the underlying sequence of graphs.Comment: 16 pages, 13 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Adaptive whitening in neural populations with gain-modulating interneurons

Author: Chklovskii Dmitri B.
Duong Lyndon R.
Heeger David J.
Lipshutz David
Simoncelli Eero P.
Publication venue
Publication date: 03/06/2023
Field of study

Statistical whitening transformations play a fundamental role in many computational systems, and may also play an important role in biological sensory systems. Existing neural circuit models of adaptive whitening operate by modifying synaptic interactions; however, such modifications would seem both too slow and insufficiently reversible. Motivated by the extensive neuroscience literature on gain modulation, we propose an alternative model that adaptively whitens its responses by modulating the gains of individual neurons. Starting from a novel whitening objective, we derive an online algorithm that whitens its outputs by adjusting the marginal variances of an overcomplete set of projections. We map the algorithm onto a recurrent neural network with fixed synaptic weights and gain-modulating interneurons. We demonstrate numerically that sign-constraining the gains improves robustness of the network to ill-conditioned inputs, and a generalization of the circuit achieves a form of local whitening in convolutional populations, such as those found throughout the visual or auditory systems.Comment: 20 pages, 10 figures (incl. appendix). To appear in the Proceedings of the 40th International Conference on Machine Learnin

arXiv.org e-Print Archive

Spread spectrum-based video watermarking algorithms for copyright protection

Author: Serdean Cristian Vasile
Publication venue: 'University of Plymouth'
Publication date: 01/01/2002
Field of study

Merged with duplicate record 10026.1/2263 on 14.03.2017 by CS (TIS)Digital technologies know an unprecedented expansion in the last years. The consumer can now benefit from hardware and software which was considered state-of-the-art several years ago. The advantages offered by the digital technologies are major but the same digital technology opens the door for unlimited piracy. Copying an analogue VCR tape was certainly possible and relatively easy, in spite of various forms of protection, but due to the analogue environment, the subsequent copies had an inherent loss in quality. This was a natural way of limiting the multiple copying of a video material. With digital technology, this barrier disappears, being possible to make as many copies as desired, without any loss in quality whatsoever. Digital watermarking is one of the best available tools for fighting this threat. The aim of the present work was to develop a digital watermarking system compliant with the recommendations drawn by the EBU, for video broadcast monitoring. Since the watermark can be inserted in either spatial domain or transform domain, this aspect was investigated and led to the conclusion that wavelet transform is one of the best solutions available. Since watermarking is not an easy task, especially considering the robustness under various attacks several techniques were employed in order to increase the capacity/robustness of the system: spread-spectrum and modulation techniques to cast the watermark, powerful error correction to protect the mark, human visual models to insert a robust mark and to ensure its invisibility. The combination of these methods led to a major improvement, but yet the system wasn't robust to several important geometrical attacks. In order to achieve this last milestone, the system uses two distinct watermarks: a spatial domain reference watermark and the main watermark embedded in the wavelet domain. By using this reference watermark and techniques specific to image registration, the system is able to determine the parameters of the attack and revert it. Once the attack was reverted, the main watermark is recovered. The final result is a high capacity, blind DWr-based video watermarking system, robust to a wide range of attacks.BBC Research & Developmen

Plymouth Electronic Archive and Research Library