181 research outputs found

    Blind Image Deblurring Driven by Nonlinear Processing in the Edge Domain

    Get PDF
    This work addresses the problem of blind image deblurring, that is, of recovering an original image observed through one or more unknown linear channels and corrupted by additive noise. We resort to an iterative algorithm, belonging to the class of Bussgang algorithms, based on alternating a linear and a nonlinear image estimation stage. In detail, we investigate the design of a novel nonlinear processing acting on the Radon transform of the image edges. This choice is motivated by the fact that the Radon transform of the image edges well describes the structural image features and the effect of blur, thus simplifying the nonlinearity design. The effect of the nonlinear processing is to thin the blurred image edges and to drive the overall blind restoration algorithm to a sharp, focused image. The performance of the algorithm is assessed by experimental results pertaining to restoration of blurred natural images

    User-Symbiotic Speech Enhancement for Hearing Aids

    Get PDF

    New Insights Into the MVDR Beamformer in Room Acoustics

    Full text link

    Fir filter for makhraj recognition system

    Get PDF
    Audio and speech processing systems have steadily risen in importance in the everyday of most people in developed countries. Speech recognition is the process of converting an acoustic signal, captured by a microphone to a set of words. Recognition is generally more difficult when vocabularies are larger or have many similar-sounding words. There are some external parameters that can effects speech recognition system performance, including the characteristics of the environmental noise and the type and also the placement of the microphone. A particular objective of the invention is to recognize the correct makhraj pronounce for the recognition analysis using pre-processing data base Matlab. In this project, speech processing for makhraj recognition is built using Finite Impulse Response (FIR) filter. The speech that was collects all of data from respondent. It requires the simultaneously recording of the speech wave as many parameters as possible. Then, get the correct makhraj pronounce example (alif), (ba), (ta), (tsa), (jim), (ha) and others. After that, the project will built using Matlab softwar

    Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

    Full text link
    Recently, frequency domain all-neural beamforming methods have achieved remarkable progress for multichannel speech separation. In parallel, the integration of time domain network structure and beamforming also gains significant attention. This study proposes a novel all-neural beamforming method in time domain and makes an attempt to unify the all-neural beamforming pipelines for time domain and frequency domain multichannel speech separation. The proposed model consists of two modules: separation and beamforming. Both modules perform temporal-spectral-spatial modeling and are trained from end-to-end using a joint loss function. The novelty of this study lies in two folds. Firstly, a time domain directional feature conditioned on the direction of the target speaker is proposed, which can be jointly optimized within the time domain architecture to enhance target signal estimation. Secondly, an all-neural beamforming network in time domain is designed to refine the pre-separated results. This module features with parametric time-variant beamforming coefficient estimation, without explicitly following the derivation of optimal filters that may lead to an upper bound. The proposed method is evaluated on simulated reverberant overlapped speech data derived from the AISHELL-1 corpus. Experimental results demonstrate significant performance improvements over frequency domain state-of-the-arts, ideal magnitude masks and existing time domain neural beamforming methods

    Adaptive noise cancellation using multichannel lattice structure.

    Get PDF
    This thesis presents a multichannel adaptive noise cancellation technique (MCLS) for cancelling the noise over nonlinear transmission channel. The technique applies to the situation in which the reference signal and noisy primary signal are collected simultaneously. The coefficients of the multichannel multiple regression transversal filter are modified adaptively according to the backward prediction error vector generated from the multichannel adaptive lattice predictor. This multichannel adaptive noise cancellation procedure involves the NLMS adaptive algorithm. The performance of the new technique using different types of transmission channels, different types of reference inputs and different types of noise-free primary inputs are examined analytically. The new approach is experimentally shown to have better noise cancellation performance than the existing single-channel adaptive lattice noise cancellation algorithm (SCLS) over nonlinear transmission channel case, especially in low input SNR situation.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .X54. Source: Masters Abstracts International, Volume: 43-01, page: 0288. Adviser: H. K. Kwan. Thesis (M.A.Sc.)--University of Windsor (Canada), 2004

    Robust speaker diarization for meetings

    Get PDF
    Aquesta tesi doctoral mostra la recerca feta en l'àrea de la diarització de locutor per a sales de reunions. En la present s'estudien els algorismes i la implementació d'un sistema en diferit de segmentació i aglomerat de locutor per a grabacions de reunions a on normalment es té accés a més d'un micròfon per al processat. El bloc més important de recerca s'ha fet durant una estada al International Computer Science Institute (ICSI, Berkeley, Caligornia) per un període de dos anys.La diarització de locutor s'ha estudiat força per al domini de grabacions de ràdio i televisió. La majoria dels sistemes proposats utilitzen algun tipus d'aglomerat jeràrquic de les dades en grups acústics a on de bon principi no se sap el número de locutors òptim ni tampoc la seva identitat. Un mètode molt comunment utilitzat s'anomena "bottom-up clustering" (aglomerat de baix-a-dalt), amb el qual inicialment es defineixen molts grups acústics de dades que es van ajuntant de manera iterativa fins a obtenir el nombre òptim de grups tot i acomplint un criteri de parada. Tots aquests sistemes es basen en l'anàlisi d'un canal d'entrada individual, el qual no permet la seva aplicació directa per a reunions. A més a més, molts d'aquests algorisms necessiten entrenar models o afinar els parameters del sistema usant dades externes, el qual dificulta l'aplicabilitat d'aquests sistemes per a dades diferents de les usades per a l'adaptació.La implementació proposada en aquesta tesi es dirigeix a solventar els problemes mencionats anteriorment. Aquesta pren com a punt de partida el sistema existent al ICSI de diarització de locutor basat en l'aglomerat de "baix-a-dalt". Primer es processen els canals de grabació disponibles per a obtindre un sol canal d'audio de qualitat major, a més dínformació sobre la posició dels locutors existents. Aleshores s'implementa un sistema de detecció de veu/silenci que no requereix de cap entrenament previ, i processa els segments de veu resultant amb una versió millorada del sistema mono-canal de diarització de locutor. Aquest sistema ha estat modificat per a l'ús de l'informació de posició dels locutors (quan es tingui) i s'han adaptat i creat nous algorismes per a que el sistema obtingui tanta informació com sigui possible directament del senyal acustic, fent-lo menys depenent de les dades de desenvolupament. El sistema resultant és flexible i es pot usar en qualsevol tipus de sala de reunions pel que fa al nombre de micròfons o la seva posició. El sistema, a més, no requereix en absolute dades d´entrenament, sent més senzill adaptar-lo a diferents tipus de dades o dominis d'aplicació. Finalment, fa un pas endavant en l'ús de parametres que siguin mes robusts als canvis en les dades acústiques. Dos versions del sistema es van presentar amb resultats excel.lents a les evaluacions de RT05s i RT06s del NIST en transcripció rica per a reunions, a on aquests es van avaluar amb dades de dos subdominis diferents (conferencies i reunions). A més a més, es fan experiments utilitzant totes les dades disponibles de les evaluacions RT per a demostrar la viabilitat dels algorisms proposats en aquesta tasca.This thesis shows research performed into the topic of speaker diarization for meeting rooms. It looks into the algorithms and the implementation of an offline speaker segmentation and clustering system for a meeting recording where usually more than one microphone is available. The main research and system implementation has been done while visiting the International Computes Science Institute (ICSI, Berkeley, California) for a period of two years. Speaker diarization is a well studied topic on the domain of broadcast news recordings. Most of the proposed systems involve some sort of hierarchical clustering of the data into clusters, where the optimum number of speakers of their identities are unknown a priory. A very commonly used method is called bottom-up clustering, where multiple initial clusters are iteratively merged until the optimum number of clusters is reached, according to some stopping criterion. Such systems are based on a single channel input, not allowing a direct application for the meetings domain. Although some efforts have been done to adapt such systems to multichannel data, at the start of this thesis no effective implementation had been proposed. Furthermore, many of these speaker diarization algorithms involve some sort of models training or parameter tuning using external data, which impedes its usability with data different from what they have been adapted to.The implementation proposed in this thesis works towards solving the aforementioned problems. Taking the existing hierarchical bottom-up mono-channel speaker diarization system from ICSI, it first uses a flexible acoustic beamforming to extract speaker location information and obtain a single enhanced signal from all available microphones. It then implements a train-free speech/non-speech detection on such signal and processes the resulting speech segments with an improved version of the mono-channel speaker diarization system. Such system has been modified to use speaker location information (then available) and several algorithms have been adapted or created new to adapt the system behavior to each particular recording by obtaining information directly from the acoustics, making it less dependent on the development data.The resulting system is flexible to any meetings room layout regarding the number of microphones and their placement. It is train-free making it easy to adapt to different sorts of data and domains of application. Finally, it takes a step forward into the use of parameters that are more robust to changes in the acoustic data. Two versions of the system were submitted with excellent results in RT05s and RT06s NIST Rich Transcription evaluations for meetings, where data from two different subdomains (lectures and conferences) was evaluated. Also, experiments using the RT datasets from all meetings evaluations were used to test the different proposed algorithms proving their suitability to the task.Postprint (published version

    Linear Reconstruction of Non-Stationary Image Ensembles Incorporating Blur and Noise Models

    Get PDF
    Two new linear reconstruction techniques are developed to improve the resolution of images collected by ground-based telescopes imaging through atmospheric turbulence. The classical approach involves the application of constrained least squares (CLS) to the deconvolution from wavefront sensing (DWFS) technique. The new algorithm incorporates blur and noise models to select the appropriate regularization constant automatically. In all cases examined, the Newton-Raphson minimization converged to a solution in less than 10 iterations. The non-iterative Bayesian approach involves the development of a new vector Wiener filter which is optimal with respect to mean square error (MSE) for a non-stationary object class degraded by atmospheric turbulence and measurement noise. This research involves the first extension of the Wiener filter to account properly for shot noise and an unknown, random optical transfer function (OTF). The vector Wiener filter provides superior reconstructions when compared to the traditional scalar Wiener filter for a non-stationary object class. In addition, the new filter can provide a superresolution capability when the object\u27s Fourier domain statistics are known for spatial frequencies beyond the OTF cutoff. A generalized performance and robustness study of the vector Wiener filter showed that MSE performance is fundamentally limited by object signal-to-noise ratio (SNR) and correlation between object pixels

    Informed source extraction from a mixture of sources exploiting second order temporal structure

    Get PDF
    Extracting a specific signal from among man
    corecore