5,966 research outputs found
A Generative Product-of-Filters Model of Audio
We propose the product-of-filters (PoF) model, a generative model that
decomposes audio spectra as sparse linear combinations of "filters" in the
log-spectral domain. PoF makes similar assumptions to those used in the classic
homomorphic filtering approach to signal processing, but replaces hand-designed
decompositions built of basic signal processing operations with a learned
decomposition based on statistical inference. This paper formulates the PoF
model and derives a mean-field method for posterior inference and a variational
EM algorithm to estimate the model's free parameters. We demonstrate PoF's
potential for audio processing on a bandwidth expansion task, and show that PoF
can serve as an effective unsupervised feature extractor for a speaker
identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod
Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements
Background subtraction has been a fundamental and widely studied task in
video analysis, with a wide range of applications in video surveillance,
teleconferencing and 3D modeling. Recently, motivated by compressive imaging,
background subtraction from compressive measurements (BSCM) is becoming an
active research task in video surveillance. In this paper, we propose a novel
tensor-based robust PCA (TenRPCA) approach for BSCM by decomposing video frames
into backgrounds with spatial-temporal correlations and foregrounds with
spatio-temporal continuity in a tensor framework. In this approach, we use 3D
total variation (TV) to enhance the spatio-temporal continuity of foregrounds,
and Tucker decomposition to model the spatio-temporal correlations of video
background. Based on this idea, we design a basic tensor RPCA model over the
video frames, dubbed as the holistic TenRPCA model (H-TenRPCA). To characterize
the correlations among the groups of similar 3D patches of video background, we
further design a patch-group-based tensor RPCA model (PG-TenRPCA) by joint
tensor Tucker decompositions of 3D patch groups for modeling the video
background. Efficient algorithms using alternating direction method of
multipliers (ADMM) are developed to solve the proposed models. Extensive
experiments on simulated and real-world videos demonstrate the superiority of
the proposed approaches over the existing state-of-the-art approaches.Comment: To appear in IEEE TI
Speech enhancement with frequency domain auto-regressive modeling
Speech applications in far-field real world settings often deal with signals
that are corrupted by reverberation. The task of dereverberation constitutes an
important step to improve the audible quality and to reduce the error rates in
applications like automatic speech recognition (ASR). We propose a unified
framework of speech dereverberation for improving the speech quality and the
ASR performance using the approach of envelope-carrier decomposition provided
by an autoregressive (AR) model. The AR model is applied in the frequency
domain of the sub-band speech signals to separate the envelope and carrier
parts. A novel neural architecture based on dual path long short term memory
(DPLSTM) model is proposed, which jointly enhances the sub-band envelope and
carrier components. The dereverberated envelope-carrier signals are modulated
and the sub-band signals are synthesized to reconstruct the audio signal back.
The DPLSTM model for dereverberation of envelope and carrier components also
allows the joint learning of the network weights for the down stream ASR task.
In the ASR tasks on the REVERB challenge dataset as well as on the VOiCES
dataset, we illustrate that the joint learning of speech dereverberation
network and the E2E ASR model yields significant performance improvements over
the baseline ASR system trained on log-mel spectrogram as well as other
benchmarks for dereverberation (average relative improvements of 10-24% over
the baseline system). The speech quality improvements, evaluated using
subjective listening tests, further highlight the improved quality of the
reconstructed audio.Comment: 10 page
A novel method for subjective picture quality assessment and further studies of HDTV formats
This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ IEEE 2008.This paper proposes a novel method for the assessment of picture quality, called triple stimulus continuous evaluation scale (TSCES), to allow the direct comparison of different HDTV formats. The method uses an upper picture quality anchor and a lower picture quality anchor with defined impairments. The HDTV format under test is evaluated in a subjective comparison with the upper and lower anchors. The method utilizes three displays in a particular vertical arrangement. In an initial series of tests with the novel method, the HDTV formats 1080p/50,1080i/25, and 720p/50 were compared at various bit-rates and with seven different content types on three identical 1920 times 1080 pixel displays. It was found that the new method provided stable and consistent results. The method was tested with 1080p/50,1080i/25, and 720p/50 HDTV images that had been coded with H.264/AVC High profile. The result of the assessment was that the progressive HDTV formats found higher appreciation by the assessors than the interlaced HDTV format. A system chain proposal is given for future media production and delivery to take advantage of this outcome. Recommendations for future research conclude the paper
Data compression techniques applied to high resolution high frame rate video technology
An investigation is presented of video data compression applied to microgravity space experiments using High Resolution High Frame Rate Video Technology (HHVT). An extensive survey of methods of video data compression, described in the open literature, was conducted. The survey examines compression methods employing digital computing. The results of the survey are presented. They include a description of each method and assessment of image degradation and video data parameters. An assessment is made of present and near term future technology for implementation of video data compression in high speed imaging system. Results of the assessment are discussed and summarized. The results of a study of a baseline HHVT video system, and approaches for implementation of video data compression, are presented. Case studies of three microgravity experiments are presented and specific compression techniques and implementations are recommended
- …