1,365 research outputs found

    Underdetermined source separation using a sparse STFT framework and weighted laplacian directional modelling

    Full text link
    The instantaneous underdetermined audio source separation problem of K-sensors, L-sources mixing scenario (where K < L) has been addressed by many different approaches, provided the sources remain quite distinct in the virtual positioning space spanned by the sensors. This problem can be tackled as a directional clustering problem along the source position angles in the mixture. The use of Generalised Directional Laplacian Densities (DLD) in the MDCT domain for underdetermined source separation has been proposed before. Here, we derive weighted mixtures of DLDs in a sparser representation of the data in the STFT domain to perform separation. The proposed approach yields improved results compared to our previous offering and compares favourably with the state-of-the-art.Comment: EUSIPCO 2016, Budapest, Hungar

    Audio Source Separation Using Sparse Representations

    Get PDF
    This is the author's final version of the article, first published as A. Nesbit, M. G. Jafari, E. Vincent and M. D. Plumbley. Audio Source Separation Using Sparse Representations. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 10, pp. 246-264. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch010file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research

    Networked Computing in Wireless Sensor Networks for Structural Health Monitoring

    Full text link
    This paper studies the problem of distributed computation over a network of wireless sensors. While this problem applies to many emerging applications, to keep our discussion concrete we will focus on sensor networks used for structural health monitoring. Within this context, the heaviest computation is to determine the singular value decomposition (SVD) to extract mode shapes (eigenvectors) of a structure. Compared to collecting raw vibration data and performing SVD at a central location, computing SVD within the network can result in significantly lower energy consumption and delay. Using recent results on decomposing SVD, a well-known centralized operation, into components, we seek to determine a near-optimal communication structure that enables the distribution of this computation and the reassembly of the final results, with the objective of minimizing energy consumption subject to a computational delay constraint. We show that this reduces to a generalized clustering problem; a cluster forms a unit on which a component of the overall computation is performed. We establish that this problem is NP-hard. By relaxing the delay constraint, we derive a lower bound to this problem. We then propose an integer linear program (ILP) to solve the constrained problem exactly as well as an approximate algorithm with a proven approximation ratio. We further present a distributed version of the approximate algorithm. We present both simulation and experimentation results to demonstrate the effectiveness of these algorithms

    Application of Fractal Dimension for Quantifying Noise Texture in Computed Tomography Images

    Get PDF
    Purpose Evaluation of noise texture information in CT images is important for assessing image quality. Noise texture is often quantified by the noise power spectrum (NPS), which requires numerous image realizations to estimate. This study evaluated fractal dimension for quantifying noise texture as a scalar metric that can potentially be estimated using one image realization. Methods The American College of Radiology CT accreditation phantom (ACR) was scanned on a clinical scanner (Discovery CT750, GE Healthcare) at 120 kV and 25 and 90 mAs. Images were reconstructed using filtered back projection (FBP/ASIR 0%) with varying reconstruction kernels: Soft, Standard, Detail, Chest, Lung, Bone, and Edge. For each kernel, images were also reconstructed using ASIR 50% and ASIR 100% iterative reconstruction (IR) methods. Fractal dimension was estimated using the differential box‐counting algorithm applied to images of the uniform section of ACR phantom. The two‐dimensional Noise Power Spectrum (NPS) and one‐dimensional‐radially averaged NPS were estimated using established techniques. By changing the radiation dose, the effect of noise magnitude on fractal dimension was evaluated. The Spearman correlation between the fractal dimension and the frequency of the NPS peak was calculated. The number of images required to reliably estimate fractal dimension was determined and compared to the number of images required to estimate the NPS‐peak frequency. The effect of Region of Interest (ROI) size on fractal dimension estimation was evaluated. Feasibility of estimating fractal dimension in an anthropomorphic phantom and clinical image was also investigated, with the resulting fractal dimension compared to that estimated within the uniform section of the ACR phantom. Results Fractal dimension was strongly correlated with the frequency of the peak of the radially averaged NPS curve, having a Spearman rank‐order coefficient of 0.98 (P‐value \u3c 0.01) for ASIR 0%. The mean fractal dimension at ASIR 0% was 2.49 (Soft), 2.51 (Standard), 2.52 (Detail), 2.57 (Chest), 2.61 (Lung), 2.66 (Bone), and 2.7 (Edge). A reduction in fractal dimension was observed with increasing ASIR levels for all investigated reconstruction kernels. Fractal dimension was found to be independent of noise magnitude. Fractal dimension was successfully estimated from four ROIs of size 64 × 64 pixels or one ROI of 128 × 128 pixels. Fractal dimension was found to be sensitive to non‐noise structures in the image, such as ring artifacts and anatomical structure. Fractal dimension estimated within a uniform region of an anthropomorphic phantom and clinical head image matched that estimated within the ACR phantom for filtered back projection reconstruction. Conclusions Fractal dimension correlated with the NPS‐peak frequency and was independent of noise magnitude, suggesting that the scalar metric of fractal dimension can be used to quantify the change in noise texture across reconstruction approaches. Results demonstrated that fractal dimension can be estimated from four, 64 × 64‐pixel ROIs or one 128 × 128 ROI within a head CT image, which may make it amenable for quantifying noise texture within clinical images

    Audio Inpainting

    Get PDF
    (c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published version: IEEE Transactions on Audio, Speech and Language Processing 20(3): 922-932, Mar 2012. DOI: 10.1090/TASL.2011.2168211

    Frequency Domain Methods for Coding the Linear Predictive Residual of Speech Signals

    Get PDF
    The most frequently used speech coding paradigm is ACELP, famous because it encodes speech with high quality, while consuming a small bandwidth. ACELP performs linear prediction filtering in order to eliminate the effect of the spectral envelope from the signal. The noise-like excitation is then encoded using algebraic codebooks. The search of this codebook, however, can not be performed optimally with conventional encoders due to the correlation between their samples. Because of this, more complex algorithms are required in order to maintain the quality. Four different transformation algorithms have been implemented (DCT, DFT, Eigenvalue decomposition and Vandermonde decomposition) in order to decorrelate the samples of the innovative excitation in ACELP. These transformations have been integrated in the ACELP of the EVS codec. The transformed innovative excitation is coded using the envelope based arithmetic coder. Objective and subjective tests have been carried out to evaluate the quality of the encoding, the degree of decorrelation achieved by the transformations and the computational complexity of the algorithms
    corecore