551 research outputs found

    Multiple and single snapshot compressive beamforming

    Full text link
    For a sound field observed on a sensor array, compressive sensing (CS) reconstructs the direction-of-arrival (DOA) of multiple sources using a sparsity constraint. The DOA estimation is posed as an underdetermined problem by expressing the acoustic pressure at each sensor as a phase-lagged superposition of source amplitudes at all hypothetical DOAs. Regularizing with an â„“1\ell_1-norm constraint renders the problem solvable with convex optimization, and promoting sparsity gives high-resolution DOA maps. Here, the sparse source distribution is derived using maximum a posteriori (MAP) estimates for both single and multiple snapshots. CS does not require inversion of the data covariance matrix and thus works well even for a single snapshot where it gives higher resolution than conventional beamforming. For multiple snapshots, CS outperforms conventional high-resolution methods, even with coherent arrivals and at low signal-to-noise ratio. The superior resolution of CS is demonstrated with vertical array data from the SWellEx96 experiment for coherent multi-paths.Comment: In press Journal of Acoustical Society of Americ

    Sparse Modeling of Grouped Line Spectra

    Get PDF
    This licentiate thesis focuses on clustered parametric models for estimation of line spectra, when the spectral content of a signal source is assumed to exhibit some form of grouping. Different from previous parametric approaches, which generally require explicit knowledge of the model orders, this thesis exploits sparse modeling, where the orders are implicitly chosen. For line spectra, the non-linear parametric model is approximated by a linear system, containing an overcomplete basis of candidate frequencies, called a dictionary, and a large set of linear response variables that selects and weights the components in the dictionary. Frequency estimates are obtained by solving a convex optimization program, where the sum of squared residuals is minimized. To discourage overfitting and to infer certain structure in the solution, different convex penalty functions are introduced into the optimization. The cost trade-off between fit and penalty is set by some user parameters, as to approximate the true number of spectral lines in the signal, which implies that the response variable will be sparse, i.e., have few non-zero elements. Thus, instead of explicit model orders, the orders are implicitly set by this trade-off. For grouped variables, the dictionary is customized, and appropriate convex penalties selected, so that the solution becomes group sparse, i.e., has few groups with non-zero variables. In an array of sensors, the specific time-delays and attenuations will depend on the source and sensor positions. By modeling this, one may estimate the location of a source. In this thesis, a novel joint location and grouped frequency estimator is proposed, which exploits sparse modeling for both spectral and spatial estimates, showing robustness against sources with overlapping frequency content. For audio signals, this thesis uses two different features for clustering. Pitch is a perceptual property of sound that may be described by the harmonic model, i.e., by a group of spectral lines at integer multiples of a fundamental frequency, which we estimate by exploiting a novel adaptive total variation penalty. The other feature, chroma, is a concept in musical theory, collecting pitches at powers of 2 from each other into groups. Using a chroma dictionary, together with appropriate group sparse penalties, we propose an automatic transcription of the chroma content of a signal

    Pilot-Aided Equalization with a Constrained Noise-Estimation Filter

    Get PDF
    In this paper we focus on a single carrier pilotassisted transmission scheme where one pilot symbol is periodically inserted in the transmitted sequence on a time-division multiplexing basis. A new equalization scheme, where the knowledge of pilot symbols is exploited by the equalizer to generate an estimate of the noise affecting the symbol to be detected, is introduced and analyzed. The criterion used to compute the equalizer coefficients is the minimization of the mean-square error (MSE). The main new result of our analysis is that the optimal pilot aided equalizer (PAE) can be decomposed as the cascade of an unconstrained minimum MSE (MMSE) linear equalizer (LE) and a data-aided noise estimation filter. This result completes and extends the noise-predictive view of decision feedback equalization to general data-aided equalization. The PAE is compared here to the MMSE-LE and to the MSE decision feedback equalizer on two frequency selective wireless channels

    Absolute identification by relative judgment

    Get PDF
    In unidimensional absolute identification tasks, participants identify stimuli that vary along a single dimension. Performance is surprisingly poor compared with discrimination of the same stimuli. Existing models assume that identification is achieved using long-term representations of absolute magnitudes. The authors propose an alternative relative judgment model (RJM) in which the elemental perceptual units are representations of the differences between current and previous stimuli. These differences are used, together with the previous feedback, to respond. Without using long-term representations of absolute magnitudes, the RJM accounts for (a) information transmission limits, (b) bowed serial position effects, and (c) sequential effects, where responses are biased toward immediately preceding stimuli but away from more distant stimuli (assimilation and contrast)

    Feature Topography and Sound Intensity Level Encoding in Primary Auditory Cortex

    Get PDF
    The primary auditory cortex: A1) in mammals is one of the first areas in the neocortex that receives auditory related spiking activity from the thalamus. Because the neocortex is implicated in regulating high-level brain phenomena, such as attention and perception, it is therefore important in regards to these high-level behaviors to understand how sounds are represented and transformed by neuronal circuits in this area. The topographic organization of neuronal responses to auditory features in A1 provides evidence for potential mechanisms and functional roles of this neural circuitry. This dissertation presents results from models of topographic organization supporting the notion that if the topographic organization of frequency responses, termed tonotopy or cochleotopy, is aligned along the longest anatomical line segment in A1, as supported by some physiological studies, then it is unlikely that any other topography is mapped monotonically along the orthogonal axis. Thresholds of neuronal responses to sound intensity level represent a particular feature that may have a local, highly periodic topography and that is vital to the sensitivity of the auditory system. The neuronal representation of sound level in A1, particularly as it relates to encoding accuracy, contains a distribution of neurons with varying amounts of inhibition at high sound levels. Neurons with large amounts of this high-level inhibition are described as nonmonotonic or level-tuned. This dissertation presents evidence from single neuron recordings in A1 that neurons exhibiting greater high-level inhibition also exhibit lower neuronal thresholds and that lower thresholds in these nonmonotonic neurons are preserved even when much of the neuronal population is adapted for accurately encoding more intense sounds. Evidence presented in this dissertation also suggests that nonmonotonic neurons have transient responses to time-varying: dynamic) level stimuli that adapt more quickly in response to low-level sounds than those of monotonic neurons. Together these results imply that under static, steady-state-dynamic and transient-dynamic sound level conditions, nonmonotonic neurons are specialized encoders of less intense sounds that allow the auditory system to maintain sensitivity under a variety of environmental conditions

    An evaluation of the broadband direction finding capabilities of array signal processing techniques

    Get PDF
    The objective of this study was to determine and compare the direction finding capabilities of high resolution spectral analysis techniques applied to the signals from an antenna array. The maintenance of acceptable resolution over a broad operating frequency range was of particular concern. The comparison was accomplished by computer simulation of the performance of a linear array of eleven isotropic elements, spaced 15 cm apart, over the frequency range from 100 MHz to 1.0 GHz;The two-signal resolution of three linear prediction based algorithms was compared. The variation in performance with signal-to-noise ratio, frequency, and center angle of arrival was also evaluated;An algorithm due to Tufts and Kumaresan which reduces the effects of noise by replacing the noisy signal correlation matrix by a smoothed, least-squares fit to it gave the best performance at the cost of the highest computational complexity. A special case of this method which is easy to compute exhibited blind angles, where performance was severely degraded in spite of wide spacing of the sources;The ratio of the physical length of the array to the length of the modulation envelope set up by the interference of the two incoming signals was found to be a constant at the point of resolution. This led to an expression for the two-signal resolution as a function of look angle, array length, frequency, and this algorithm dependent constant

    Improving Pure-Tone Audiometry Using Probabilistic Machine Learning Classification

    Get PDF
    Hearing loss is a critical public health concern, affecting hundreds millions of people worldwide and dramatically impacting quality of life for affected individuals. While treatment techniques have evolved in recent years, methods for assessing hearing ability have remained relatively unchanged for decades. The standard clinical procedure is the modified Hughson-Westlake procedure, an adaptive pure-tone detection task that is typically performed manually by audiologists, costing millions of collective hours annually among healthcare professionals. In addition to the high burden of labor, the technique provides limited detail about an individual’s hearing ability, estimating only detection thresholds at a handful of pre-defined pure-tone frequencies (a threshold audiogram). An efficient technique that produces a detailed estimate of the audiometric function, including threshold and spread, could allow for better characterization of particular hearing pathologies and provide more diagnostic value. Parametric techniques exist to efficiently estimate multidimensional psychometric functions, but are ill-suited for estimation of audiometric functions because these functions cannot be easily parameterized. The Gaussian process is a compelling machine learning technique for inference of nonparametric multidimensional functions using binary data. The work described in this thesis utilizes Gaussian process classification to build an automated framework for efficient, high-resolution estimation of the full audiometric function, which we call the machine learning audiogram (MLAG). This Bayesian technique iteratively computes a posterior distribution describing its current belief about detection probability given the current set of observed pure tones and detection responses. The posterior distribution can be used to provide a current point estimate of the psychometric function as well as to select an informative query point for the next stimulus to be provided to the listener. The Gaussian process covariance function encodes correlations between variables, reflecting prior beliefs on the system; MLAG uses a composite linear/squared exponential covariance function that enforces monotonicity with respect to intensity but only smoothness with respect to frequency for the audiometric function. This framework was initially evaluated in human subjects for threshold audiogram estimation. 2 repetitions of MLAG and 1 repetition of manual clinical audiometry were conducted in each of 21 participants. Results indicated that MLAG both agreed with clinical estimates and exhibited test-retest reliability to within accepted clinical standards, but with significantly fewer tone deliveries required compared to clinical methods while also providing an effectively continuous threshold estimate along frequency. This framework’s ability to evaluate full psychometric functions was then evaluated using simulated experiments. As a feasibility check, performance for estimating unidimensional psychometric functions was assessed and directly compared to inference using standard maximum-likelihood probit regression; results indicated that the two methods exhibited near identical performance for estimating threshold and spread. MLAG was then used to estimate 2-dimensional audiometric functions constructed using existing audiogram phenotypes. Results showed that this framework could estimate both threshold and spread of the full audiometric function with high accuracy and reliability given a sufficient sample count; non-active sampling using the Halton set required between 50-100 queries to reach clinical reliability, while active sampling strategies reduced the required number to around 20-30, with Bayesian active leaning by disagreement exhibiting the best performance of the tested methods. Overall, MLAG’s accuracy, reliability, and high degree of detail make it a promising method for estimation of threshold audiograms and audiometric functions, and the framework’s flexibility enables it to be easily extended to other psychophysical domains

    Shape and Topology Constrained Image Segmentation with Stochastic Models

    Get PDF
    The central theme of this thesis has been to develop robust algorithms for the task of image segmentation. All segmentation techniques that have been proposed in this thesis are based on the sound modeling of the image formation process. This approach to image partition enables the derivation of objective functions, which make all modeling assumptions explicit. Based on the Parametric Distributional Clustering (PDC) technique, improved variants have been derived, which explicitly incorporate topological assumptions in the corresponding cost functions. In this thesis, the questions of robustness and generalizability of segmentation solutions have been addressed in an empirical manner, giving comprehensive example sets for both problems. It has been shown, that the PDC framework is indeed capable of producing highly robust image partitions. In the context of PDC-based segmentation, a probabilistic representation of shape has been constructed. Furthermore, likelihood maps for given objects of interest were derived from the PDC cost function. Interpreting the shape information as a prior for the segmentation task, it has been combined with the likelihoods in a Bayesian setting. The resulting posterior probability for the occurrence of an object of a specified semantic category has been demonstrated to achieve excellent segmentation quality on very hard testbeds of images from the Corel gallery
    • …
    corecore