58 research outputs found

    Spectral Analysis for Signal Detection and Classification : Reducing Variance and Extracting Features

    Get PDF
    Spectral analysis encompasses several powerful signal processing methods. The papers in this thesis present methods for finding good spectral representations, and methods both for stationary and non-stationary signals are considered. Stationary methods can be used for real-time evaluation, analysing shorter segments of an incoming signal, while non-stationary methods can be used to analyse the instantaneous frequencies of fully recorded signals. All the presented methods aim to produce spectral representations that have high resolution and are easy to interpret. Such representations allow for detection of individual signal components in multi-component signals, as well as separation of close signal components. This makes feature extraction in the spectral representation possible, relevant features include the frequency or instantaneous frequency of components, the number of components in the signal, and the time duration of the components. Two methods that extract some of these features automatically for two types of signals are presented in this thesis. One adapted to signals with two longer duration frequency modulated components that detects the instantaneous frequencies and cross-terms in the Wigner-Ville distribution, the other for signals with an unknown number of short duration oscillations that detects the instantaneous frequencies in a reassigned spectrogram. This thesis also presents two multitaper methods that reduce the influence of noise on the spectral representations. One is designed for stationary signals and the other for non-stationary signals with multiple short duration oscillations. Applications for the methods presented in this thesis include several within medicine, e.g. diagnosis from analysis of heart rate variability, improved ultrasound resolution, and interpretation of brain activity from the electroencephalogram

    Scaled reassigned spectrograms applied to linear transducer signals

    Get PDF
    This study evaluates the applicability of scaled reassigned spectrograms (ReSTS) on ultrasound radio frequency data obtained with a clinical linear array ultrasound transducer. The ReSTS's ability to resolve axially closely spaced objects in a phantom is compared to the classical cross-correlation method with respect to the ability to resolve closely spaced objects as individual reflectors using ultrasound pulses with different lengths. The results show that the axial resolution achieved with the ReSTS was superior to the cross-correlation method when the reflected pulses from two objects overlap. A novel B-mode imaging method, facilitating higher image resolution for distinct reflectors, is proposed

    Sparse Semi-Parametric Estimation of Harmonic Chirp Signals

    Get PDF
    In this work, we present a method for estimating the parameters detailing an unknown number of linear, possibly harmonically related, chirp signals, using an iterative sparse reconstruction framework. The proposed method is initiated by a re-weighted group-sparsity approach, followed by an iterative relaxation-based refining step, to allow for high resolution estimates. Numerical simulations illustrate the achievable performance, offering a notable improvement as compared to other recent approaches. The resulting estimates are found to be statistically efficient, achieving the corresponding Cram´er-Rao lower bound

    Time-frequency component analyzer

    Get PDF
    Cataloged from PDF version of article.In this thesis, a new algorithm, time–frequency component analyzer (TFCA), is proposed to analyze composite signals, whose components have compact time–frequency supports. Examples of this type of signals include biological, acoustic, seismic, speech, radar and sonar signals. By conducting its time–frequency analysis in an adaptively chosen warped fractional domain the method provides time–frequency distributions which are as sharp as the Wigner distribution, while suppressing the undesirable interference terms present in the Wigner distribution. Being almost fully automated, TFCA does not require any a priori information on the analyzed signal. By making use of recently developed fast Wigner slice computation algorithm, directionally smoothed Wigner distribution algorithm and fractional domain incision algorithm in the warped fractional domain, the method provides an overall time-frequency representation of the composite signals. It also provides time–frequency representations corresponding to the individual signal components constituting the composite signal. Since, TFCA based analysis enables the extraction of the identified components from the composite signals, it allows detailed post processing of the extracted signal components and their corresponding time–frequency distributions, as well.Özdemir, Ahmet KemalPh.D

    Kombinacija vremensko-frekvencijske analize signala i strojnoga učenja uz primjer u detekciji gravitacijskih valova

    Get PDF
    This paper presents a method for classifying noisy, non-stationary signals in the time-frequency domain using artificial intelligence. The preprocessed time-series signals are transformed into time-frequency representations (TFrs) from Cohen’s class resulting in the TFr images, which are used as input to the machine learning algorithms. We have used three state-of-the-art deep-learning 2d convolutional neural network (Cnn) architectures (ResNet-101, Xception, and EfficientNet). The method was demonstrated on the challenging task of detecting gravitational-wave (gw) signals in intensive real-life, non-stationary, non-gaussian, and non-white noise. The results show excellent classification performance of the proposed approach in terms of classification accuracy, area under the receiver operating characteristic curve (roC auC), recall, precision, F1 score, and area under the precision-recall curve (PR AUC). The novel method outperforms the baseline machine learning model trained on the time-series data in terms of all considered metrics. The study indicates that the proposed technique can also be extended to various other applications dealing with non-stationary data in intensive noise.Ovaj rad predstavlja metodu klasifikacije šumom narušenih nestacionarnih signala u vremensko-frekvencijskoj domeni korištenjem umjetne inteligencije. Naime, signali u obliku vremenskih nizova transformirani su nakon predobrade u vremensko-frekvencijske prikaze (TFR) iz Cohenove klase, rezultirajući TFR slikama korištenim kao ulaz u algoritme strojnoga učenja. Korištene su tri suvremene metode dubokoga učenja u obliku 2D arhitektura konvolucijskih neuronskih mreža (CNN) (ResNet-101, Xception i EfficientNet). Metoda je demonstrirana na zahtjevnom problemu detekcije signala gravitacijskih valova (GW) u intenzivnom stvarnom i nestacionarnom šumu koji nema karakteristike ni Gaussovog ni bijelog šuma. Rezultati pokazuju izvrsne performanse klasifikacije predloženoga pristupa s obzirom na točnost klasifikacije, površinu ispod krivulje značajke djelovanja prijamnika (ROC AUC), odziv, preciznost, F1-mjeru i površinu ispod krivulje preciznost-odziv (PR AUC). Nova metoda nadmašuje osnovni model strojnoga učenja treniran na podatcima u obliku vremenskih nizova s obzirom na razmatrane metrike. Istraživanje pokazuje da se predložena tehnika može proširiti i na različite druge primjene koje uključuju nestacionarne podatke u intenzivnom šumu

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    Guided Lamb Wave Based 2-D Spiral Phased Array for Structural Health Monitoring of Thin Panel Structures

    Get PDF
    In almost all industries of mechanical, aerospace, and civil engineering fields, structural health monitoring (SHM) technology is essentially required for providing the reliable information of structural integrity of safety-critical structures, which can help reduce the risk of unexpected and sometimes catastrophic failures, and also offer cost-effective inspection and maintenance of the structures. State of the art SHM research on structural damage diagnosis is focused on developing global and real-time technologies to identify the existence, location, extent, and type of damage. In order to detect and monitor the structural damage in plate-like structures, SHM technology based on guided Lamb wave (GLW) interrogation is becoming more attractive due to its potential benefits such as large inspection area coverage in short time, simple inspection mechanism, and sensitivity to small damage. However, the GLW method has a few critical issues such as dispersion nature, mode conversion and separation, and multiple-mode existence. Phased array technique widely used in all aspects of civil, military, science, and medical industry fields may be employed to resolve the drawbacks of the GLW method. The GLW-based phased array approach is able to effectively examine and analyze complicated structural vibration responses in thin plate structures. Because the phased sensor array operates as a spatial filter for the GLW signals, the array signal processing method can enhance a desired signal component at a specific direction while eliminating other signal components from other directions. This dissertation presents the development, the experimental validation, and the damage detection applications of an innovative signal processing algorithm based on two-dimensional (2-D) spiral phased array in conjunction with the GLW interrogation technique. It starts with general backgrounds of SHM and the associated technology including the GLW interrogation method. Then, it is focused on the fundamentals of the GLW-based phased array approach and the development of an innovative signal processing algorithm associated with the 2-D spiral phased sensor array. The SHM approach based on array responses determined by the proposed phased array algorithm implementation is addressed. The experimental validation of the GLW-based 2-D spiral phased array technology and the associated damage detection applications to thin isotropic plate and anisotropic composite plate structures are presented

    The Hilbert-Huang Transform for Damage Detection in Plate Structures

    Get PDF
    This thesis investigates the detection of structural damage in plate structures using the empirical mode decomposition method along with the Hilbert spectral analysis. In recent years there have been an extensive amount of research associated with the development of health monitoring methods for aerospace systems, such as aging aircraft and Health and Usage Monitoring Systems (HUMS) for rotorcraft. The method developed here exploits a new time-frequency signal processing analysis tool, the Hilbert-Huang transform, along with the Lamb wave propagation for thin plates. With the use of the wave reflections from discontinuities, damage identification methods were developed to determine the presence, location and extent of damage in isotropic and composite plate structures. The ability of the empirical mode decomposition to extract embedded oscillations, to reveal hidden reflections in the data and to provide a high-resolution energy-time-frequency spectrum is used to describe the Lamb waves interactions with various damaged regions

    Sound Event Localization, Detection, and Tracking by Deep Neural Networks

    Get PDF
    In this thesis, we present novel sound representations and classification methods for the task of sound event localization, detection, and tracking (SELDT). The human auditory system has evolved to localize multiple sound events, recognize and further track their motion individually in an acoustic environment. This ability of humans makes them context-aware and enables them to interact with their surroundings naturally. Developing similar methods for machines will provide an automatic description of social and human activities around them and enable machines to be context-aware similar to humans. Such methods can be employed to assist the hearing impaired to visualize sounds, for robot navigation, and to monitor biodiversity, the home, and cities. A real-life acoustic scene is complex in nature, with multiple sound events that are temporally and spatially overlapping, including stationary and moving events with varying angular velocities. Additionally, each individual sound event class, for example, a car horn can have a lot of variabilities, i.e., different cars have different horns, and within the same model of the car, the duration and the temporal structure of the horn sound is driver dependent. Performing SELDT in such overlapping and dynamic sound scenes while being robust is challenging for machines. Hence we propose to investigate the SELDT task in this thesis and use a data-driven approach using deep neural networks (DNNs). The sound event detection (SED) task requires the detection of onset and offset time for individual sound events and their corresponding labels. In this regard, we propose to use spatial and perceptual features extracted from multichannel audio for SED using two different DNNs, recurrent neural networks (RNNs) and convolutional recurrent neural networks (CRNNs). We show that using multichannel audio features improves the SED performance for overlapping sound events in comparison to traditional single-channel audio features. The proposed novel features and methods produced state-of-the-art performance for the real-life SED task and won the IEEE AASP DCASE challenge consecutively in 2016 and 2017. Sound event localization is the task of spatially locating the position of individual sound events. Traditionally, this has been approached using parametric methods. In this thesis, we propose a CRNN for detecting the azimuth and elevation angles of multiple temporally overlapping sound events. This is the first DNN-based method performing localization in complete azimuth and elevation space. In comparison to parametric methods which require the information of the number of active sources, the proposed method learns this information directly from the input data and estimates their respective spatial locations. Further, the proposed CRNN is shown to be more robust than parametric methods in reverberant scenarios. Finally, the detection and localization tasks are performed jointly using a CRNN. This method additionally tracks the spatial location with time, thus producing the SELDT results. This is the first DNN-based SELDT method and is shown to perform equally with stand-alone baselines for SED, localization, and tracking. The proposed SELDT method is evaluated on nine datasets that represent anechoic and reverberant sound scenes, stationary and moving sources with varying velocities, a different number of overlapping sound events and different microphone array formats. The results show that the SELDT method can track multiple overlapping sound events that are both spatially stationary and moving

    Audio computing in the wild: frameworks for big data and small computers

    Get PDF
    This dissertation presents some machine learning algorithms that are designed to process as much data as needed while spending the least possible amount of resources, such as time, energy, and memory. Examples of those applications, but not limited to, can be a large-scale multimedia information retrieval system where both queries and the items in the database are noisy signals; collaborative audio enhancement from hundreds of user-created clips of a music concert; an event detection system running in a small device that has to process various sensor signals in real time; a lightweight custom chipset for speech enhancement on hand-held devices; instant music analysis engine running on smartphone apps. In all those applications, efficient machine learning algorithms are supposed to achieve not only a good performance, but also a great resource-efficiency. We start from some efficient dictionary-based single-channel source separation algorithms. We can train this kind of source-specific dictionaries by using some matrix factorization or topic modeling, whose elements form a representative set of spectra for the particular source. During the test time, the system estimates the contribution of the participating dictionary items for an unknown mixture spectrum. In this way we can estimate the activation of each source separately, and then recover the source of interest by using that particular source's reconstruction. There are some efficiency issues during this procedure. First off, searching for the optimal dictionary size is time consuming. Although for some very common types of sources, e.g. English speech, we know the optimal rank of the model by trial and error, it is hard to know in advance as to what is the optimal number of dictionary elements for the unknown sources, which are usually modeled during the test time in the semi-supervised separation scenarios. On top of that, when it comes to the non-stationary unknown sources, we had better maintain a dictionary that adapts its size and contents to the change of the source's nature. In this online semi-supervised separation scenario, a mechanism that can efficiently learn the optimal rank is helpful. To this end, a deflation method is proposed for modeling this unknown source with a nonnegative dictionary whose size is optimal. Since it has to be done during the test time, the deflation method that incrementally adds up new dictionary items shows better efficiency than a corresponding na\"ive approach where we simply try a bunch of different models. We have another efficiency issue when we are to use a large dictionary for better separation. It has been known that considering the manifold of the training data can help enhance the performance for the separation. This is because of the symptom that the usual manifold-ignorant convex combination models, such as from low-rank matrix decomposition or topic modeling, tend to result in ambiguous regions in the source-specific subspace defined by the dictionary items as the bases. For example, in those ambiguous regions, the original data samples cannot reside. Although some source separation techniques that respect data manifold could increase the performance, they call for more memory and computational resources due to the fact that the models call for larger dictionaries and involve sparse coding during the test time. This limitation led the development of hashing-based encoding of the audio spectra, so that some computationally heavy routines, such as nearest neighbor searches for sparse coding, can be performed in a cheaper bit-wise fashion. Matching audio signals can be challenging as well, especially if the signals are noisy and the matching task involves a big amount of signals. If it is an information retrieval application, for example, the bigger size of the data leads to a longer response time. On top of that, if the signals are defective, we have to perform the enhancement or separation job in the first place before matching, or we might need a matching mechanism that is robust to all those different kinds of artifacts. Likewise, the noisy nature of signals can add an additional complexity to the system. In this dissertation we will also see some compact integer (and eventually binary) representations for those matching systems. One of the possible compact representations would be a hashing-based matching method, where we can employ a particular kind of hash functions to preserve the similarity among original signals in the hash code domain. We will see that a variant of Winner Take All hashing can provide Hamming distance from noise-robust binary features, and that matching using the hash codes works well for some keyword spotting tasks. From the fact that some landmark hashes (e.g. local maxima from non-maximum suppression on the magnitudes of a mel-scaled spectrogram) can also robustly represent the time-frequency domain signal efficiently, a matrix decomposition algorithm is also proposed to take those irregular sparse matrices as input. Based on the assumption that the number of landmarks is a lot smaller than the number of all the time-frequency coefficients, we can think of this matching algorithm efficient if it operates entirely on the landmark representation. On the contrary to the usual landmark matching schemes, where matching is defined rigorously, we see the audio matching problem as soft matching where we find a similar constellation of landmarks to the query. In order to perform this soft matching job, the landmark positions are smoothed by a fixed-width Gaussian caps, with which the matching job is reduced down to calculating the amount of overlaps in-between those Gaussians. The Gaussian-based density approximation is also useful when we perform decomposition on this landmark representation, because otherwise the landmarks are usually too sparse to perform an ordinary matrix factorization algorithm, which are originally for a dense input matrix. We also expand this concept to the matrix deconvolution problem as well, where we see the input landmark representation of a source as a two-dimensional convolution between a source pattern and its corresponding sparse activations. If there are more than one source, as a noisy signal, we can think of this problem as factor deconvolution where the mixture is the combination of all the source-specific convolutions. The dissertation also covers Collaborative Audio Enhancement (CAE) algorithms that aim to recover the dominant source at a sound scene (e.g. music signals of a concert rather than the noise from the crowd) from multiple low-quality recordings (e.g. Youtube video clips uploaded by the audience). CAE can be seen as crowdsourcing a recording job, which needs a substantial amount of denoising effort afterward, because the user-created recordings might have been contaminated with various artifacts. In the sense that the recordings are from not-synchronized heterogenous sensors, we can also think of CAE as big ad-hoc sensor array processing. In CAE, each recording is assumed to be uniquely corrupted by a specific frequency response of the microphone, an aggressive audio coding algorithm, interference, band-pass filtering, clipping, etc. To consolidate all these recordings and come up with an enhanced audio, Probabilistic Latent Component Sharing (PLCS) has been proposed as a method of simultaneous probabilistic topic modeling on synchronized input signals. In PLCS, some of the parameters are fixed to be same during and after the learning process to capture common audio content, while the rest of the parameters are for the unwanted recording-specific interference and artifacts. We can speed up PLCS by incorporating a hashing-based nearest neighbor search so that at every EM iteration PLCS can be applied only to a small number of recordings that are closest to the current source estimation. Experiments on a small simulated CAE setup shows that the proposed PLCS can improve the sound quality from variously contaminated recordings. The nearest neighbor search technique during PLCS provides sensible speed-up at larger scaled experiments (up to 1000 recordings). Finally, to describe an extremely optimized deep learning deployment system, Bitwise Neural Networks (BNN) will be also discussed. In the proposed BNN, all the input, hidden, and output nodes are binaries (+1 and -1), and so are all the weights and bias. Consequently, the operations on them during the test time are defined with Boolean algebra, too. BNNs are spatially and computationally efficient in implementations, since (a) we represent a real-valued sample or parameter with a bit (b) the multiplication and addition correspond to bitwise XNOR and bit-counting, respectively. Therefore, BNNs can be used to implement a deep learning system in a resource-constrained environment, so that we can deploy a deep learning system on small devices without using up the power, memory, CPU clocks, etc. The training procedure for BNNs is based on a straightforward extension of backpropagation, which is characterized by the use of the quantization noise injection scheme, and the initialization strategy that learns a weight-compressed real-valued network only for the initialization purpose. Some preliminary results on the MNIST dataset and speech denoising demonstrate that a straightforward extension of backpropagation can successfully train BNNs whose performance is comparable while necessitating vastly fewer computational resources
    corecore