748 research outputs found

    Using autoencoders for radio signal denoising

    Get PDF
    We investigated the use of a Deep Learning approach to radio signal de-noising. This data-driven approach has does not require explicit use of expert knowledge to set up the parameters of the denoising procedure and grants great flexibility across many channel conditions. The core component used in this work is a Convolutional De-noising AutoEncoder, known to be very effective in image processing. The key of our approach consists in transforming the radio signal into a representation suitable to the CDAE: we transform the time-domain signal into a 2D signal using the Short Time Fourier Transform. We report about the performance of the approach in preamble denoising across protocols of the IEEE 802.11 family, studied using simulation data. This approach could be used within a machine learning pipeline: the denoised data can be fed to a protocol classifier. A perspective advantage of using the AutoEncoders in that pipeline is that they can be co-trained with the downstream classifier, to optimize the classification accuracy

    A Deep Learning Approach to Radio Signal Denoising

    Get PDF
    This paper proposes a Deep Learning approach to radio signal de-noising. This approach is data-driven, thus it allows de-noising signals, corresponding to distinct protocols, without requiring explicit use of expert knowledge, in this way granting higher flexibility. The core component of the Artificial Neural Network architecture used in this work is a Convolutional De-noising AutoEncoder. We report about the performance of the system in spectrogram-based denoising of the protocol preamble across protocols of the IEEE 802.11 family, studied using simulation data. This approach can be used within a machine learning pipeline: the denoised data can be fed to a protocol classifier. A further perspective advantage of using the AutoEncoders in such a pipeline is that they can be co-trained with the downstream classifier (protocol detector), to optimize its accuracy

    Deep Room Recognition Using Inaudible Echos

    Full text link
    Recent years have seen the increasing need of location awareness by mobile applications. This paper presents a room-level indoor localization approach based on the measured room's echos in response to a two-millisecond single-tone inaudible chirp emitted by a smartphone's loudspeaker. Different from other acoustics-based room recognition systems that record full-spectrum audio for up to ten seconds, our approach records audio in a narrow inaudible band for 0.1 seconds only to preserve the user's privacy. However, the short-time and narrowband audio signal carries limited information about the room's characteristics, presenting challenges to accurate room recognition. This paper applies deep learning to effectively capture the subtle fingerprints in the rooms' acoustic responses. Our extensive experiments show that a two-layer convolutional neural network fed with the spectrogram of the inaudible echos achieve the best performance, compared with alternative designs using other raw data formats and deep models. Based on this result, we design a RoomRecognize cloud service and its mobile client library that enable the mobile application developers to readily implement the room recognition functionality without resorting to any existing infrastructures and add-on hardware. Extensive evaluation shows that RoomRecognize achieves 99.7%, 97.7%, 99%, and 89% accuracy in differentiating 22 and 50 residential/office rooms, 19 spots in a quiet museum, and 15 spots in a crowded museum, respectively. Compared with the state-of-the-art approaches based on support vector machine, RoomRecognize significantly improves the Pareto frontier of recognition accuracy versus robustness against interfering sounds (e.g., ambient music).Comment: 29 page

    A Detailed Investigation into Low-Level Feature Detection in Spectrogram Images

    Get PDF
    Being the first stage of analysis within an image, low-level feature detection is a crucial step in the image analysis process and, as such, deserves suitable attention. This paper presents a systematic investigation into low-level feature detection in spectrogram images. The result of which is the identification of frequency tracks. Analysis of the literature identifies different strategies for accomplishing low-level feature detection. Nevertheless, the advantages and disadvantages of each are not explicitly investigated. Three model-based detection strategies are outlined, each extracting an increasing amount of information from the spectrogram, and, through ROC analysis, it is shown that at increasing levels of extraction the detection rates increase. Nevertheless, further investigation suggests that model-based detection has a limitation—it is not computationally feasible to fully evaluate the model of even a simple sinusoidal track. Therefore, alternative approaches, such as dimensionality reduction, are investigated to reduce the complex search space. It is shown that, if carefully selected, these techniques can approach the detection rates of model-based strategies that perform the same level of information extraction. The implementations used to derive the results presented within this paper are available online from http://stdetect.googlecode.com

    Acoustic signal processing based on the short-time spectrum

    Get PDF
    technical reportThe frequency domain representation of a time signal afforded by the Fourier transform is a powerful tool in acoustic signal processing. The usefulness of this representation is rooted in the mechanisms of sound production and perception. Many sources of sound exhibit normal modes or natural frequencies of vibration, and can be described concisely in the frequency domain. The human auditory system performs frequency analysis early in the hearing process, so perception is often best described by frequency domain parameters. This dissertation investigates a new approach to acoustic signal processing based on the short-time fourier transform, a two dimensional representation which shows the time and frequency structure of sounds. This representation is appropriate for signals such as speech and music. Where the natural frequencies of the source change and timing of these changes is important to perception. The principal advantage of this approach is that the signal processing domain is similar to the perceptual domain, so that signal modifications can be related to perceptual criteria. The mathematical basis for this type of processing is developed, and four examples are described: removal of broad band background noise, isolation of perceptually important speech features, dynamic range compression and expansion, and removal of locally periodic interfering signals

    The analysis of composition techniques in utp_: Synthetic composition for electroacoustic ensembles

    Get PDF
    This thesis attempts to analyse and describe a number of spectrally oriented composition techniques for composing music for electroacoustic ensemble. These techniques aim to achieve a synthetic approach to combining electronic and acoustic sound sources in live performance. To achieve this, an in-depth analysis of utp_ (2008) by Alva Noto and Ryuichi Sakamoto in collaboration with Ensemble Modern is conducted. utp_ utilises a large acoustic ensemble, live electronic processing, prerecorded electronic sound and video projections in performance. The discussion also queries the possible problems of electroacoustic performance, and examines ways to resolve the most prevalent issues. This involves a discussion of the materials of electroacoustic works, timbral differences in acoustic and electronic sounds and liveness in electroacoustic music performance. The analysis involves using spectral and score analysis to identify composition techniques. The final section describes the way these composition techniques are applied in my own work for electroacoustic ensemble, lucidity

    Evaluating Content-centric vs User-centric Ad Affect Recognition

    Get PDF
    Despite the fact that advertisements (ads) often include strongly emotional content, very little work has been devoted to affect recognition (AR) from ads. This work explicitly compares content-centric and user-centric ad AR methodologies, and evaluates the impact of enhanced AR on computational advertising via a user study. Specifically, we (1) compile an affective ad dataset capable of evoking coherent emotions across users; (2) explore the efficacy of content-centric convolutional neural network (CNN) features for encoding emotions, and show that CNN features outperform low-level emotion descriptors; (3) examine user-centered ad AR by analyzing Electroencephalogram (EEG) responses acquired from eleven viewers, and find that EEG signals encode emotional information better than content descriptors; (4) investigate the relationship between objective AR and subjective viewer experience while watching an ad-embedded online video stream based on a study involving 12 users. To our knowledge, this is the first work to (a) expressly compare user vs content-centered AR for ads, and (b) study the relationship between modeling of ad emotions and its impact on a real-life advertising application.Comment: Accepted at the ACM International Conference on Multimodal Interation (ICMI) 201
    • …
    corecore