104 research outputs found

    Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

    Full text link
    Enhancement algorithms for wireless acoustics sensor networks~(WASNs) are indispensable with the increasing availability and usage of connected devices with microphones. Conventional spatial filtering approaches for enhancement in WASNs approximate quantization noise with an additive Gaussian distribution, which limits performance due to the non-linear nature of quantization noise at lower bitrates. In this work, we propose a postfilter for enhancement based on Bayesian statistics to obtain a multidevice signal estimate, which explicitly models the quantization noise. Our experiments using PSNR, PESQ and MUSHRA scores demonstrate that the proposed postfilter can be used to enhance signal quality in ad-hoc sensor networks

    Efficient speaker recognition for mobile devices

    Get PDF

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Secure covert communications over streaming media using dynamic steganography

    Get PDF
    Streaming technologies such as VoIP are widely embedded into commercial and industrial applications, so it is imperative to address data security issues before the problems get really serious. This thesis describes a theoretical and experimental investigation of secure covert communications over streaming media using dynamic steganography. A covert VoIP communications system was developed in C++ to enable the implementation of the work being carried out. A new information theoretical model of secure covert communications over streaming media was constructed to depict the security scenarios in streaming media-based steganographic systems with passive attacks. The model involves a stochastic process that models an information source for covert VoIP communications and the theory of hypothesis testing that analyses the adversary‘s detection performance. The potential of hardware-based true random key generation and chaotic interval selection for innovative applications in covert VoIP communications was explored. Using the read time stamp counter of CPU as an entropy source was designed to generate true random numbers as secret keys for streaming media steganography. A novel interval selection algorithm was devised to choose randomly data embedding locations in VoIP streams using random sequences generated from achaotic process. A dynamic key updating and transmission based steganographic algorithm that includes a one-way cryptographical accumulator integrated into dynamic key exchange for covert VoIP communications, was devised to provide secure key exchange for covert communications over streaming media. The discrete logarithm problem in mathematics and steganalysis using t-test revealed the algorithm has the advantage of being the most solid method of key distribution over a public channel. The effectiveness of the new steganographic algorithm for covert communications over streaming media was examined by means of security analysis, steganalysis using non parameter Mann-Whitney-Wilcoxon statistical testing, and performance and robustness measurements. The algorithm achieved the average data embedding rate of 800 bps, comparable to other related algorithms. The results indicated that the algorithm has no or little impact on real-time VoIP communications in terms of speech quality (< 5% change in PESQ with hidden data), signal distortion (6% change in SNR after steganography) and imperceptibility, and it is more secure and effective in addressing the security problems than other related algorithms

    Estimation and Modeling Problems in Parametric Audio Coding

    Get PDF

    Measurements in Perceptual Annoyance of Audio Coding Artifacts

    Get PDF
    TÀssÀ diplomityössÀ tutkitaan matalan bittinopeuden puhe- ja audiokooderin USACin kehityksessÀ merkittÀvÀksi koettujen koodausartifaktien psykoakustista ÀrsyttÀvyyttÀ. Tutkielmassa kÀsitellÀÀn neljÀÀ ilmiötÀ, jotka on eritelty alempana. Artifaktit mallinnettiin MATLAB(R)-ohjelmistolla ja niiden ÀrsyttÀvyyttÀ arvioitiin kuuntelukokein. Työn toimeksiantaja on saksalainen Fraunhofer-instituutti, joka tunnetaan muun muassa MP3-koodekin kehittÀjÀnÀ. Audionkoodauksessa signaaleja kÀsitellÀÀn yleensÀ noin 20-50 millisekunnin pituisina kehyksinÀ, jolloin koodausartifaktit voivat vaihdella nopeastikin. TÀmÀn ilmiön ÀrsyttÀvyyttÀ tutkittiin varioimalla kapeakaistaisen kohinan sekÀ yksittÀisten harmonisten voimakkuutta eri nopeuksilla. Koetulosten perusteella keskinopea vaihtelu koetaan ÀrsyttÀvimmÀksi. Harmoninen kaistanleveyden laajennus (harmonic bandwidth extension) on menetelmÀ, jolla voidaan luoda harmonisia komponentteja rajataajuuden ylÀpuolelle alkuperÀistÀ spektriÀ venyttÀmÀllÀ. NÀin audiosignaalin bittinopeutta voidaan laskea, kun ylimpiÀ harmonisia ei tarvitse koodata eksplisiittisesti, vaan ne voidaan generoida dekoodauksessa. Koska luotujen harmonisisten joukko on kuitenkin aina puutteellinen, saattaa syntyÀ vaikutelma ylimÀÀrÀisestÀ sÀvelkorkeudesta (ghost pitch). Kuuntelukokeessa tutkittiin synteettisillÀ ÀÀnillÀ, miten tÀmÀn ilmiön voimakkuus riippuu ÀÀnen perustaajuudesta ja valitusta rajataajuudesta. Kuulon peittokÀyrÀÀ voidaan approksimoida tehokkaasti spektrin verhokÀyrÀllÀ, jota kÀyttÀen itse signaalikehys voidaan siirtÀÀ perkeptuaaliseen alueeseen kvantisoitavaksi. Kvantisointikohinan peittymistÀ voidaan tehostaa sÀÀtÀmÀllÀ verhokÀyrÀn pehmeyttÀ sen siirtofunktioon sijoitetulla vakiolla. TyössÀ esitetÀÀn ehdotus tÀmÀn parametrin arvoksi. Sopivasti muokattua verhokÀyrÀÀ voidaan kÀyttÀÀ myös spektrin voimakkaiden osien vahvistamiseen ja heikkojen osien vaimentamiseen. Puhesignaaleilla huomattiin, ettÀ tÀllÀ formanttien korostamisella voidaan peittÀÀ kvantisointikohinaa, mutta samalla sointivÀri muuttuu epÀluonnollisemmaksi. TekstissÀ esitetÀÀn malli optimaalisten muokkausvakioiden valitsemiseksi perkeptuaalisen signaali-kohinasuhteen funktiona.This thesis discusses the perceptual annoyance of several audio coding artifacts that have become of interest during the development of USAC, a new low-bitrate speech and audio coder. A total of four different coding-related phenomena, all of which are explained below, were investigated in this study. All artifacts were artificially generated using MATLAB(R) and evaluated in listening tests with approximately ten participants in each. This work was commissioned by Fraunhofer IIS, Germany - a leader in audio coding technology and the home of MP3. In audio coding, signals are usually processed in frames with a length of 20 to 50 milliseconds, which may cause rapid variations in artifacts. In our tests, the level of critical-bandwidth noise or single harmonics was altered with various speeds. The results suggest that moderate-speed variations are considered the most annoying. Harmonic bandwidth extension is a method that generates artificial harmonics by stretching spectra in frequency. It is useful in audio compression because upper harmonics need not be encoded explicitly, but can be approximately reconstructed in the decoding phase. However, the generated harmonic patch will inevitably be incomplete, which may cause a false additional pitch sensation. The perceived strength of this ghost pitch was examined with synthetic tones as a function of fundamental and crossover frequencies. The masking curve of a signal frame can be efficiently modelled with a spectral envelope. It can then be used for transferring the frame to the perceptual domain for quantization. The resulting quantization noise will be less audible if the smoothness of the envelope is properly adjusted in the first place by modifying the transfer function with a constant. A proposal for the optimal constant value is provided in this study. Strong parts of a signal spectrum can be boosted and weak parts diminished by multiplying the spectrum with its modified envelope. This technique, known as formant enhancement, enables a better masking of quantization noise, but tends to render the overall tone unnatural. A model for selecting the optimal spectrum modification parameter values as a function of perceptual signal-to-noise ratio is proposed

    Bitstream-based video quality modeling and analysis of HTTP-based adaptive streaming

    Get PDF
    Die Verbreitung erschwinglicher Videoaufnahmetechnologie und verbesserte Internetbandbreiten ermöglichen das Streaming von hochwertigen Videos (Auflösungen > 1080p, Bildwiederholraten ≄ 60fps) online. HTTP-basiertes adaptives Streaming ist die bevorzugte Methode zum Streamen von Videos, bei der Videoparameter an die verfĂŒgbare Bandbreite angepasst wird, was sich auf die VideoqualitĂ€t auswirkt. Adaptives Streaming reduziert Videowiedergabeunterbrechnungen aufgrund geringer Netzwerkbandbreite, wirken sich jedoch auf die wahrgenommene QualitĂ€t aus, weswegen eine systematische Bewertung dieser notwendig ist. Diese Bewertung erfolgt ĂŒblicherweise fĂŒr kurze Abschnitte von wenige Sekunden und wĂ€hrend einer Sitzung (bis zu mehreren Minuten). Diese Arbeit untersucht beide Aspekte mithilfe perzeptiver und instrumenteller Methoden. Die perzeptive Bewertung der kurzfristigen VideoqualitĂ€t umfasst eine Reihe von Labortests, die in frei verfĂŒgbaren DatensĂ€tzen publiziert wurden. Die QualitĂ€t von lĂ€ngeren Sitzungen wurde in Labortests mit menschlichen Betrachtern bewertet, die reale Betrachtungsszenarien simulieren. Die Methodik wurde zusĂ€tzlich außerhalb des Labors fĂŒr die Bewertung der kurzfristigen VideoqualitĂ€t und der GesamtqualitĂ€t untersucht, um alternative AnsĂ€tze fĂŒr die perzeptive QualitĂ€tsbewertung zu erforschen. Die instrumentelle QualitĂ€tsevaluierung wurde anhand von bitstrom- und hybriden pixelbasierten VideoqualitĂ€tsmodellen durchgefĂŒhrt, die im Zuge dieser Arbeit entwickelt wurden. Dazu wurde die Modellreihe AVQBits entwickelt, die auf den Labortestergebnissen basieren. Es wurden vier verschiedene Modellvarianten von AVQBits mit verschiedenen Inputinformationen erstellt: Mode 3, Mode 1, Mode 0 und Hybrid Mode 0. Die Modellvarianten wurden untersucht und schneiden besser oder gleichwertig zu anderen aktuellen Modellen ab. Diese Modelle wurden auch auf 360°- und Gaming-Videos, HFR-Inhalte und Bilder angewendet. DarĂŒber hinaus wird ein Langzeitintegrationsmodell (1 - 5 Minuten) auf der Grundlage des ITU-T-P.1203.3-Modells prĂ€sentiert, das die verschiedenen Varianten von AVQBits mit sekĂŒndigen QualitĂ€tswerten als VideoqualitĂ€tskomponente des vorgeschlagenen Langzeitintegrationsmodells verwendet. Alle AVQBits-Varianten, das Langzeitintegrationsmodul und die perzeptiven Testdaten wurden frei zugĂ€nglich gemacht, um weitere Forschung zu ermöglichen.The pervasion of affordable capture technology and increased internet bandwidth allows high-quality videos (resolutions > 1080p, framerates ≄ 60fps) to be streamed online. HTTP-based adaptive streaming is the preferred method for streaming videos, adjusting video quality based on available bandwidth. Although adaptive streaming reduces the occurrences of video playout being stopped (called “stalling”) due to narrow network bandwidth, the automatic adaptation has an impact on the quality perceived by the user, which results in the need to systematically assess the perceived quality. Such an evaluation is usually done on a short-term (few seconds) and overall session basis (up to several minutes). In this thesis, both these aspects are assessed using subjective and instrumental methods. The subjective assessment of short-term video quality consists of a series of lab-based video quality tests that have resulted in publicly available datasets. The overall integral quality was subjectively assessed in lab tests with human viewers mimicking a real-life viewing scenario. In addition to the lab tests, the out-of-the-lab test method was investigated for both short-term video quality and overall session quality assessment to explore the possibility of alternative approaches for subjective quality assessment. The instrumental method of quality evaluation was addressed in terms of bitstream- and hybrid pixel-based video quality models developed as part of this thesis. For this, a family of models, namely AVQBits has been conceived using the results of the lab tests as ground truth. Based on the available input information, four different instances of AVQBits, that is, a Mode 3, a Mode 1, a Mode 0, and a Hybrid Mode 0 model are presented. The model instances have been evaluated and they perform better or on par with other state-of-the-art models. These models have further been applied to 360° and gaming videos, HFR content, and images. Also, a long-term integration (1 - 5 mins) model based on the ITU-T P.1203.3 model is presented. In this work, the different instances of AVQBits with the per-1-sec scores output are employed as the video quality component of the proposed long-term integration model. All AVQBits variants as well as the long-term integration module and the subjective test data are made publicly available for further research

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Modelling, Simulation and Data Analysis in Acoustical Problems

    Get PDF
    Modelling and simulation in acoustics is currently gaining importance. In fact, with the development and improvement of innovative computational techniques and with the growing need for predictive models, an impressive boost has been observed in several research and application areas, such as noise control, indoor acoustics, and industrial applications. This led us to the proposal of a special issue about “Modelling, Simulation and Data Analysis in Acoustical Problems”, as we believe in the importance of these topics in modern acoustics’ studies. In total, 81 papers were submitted and 33 of them were published, with an acceptance rate of 37.5%. According to the number of papers submitted, it can be affirmed that this is a trending topic in the scientific and academic community and this special issue will try to provide a future reference for the research that will be developed in coming years
    • 

    corecore