483 research outputs found

    A Method to Detect AAC Audio Forgery

    Get PDF
    Advanced Audio Coding (AAC), a standardized lossy compression scheme for digital audio, which was designed to be the successor of the MP3 format, generally achieves better sound quality than MP3 at similar bit rates. While AAC is also the default or standard audio format for many devices and AAC audio files may be presented as important digital evidences, the authentication of the audio files is highly needed but relatively missing. In this paper, we propose a scheme to expose tampered AAC audio streams that are encoded at the same encoding bit-rate. Specifically, we design a shift-recompression based method to retrieve the differential features between the re-encoded audio stream at each shifting and original audio stream, learning classifier is employed to recognize different patterns of differential features of the doctored forgery files and original (untouched) audio files. Experimental results show that our approach is very promising and effective to detect the forgery of the same encoding bit-rate on AAC audio streams. Our study also shows that shift recompression-based differential analysis is very effective for detection of the MP3 forgery at the same bit rate

    Audio Coding Based on Integer Transforms

    Get PDF
    Die Audiocodierung hat sich in den letzten Jahren zu einem sehr populĂ€ren Forschungs- und Anwendungsgebiet entwickelt. Insbesondere gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3 (MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden hĂ€ufig zur effizienten Speicherung und Übertragung von Audiosignalen verwendet. FĂŒr professionelle Anwendungen, wie etwa die Archivierung und Übertragung im Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht. Die bisherigen AnsĂ€tze fĂŒr gehörangepasste und verlustlose Audiocodierung sind technisch völlig verschieden. Moderne gehörangepasste Audiocoder basieren meist auf FilterbĂ€nken, wie etwa der ĂŒberlappenden orthogonalen Transformation "Modifizierte Diskrete Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen verwenden meist prĂ€diktive Codierung zur Redundanzreduktion. Nur wenige AnsĂ€tze zur transformationsbasierten verlustlosen Audiocodierung wurden bisher versucht. Diese Arbeit prĂ€sentiert einen neuen Ansatz hierzu, der das Lifting-Schema auf die in der gehörangepassten Audiocodierung verwendeten ĂŒberlappenden Transformationen anwendet. Dies ermöglicht eine invertierbare Integer-Approximation der ursprĂŒnglichen Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die selbe Technik kann auch fĂŒr FilterbĂ€nke mit niedriger Systemverzögerung angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler Lifting-Ansatz und eine Technik zur Spektralformung von Quantisierungsfehlern eine Verbesserung der Approximation der ursprĂŒnglichen Transformation. Basierend auf diesen neuen Integer-Transformationen werden in dieser Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren umfassen verlustlose Audiocodierung, eine skalierbare verlustlose Erweiterung eines gehörangepassten Audiocoders und einen integrierten Ansatz zur fein skalierbaren gehörangepassten und verlustlosen Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for research and applications. Especially perceptual audio coding schemes, such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are widely used for efficient storage and transmission of music signals. Nevertheless, for professional applications, such as archiving and transmission in studio environments, lossless audio coding schemes are considered more appropriate. Traditionally, the technical approaches used in perceptual and lossless audio coding have been separate worlds. In perceptual audio coding, the use of filter banks, such as the lapped orthogonal transform "Modified Discrete Cosine Transform" (MDCT), has been the approach of choice being used by many state of the art coding schemes. On the other hand, lossless audio coding schemes mostly employ predictive coding of waveforms to remove redundancy. Only few attempts have been made so far to use transform coding for the purpose of lossless audio coding. This work presents a new approach of applying the lifting scheme to lapped transforms used in perceptual audio coding. This allows for an invertible integer-to-integer approximation of the original transform, e.g. the IntMDCT as an integer approximation of the MDCT. The same technique can also be applied to low-delay filter banks. A generalized, multi-dimensional lifting approach and a noise-shaping technique are introduced, allowing to further optimize the accuracy of the approximation to the original transform. Based on these new integer transforms, this work presents new audio coding schemes and applications. The audio coding applications cover lossless audio coding, scalable lossless enhancement of a perceptual audio coder and fine-grain scalable perceptual and lossless audio coding. Finally an approach to data hiding with high data rates in uncompressed audio signals based on integer transforms is described

    Audio Coding Based on Integer Transforms

    Get PDF
    Die Audiocodierung hat sich in den letzten Jahren zu einem sehr populĂ€ren Forschungs- und Anwendungsgebiet entwickelt. Insbesondere gehörangepasste Verfahren zur Audiocodierung, wie etwa MPEG-1 Layer-3 (MP3) oder MPEG-2 Advanced Audio Coding (AAC), werden hĂ€ufig zur effizienten Speicherung und Übertragung von Audiosignalen verwendet. FĂŒr professionelle Anwendungen, wie etwa die Archivierung und Übertragung im Studiobereich, ist hingegen eher eine verlustlose Audiocodierung angebracht. Die bisherigen AnsĂ€tze fĂŒr gehörangepasste und verlustlose Audiocodierung sind technisch völlig verschieden. Moderne gehörangepasste Audiocoder basieren meist auf FilterbĂ€nken, wie etwa der ĂŒberlappenden orthogonalen Transformation "Modifizierte Diskrete Cosinus-Transformation" (MDCT). Verlustlose Audiocoder hingegen verwenden meist prĂ€diktive Codierung zur Redundanzreduktion. Nur wenige AnsĂ€tze zur transformationsbasierten verlustlosen Audiocodierung wurden bisher versucht. Diese Arbeit prĂ€sentiert einen neuen Ansatz hierzu, der das Lifting-Schema auf die in der gehörangepassten Audiocodierung verwendeten ĂŒberlappenden Transformationen anwendet. Dies ermöglicht eine invertierbare Integer-Approximation der ursprĂŒnglichen Transformation, z.B. die IntMDCT als Integer-Approximation der MDCT. Die selbe Technik kann auch fĂŒr FilterbĂ€nke mit niedriger Systemverzögerung angewandt werden. Weiterhin ermöglichen ein neuer, mehrdimensionaler Lifting-Ansatz und eine Technik zur Spektralformung von Quantisierungsfehlern eine Verbesserung der Approximation der ursprĂŒnglichen Transformation. Basierend auf diesen neuen Integer-Transformationen werden in dieser Arbeit neue Verfahren zur Audiocodierung vorgestellt. Die Verfahren umfassen verlustlose Audiocodierung, eine skalierbare verlustlose Erweiterung eines gehörangepassten Audiocoders und einen integrierten Ansatz zur fein skalierbaren gehörangepassten und verlustlosen Audiocodierung. Schließlich wird mit Hilfe der Integer-Transformationen ein neuer Ansatz zur unhörbaren Einbettung von Daten mit hohen Datenraten in unkomprimierte Audiosignale vorgestellt.In recent years audio coding has become a very popular field for research and applications. Especially perceptual audio coding schemes, such as MPEG-1 Layer-3 (MP3) and MPEG-2 Advanced Audio Coding (AAC), are widely used for efficient storage and transmission of music signals. Nevertheless, for professional applications, such as archiving and transmission in studio environments, lossless audio coding schemes are considered more appropriate. Traditionally, the technical approaches used in perceptual and lossless audio coding have been separate worlds. In perceptual audio coding, the use of filter banks, such as the lapped orthogonal transform "Modified Discrete Cosine Transform" (MDCT), has been the approach of choice being used by many state of the art coding schemes. On the other hand, lossless audio coding schemes mostly employ predictive coding of waveforms to remove redundancy. Only few attempts have been made so far to use transform coding for the purpose of lossless audio coding. This work presents a new approach of applying the lifting scheme to lapped transforms used in perceptual audio coding. This allows for an invertible integer-to-integer approximation of the original transform, e.g. the IntMDCT as an integer approximation of the MDCT. The same technique can also be applied to low-delay filter banks. A generalized, multi-dimensional lifting approach and a noise-shaping technique are introduced, allowing to further optimize the accuracy of the approximation to the original transform. Based on these new integer transforms, this work presents new audio coding schemes and applications. The audio coding applications cover lossless audio coding, scalable lossless enhancement of a perceptual audio coder and fine-grain scalable perceptual and lossless audio coding. Finally an approach to data hiding with high data rates in uncompressed audio signals based on integer transforms is described

    A generic audio classification and segmentation approach for multimedia indexing and retrieval

    Full text link

    UNSUPERVISED SEGMENTATION AND CLASSIFICATION OVER MP3 AND AAC AUDIO BITSTREAMS

    Full text link

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    Design and Implementation of HD Wireless Video Transmission System Based on Millimeter Wave

    Get PDF
    With the improvement of optical fiber communication network construction and the improvement of camera technology, the video that the terminal can receive becomes clearer, with resolution up to 4K. Although optical fiber communication has high bandwidth and fast transmission speed, it is not the best solution for indoor short-distance video transmission in terms of cost, laying difficulty and speed. In this context, this thesis proposes to design and implement a multi-channel wireless HD video transmission system with high transmission performance by using the 60GHz millimeter wave technology, aiming to improve the bandwidth from optical nodes to wireless terminals and improve the quality of video transmission. This thesis mainly covers the following parts: (1) This thesis implements wireless video transmission algorithm, which is divided into wireless transmission algorithm and video transmission algorithm, such as 64QAM modulation and demodulation algorithm, H.264 video algorithm and YUV420P algorithm. (2) This thesis designs the hardware of wireless HD video transmission system, including network processing unit (NPU) and millimeter wave module. Millimeter wave module uses RWM6050 baseband chip and TRX-BF01 rf chip. This thesis will design the corresponding hardware circuit based on the above chip, such as 10Gb/s network port, PCIE. (3) This thesis realizes the software design of wireless HD video transmission system, selects FFmpeg and Nginx to build the sending platform of video transmission system on NPU, and realizes video multiplex transmission with Docker. On the receiving platform of video transmission, FFmpeg and Qt are selected to realize video decoding, and OpenGL is combined to realize video playback. (4) Finally, the thesis completed the wireless HD video transmission system test, including pressure test, Web test and application scenario test. It has been verified that its HD video wireless transmission system can transmit HD VR video with three-channel bit rate of 1.2GB /s, and its rate can reach up to 3.7GB /s, which meets the research goal

    Low Complexity Image Recognition Algorithms for Handheld devices

    Get PDF
    Content Based Image Retrieval (CBIR) has gained a lot of interest over the last two decades. The need to search and retrieve images from databases, based on information (“features”) extracted from the image itself, is becoming increasingly important. CBIR can be useful for handheld image recognition devices in which the image to be recognized is acquired with a camera, and thus there is no additional metadata associated to it. However, most CBIR systems require large computations, preventing their use in handheld devices. In this PhD work, we have developed low-complexity algorithms for content based image retrieval in handheld devices for camera acquired images. Two novel algorithms, ‘Color Density Circular Crop’ (CDCC) and ‘DCT-Phase Match’ (DCTPM), to perform image retrieval along with a two-stage image retrieval algorithm that combines CDCC and DCTPM, to achieve the low complexity required in handheld devices are presented. The image recognition algorithms run on a handheld device over a large database with fast retrieval time besides having high accuracy, precision and robustness to environment variations. Three algorithms for Rotation, Scale, and Translation (RST) compensation for images were also developed in this PhD work to be used in conjunction with the two-stage image retrieval algorithm. The developed algorithms are implemented, using a commercial fixed-point Digital Signal Processor (DSP), into a device, called ‘PictoBar’, in the domain of Alternative and Augmentative Communication (AAC). The PictoBar is intended to be used in the field of electronic aid for disabled people, in areas like speech rehabilitation therapy, education etc. The PictoBar is able to recognize pictograms and pictures contained in a database. Once an image is found in the database, a corresponding associated speech message is played. A methodology for optimal implementation and systematic testing of the developed image retrieval algorithms on a fixed point DSP is also established as part of this PhD work

    Degraded Visual Environment Tracker

    Get PDF
    Compressive Sensing (CS) has proven its ability to reduce the number of measurements required to reproduce images with similar quality to those reconstructed by observing the Shannon-Nyquest sampling criteria. By exploiting spatial redundancies, it was shown that CS can be used to denoise and enhance image quality. In this thesis we propose a method that incorporates an efficient use of CS to locate a specific object in zero-visibility environments. This method was developed after multiple implementations of dictionary learning, reconstruction, detection, and tracking algorithms in order to identify the shortcomings of existing techniques and enhance our results. We show that with the use of an over-complete dictionary of the target our technique can perceive the location of the target from hidden information in the scene. This thesis will summarize the previously implemented algorithms, detail the shortcomings evident in their outputs, explain our setups, and present quantified results to support its efficacy in the results section
    • 

    corecore