28 research outputs found

    Machine Learning for Beamforming in Audio, Ultrasound, and Radar

    Get PDF
    Multi-sensor signal processing plays a crucial role in the working of several everyday technologies, from correctly understanding speech on smart home devices to ensuring aircraft fly safely. A specific type of multi-sensor signal processing called beamforming forms a central part of this thesis. Beamforming works by combining the information from several spatially distributed sensors to directionally filter information, boosting the signal from a certain direction but suppressing others. The idea of beamforming is key to the domains of audio, ultrasound, and radar. Machine learning is the other central part of this thesis. Machine learning, and especially its sub-field of deep learning, has enabled breakneck progress in tackling several problems that were previously thought intractable. Today, machine learning powers many of the cutting edge systems we see on the internet for image classification, speech recognition, language translation, and more. In this dissertation, we look at beamforming pipelines in audio, ultrasound, and radar from a machine learning lens and endeavor to improve different parts of the pipelines using ideas from machine learning. We start off in the audio domain and derive a machine learning inspired beamformer to tackle the problem of ensuring the audio captured by a camera matches its visual content, a problem we term audiovisual zooming. Staying in the audio domain, we then demonstrate how deep learning can be used to improve the perceptual qualities of speech by denoising speech clipping, codec distortions, and gaps in speech. Transitioning to the ultrasound domain, we improve the performance of short-lag spatial coherence ultrasound imaging by exploiting the differences in tissue texture at each short lag value by applying robust principal component analysis. Next, we use deep learning as an alternative to beamforming in ultrasound and improve the information extraction pipeline by simultaneously generating both a segmentation map and B-mode image of high quality directly from raw received ultrasound data. Finally, we move to the radar domain and study how deep learning can be used to improve signal quality in ultra-wideband synthetic aperture radar by suppressing radio frequency interference, random spectral gaps, and contiguous block spectral gaps. By training and applying the networks on raw single-aperture data prior to beamforming, it can work with myriad sensor geometries and different beamforming equations, a crucial requirement in synthetic aperture radar

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    (b2023 to 2014) The UNBELIEVABLE similarities between the ideas of some people (2006-2016) and my ideas (2002-2008) in physics (quantum mechanics, cosmology), cognitive neuroscience, philosophy of mind, and philosophy (this manuscript would require a REVOLUTION in international academy environment!)

    Get PDF
    (b2023 to 2014) The UNBELIEVABLE similarities between the ideas of some people (2006-2016) and my ideas (2002-2008) in physics (quantum mechanics, cosmology), cognitive neuroscience, philosophy of mind, and philosophy (this manuscript would require a REVOLUTION in international academy environment!
    corecore