768 research outputs found
Graph Spectral Image Processing
Recent advent of graph signal processing (GSP) has spurred intensive studies
of signals that live naturally on irregular data kernels described by graphs
(e.g., social networks, wireless sensor networks). Though a digital image
contains pixels that reside on a regularly sampled 2D grid, if one can design
an appropriate underlying graph connecting pixels with weights that reflect the
image structure, then one can interpret the image (or image patch) as a signal
on a graph, and apply GSP tools for processing and analysis of the signal in
graph spectral domain. In this article, we overview recent graph spectral
techniques in GSP specifically for image / video processing. The topics covered
include image compression, image restoration, image filtering and image
segmentation
Physics-Guided Deep Learning for Dynamical Systems: A survey
Modeling complex physical dynamics is a fundamental task in science and
engineering. Traditional physics-based models are interpretable but rely on
rigid assumptions. And the direct numerical approximation is usually
computationally intensive, requiring significant computational resources and
expertise. While deep learning (DL) provides novel alternatives for efficiently
recognizing complex patterns and emulating nonlinear dynamics, it does not
necessarily obey the governing laws of physical systems, nor do they generalize
well across different systems. Thus, the study of physics-guided DL emerged and
has gained great progress. It aims to take the best from both physics-based
modeling and state-of-the-art DL models to better solve scientific problems. In
this paper, we provide a structured overview of existing methodologies of
integrating prior physical knowledge or physics-based modeling into DL and
discuss the emerging opportunities
Recommended from our members
Controllable Neural Synthesis for Natural Images and Vector Art
Neural image synthesis approaches have become increasingly popular over the last years due to their ability to generate photorealistic images useful for several applications, such as digital entertainment, mixed reality, synthetic dataset creation, computer art, to name a few. Despite the progress over the last years, current approaches lack two important aspects: (a) they often fail to capture long-range interactions in the image, and as a result, they fail to generate scenes with complex dependencies between their different objects or parts. (b) they often ignore the underlying 3D geometry of the shape/scene in the image, and as a result, they frequently lose coherency and details.My thesis proposes novel solutions to the above problems. First, I propose a neural transformer architecture that captures long-range interactions and context for image synthesis at high resolutions, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the landscape, that was not possible to generate reliably with previous ConvNet- and other transformer-based approaches. The key idea of the architecture is to sparsify the transformer\u27s attention matrix at high resolutions, guided by dense attention extracted at lower image resolution. I present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of the method, and its superiority compared to the state-of-the-art. Second, I propose a method that generates artistic images with the guidance of input 3D shapes. In contrast to previous methods, the use of a geometric representation of 3D shape enables the synthesis of more precise stylized drawings with fewer artifacts. My method outputs the synthesized images in a vector representation, enabling richer downstream analysis or editing in interactive applications. I also show that the method produces substantially better results than existing image-based methods, in terms of predicting artists’ drawings and in user evaluation of results
RFD-ECNet: Extreme Underwater Image Compression with Reference to Feature Dictionar
Thriving underwater applications demand efficient extreme compression
technology to realize the transmission of underwater images (UWIs) in very
narrow underwater bandwidth. However, existing image compression methods
achieve inferior performance on UWIs because they do not consider the
characteristics of UWIs: (1) Multifarious underwater styles of color shift and
distance-dependent clarity, caused by the unique underwater physical imaging;
(2) Massive redundancy between different UWIs, caused by the fact that
different UWIs contain several common ocean objects, which have plenty of
similarities in structures and semantics. To remove redundancy among UWIs, we
first construct an exhaustive underwater multi-scale feature dictionary to
provide coarse-to-fine reference features for UWI compression. Subsequently, an
extreme UWI compression network with reference to the feature dictionary
(RFD-ECNet) is creatively proposed, which utilizes feature match and reference
feature variant to significantly remove redundancy among UWIs. To align the
multifarious underwater styles and improve the accuracy of feature match, an
underwater style normalized block (USNB) is proposed, which utilizes underwater
physical priors extracted from the underwater physical imaging model to
normalize the underwater styles of dictionary features toward the input.
Moreover, a reference feature variant module (RFVM) is designed to adaptively
morph the reference features, improving the similarity between the reference
and input features. Experimental results on four UWI datasets show that our
RFD-ECNet is the first work that achieves a significant BD-rate saving of 31%
over the most advanced VVC
Guided Depth Super-Resolution by Deep Anisotropic Diffusion
Performing super-resolution of a depth image using the guidance from an RGB
image is a problem that concerns several fields, such as robotics, medical
imaging, and remote sensing. While deep learning methods have achieved good
results in this problem, recent work highlighted the value of combining modern
methods with more formal frameworks. In this work, we propose a novel approach
which combines guided anisotropic diffusion with a deep convolutional network
and advances the state of the art for guided depth super-resolution. The edge
transferring/enhancing properties of the diffusion are boosted by the
contextual reasoning capabilities of modern networks, and a strict adjustment
step guarantees perfect adherence to the source image. We achieve unprecedented
results in three commonly used benchmarks for guided depth super-resolution.
The performance gain compared to other methods is the largest at larger scales,
such as x32 scaling. Code for the proposed method will be made available to
promote reproducibility of our results
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
Significant progress has recently been made in creative applications of large
pre-trained models for downstream tasks in 3D vision, such as text-to-shape
generation. This motivates our investigation of how these pre-trained models
can be used effectively to generate 3D shapes from sketches, which has largely
remained an open challenge due to the limited sketch-shape paired datasets and
the varying level of abstraction in the sketches. We discover that conditioning
a 3D generative model on the features (obtained from a frozen large pre-trained
vision model) of synthetic renderings during training enables us to effectively
generate 3D shapes from sketches at inference time. This suggests that the
large pre-trained vision model features carry semantic signals that are
resilient to domain shifts, i.e., allowing us to use only RGB renderings, but
generalizing to sketches at inference time. We conduct a comprehensive set of
experiments investigating different design factors and demonstrate the
effectiveness of our straightforward approach for generation of multiple 3D
shapes per each input sketch regardless of their level of abstraction without
requiring any paired datasets during training
A Review of Deep Learning Techniques for Speech Processing
The field of speech processing has undergone a transformative shift with the
advent of deep learning. The use of multiple processing layers has enabled the
creation of models capable of extracting intricate features from speech data.
This development has paved the way for unparalleled advancements in speech
recognition, text-to-speech synthesis, automatic speech recognition, and
emotion recognition, propelling the performance of these tasks to unprecedented
heights. The power of deep learning techniques has opened up new avenues for
research and innovation in the field of speech processing, with far-reaching
implications for a range of industries and applications. This review paper
provides a comprehensive overview of the key deep learning models and their
applications in speech-processing tasks. We begin by tracing the evolution of
speech processing research, from early approaches, such as MFCC and HMM, to
more recent advances in deep learning architectures, such as CNNs, RNNs,
transformers, conformers, and diffusion models. We categorize the approaches
and compare their strengths and weaknesses for solving speech-processing tasks.
Furthermore, we extensively cover various speech-processing tasks, datasets,
and benchmarks used in the literature and describe how different deep-learning
networks have been utilized to tackle these tasks. Additionally, we discuss the
challenges and future directions of deep learning in speech processing,
including the need for more parameter-efficient, interpretable models and the
potential of deep learning for multimodal speech processing. By examining the
field's evolution, comparing and contrasting different approaches, and
highlighting future directions and challenges, we hope to inspire further
research in this exciting and rapidly advancing field
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
- …