308 research outputs found
Optimising Spatial and Tonal Data for PDE-based Inpainting
Some recent methods for lossy signal and image compression store only a few
selected pixels and fill in the missing structures by inpainting with a partial
differential equation (PDE). Suitable operators include the Laplacian, the
biharmonic operator, and edge-enhancing anisotropic diffusion (EED). The
quality of such approaches depends substantially on the selection of the data
that is kept. Optimising this data in the domain and codomain gives rise to
challenging mathematical problems that shall be addressed in our work.
In the 1D case, we prove results that provide insights into the difficulty of
this problem, and we give evidence that a splitting into spatial and tonal
(i.e. function value) optimisation does hardly deteriorate the results. In the
2D setting, we present generic algorithms that achieve a high reconstruction
quality even if the specified data is very sparse. To optimise the spatial
data, we use a probabilistic sparsification, followed by a nonlocal pixel
exchange that avoids getting trapped in bad local optima. After this spatial
optimisation we perform a tonal optimisation that modifies the function values
in order to reduce the global reconstruction error. For homogeneous diffusion
inpainting, this comes down to a least squares problem for which we prove that
it has a unique solution. We demonstrate that it can be found efficiently with
a gradient descent approach that is accelerated with fast explicit diffusion
(FED) cycles. Our framework allows to specify the desired density of the
inpainting mask a priori. Moreover, is more generic than other data
optimisation approaches for the sparse inpainting problem, since it can also be
extended to nonlinear inpainting operators such as EED. This is exploited to
achieve reconstructions with state-of-the-art quality.
We also give an extensive literature survey on PDE-based image compression
methods
Deep learning of curvature features for shape completion
The paper presents a novel solution to the issue of incomplete regions in 3D meshes obtained through
digitization. Traditional methods for estimating the surface of missing geometry and topology often
yield unrealistic outcomes for intricate surfaces. To overcome this limitation, the paper proposes
a neural network-based approach that generates points in areas where geometric information is
lacking. The method employs 2D inpainting techniques on color images obtained from the original
mesh parameterization and curvature values. The network used in this approach can reconstruct the
curvature image, which then serves as a reference for generating a polygonal surface that closely
resembles the predicted one. The paper’s experiments show that the proposed method effectively fills
complex holes in 3D surfaces with a high degree of naturalness and detail. This paper improves the
previous work in terms of a more in-depth explanation of the different stages of the approach as well
as an extended results section with exhaustive experiments.Spanish Ministry of Science
and Technology under projects PID2020-119478GB-I00TED2021-132702B-C21MCIN/AEI/10.13039/501100
011033European Regional Development Fund (ERDF
Recommended from our members
Learning to See with Minimal Human Supervision
Deep learning has significantly advanced computer vision in the past decade, paving the way for practical applications such as facial recognition and autonomous driving. However, current techniques depend heavily on human supervision, limiting their broader deployment. This dissertation tackles this problem by introducing algorithms and theories to minimize human supervision in three key areas: data, annotations, and neural network architectures, in the context of various visual understanding tasks such as object detection, image restoration, and 3D generation.
First, we present self-supervised learning algorithms to handle in-the-wild images and videos that traditionally require time-consuming manual curation and labeling. We demonstrate that when a deep network is trained to be invariant to geometric and photometric transformations, representations from its intermediate layers are highly predictive of object semantic parts such as eyes and noses. This insight offers a simple unsupervised learning framework that significantly improves the efficiency and accuracy of few-shot landmark prediction and matching. We then present a technique for learning single-view 3D object pose estimation models by utilizing in-the-wild videos where objects turn (e.g., cars in roundabouts). This technique achieves competitive performance with respect to existing state-of-the-art without requiring any manual labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur, and illumination changes that serve as a benchmark for 3D pose estimation.
Second, we address variations in labeling styles across different annotators, which leads to a type of noisy label referred to as heterogeneous label. This variability in human annotation can cause subpar performance during both the training and testing phases. To mitigate this, we have developed a framework that models the labeling styles of individual annotators, reducing the impact of human annotation variations and enhancing the performance of standard object detection models. We have also applied this framework to analyze ecological data, which are often collected opportunistically across different case studies without consistent annotation guidelines. Through this application, we have obtained several insightful observations into large-scale bird migration behaviors and their relationship to climate change.
Our next study explores the challenges of designing neural networks, an area that lacks a comprehensive theoretical understanding. By linking deep neural networks with Gaussian processes, we propose a novel Bayesian interpretation of the deep image prior, which parameterizes a natural image as the output of a convolutional network with random parameters and random input. This approach offers valuable insights to optimize the design of neural networks for various image restoration tasks.
Lastly, we introduce several machine-learning techniques to reconstruct and edit 3D shapes from 2D images with minimal human effort. We first present a generic multi-modal generative model that bridges 2D images and 3D shapes via a shared latent space, and demonstrate its applications on versatile 3D shape generation and manipulation tasks. Additionally, we develop a framework for joint estimation of 3D neural scene representation and camera poses. This approach outperforms prior works and allows us to operate in the general SE(3) camera pose setting, unlike the baselines. The results also indicate this method can be complementary to classical structure-from-motion (SfM) pipelines as it compares favorably to SfM on low-texture and low-resolution images
Deep spatial and tonal data optimisation for homogeneous diffusion inpainting
Difusion-based inpainting can reconstruct missing image areas with high quality from sparse data, provided that their location and their values are well optimised. This is particularly useful for applications such as image compression, where the
original image is known. Selecting the known data constitutes a challenging optimisation problem, that has so far been only
investigated with model-based approaches. So far, these methods require a choice between either high quality or high speed
since qualitatively convincing algorithms rely on many time-consuming inpaintings. We propose the frst neural network
architecture that allows fast optimisation of pixel positions and pixel values for homogeneous difusion inpainting. During
training, we combine two optimisation networks with a neural network-based surrogate solver for difusion inpainting. This
novel concept allows us to perform backpropagation based on inpainting results that approximate the solution of the inpainting equation. Without the need for a single inpainting during test time, our deep optimisation accelerates data selection by
more than four orders of magnitude compared to common model-based approaches. This provides real-time performance
with high quality results
Recent Advances in Image Restoration with Applications to Real World Problems
In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included
FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly
While text-3D editing has made significant strides in leveraging score
distillation sampling, emerging approaches still fall short in delivering
separable, precise and consistent outcomes that are vital to content creation.
In response, we introduce FocalDreamer, a framework that merges base shape with
editable parts according to text prompts for fine-grained editing within
desired regions. Specifically, equipped with geometry union and dual-path
rendering, FocalDreamer assembles independent 3D parts into a complete object,
tailored for convenient instance reuse and part-wise control. We propose
geometric focal loss and style consistency regularization, which encourage
focal fusion and congruent overall appearance. Furthermore, FocalDreamer
generates high-fidelity geometry and PBR textures which are compatible with
widely-used graphics engines. Extensive experiments have highlighted the
superior editing capabilities of FocalDreamer in both quantitative and
qualitative evaluations.Comment: Project website: https://focaldreamer.github.i
Integrating View Conditions for Image Synthesis
In the field of image processing, applying intricate semantic modifications
within existing images remains an enduring challenge. This paper introduces a
pioneering framework that integrates viewpoint information to enhance the
control of image editing tasks. By surveying existing object editing
methodologies, we distill three essential criteria, consistency,
controllability, and harmony, that should be met for an image editing method.
In contrast to previous approaches, our method takes the lead in satisfying all
three requirements for addressing the challenge of image synthesis. Through
comprehensive experiments, encompassing both quantitative assessments and
qualitative comparisons with contemporary state-of-the-art methods, we present
compelling evidence of our framework's superior performance across multiple
dimensions. This work establishes a promising avenue for advancing image
synthesis techniques and empowering precise object modifications while
preserving the visual coherence of the entire composition
- …