658 research outputs found

    On incorporating inductive biases into deep neural networks

    Get PDF
    A machine learning (ML) algorithm can be interpreted as a system that learns to capture patterns in data distributions. Before the modern \emph{deep learning era}, emulating the human brain, the use of structured representations and strong inductive bias have been prevalent in building ML models, partly due to the expensive computational resources and the limited availability of data. On the contrary, armed with increasingly cheaper hardware and abundant data, deep learning has made unprecedented progress during the past decade, showcasing incredible performance on a diverse set of ML tasks. In contrast to \emph{classical ML} models, the latter seeks to minimize structured representations and inductive bias when learning, implicitly favoring the flexibility of learning over manual intervention. Despite the impressive performance, attention is being drawn towards enhancing the (relatively) weaker areas of deep models such as learning with limited resources, robustness, minimal overhead to realize simple relationships, and ability to generalize the learned representations beyond the training conditions, which were (arguably) the forte of classical ML. Consequently, a recent hybrid trend is surfacing that aims to blend structured representations and substantial inductive bias into deep models, with the hope of improving them. Based on the above motivation, this thesis investigates methods to improve the performance of deep models using inductive bias and structured representations across multiple problem domains. To this end, we inject a priori knowledge into deep models in the form of enhanced feature extraction techniques, geometrical priors, engineered features, and optimization constraints. Especially, we show that by leveraging the prior knowledge about the task in hand and the structure of data, the performance of deep learning models can be significantly elevated. We begin by exploring equivariant representation learning. In general, the real-world observations are prone to fundamental transformations (e.g., translation, rotation), and deep models typically demand expensive data-augmentations and a high number of filters to tackle such variance. In comparison, carefully designed equivariant filters possess this ability by nature. Henceforth, we propose a novel \emph{volumetric convolution} operation that can convolve arbitrary functions in the unit-ball (B3\mathbb{B}^3) while preserving rotational equivariance by projecting the input data onto the Zernike basis. We conduct extensive experiments and show that our formulations can be used to construct significantly cheaper ML models. Next, we study generative modeling of 3D objects and propose a principled approach to synthesize 3D point-clouds in the spectral-domain by obtaining a structured representation of 3D points as functions on the unit sphere (S2\mathbb{S}^2). Using the prior knowledge about the spectral moments and the output data manifold, we design an architecture that can maximally utilize the information in the inputs and generate high-resolution point-clouds with minimal computational overhead. Finally, we propose a framework to build normalizing flows (NF) based on increasing triangular maps and Bernstein-type polynomials. Compared to the existing NF approaches, our framework consists of favorable characteristics for fusing inductive bias within the model i.e., theoretical upper bounds for the approximation error, robustness, higher interpretability, suitability for compactly supported densities, and the ability to employ higher degree polynomials without training instability. Most importantly, we present a constructive universality proof, which permits us to analytically derive the optimal model coefficients for known transformations without training

    3D-aware Blending with Generative NeRFs

    Full text link
    Image blending aims to combine multiple images seamlessly. It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes. To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending. For 3D-aware alignment, we first estimate the camera pose of the reference image with respect to generative NeRFs and then perform 3D local alignment for each part. To further leverage 3D information of the generative NeRF, we propose 3D-aware blending that directly blends images on the NeRF's latent representation space, rather than raw pixel space. Collectively, our method outperforms existing 2D baselines, as validated by extensive quantitative and qualitative evaluations with FFHQ and AFHQ-Cat.Comment: ICCV 2023, Project page: https://blandocs.github.io/blendner

    Integrating GAN and Texture Synthesis for Enhanced Road Damage Detection

    Full text link
    In the domain of traffic safety and road maintenance, precise detection of road damage is crucial for ensuring safe driving and prolonging road durability. However, current methods often fall short due to limited data. Prior attempts have used Generative Adversarial Networks to generate damage with diverse shapes and manually integrate it into appropriate positions. However, the problem has not been well explored and is faced with two challenges. First, they only enrich the location and shape of damage while neglect the diversity of severity levels, and the realism still needs further improvement. Second, they require a significant amount of manual effort. To address these challenges, we propose an innovative approach. In addition to using GAN to generate damage with various shapes, we further employ texture synthesis techniques to extract road textures. These two elements are then mixed with different weights, allowing us to control the severity of the synthesized damage, which are then embedded back into the original images via Poisson blending. Our method ensures both richness of damage severity and a better alignment with the background. To save labor costs, we leverage structural similarity for automated sample selection during embedding. Each augmented data of an original image contains versions with varying severity levels. We implement a straightforward screening strategy to mitigate distribution drift. Experiments are conducted on a public road damage dataset. The proposed method not only eliminates the need for manual labor but also achieves remarkable enhancements, improving the mAP by 4.1% and the F1-score by 4.5%.Comment: 10 pages, 13 figures, 2 Table

    Generating 3D faces using multi-column graph convolutional networks

    Get PDF
    In this work, we introduce multi-column graph convolutional networks (MGCNs), a deep generative model for 3D mesh surfaces that effectively learns a non-linear facial representation. We perform spectral decomposition of meshes and apply convolutions directly in the frequency domain. Our network architecture involves multiple columns of graph convolutional networks (GCNs), namely large GCN (L-GCN), medium GCN (M-GCN) and small GCN (S-GCN), with different filter sizes to extract features at different scales. L-GCN is more useful to extract large-scale features, whereas S-GCN is effective for extracting subtle and fine-grained features, and M-GCN captures information in between. Therefore, to obtain a high-quality representation, we propose a selective fusion method that adaptively integrates these three kinds of information. Spatially non-local relationships are also exploited through a self-attention mechanism to further improve the representation ability in the latent vector space. Through extensive experiments, we demonstrate the superiority of our end-to-end framework in improving the accuracy of 3D face reconstruction. Moreover, with the help of variational inference, our model has excellent generating ability

    Medical image synthesis using generative adversarial networks: towards photo-realistic image synthesis

    Full text link
    This proposed work addresses the photo-realism for synthetic images. We introduced a modified generative adversarial network: StencilGAN. It is a perceptually-aware generative adversarial network that synthesizes images based on overlaid labelled masks. This technique can be a prominent solution for the scarcity of the resources in the healthcare sector
    corecore