57 research outputs found

    Physics-Informed Computer Vision: A Review and Perspectives

    Full text link
    Incorporation of physical information in machine learning frameworks are opening and transforming many application domains. Here the learning process is augmented through the induction of fundamental knowledge and governing physical laws. In this work we explore their utility for computer vision tasks in interpreting and understanding visual data. We present a systematic literature review of formulation and approaches to computer vision tasks guided by physical laws. We begin by decomposing the popular computer vision pipeline into a taxonomy of stages and investigate approaches to incorporate governing physical equations in each stage. Existing approaches in each task are analyzed with regard to what governing physical processes are modeled, formulated and how they are incorporated, i.e. modify data (observation bias), modify networks (inductive bias), and modify losses (learning bias). The taxonomy offers a unified view of the application of the physics-informed capability, highlighting where physics-informed learning has been conducted and where the gaps and opportunities are. Finally, we highlight open problems and challenges to inform future research. While still in its early days, the study of physics-informed computer vision has the promise to develop better computer vision models that can improve physical plausibility, accuracy, data efficiency and generalization in increasingly realistic applications

    Generative Models for Inverse Imaging Problems

    Get PDF

    Learning Patterns with Kernels and Learning Kernels from Patterns

    Get PDF
    A major technique in learning involves the identification of patterns and their use to make predictions. In this work, we examine the symbiotic relationship between patterns and Gaussian process regression (GPR), which is mathematically equivalent to kernel interpolation. We introduce techniques where GPR can be used to learn patterns in denoising and mode (signal) decomposition. Additionally, we present the kernel flow (KF) algorithm which learns a kernels from patterns in the data with methodology inspired by cross validation. We further show how the KF algorithm can be applied to artificial neural networks (ANNs) to make improvements to learning patterns in images. In our denoising and mode decomposition examples, we show how kernels can be constructed to estimate patterns that may be hidden due to data corruption. In other words, we demonstrate how to learn patterns with kernels. Donoho and Johnstone proposed a near-minimax method for reconstructing an unknown smooth function u from noisy data u + ζ by translating the empirical wavelet coefficients of u + ζ towards zero. We consider the situation where the prior information on the unknown function u may not be the regularity of u, but that of ℒu where ℒ is a linear operator, such as a partial differential equation (PDE) or a graph Laplacian. We show that a near-minimax approximation of u can be obtained by truncating the ℒ-gamblet (operator-adapted wavelet) coefficients of u + ζ. The recovery of u can be seen to be precisely a Gaussian conditioning of u + ζ on measurement functions with length scale dependent on the signal-to-noise ratio. We next introduce kernel mode decomposition (KMD), which has been designed to learn the modes vi = ai(t)yi(θi(t)) of a (possibly noisy) signal Σivi when the amplitudes ai, instantaneous phases θi, and periodic waveforms yi may all be unknown. GPR with Gabor wavelet-inspired kernels is used to estimate ai, θi, and yi. We show near machine precision recovery under regularity and separation assumptions on the instantaneous amplitudes ai and frequencies &#729;θi. GPR and kernel interpolation require the selection of an appropriate kernel modeling the data. We present the KF algorithm, which is a numerical-approximation approach to this selection. The main principle the method utilizes is that a "good" kernel is able to make accurate predictions with small subsets of a training set. In this way, we learn a kernel from patterns. In image classification, we show that the learned kernels are able to classify accurately using only one training image per class and show signs of unsupervised learning. Furthermore, we introduce the combination of the KF algorithm with conventional neural-network training. This combination is able to train the intermediate-layer outputs of the network simultaneously with the final-layer output. We test the proposed method on Convolutional Neural Networks (CNNs) and Wide Residual Networks (WRNs) without alteration of their structure or their output classifier. We report reduced test errors, decreased generalization gaps, and increased robustness to distribution shift without significant increase in computational complexity relative to standard CNN and WRN training (with Drop Out and Batch Normalization). As a whole, this work highlights the interplay between kernel techniques with pattern recognition and numerical approximation.</p

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    Sensor Signal and Information Processing II

    Get PDF
    In the current age of information explosion, newly invented technological sensors and software are now tightly integrated with our everyday lives. Many sensor processing algorithms have incorporated some forms of computational intelligence as part of their core framework in problem solving. These algorithms have the capacity to generalize and discover knowledge for themselves and learn new information whenever unseen data are captured. The primary aim of sensor processing is to develop techniques to interpret, understand, and act on information contained in the data. The interest of this book is in developing intelligent signal processing in order to pave the way for smart sensors. This involves mathematical advancement of nonlinear signal processing theory and its applications that extend far beyond traditional techniques. It bridges the boundary between theory and application, developing novel theoretically inspired methodologies targeting both longstanding and emergent signal processing applications. The topic ranges from phishing detection to integration of terrestrial laser scanning, and from fault diagnosis to bio-inspiring filtering. The book will appeal to established practitioners, along with researchers and students in the emerging field of smart sensors processing

    Novi algoritam za kompresiju seizmičkih podataka velike amplitudske rezolucije

    Get PDF
    Renewable sources cannot meet energy demand of a growing global market. Therefore, it is expected that oil & gas will remain a substantial sources of energy in a coming years. To find a new oil & gas deposits that would satisfy growing global energy demands, significant efforts are constantly involved in finding ways to increase efficiency of a seismic surveys. It is commonly considered that, in an initial phase of exploration and production of a new fields, high-resolution and high-quality images of the subsurface are of the great importance. As one part in the seismic data processing chain, efficient managing and delivering of a large data sets, that are vastly produced by the industry during seismic surveys, becomes extremely important in order to facilitate further seismic data processing and interpretation. In this respect, efficiency to a large extent relies on the efficiency of the compression scheme, which is often required to enable faster transfer and access to data, as well as efficient data storage. Motivated by the superior performance of High Efficiency Video Coding (HEVC), and driven by the rapid growth in data volume produced by seismic surveys, this work explores a 32 bits per pixel (b/p) extension of the HEVC codec for compression of seismic data. It is proposed to reassemble seismic slices in a format that corresponds to video signal and benefit from the coding gain achieved by HEVC inter mode, besides the possible advantages of the (still image) HEVC intra mode. To this end, this work modifies almost all components of the original HEVC codec to cater for high bit-depth coding of seismic data: Lagrange multiplier used in optimization of the coding parameters has been adapted to the new data statistics, core transform and quantization have been reimplemented to handle the increased bit-depth range, and modified adaptive binary arithmetic coder has been employed for efficient entropy coding. In addition, optimized block selection, reduced intra prediction modes, and flexible motion estimation are tested to adapt to the structure of seismic data. Even though the new codec after implementation of the proposed modifications goes beyond the standardized HEVC, it still maintains a generic HEVC structure, and it is developed under the general HEVC framework. There is no similar work in the field of the seismic data compression that uses the HEVC as a base codec setting. Thus, a specific codec design has been tailored which, when compared to the JPEG-XR and commercial wavelet-based codec, significantly improves the peak-signal-tonoise- ratio (PSNR) vs. compression ratio performance for 32 b/p seismic data. Depending on a proposed configurations, PSNR gain goes from 3.39 dB up to 9.48 dB. Also, relying on the specific characteristics of seismic data, an optimized encoder is proposed in this work. It reduces encoding time by 67.17% for All-I configuration on trace image dataset, and 67.39% for All-I, 97.96% for P2-configuration and 98.64% for B-configuration on 3D wavefield dataset, with negligible coding performance losses. As a side contribution of this work, HEVC is analyzed within all of its functional units, so that the presented work itself can serve as a specific overview of methods incorporated into the standard
    corecore