139 research outputs found

    Wavelet pooling for convolutional neural networks

    Get PDF
    Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2018, Director: Petia Radeva,[en] Wavelets are mathematical functions that are currently used in many computer vision problems, such as image denoising or image compression. In this work, first we will study all the basic theory about wavelets, in order to understand them and build a basic knowledge that allows us to develop another application. For such purpose, we propose two pooling methods based on wavelets: one based on simple wavelet basis and one that combines two basis working in parallel. We will test them and show that they can be used at the same level of performance as max and average pooling

    A Study in Image Watermarking Schemes using Neural Networks

    Full text link
    The digital watermarking technique, an effective way to protect image, has become the research focus on neural network. The purpose of this paper is to provide a brief study on broad theories and discuss the different types of neural networks for image watermarking. Most of the research interest image watermarking based on neural network in discrete wavelet transform or discrete cosine transform. Generally image watermarking based on neural network to solve the problem on to reduce the error, improve the rate of the learning, achieves goods imperceptibility and robustness. It will be useful for researches to implement effective image watermarking by using neural network

    Driver Distraction Identification with an Ensemble of Convolutional Neural Networks

    Full text link
    The World Health Organization (WHO) reported 1.25 million deaths yearly due to road traffic accidents worldwide and the number has been continuously increasing over the last few years. Nearly fifth of these accidents are caused by distracted drivers. Existing work of distracted driver detection is concerned with a small set of distractions (mostly, cell phone usage). Unreliable ad-hoc methods are often used.In this paper, we present the first publicly available dataset for driver distraction identification with more distraction postures than existing alternatives. In addition, we propose a reliable deep learning-based solution that achieves a 90% accuracy. The system consists of a genetically-weighted ensemble of convolutional neural networks, we show that a weighted ensemble of classifiers using a genetic algorithm yields in a better classification confidence. We also study the effect of different visual elements in distraction detection by means of face and hand localizations, and skin segmentation. Finally, we present a thinned version of our ensemble that could achieve 84.64% classification accuracy and operate in a real-time environment.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0949

    MULTIWAVELET NEURAL NETWORK PREPROCESSING OF IRREGULARLY SAMPLED DATA

    Get PDF
    Abstract. Multiwavelets are briefly reviewed and preprocessing and postprocessing for such wavelets are introduced. Least squares curve fitting of irregularly sampled data is achieved by means of unshifted and shifted multiscaling functions. This preprocessing procedure combined with multiwavelet neural networks for data-adaptive curve fitting is shown to perform well in the case of high resolution. In the case of low resolution it is more accurate than numerical integration and cheaper than matrix inversion

    High Performance Techniques for Face Recognition

    Get PDF
    The identification of individuals using face recognition techniques is a challenging task. This is due to the variations resulting from facial expressions, makeup, rotations, illuminations, gestures, etc. Also, facial images contain a great deal of redundant information, which negatively affects the performance of the recognition system. The dimensionality and the redundancy of the facial features have a direct effect on the face recognition accuracy. Not all the features in the feature vector space are useful. For example, non-discriminating features in the feature vector space not only degrade the recognition accuracy but also increase the computational complexity. In the field of computer vision, pattern recognition, and image processing, face recognition has become a popular research topic. This is due to its wide spread applications in security and control, which allow the identified individual to access secure areas, personal information, etc. The performance of any recognition system depends on three factors: 1) the storage requirements, 2) the computational complexity, and 3) the recognition rates. Two different recognition system families are presented and developed in this dissertation. Each family consists of several face recognition systems. Each system contains three main steps, namely, preprocessing, feature extraction, and classification. Several preprocessing steps, such as cropping, facial detection, dividing the facial image into sub-images, etc. are applied to the facial images. This reduces the effect of the irrelevant information (background) and improves the system performance. In this dissertation, either a Neural Network (NN) based classifier or Euclidean distance is used for classification purposes. Five widely used databases, namely, ORL, YALE, FERET, FEI, and LFW, each containing different facial variations, such as light condition, rotations, facial expressions, facial details, etc., are used to evaluate the proposed systems. The experimental results of the proposed systems are analyzed using K-folds Cross Validation (CV). In the family-1, Several systems are proposed for face recognition. Each system employs different integrated tools in the feature extraction step. These tools, Two Dimensional Discrete Multiwavelet Transform (2D DMWT), 2D Radon Transform (2D RT), 2D or 3D DWT, and Fast Independent Component Analysis (FastICA), are applied to the processed facial images to reduce the dimensionality and to obtain discriminating features. Each proposed system produces a unique representation, and achieves less storage requirements and better performance than the existing methods. For further facial compression, there are three face recognition systems in the second family. Each system uses different integrated tools to obtain better facial representation. The integrated tools, Vector Quantization (VQ), Discrete cosine Transform (DCT), and 2D DWT, are applied to the facial images for further facial compression and better facial representation. In the systems using the tools VQ/2D DCT and VQ/ 2D DWT, each pose in the databases is represented by one centroid with 4*4*16 dimensions. In the third system, VQ/ Facial Part Detection (FPD), each person in the databases is represented by four centroids with 4*Centroids (4*4*16) dimensions. The systems in the family-2 are proposed to further reduce the dimensions of the data compared to the systems in the family-1 while attaining comparable results. For example, in family-1, the integrated tools, FastICA/ 2D DMWT, applied to different combinations of sub-images in the FERET database with K-fold=5 (9 different poses used in the training mode), reduce the dimensions of the database by 97.22% and achieve 99% accuracy. In contrast, the integrated tools, VQ/ FPD, in the family-2 reduce the dimensions of the data by 99.31% and achieve 97.98% accuracy. In this example, the integrated tools, VQ/ FPD, accomplished further data compression and less accuracy compared to those reported by FastICA/ 2D DMWT tools. Various experiments and simulations using MATLAB are applied. The experimental results of both families confirm the improvements in the storage requirements, as well as the recognition rates as compared to some recently reported methods

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Transformer Meets Boundary Value Inverse Problems

    Full text link
    A Transformer-based deep direct sampling method is proposed for a class of boundary value inverse problems. A real-time reconstruction is achieved by evaluating the learned inverse operator between carefully designed data and the reconstructed images. An effort is made to give a specific example to a fundamental question: whether and how one can benefit from the theoretical structure of a mathematical problem to develop task-oriented and structure-conforming deep neural networks? Specifically, inspired by direct sampling methods for inverse problems, the 1D boundary data in different frequencies are preprocessed by a partial differential equation-based feature map to yield 2D harmonic extensions as different input channels. Then, by introducing learnable non-local kernels, the direct sampling is recast to a modified attention mechanism. The proposed method is then applied to electrical impedance tomography, a well-known severely ill-posed nonlinear inverse problem. The new method achieves superior accuracy over its predecessors and contemporary operator learners, as well as shows robustness with respect to noise. This research shall strengthen the insights that the attention mechanism, despite being invented for natural language processing tasks, offers great flexibility to be modified in conformity with the a priori mathematical knowledge, which ultimately leads to the design of more physics-compatible neural architectures

    Neural Operator: Is data all you need to model the world? An insight into the impact of Physics Informed Machine Learning

    Full text link
    Numerical approximations of partial differential equations (PDEs) are routinely employed to formulate the solution of physics, engineering and mathematical problems involving functions of several variables, such as the propagation of heat or sound, fluid flow, elasticity, electrostatics, electrodynamics, and more. While this has led to solving many complex phenomena, there are some limitations. Conventional approaches such as Finite Element Methods (FEMs) and Finite Differential Methods (FDMs) require considerable time and are computationally expensive. In contrast, data driven machine learning-based methods such as neural networks provide a faster, fairly accurate alternative, and have certain advantages such as discretization invariance and resolution invariance. This article aims to provide a comprehensive insight into how data-driven approaches can complement conventional techniques to solve engineering and physics problems, while also noting some of the major pitfalls of machine learning-based approaches. Furthermore, we highlight, a novel and fast machine learning-based approach (~1000x) to learning the solution operator of a PDE operator learning. We will note how these new computational approaches can bring immense advantages in tackling many problems in fundamental and applied physics
    • …
    corecore