20 research outputs found

    Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images

    Full text link
    Modeling statistical regularity plays an essential role in ill-posed image processing problems. Recently, deep learning based methods have been presented to implicitly learn statistical representation of pixel distributions in natural images and leverage it as a constraint to facilitate subsequent tasks, such as color constancy and image dehazing. However, the existing CNN architecture is prone to variability and diversity of pixel intensity within and between local regions, which may result in inaccurate statistical representation. To address this problem, this paper presents a novel fully point-wise CNN architecture for modeling statistical regularities in natural images. Specifically, we propose to randomly shuffle the pixels in the origin images and leverage the shuffled image as input to make CNN more concerned with the statistical properties. Moreover, since the pixels in the shuffled image are independent identically distributed, we can replace all the large convolution kernels in CNN with point-wise (111*1) convolution kernels while maintaining the representation ability. Experimental results on two applications: color constancy and image dehazing, demonstrate the superiority of our proposed network over the existing architectures, i.e., using 1/10\sim1/100 network parameters and computational cost while achieving comparable performance.Comment: 9 pages, 7 figures. To appear in ACM MM 201

    Alternating Optimization: Constrained Problems, Adversarial Networks, and Robust Models

    Get PDF
    Data-driven machine learning methods have achieved impressive performance for many industrial applications and academic tasks. Machine learning methods usually have two stages: training a model from large-scale samples, and inference on new samples after the model is deployed. The training of modern models relies on solving difficult optimization problems that involve nonconvex, nondifferentiable objective functions and constraints, which is sometimes slow and often requires expertise to tune hyperparameters. While inference is much faster than training, it is often not fast enough for real-time applications.We focus on machine learning problems that can be formulated as a minimax problem in training, and study alternating optimization methods served as fast, scalable, stable and automated solvers. First, we focus on the alternating direction method of multipliers (ADMM) for constrained problem in classical convex and nonconvex optimization. Some popular machine learning applications including sparse and low-rank models, regularized linear models, total variation image processing, semidefinite programming, and consensus distributed computing. We propose adaptive ADMM (AADMM), which is a fully automated solver achieving fast practical convergence by adapting the only free parameter in ADMM. We further automate several variants of ADMM (relaxed ADMM, multi-block ADMM and consensus ADMM), and prove convergence rate guarantees that are widely applicable to variants of ADMM with changing parameters. We release the fast implementation for more than ten applications and validate the efficiency with several benchmark datasets for each application. Second, we focus on the minimax problem of generative adversarial networks (GAN). We apply prediction steps to stabilize stochastic alternating methods for the training of GANs, and demonstrate advantages of GAN-based losses for image processing tasks. We also propose GAN-based knowledge distillation methods to train small neural networks for inference acceleration, and empirically study the trade-off between acceleration and accuracy.Third, we present preliminary results on adversarial training for robust models. We study fast algorithms for the attack and defense for universal perturbations, and then explore network architectures to boost robustness

    Computational strategies for understanding underwater optical image datasets

    Get PDF
    Thesis: Ph. D. in Mechanical and Oceanographic Engineering, Joint Program in Oceanography/Applied Ocean Science and Engineering (Massachusetts Institute of Technology, Department of Mechanical Engineering; and the Woods Hole Oceanographic Institution), 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 117-135).A fundamental problem in autonomous underwater robotics is the high latency between the capture of image data and the time at which operators are able to gain a visual understanding of the survey environment. Typical missions can generate imagery at rates hundreds of times greater than highly compressed images can be transmitted acoustically, delaying that understanding until after the vehicle has been recovered and the data analyzed. While automated classification algorithms can lessen the burden on human annotators after a mission, most are too computationally expensive or lack the robustness to run in situ on a vehicle. Fast algorithms designed for mission-time performance could lessen the latency of understanding by producing low-bandwidth semantic maps of the survey area that can then be telemetered back to operators during a mission. This thesis presents a lightweight framework for processing imagery in real time aboard a robotic vehicle. We begin with a review of pre-processing techniques for correcting illumination and attenuation artifacts in underwater images, presenting our own approach based on multi-sensor fusion and a strong physical model. Next, we construct a novel image pyramid structure that can reduce the complexity necessary to compute features across multiple scales by an order of magnitude and recommend features which are fast to compute and invariant to underwater artifacts. Finally, we implement our framework on real underwater datasets and demonstrate how it can be used to select summary images for the purpose of creating low-bandwidth semantic maps capable of being transmitted acoustically.by Jeffrey W. Kaeli.Ph. D. in Mechanical and Oceanographic Engineerin

    Semi-supervised deep learning techniques for spectrum reconstruction from RGB images

    Get PDF
    In this thesis we introduce some techniques reaching close to state of the art performance in the topic of spectrum reconstruction while requiring in input only a few HS images or even pixels. We also show the importance of exploiting the physical model in the training pipeline to constraint the output and ease out the whole process

    Measurement model of brass plated tyre steel cord based on wave feature extraction

    Get PDF
    In the production of Truck and Bus Radial (TBR) vehicle tyres, one of the essential components is the wire that supports the tyre. There are several types of tyre wire, one of which is Brass Plated Tyre Steel Cord (BPTSC), produced by Bekaert Indonesia Company. BPTSC object has a micro-size with a diameter of 0.230 mm and has a wave shape. In checking the quality of steel straps, brass-coated tyres are usually measured manually by experienced experts by measuring instruments to measure the diameter using a micrometre, wave amount, and wavelength using a profile projector. The manual measurement process results in inaccuracy due to fatigue in employees' eyes and low lighting and must be repeated, thus, consuming more time. Technological developments that use computer vision are increasingly widespread. Moreover, from the results of studies in various literature, it is proposed to combine the models obtained to find new models to solve this problem. The objectives of this study were to implement and evaluate an automatic segmentation method for obtaining regions of interest, to propose a BPTSC diameter, wave amount, and wavelength measurement model based on its edge, and to evaluate the proposed model by comparing the results with standard and industrial measurement results. The technique to prepare the brass plated tyre steel cord was done in two ways: image acquisition techniques with enhanced image quality, noise removal, and edge detection. Secondly, ground truth techniques were utilised to find the truth about the stages of the image acquisition process. Finally, sensitivity testing was conducted to find the similarity between the acquired images and the ground truth data using Jaccard, Dice, and Cosine similarity method. From 148 wire samples, the average similarity value was 93% by Jaccard, 96% by Dice, and 91% by the Cosine method. Thus, it can be concluded that the acquisition stage of the brass-coated steel tyre cable with image processing techniques can be carried out. For the subsequent process, the pixel distance and the sliding windows model applied can correctly detect the diameter of the BPTSC properly. The wave amount and wavelength of BPTSC objects in the form of waves were measured using several local minima and maxima approaches. This included maxima of local minima maxima distance, the average of local minima maxima distance, and perpendicular shape to centre distance for measuring wave amounts. While for wavelength measurements, the midpoint of local maxima minima distance and the intersection of local maxima minima with a central line were used. Measurement results were evaluated to determine the accuracy and efficiency of the measurement process compared to standard production values using the accuracy, precision, recall, and Root Mean Square Error (RMSE) test. From the evaluation results of the two methods, the accuracy rate of diameter measurement is 97%, wave rate measurement is 95%, and wavelength measurement is 90%. A new model was formed from the evaluation results that could solve these problems and provide scientific and beneficial contributions to society in general and the companies related to this industry
    corecore