3 research outputs found

    Extreme Image Compression with Deep Learning Autoencoder

    Get PDF
    Image compression can save billions of dollars in the industry by reducing the bits needed to store and transfer an image without significantly losing visual quality. Traditional image compression methods use transform, quantization, predictive coding and entropy coding to tackle the problem, represented by international standards like JPEG (joint photographic experts group), JPEG 2000, BPG (better portable graphics), and HEIC (high efficiency image file format). Recently, there are deep learning based image compression approaches that achieved similar or better performance compared with traditional methods, represented by autoencoder, GAN (generative adversarial networks) and super-resolution based approaches. In this paper, we built autoencoder based pipelines for extreme end-to-end image compression based on Ballé’s approach in 2017 and 2018 and improved the cost function and network structure. We replaced MSE (mean square error) with RMSE (root mean square error) in the cost function and deepened the network by adding one more hidden layer before each strided convolutional layer. The source code is available in bit.ly/deepimagecompressiongithub. Our 2018 approach outperformed Ballé’s approach in 2018, which is the state-of-the-art open source implementation in image compression using deep learning in terms of PSNR (peak signalto- noise ratio) and MS-SSIM (multi-scale structural similarity) with similar bpp (bits per pixel). It also outperformed all traditional image compression methods including JPEG, and HEIC in terms of reconstruction image quality. Regarding encoding and decoding time, our 2018 approach takes significant longer than traditional methods even with the support of GPU, this need to be measured and improved in the future. Experimental results proved that deepening network in autoencoder can effectively increase model fitting without losing generalization when applied to image compression, if the network is designed appropriately. In the future, this image compression method can be applied to video compression if encoding and decoding time can be reduced to an acceptable level. Automatic neural architecture search might also be applied to help find optimal network structure for autoencoder in image compression. Optimizer can also be replaced with trainable ones, like LSTM (long short-term memory) based optimizer. Last but not least, the cost function can also include encoding and decoding time, so that these two metrics can also be optimized during training

    SAMPLE-BASED DYNAMIC HIERARCHICAL TRANSFORMER WITH LAYER AND HEAD FLEXIBILITY VIA CONTEXTUAL BANDIT

    Get PDF
    Transformer requires a fixed number of layers and heads which makes them inflexible to the complexity of individual samples and expensive in training and inference. To address this, we propose a sample-based Dynamic Hierarchical Transformer (DHT) model whose layers and heads can be dynamically configured with single data samples via solving contextual bandit problems. To determine the number of layers and heads, we use the Uniform Confidence Bound algorithm while we deploy combinatorial Thompson Sampling in order to select specific head combinations given their number. Different from previous work that focuses on compressing trained networks for inference only, DHT is not only advantageous for adaptively optimizing the underlying network architecture during training but also has a flexible network for efficient inference. To the best of our knowledge, this is the first comprehensive data-driven dynamic transformer without any additional auxiliary neural networks that implement the dynamic system. According to the experiment results, we achieve up to 74% computational savings for both training and inference with a minimal loss of accuracy
    corecore