58 research outputs found

    Depth Super-Resolution from Explicit and Implicit High-Frequency Features

    Full text link
    We propose a novel multi-stage depth super-resolution network, which progressively reconstructs high-resolution depth maps from explicit and implicit high-frequency features. The former are extracted by an efficient transformer processing both local and global contexts, while the latter are obtained by projecting color images into the frequency domain. Both are combined together with depth features by means of a fusion strategy within a multi-stage and multi-scale framework. Experiments on the main benchmarks, such as NYUv2, Middlebury, DIML and RGBDD, show that our approach outperforms existing methods by a large margin (~20% on NYUv2 and DIML against the contemporary work DADA, with 16x upsampling), establishing a new state-of-the-art in the guided depth super-resolution task

    Entropy in Image Analysis II

    Get PDF
    Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

    Improved steganalysis technique based on least significant bit using artificial neural network for MP3 files

    Get PDF
    MP3 files are one of the most widely used digital audio formats that provide a high compression ratio with reliable quality. Their widespread use has resulted in MP3 audio files becoming excellent covers to carry hidden information in audio steganography on the Internet. Emerging interest in uncovering such hidden information has opened up a field of research called steganalysis that looked at the detection of hidden messages in a specific media. Unfortunately, the detection accuracy in steganalysis is affected by bit rates, sampling rate of the data type, compression rates, file track size and standard, as well as benchmark dataset of the MP3 files. This thesis thus proposed an effective technique to steganalysis of MP3 audio files by deriving a combination of features from MP3 file properties. Several trials were run in selecting relevant features of MP3 files like the total harmony distortion, power spectrum density, and peak signal-to-noise ratio (PSNR) for investigating the correlation between different channels of MP3 signals. The least significant bit (LSB) technique was used in the detection of embedded secret files in stego-objects. This involved reading the stego-objects for statistical evaluation for possible points of secret messages and classifying these points into either high or low tendencies for containing secret messages. Feed Forward Neural Network with 3 layers and traingdx function with an activation function for each layer were also used. The network vector contains information about all features, and is used to create a network for the given learning process. Finally, an evaluation process involving the ANN test that compared the results with previous techniques, was performed. A 97.92% accuracy rate was recorded when detecting MP3 files under 96 kbps compression. These experimental results showed that the proposed approach was effective in detecting embedded information in MP3 files. It demonstrated significant improvement in detection accuracy at low embedding rates compared with previous work

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Lossy Light Field Compression Using Modern Deep Learning and Domain Randomization Techniques

    Get PDF
    Lossy data compression is a particular type of informational encoding utilizing approximations in order to efficiently tradeoff accuracy in favour of smaller file sizes. The transmission and storage of images is a typical example of this in the modern digital world. However the reconstructed images often suffer from degradation and display observable visual artifacts. Convolutional Neural Networks have garnered much attention in all corners of Computer Vision, including the tasks of image compression and artifact reduction. We study how lossy compression can be extended to higher dimensional images with varying viewpoints, known as light fields. Domain Randomization is explored in detail, and used to generate the largest light field dataset we are aware of, to be used as training data. We formulate the task of compression under the frameworks of neural networks and calculate a quantization tensor for the 4-D Discrete Cosine Transform coefficients of the light fields. In order to accurately train the network, a high degree approximation to the rounding operation is introduced. In addition, we present a multi-resolution convolutional-based light field enhancer, producing average gains of 0.854 db in Peak Signal-to-Noise Ratio, and 0.0338 in Structural Similarity Index Measure over the base model, across a wide range of bitrates

    Lossy Light Field Compression Using Modern Deep Learning and Domain Randomization Techniques

    Get PDF
    Lossy data compression is a particular type of informational encoding utilizing approximations in order to efficiently tradeoff accuracy in favour of smaller file sizes. The transmission and storage of images is a typical example of this in the modern digital world. However the reconstructed images often suffer from degradation and display observable visual artifacts. Convolutional Neural Networks have garnered much attention in all corners of Computer Vision, including the tasks of image compression and artifact reduction. We study how lossy compression can be extended to higher dimensional images with varying viewpoints, known as light fields. Domain Randomization is explored in detail, and used to generate the largest light field dataset we are aware of, to be used as training data. We formulate the task of compression under the frameworks of neural networks and calculate a quantization tensor for the 4-D Discrete Cosine Transform coefficients of the light fields. In order to accurately train the network, a high degree approximation to the rounding operation is introduced. In addition, we present a multi-resolution convolutional-based light field enhancer, producing average gains of 0.854 db in Peak Signal-to-Noise Ratio, and 0.0338 in Structural Similarity Index Measure over the base model, across a wide range of bitrates

    Selected Papers from the First International Symposium on Future ICT (Future-ICT 2019) in Conjunction with 4th International Symposium on Mobile Internet Security (MobiSec 2019)

    Get PDF
    The International Symposium on Future ICT (Future-ICT 2019) in conjunction with the 4th International Symposium on Mobile Internet Security (MobiSec 2019) was held on 17–19 October 2019 in Taichung, Taiwan. The symposium provided academic and industry professionals an opportunity to discuss the latest issues and progress in advancing smart applications based on future ICT and its relative security. The symposium aimed to publish high-quality papers strictly related to the various theories and practical applications concerning advanced smart applications, future ICT, and related communications and networks. It was expected that the symposium and its publications would be a trigger for further related research and technology improvements in this field

    Tamper detection of qur'anic text watermarking scheme based on vowel letters with Kashida using exclusive-or and queueing technique

    Get PDF
    The most sensitive Arabic text available online is the digital Holy Qur’an. This sacred Islamic religious book is recited by all Muslims worldwide including the non-Arabs as part of their worship needs. It should be protected from any kind of tampering to keep its invaluable meaning intact. Different characteristics of the Arabic letters like the vowels ( أ . و . ي ), Kashida (extended letters), and other symbols in the Holy Qur’an must be secured from alterations. The cover text of the al-Qur’an and its watermarked text are different due to the low values of the Peak Signal to Noise Ratio (PSNR), Embedding Ratio (ER), and Normalized Cross-Correlation (NCC), thus the location for tamper detection gets low accuracy. Watermarking technique with enhanced attributes must therefore be designed for the Qur’an text using Arabic vowel letters with Kashida. Most of the existing detection methods that tried to achieve accurate results related to the tampered Qur’an text often show various limitations like diacritics, alif mad surah, double space, separate shapes of Arabic letters, and Kashida. The gap addressed by this research is to improve the security of Arabic text in the Holy Qur’an by using vowel letters with Kashida. The purpose of this research is to enhance Quran text watermarking scheme based on exclusive-or and reversing with queueing techniques. The methodology consists of four phases. The first phase is pre-processing followed by the embedding process phase to hide the data after the vowel letters wherein if the secret bit is ‘1’, insert the Kashida but do not insert it if the bit is ‘0’. The third phase is extraction process and the last phase is to evaluate the performance of the proposed scheme by using PSNR (for the imperceptibility), ER (for the capacity), and NCC (for the security of the watermarking). The experimental results revealed the improvement of the NCC by 1.77 %, PSNR by 9.6 %, and ER by 8.6 % compared to available current schemes. Hence, it can be concluded that the proposed scheme has the ability to detect the location of tampering accurately for attacks of insertion, deletion, and reordering

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
    corecore