492 research outputs found
Optimal coding unit decision for early termination in high efficiency video coding using enhanced whale optimization algorithm
Video compression is an emerging research topic in the field of block based video encoders. Due to the growth of video coding technologies, high efficiency video coding (HEVC) delivers superior coding performance. With the increased encoding complexity, the HEVC enhances the rate-distortion (RD) performance. In the video compression, the out-sized coding units (CUs) have higher encoding complexity. Therefore, the computational encoding cost and complexity remain vital concerns, which need to be considered as an optimization task. In this manuscript, an enhanced whale optimization algorithm (EWOA) is implemented to reduce the computational time and complexity of the HEVC. In the EWOA, a cosine function is incorporated with the controlling parameter A and two correlation factors are included in the WOA for controlling the position of whales and regulating the movement of search mechanism during the optimization and search processes. The bit streams in the Luma-coding tree block are selected using EWOA that defines the CU neighbors and is used in the HEVC. The results indicate that the EWOA achieves best bit rate (BR), time saving, and peak signal to noise ratio (PSNR). The EWOA showed 0.006-0.012 dB higher PSNR than the existing models in the real-time videos
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey
Adversarial attacks and defenses in machine learning and deep neural network
have been gaining significant attention due to the rapidly growing applications
of deep learning in the Internet and relevant scenarios. This survey provides a
comprehensive overview of the recent advancements in the field of adversarial
attack and defense techniques, with a focus on deep neural network-based
classification models. Specifically, we conduct a comprehensive classification
of recent adversarial attack methods and state-of-the-art adversarial defense
techniques based on attack principles, and present them in visually appealing
tables and tree diagrams. This is based on a rigorous evaluation of the
existing works, including an analysis of their strengths and limitations. We
also categorize the methods into counter-attack detection and robustness
enhancement, with a specific focus on regularization-based methods for
enhancing robustness. New avenues of attack are also explored, including
search-based, decision-based, drop-based, and physical-world attacks, and a
hierarchical classification of the latest defense methods is provided,
highlighting the challenges of balancing training costs with performance,
maintaining clean accuracy, overcoming the effect of gradient masking, and
ensuring method transferability. At last, the lessons learned and open
challenges are summarized with future research opportunities recommended.Comment: 46 pages, 21 figure
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
The growing use of voice user interfaces has led to a surge in the collection
and storage of speech data. While data collection allows for the development of
efficient tools powering most speech services, it also poses serious privacy
issues for users as centralized storage makes private personal speech data
vulnerable to cyber threats. With the increasing use of voice-based digital
assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the
increasing ease with which personal speech data can be collected, the risk of
malicious use of voice-cloning and speaker/gender/pathological/etc. recognition
has increased.
This thesis proposes solutions for anonymizing speech and evaluating the
degree of the anonymization. In this work, anonymization refers to making
personal speech data unlinkable to an identity while maintaining the usefulness
(utility) of the speech signal (e.g., access to linguistic content). We start
by identifying several challenges that evaluation protocols need to consider to
evaluate the degree of privacy protection properly. We clarify how
anonymization systems must be configured for evaluation purposes and highlight
that many practical deployment configurations do not permit privacy evaluation.
Furthermore, we study and examine the most common voice conversion-based
anonymization system and identify its weak points before suggesting new methods
to overcome some limitations. We isolate all components of the anonymization
system to evaluate the degree of speaker PPI associated with each of them.
Then, we propose several transformation methods for each component to reduce as
much as possible speaker PPI while maintaining utility. We promote
anonymization algorithms based on quantization-based transformation as an
alternative to the most-used and well-known noise-based approach. Finally, we
endeavor a new attack method to invert anonymization.Comment: PhD Thesis Pierre Champion | Universit\'e de Lorraine - INRIA Nancy |
for associated source code, see https://github.com/deep-privacy/SA-toolki
Examining the Impact of Provenance-Enabled Media on Trust and Accuracy Perceptions
In recent years, industry leaders and researchers have proposed to use
technical provenance standards to address visual misinformation spread through
digitally altered media. By adding immutable and secure provenance information
such as authorship and edit date to media metadata, social media users could
potentially better assess the validity of the media they encounter. However, it
is unclear how end users would respond to provenance information, or how to
best design provenance indicators to be understandable to laypeople. We
conducted an online experiment with 595 participants from the US and UK to
investigate how provenance information altered users' accuracy perceptions and
trust in visual content shared on social media. We found that provenance
information often lowered trust and caused users to doubt deceptive media,
particularly when it revealed that the media was composited. We additionally
tested conditions where the provenance information itself was shown to be
incomplete or invalid, and found that these states have a significant impact on
participants' accuracy perceptions and trust in media, leading them, in some
cases, to disbelieve honest media. Our findings show that provenance, although
enlightening, is still not a concept well-understood by users, who confuse
media credibility with the orthogonal (albeit related) concept of provenance
credibility. We discuss how design choices may contribute to provenance
(mis)understanding, and conclude with implications for usable provenance
systems, including clearer interfaces and user education.Comment: Accepted to CSCW 202
A survey on artificial intelligence-based acoustic source identification
The concept of Acoustic Source Identification (ASI), which refers to the process of identifying noise sources has attracted increasing attention in recent years. The ASI technology can be used for surveillance, monitoring, and maintenance applications in a wide range of sectors, such as defence, manufacturing, healthcare, and agriculture. Acoustic signature analysis and pattern recognition remain the core technologies for noise source identification. Manual identification of acoustic signatures, however, has become increasingly challenging as dataset sizes grow. As a result, the use of Artificial Intelligence (AI) techniques for identifying noise sources has become increasingly relevant and useful. In this paper, we provide a comprehensive review of AI-based acoustic source identification techniques. We analyze the strengths and weaknesses of AI-based ASI processes and associated methods proposed by researchers in the literature. Additionally, we did a detailed survey of ASI applications in machinery, underwater applications, environment/event source recognition, healthcare, and other fields. We also highlight relevant research directions
An ensemble architecture for forgery detection and localization in digital images
Questa tesi presenta un approccio d'insieme unificato - "ensemble" - per il rilevamento e la localizzazione di contraffazioni in immagini digitali. Il focus della ricerca è su due delle più comuni ma efficaci tecniche di contraffazione: "copy-move" e "splicing". L'architettura proposta combina una serie di metodi di rilevamento e localizzazione di manipolazioni per ottenere prestazioni migliori rispetto a metodi utilizzati in modalità "standalone". I principali contributi di questo lavoro sono elencati di seguito.
In primo luogo, nel Capitolo 1 e 2 viene presentata un'ampia rassegna dell'attuale stato dell'arte nel rilevamento di manipolazioni ("forgery"), con particolare attenzione agli approcci basati sul deep learning. Un'importante intuizione che ne deriva è la seguente: questi approcci, sebbene promettenti, non possono essere facilmente confrontati in termini di performance perché tipicamente vengono valutati su dataset personalizzati a causa della mancanza di dati annotati con precisione. Inoltre, spesso questi dati non sono resi disponibili pubblicamente.
Abbiamo poi progettato un algoritmo di rilevamento di manipolazioni copy-move basato su "keypoint", descritto nel capitolo 3. Rispetto a esistenti approcci simili, abbiamo aggiunto una fase di clustering basato su densità spaziale per filtrare le corrispondenze rumorose dei keypoint. I risultati hanno dimostrato che questo metodo funziona bene su due dataset di riferimento e supera uno dei metodi più citati in letteratura.
Nel Capitolo 4 viene proposta una nuova architettura per predire la direzione della luce 3D in una data immagine. Questo approccio sfrutta l'idea di combinare un metodo "data-driven" con un modello di illuminazione fisica, consentendo così di ottenere prestazioni migliori. Al fine di sopperire al problema della scarsità di dati per l'addestramento di architetture di deep learning altamente parametrizzate, in particolare per il compito di scomposizione intrinseca delle immagini, abbiamo sviluppato due algoritmi di generazione dei dati. Questi sono stati utilizzati per produrre due dataset - uno sintetico e uno di immagini reali - con lo scopo di addestrare e valutare il nostro approccio.
Il modello di stima della direzione della luce proposto è stato sfruttato in un nuovo approccio di rilevamento di manipolazioni di tipo splicing, discusso nel Capitolo 5, in cui le incoerenze nella direzione della luce tra le diverse regioni dell'immagine vengono utilizzate per evidenziare potenziali attacchi splicing.
L'approccio ensemble proposto è descritto nell'ultimo capitolo. Questo include un modulo "FusionForgery" che combina gli output dei metodi "base" proposti in precedenza e assegna un'etichetta binaria (forged vs. original). Nel caso l'immagine sia identificata come contraffatta, il nostro metodo cerca anche di specializzare ulteriormente la decisione tra attacchi splicing o copy-move. In questo secondo caso, viene eseguito anche un tentativo di ricostruire le regioni "sorgente" utilizzate nell'attacco copy-move. Le prestazioni dell'approccio proposto sono state valutate addestrandolo e testandolo su un dataset sintetico, generato da noi, comprendente sia attacchi copy-move che di tipo splicing. L'approccio ensemble supera tutti i singoli metodi "base" in termini di prestazioni, dimostrando la validità della strategia proposta.This thesis presents a unified ensemble approach for forgery detection and localization in digital images. The focus of the research is on two of the most common but effective forgery techniques: copy-move and splicing. The ensemble architecture combines a set of forgery detection and localization methods in order to achieve improved performance with respect to standalone approaches. The main contributions of this work are listed in the following.
First, an extensive review of the current state of the art in forgery detection, with a focus on deep learning-based approaches is presented in Chapter 1 and 2. An important insight that is derived is the following: these approaches, although promising, cannot be easily compared in terms of performance because they are typically evaluated on custom datasets due to the lack of precisely annotated data. Also, they are often not publicly available.
We then designed a keypoint-based copy-move detection algorithm, which is described in Chapter 3. Compared to previous existing keypoints-based approaches, we added a density-based clustering step to filter out noisy keypoints matches. This method has been demonstrated to perform well on two benchmark datasets and outperforms one of the most cited state-of-the-art methods.
In Chapter 4 a novel architecture is proposed to predict the 3D light direction of the light in a given image. This approach leverages the idea of combining, in a data-driven method, a physical illumination model that allows for improved regression performance. In order to fill in the gap of data scarcity for training highly-parameterized deep learning architectures, especially for the task of intrinsic image decomposition, we developed two data generation algorithms that were used to produce two datasets - one synthetic and one of real images - to train and evaluate our approach.
The proposed light direction estimation model has then been employed to design a novel splicing detection approach, discussed in Chapter 5, in which light direction inconsistencies between different regions in the image are used to highlight potential splicing attacks.
The proposed ensemble scheme for forgery detection is described in the last chapter. It includes a "FusionForgery" module that combines the outputs of the different previously proposed "base" methods and assigns a binary label (forged vs. pristine) to the input image. In the case of forgery prediction, our method also tries to further specialize the decision between splicing and copy-move attacks. If the image is predicted as copy-moved, an attempt to reconstruct the source regions used in the copy-move attack is also done. The performance of the proposed approach has been assessed by training and testing it on a synthetic dataset, generated by us, comprising both copy-move and splicing attacks. The ensemble approach outperforms all of the individual "base" methods, demonstrating the validity of the proposed strategy
Robust image steganography method suited for prining = Robustna steganografska metoda prilagođena procesu tiska
U ovoj doktorskoj dizertaciji prezentirana je robustna steganografska metoda razvijena i
prilagođena za tisak. Osnovni cilj metode je pružanje zaštite od krivotvorenja ambalaže.
Zaštita ambalaže postiže se umetanjem više bitova informacije u sliku pri enkoderu, a potom
maskiranjem informacije kako bi ona bila nevidljiva ljudskom oku. Informacija se pri
dekoderu detektira pomoću infracrvene kamere. Preliminarna istraživanja pokazala su da u
relevantnoj literaturi nedostaje metoda razvijenih za domenu tiska. Razlog za takav
nedostatak jest činjenica da razvijanje steganografskih metoda za tisak zahtjeva veću količinu
resursa i materijala, u odnosu na razvijanje sličnih domena za digitalnu domenu. Također,
metode za tisak često zahtijevaju višu razinu kompleksnosti, budući da se tijekom
reprodukcije pojavljuju razni oblici procesiranja koji mogu kompromitirati informaciju u slici
[1]. Da bi se sačuvala skrivena informacija, metoda mora biti otporna na procesiranje koje se
događa tijekom reprodukcije.
Kako bi se postigla visoka razina otpornosti, informacija se može umetnuti unutar
frekvencijske domene slike [2], [3]. Frekvencijskoj domeni slike možemo pristupiti pomoću
matematičkih transformacija. Najčešće se koriste diskretna kosinusna transformacija (DCT),
diskretna wavelet transformacija (DWT) i diskretna Fourierova transformacija (DFT) [2], [4].
Korištenje svake od navedenih transformacija ima određene prednosti i nedostatke, ovisno o
kontekstu razvijanja metode [5]. Za metode prilagođene procesu tiska, diskretna Fourierova
transformacija je optimalan odabir, budući da metode bazirane na DFT-u pružaju otpornost
na geometrijske transformacije koje se događaju tijekom reprodukcije [5], [6].
U ovom istraživanju korištene su slike u cmyk prostoru boja. Svaka slika najprije je
podijeljena u blokove, a umetanje informacije vrši se za svaki blok pojedinačno. Pomoću
DFT-a, ???? kanal slikovnog bloka se transformira u frekvencijsku domenu, gdje se vrši
umetanje informacije. Akromatska zamjena koristi se za maskiranje vidljivih artefakata
nastalih prilikom umetanja informacije. Primjeri uspješnog korištenja akromatske zamjene za
maskiranje artefakata mogu se pronaći u [7] i [8]. Nakon umetanja informacije u svaki
slikovni blok, blokovi se ponovno spajaju u jednu, jedinstvenu sliku. Akromatska zamjena
tada mijenja vrijednosti c, m i y kanala slike, dok kanal k, u kojemu se nalazi umetnuta
informacija, ostaje nepromijenjen. Time nakon maskiranja akromatskom zamjenom označena
slika posjeduje ista vizualna svojstva kao i slika prije označavanja. U eksperimentalnom dijelu rada koristi se 1000 slika u cmyk prostoru boja. U digitalnom
okruženju provedeno je istraživanje otpornosti metode na slikovne napade specifične za
reprodukcijski proces - skaliranje, blur, šum, rotaciju i kompresiju. Također, provedeno je
istraživanje otpornosti metode na reprodukcijski proces, koristeći tiskane uzorke. Objektivna
metrika bit error rate (BER) korištena je za evaluaciju. Mogućnost optimizacije metode
testirala se procesiranjem slike (unsharp filter) i korištenjem error correction kodova (ECC).
Provedeno je istraživanje kvalitete slike nakon umetanja informacije. Za evaluaciju su
korištene objektivne metrike peak signal to noise ratio (PSNR) i structural similarity index
measure (SSIM). PSNR i SSIM su tzv. full-reference metrike. Drugim riječima, potrebne su i
neoznačena i označena slika istovremeno, kako bi se mogla utvrditi razina sličnosti između
slika [9], [10]. Subjektivna analiza provedena je na 36 ispitanika, koristeći ukupno 144
uzorka slika. Ispitanici su ocijenjivali vidljivost artefakata na skali od nula (nevidljivo) do tri
(vrlo vidljivo).
Rezultati pokazuju da metoda posjeduje visoku razinu otpornosti na reprodukcijski proces.
Također, metoda se uistinu optimizirala korištenjem unsharp filtera i ECC-a. Kvaliteta slike
ostaje visoka bez obzira na umetanje informacije, što su potvrdili rezultati eksperimenata s
objektivnim metrikama i subjektivna analiza
The Automation of the Extraction of Evidence masked by Steganographic Techniques in WAV and MP3 Audio Files
Antiforensics techniques and particularly steganography and cryptography have
become increasingly pressing issues that affect the current digital forensics
practice, both techniques are widely researched and developed as considered in
the heart of the modern digital era but remain double edged swords standing
between the privacy conscious and the criminally malicious, dependent on the
severity of the methods deployed. This paper advances the automation of hidden
evidence extraction in the context of audio files enabling the correlation
between unprocessed evidence artefacts and extreme Steganographic and
Cryptographic techniques using the Least Significant Bits extraction method
(LSB). The research generates an in-depth review of current digital forensic
toolkit and systems and formally address their capabilities in handling
steganography-related cases, we opted for experimental research methodology in
the form of quantitative analysis of the efficiency of detecting and extraction
of hidden artefacts in WAV and MP3 audio files by comparing standard industry
software. This work establishes an environment for the practical implementation
and testing of the proposed approach and the new toolkit for extracting
evidence hidden by Cryptographic and Steganographic techniques during forensics
investigations. The proposed multi-approach automation demonstrated a huge
positive impact in terms of efficiency and accuracy and notably on large audio
files (MP3 and WAV) which the forensics analysis is time-consuming and requires
significant computational resources and memory. However, the proposed
automation may occasionally produce false positives (detecting steganography
where none exists) or false negatives (failing to detect steganography that is
present) but overall achieve a balance between detecting hidden data accurately
along with minimising the false alarms.Comment: Wires Forensics Sciences Under Revie
RAWIW: RAW Image Watermarking Robust to ISP Pipeline
Invisible image watermarking is essential for image copyright protection.
Compared to RGB images, RAW format images use a higher dynamic range to capture
the radiometric characteristics of the camera sensor, providing greater
flexibility in post-processing and retouching. Similar to the master recording
in the music industry, RAW images are considered the original format for
distribution and image production, thus requiring copyright protection.
Existing watermarking methods typically target RGB images, leaving a gap for
RAW images. To address this issue, we propose the first deep learning-based RAW
Image Watermarking (RAWIW) framework for copyright protection. Unlike RGB image
watermarking, our method achieves cross-domain copyright protection. We
directly embed copyright information into RAW images, which can be later
extracted from the corresponding RGB images generated by different
post-processing methods. To achieve end-to-end training of the framework, we
integrate a neural network that simulates the ISP pipeline to handle the
RAW-to-RGB conversion process. To further validate the generalization of our
framework to traditional ISP pipelines and its robustness to transmission
distortion, we adopt a distortion network. This network simulates various types
of noises introduced during the traditional ISP pipeline and transmission.
Furthermore, we employ a three-stage training strategy to strike a balance
between robustness and concealment of watermarking. Our extensive experiments
demonstrate that RAWIW successfully achieves cross-domain copyright protection
for RAW images while maintaining their visual quality and robustness to ISP
pipeline distortions
Image Data Augmentation from Small Training Datasets Using Generative Adversarial Networks (GANs)
The scarcity of labelled data is a serious problem since deep models generally require a large amount of training data to achieve desired performance. Data augmentation is widely adopted to enhance the diversity of original datasets and further improve the performance of deep learning models. Learning-based methods, compared to traditional techniques, are specialized in feature extraction, which enhances the effectiveness of data augmentation.
Generative adversarial networks (GANs), one of the learning-based generative models, have made remarkable advances in data synthesis. However, GANs still face many challenges in generating high-quality augmented images from small datasets because learning-based generative methods are difficult to create reliable outcomes without sufficient training data. This difficulty deteriorates the data augmentation applications using learning-based methods. In this thesis, to tackle the problem of labelled data scarcity and the training difficulty of augmenting image data from small datasets, three novel GAN models suitable for training with a small number of training samples have been proposed based on three different mapping relationships between the input and output images, including one-to-many mapping, one-to-one mapping, and many-to-many mapping. The proposed GANs employ limited training data, such as a small number of images and limited conditional features, and the synthetic images generated by the proposed GANs are expected to generate images of not only high generative quality but also desirable data diversity.
To evaluate the effectiveness of the augmented images generated by the proposed models, inception distances and human perception methods are adopted. Additionally, different image classification tasks were carried out and accuracies from using the original datasets and the augmented datasets were compared. Experimental results illustrate the image classification performance based on convolutional neural networks, i.e., AlexNet, GoogLeNet, ResNet and VGGNet, is comprehensively enhanced, and the scale of improvement is significant when a small number of training samples are involved
- …