1,197 research outputs found

    Covert Communication in Autoencoder Wireless Systems

    Get PDF
    The broadcast nature of wireless communications presents security and privacy challenges. Covert communication is a wireless security practice that focuses on intentionally hiding transmitted information. Recently, wireless systems have experienced significant growth, including the emergence of autoencoder-based models. These models, like other DNN architectures, are vulnerable to adversarial attacks, highlighting the need to study their susceptibility to covert communication. While there is ample research on covert communication in traditional wireless systems, the investigation of autoencoder wireless systems remains scarce. Furthermore, many existing covert methods are either detectable analytically or difficult to adapt to diverse wireless systems. The first part of this thesis provides a comprehensive examination of autoencoder-based communication systems in various scenarios and channel conditions. It begins with an introduction to autoencoder communication systems, followed by a detailed discussion of our own implementation and evaluation results. This serves as a solid foundation for the subsequent part of the thesis, where we propose a GAN-based covert communication model. By treating the covert sender, covert receiver, and observer as generator, decoder, and discriminator neural networks, respectively, we conduct joint training in an adversarial setting to develop a covert communication scheme that can be integrated into any normal autoencoder. Our proposal minimizes the impact on ongoing normal communication, addressing previous works shortcomings. We also introduce a training algorithm that allows for the desired tradeoff between covertness and reliability. Numerical results demonstrate the establishment of a reliable and undetectable channel between covert users, regardless of the cover signal or channel condition, with minimal disruption to the normal system operation

    Information Forensics and Security: A quarter-century-long journey

    Get PDF
    Information forensics and security (IFS) is an active R&D area whose goal is to ensure that people use devices, data, and intellectual properties for authorized purposes and to facilitate the gathering of solid evidence to hold perpetrators accountable. For over a quarter century, since the 1990s, the IFS research area has grown tremendously to address the societal needs of the digital information era. The IEEE Signal Processing Society (SPS) has emerged as an important hub and leader in this area, and this article celebrates some landmark technical contributions. In particular, we highlight the major technological advances by the research community in some selected focus areas in the field during the past 25 years and present future trends

    A Survey on Reversible Image Data Hiding Using the Hierarchical Block Embedding Technique

    Get PDF
    The use of graphics for data concealment has significantly advanced the fields of secure communication and identity verification. Reversible data hiding (RDH) involves hiding data within host media, such as images, while allowing for the recovery of the original cover. Various RDH approaches have been developed, including difference expansion, interpolation techniques, prediction, and histogram modification. However, these methods were primarily applied to plain photos. This study introduces a novel reversible image transformation technique called Block Hierarchical Substitution (BHS). BHS enhances the quality of encrypted images and enables lossless restoration of the secret image with a low Peak Signal-to-Noise Ratio (PSNR). The cover image is divided into non-overlapping blocks, and the pixel values within each block are encrypted using the modulo function. This ensures that the linear prediction difference in the block remains consistent before and after encryption, enabling independent data extraction without picture decryption. In order to address the challenges associated with secure multimedia data processing, such as data encryption during transmission and storage, this survey investigates the specific issues related to reversible data hiding in encrypted images (RDHEI). Our proposed solution aims to enhance security (low Mean Squared Error) and improve the PSNR value by applying the method to encrypted images

    Cybersecurity: Past, Present and Future

    Full text link
    The digital transformation has created a new digital space known as cyberspace. This new cyberspace has improved the workings of businesses, organizations, governments, society as a whole, and day to day life of an individual. With these improvements come new challenges, and one of the main challenges is security. The security of the new cyberspace is called cybersecurity. Cyberspace has created new technologies and environments such as cloud computing, smart devices, IoTs, and several others. To keep pace with these advancements in cyber technologies there is a need to expand research and develop new cybersecurity methods and tools to secure these domains and environments. This book is an effort to introduce the reader to the field of cybersecurity, highlight current issues and challenges, and provide future directions to mitigate or resolve them. The main specializations of cybersecurity covered in this book are software security, hardware security, the evolution of malware, biometrics, cyber intelligence, and cyber forensics. We must learn from the past, evolve our present and improve the future. Based on this objective, the book covers the past, present, and future of these main specializations of cybersecurity. The book also examines the upcoming areas of research in cyber intelligence, such as hybrid augmented and explainable artificial intelligence (AI). Human and AI collaboration can significantly increase the performance of a cybersecurity system. Interpreting and explaining machine learning models, i.e., explainable AI is an emerging field of study and has a lot of potentials to improve the role of AI in cybersecurity.Comment: Author's copy of the book published under ISBN: 978-620-4-74421-

    Timely Classification of Encrypted or ProtocolObfuscated Internet Traffic Using Statistical Methods

    Get PDF
    Internet traffic classification aims to identify the type of application or protocol that generated a particular packet or stream of packets on the network. Through traffic classification, Internet Service Providers (ISPs), governments, and network administrators can access basic functions and several solutions, including network management, advanced network monitoring, network auditing, and anomaly detection. Traffic classification is essential as it ensures the Quality of Service (QoS) of the network, as well as allowing efficient resource planning. With the increase of encrypted or obfuscated protocol traffic on the Internet and multilayer data encapsulation, some classical classification methods have lost interest from the scientific community. The limitations of traditional classification methods based on port numbers and payload inspection to classify encrypted or obfuscated Internet traffic have led to significant research efforts focused on Machine Learning (ML) based classification approaches using statistical features from the transport layer. In an attempt to increase classification performance, Machine Learning strategies have gained interest from the scientific community and have shown promise in the future of traffic classification, specially to recognize encrypted traffic. However, ML approach also has its own limitations, as some of these methods have a high computational resource consumption, which limits their application when classifying large traffic or realtime flows. Limitations of ML application have led to the investigation of alternative approaches, including featurebased procedures and statistical methods. In this sense, statistical analysis methods, such as distances and divergences, have been used to classify traffic in large flows and in realtime. The main objective of statistical distance is to differentiate flows and find a pattern in traffic characteristics through statistical properties, which enable classification. Divergences are functional expressions often related to information theory, which measure the degree of discrepancy between any two distributions. This thesis focuses on proposing a new methodological approach to classify encrypted or obfuscated Internet traffic based on statistical methods that enable the evaluation of network traffic classification performance, including the use of computational resources in terms of CPU and memory. A set of traffic classifiers based on KullbackLeibler and JensenShannon divergences, and Euclidean, Hellinger, Bhattacharyya, and Wootters distances were proposed. The following are the four main contributions to the advancement of scientific knowledge reported in this thesis. First, an extensive literature review on the classification of encrypted and obfuscated Internet traffic was conducted. The results suggest that portbased and payloadbased methods are becoming obsolete due to the increasing use of traffic encryption and multilayer data encapsulation. MLbased methods are also becoming limited due to their computational complexity. As an alternative, Support Vector Machine (SVM), which is also an ML method, and the KolmogorovSmirnov and Chisquared tests can be used as reference for statistical classification. In parallel, the possibility of using statistical methods for Internet traffic classification has emerged in the literature, with the potential of good results in classification without the need of large computational resources. The potential statistical methods are Euclidean Distance, Hellinger Distance, Bhattacharyya Distance, Wootters Distance, as well as KullbackLeibler (KL) and JensenShannon divergences. Second, we present a proposal and implementation of a classifier based on SVM for P2P multimedia traffic, comparing the results with KolmogorovSmirnov (KS) and Chisquare tests. The results suggest that SVM classification with Linear kernel leads to a better classification performance than KS and Chisquare tests, depending on the value assigned to the Self C parameter. The SVM method with Linear kernel and suitable values for the Self C parameter may be a good choice to identify encrypted P2P multimedia traffic on the Internet. Third, we present a proposal and implementation of two classifiers based on KL Divergence and Euclidean Distance, which are compared to SVM with Linear kernel, configured with the standard Self C parameter, showing a reduced ability to classify flows based solely on packet sizes compared to KL and Euclidean Distance methods. KL and Euclidean methods were able to classify all tested applications, particularly streaming and P2P, where for almost all cases they efficiently identified them with high accuracy, with reduced consumption of computational resources. Based on the obtained results, it can be concluded that KL and Euclidean Distance methods are an alternative to SVM, as these statistical approaches can operate in realtime and do not require retraining every time a new type of traffic emerges. Fourth, we present a proposal and implementation of a set of classifiers for encrypted Internet traffic, based on JensenShannon Divergence and Hellinger, Bhattacharyya, and Wootters Distances, with their respective results compared to those obtained with methods based on Euclidean Distance, KL, KS, and ChiSquare. Additionally, we present a comparative qualitative analysis of the tested methods based on Kappa values and Receiver Operating Characteristic (ROC) curves. The results suggest average accuracy values above 90% for all statistical methods, classified as ”almost perfect reliability” in terms of Kappa values, with the exception of KS. This result indicates that these methods are viable options to classify encrypted Internet traffic, especially Hellinger Distance, which showed the best Kappa values compared to other classifiers. We conclude that the considered statistical methods can be accurate and costeffective in terms of computational resource consumption to classify network traffic. Our approach was based on the classification of Internet network traffic, focusing on statistical distances and divergences. We have shown that it is possible to classify and obtain good results with statistical methods, balancing classification performance and the use of computational resources in terms of CPU and memory. The validation of the proposal supports the argument of this thesis, which proposes the implementation of statistical methods as a viable alternative to Internet traffic classification compared to methods based on port numbers, payload inspection, and ML.A classificação de tráfego Internet visa identificar o tipo de aplicação ou protocolo que gerou um determinado pacote ou fluxo de pacotes na rede. Através da classificação de tráfego, Fornecedores de Serviços de Internet (ISP), governos e administradores de rede podem ter acesso às funções básicas e várias soluções, incluindo gestão da rede, monitoramento avançado de rede, auditoria de rede e deteção de anomalias. Classificar o tráfego é essencial, pois assegura a Qualidade de Serviço (QoS) da rede, além de permitir planear com eficiência o uso de recursos. Com o aumento de tráfego cifrado ou protocolo ofuscado na Internet e do encapsulamento de dados multicamadas, alguns métodos clássicos da classificação perderam interesse de investigação da comunidade científica. As limitações dos métodos tradicionais da classificação com base no número da porta e na inspeção de carga útil payload para classificar o tráfego de Internet cifrado ou ofuscado levaram a esforços significativos de investigação com foco em abordagens da classificação baseadas em técnicas de Aprendizagem Automática (ML) usando recursos estatísticos da camada de transporte. Na tentativa de aumentar o desempenho da classificação, as estratégias de Aprendizagem Automática ganharam o interesse da comunidade científica e se mostraram promissoras no futuro da classificação de tráfego, principalmente no reconhecimento de tráfego cifrado. No entanto, a abordagem em ML também têm as suas próprias limitações, pois alguns desses métodos possuem um elevado consumo de recursos computacionais, o que limita a sua aplicação para classificação de grandes fluxos de tráfego ou em tempo real. As limitações no âmbito da aplicação de ML levaram à investigação de abordagens alternativas, incluindo procedimentos baseados em características e métodos estatísticos. Neste sentido, os métodos de análise estatística, tais como distâncias e divergências, têm sido utilizados para classificar tráfego em grandes fluxos e em tempo real. A distância estatística possui como objetivo principal diferenciar os fluxos e permite encontrar um padrão nas características de tráfego através de propriedades estatísticas, que possibilitam a classificação. As divergências são expressões funcionais frequentemente relacionadas com a teoria da informação, que mede o grau de discrepância entre duas distribuições quaisquer. Esta tese focase na proposta de uma nova abordagem metodológica para classificação de tráfego cifrado ou ofuscado da Internet com base em métodos estatísticos que possibilite avaliar o desempenho da classificação de tráfego de rede, incluindo a utilização de recursos computacionais, em termos de CPU e memória. Foi proposto um conjunto de classificadores de tráfego baseados nas Divergências de KullbackLeibler e JensenShannon e Distâncias Euclidiana, Hellinger, Bhattacharyya e Wootters. A seguir resumemse os tese. Primeiro, realizámos uma ampla revisão de literatura sobre classificação de tráfego cifrado e ofuscado de Internet. Os resultados sugerem que os métodos baseados em porta e baseados em carga útil estão se tornando obsoletos em função do crescimento da utilização de cifragem de tráfego e encapsulamento de dados multicamada. O tipo de métodos baseados em ML também está se tornando limitado em função da complexidade computacional. Como alternativa, podese utilizar a Máquina de Vetor de Suporte (SVM), que também é um método de ML, e os testes de KolmogorovSmirnov e Quiquadrado como referência de comparação da classificação estatística. Em paralelo, surgiu na literatura a possibilidade de utilização de métodos estatísticos para classificação de tráfego de Internet, com potencial de bons resultados na classificação sem aporte de grandes recursos computacionais. Os métodos estatísticos potenciais são as Distâncias Euclidiana, Hellinger, Bhattacharyya e Wootters, além das Divergências de Kullback–Leibler (KL) e JensenShannon. Segundo, apresentamos uma proposta e implementação de um classificador baseado na Máquina de Vetor de Suporte (SVM) para o tráfego multimédia P2P (PeertoPeer), comparando os resultados com os testes de KolmogorovSmirnov (KS) e Quiquadrado. Os resultados sugerem que a classificação da SVM com kernel Linear conduz a um melhor desempenho da classificação do que os testes KS e Quiquadrado, dependente do valor atribuído ao parâmetro Self C. O método SVM com kernel Linear e com valores adequados para o parâmetro Self C pode ser uma boa escolha para identificar o tráfego Par a Par (P2P) multimédia cifrado na Internet. Terceiro, apresentamos uma proposta e implementação de dois classificadores baseados na Divergência de KullbackLeibler (KL) e na Distância Euclidiana, sendo comparados com a SVM com kernel Linear, configurado para o parâmestro Self C padrão, apresenta reduzida capacidade de classificar fluxos com base apenas nos tamanhos dos pacotes em relação aos métodos KL e Distância Euclidiana. Os métodos KL e Euclidiano foram capazes de classificar todas as aplicações testadas, destacandose streaming e P2P, onde para quase todos os casos foi eficiente identificálas com alta precisão, com reduzido consumo de recursos computacionais.Com base nos resultados obtidos, podese concluir que os métodos KL e Distância Euclidiana são uma alternativa à SVM, porque essas abordagens estatísticas podem operar em tempo real e não precisam de retreinamento cada vez que surge um novo tipo de tráfego. Quarto, apresentamos uma proposta e implementação de um conjunto de classificadores para o tráfego de Internet cifrado, baseados na Divergência de JensenShannon e nas Distâncias de Hellinger, Bhattacharyya e Wootters, sendo os respetivos resultados comparados com os resultados obtidos com os métodos baseados na Distância Euclidiana, KL, KS e Quiquadrado. Além disso, apresentamos uma análise qualitativa comparativa dos métodos testados com base nos valores de Kappa e Curvas Característica de Operação do Receptor (ROC). Os resultados sugerem valores médios de precisão acima de 90% para todos os métodos estatísticos, classificados como “confiabilidade quase perfeita” em valores de Kappa, com exceçãode KS. Esse resultado indica que esses métodos são opções viáveis para a classificação de tráfego cifrado da Internet, em especial a Distância de Hellinger, que apresentou os melhores resultados do valor de Kappa em comparaçãocom os demais classificadores. Concluise que os métodos estatísticos considerados podem ser precisos e económicos em termos de consumo de recursos computacionais para classificar o tráfego da rede. A nossa abordagem baseouse na classificação de tráfego de rede Internet, focando em distâncias e divergências estatísticas. Nós mostramos que é possível classificar e obter bons resultados com métodos estatísticos, equilibrando desempenho de classificação e uso de recursos computacionais em termos de CPU e memória. A validação da proposta sustenta o argumento desta tese, que propõe a implementação de métodos estatísticos como alternativa viável à classificação de tráfego da Internet em relação aos métodos com base no número da porta, na inspeção de carga útil e de ML.Thesis prepared at Instituto de Telecomunicações Delegação da Covilhã and at the Department of Computer Science of the University of Beira Interior, and submitted to the University of Beira Interior for discussion in public session to obtain the Ph.D. Degree in Computer Science and Engineering. This work has been funded by Portuguese FCT/MCTES through national funds and, when applicable, cofunded by EU funds under the project UIDB/50008/2020, and by operation Centro010145FEDER000019 C4 Centro de Competências em Cloud Computing, cofunded by the European Regional Development Fund (ERDF/FEDER) through the Programa Operacional Regional do Centro (Centro 2020). This work has also been funded by CAPES (Brazilian Federal Agency for Support and Evaluation of Graduate Education) within the Ministry of Education of Brazil under a scholarship supported by the International Cooperation Program CAPES/COFECUB Project 9090134/ 2013 at the University of Beira Interior

    H4VDM: H.264 Video Device Matching

    Full text link
    Methods that can determine if two given video sequences are captured by the same device (e.g., mobile telephone or digital camera) can be used in many forensics tasks. In this paper we refer to this as "video device matching". In open-set video forensics scenarios it is easier to determine if two video sequences were captured with the same device than identifying the specific device. In this paper, we propose a technique for open-set video device matching. Given two H.264 compressed video sequences, our method can determine if they are captured by the same device, even if our method has never encountered the device in training. We denote our proposed technique as H.264 Video Device Matching (H4VDM). H4VDM uses H.264 compression information extracted from video sequences to make decisions. It is more robust against artifacts that alter camera sensor fingerprints, and it can be used to analyze relatively small fragments of the H.264 sequence. We trained and tested our method on a publicly available video forensics dataset consisting of 35 devices, where our proposed method demonstrated good performance

    Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

    Full text link
    The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.Comment: PhD Thesis Pierre Champion | Universit\'e de Lorraine - INRIA Nancy | for associated source code, see https://github.com/deep-privacy/SA-toolki

    Black-box Dataset Ownership Verification via Backdoor Watermarking

    Full text link
    Deep learning, especially deep neural networks (DNNs), has been widely and successfully adopted in many critical applications for its high effectiveness and efficiency. The rapid development of DNNs has benefited from the existence of some high-quality datasets (e.g.e.g., ImageNet), which allow researchers and developers to easily verify the performance of their methods. Currently, almost all existing released datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes without permission. However, there is still no good way to ensure that. In this paper, we formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model, where defenders can only query the model while having no information about its parameters and training details. Based on this formulation, we propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Our method contains two main parts, including dataset watermarking and dataset verification. Specifically, we exploit poison-only backdoor attacks (e.g.e.g., BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification. We also provide some theoretical analyses of our methods. Experiments on multiple benchmark datasets of different tasks are conducted, which verify the effectiveness of our method. The code for reproducing main experiments is available at \url{https://github.com/THUYimingLi/DVBW}.Comment: This paper is accepted by IEEE TIFS. 15 pages. The preliminary short version of this paper was posted on arXiv (arXiv:2010.05821) and presented in a non-archival NeurIPS Workshop (2020

    A survey on artificial intelligence-based acoustic source identification

    Get PDF
    The concept of Acoustic Source Identification (ASI), which refers to the process of identifying noise sources has attracted increasing attention in recent years. The ASI technology can be used for surveillance, monitoring, and maintenance applications in a wide range of sectors, such as defence, manufacturing, healthcare, and agriculture. Acoustic signature analysis and pattern recognition remain the core technologies for noise source identification. Manual identification of acoustic signatures, however, has become increasingly challenging as dataset sizes grow. As a result, the use of Artificial Intelligence (AI) techniques for identifying noise sources has become increasingly relevant and useful. In this paper, we provide a comprehensive review of AI-based acoustic source identification techniques. We analyze the strengths and weaknesses of AI-based ASI processes and associated methods proposed by researchers in the literature. Additionally, we did a detailed survey of ASI applications in machinery, underwater applications, environment/event source recognition, healthcare, and other fields. We also highlight relevant research directions
    corecore