323 research outputs found

    Randomly-based Stepwise Multi-Level Distributed Medical Image Steganography

    Get PDF
    Steganography deals with concealing sensitive information that can either be communicated across a network or stored in a secured location. The concealment of information is accomplished through the carrier, making data access by an unauthorized person more difficult. There are many stenographic techniques that have been used. Unfortunately, the hybrid-multi-level approach was ignored. For this reason, the current research utilized image steganography on a hybrid-multi level involving encryption, data compression, and two-stage high data concealment. The proposed technique can be used to conceal information in medical images without any distortion, allowing flexible and secure transfer capability. After using the Trible DES algorithm to encrypt the secret text at the beginning of the process, the next step involves embedding the secret encrypted cipher message into the host image while keeping the image intact. The findings indicate that the value of PSNR and NCC are satisfactory when compared to the sensitivity of the human eye. As a direct impact, the confidential message is hidden from the adversary. It can be seen that the PSNR value is quite high. Therefore, this indicates that the image after the stenographic process is relatively similar to the original image

    Triple scheme based on image steganography to improve imperceptibility and security

    Get PDF
    A foremost priority in the information technology and communication era is achieving an effective and secure steganography scheme when considering information hiding. Commonly, the digital images are used as the cover for the steganography owing to their redundancy in the representation, making them hidden to the intruders. Nevertheless, any steganography system launched over the internet can be attacked upon recognizing the stego cover. Presently, the design and development of an effective image steganography system are facing several challenging issues including the low capacity, poor security, and imperceptibility. Towards overcoming the aforementioned issues, a new decomposition scheme was proposed for image steganography with a new approach known as a Triple Number Approach (TNA). In this study, three main stages were used to achieve objectives and overcome the issues of image steganography, beginning with image and text preparation, followed by embedding and culminating in extraction. Finally, the evaluation stage employed several evaluations in order to benchmark the results. Different contributions were presented with this study. The first contribution was a Triple Text Coding Method (TTCM), which was related to the preparation of secret messages prior to the embedding process. The second contribution was a Triple Embedding Method (TEM), which was related to the embedding process. The third contribution was related to security criteria which were based on a new partitioning of an image known as the Image Partitioning Method (IPM). The IPM proposed a random pixel selection, based on image partitioning into three phases with three iterations of the Hénon Map function. An enhanced Huffman coding algorithm was utilized to compress the secret message before TTCM process. A standard dataset from the Signal and Image Processing Institute (SIPI) containing color and grayscale images with 512 x 512 pixels were utilised in this study. Different parameters were used to test the performance of the proposed scheme based on security and imperceptibility (image quality). In image quality, four important measurements that were used are Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Mean Square Error (MSE) and Histogram analysis. Whereas, two security measurements that were used are Human Visual System (HVS) and Chi-square (X2) attacks. In terms of PSNR and SSIM, the Lena grayscale image obtained results were 78.09 and 1 dB, respectively. Meanwhile, the HVS and X2 attacks obtained high results when compared to the existing scheme in the literature. Based on the findings, the proposed scheme give evidence to increase capacity, imperceptibility, and security to overcome existing issues

    Neural Text Sanitization with Privacy Risk Indicators: An Empirical Analysis

    Full text link
    Text sanitization is the task of redacting a document to mask all occurrences of (direct or indirect) personal identifiers, with the goal of concealing the identity of the individual(s) referred in it. In this paper, we consider a two-step approach to text sanitization and provide a detailed analysis of its empirical performance on two recently published datasets: the Text Anonymization Benchmark (Pil\'an et al., 2022) and a collection of Wikipedia biographies (Papadopoulou et al., 2022). The text sanitization process starts with a privacy-oriented entity recognizer that seeks to determine the text spans expressing identifiable personal information. This privacy-oriented entity recognizer is trained by combining a standard named entity recognition model with a gazetteer populated by person-related terms extracted from Wikidata. The second step of the text sanitization process consists in assessing the privacy risk associated with each detected text span, either isolated or in combination with other text spans. We present five distinct indicators of the re-identification risk, respectively based on language model probabilities, text span classification, sequence labelling, perturbations, and web search. We provide a contrastive analysis of each privacy indicator and highlight their benefits and limitations, notably in relation to the available labeled data

    Malware detection with artificial intelligence: A systematic literature review

    Get PDF
    In this survey, we review the key developments in the field of malware detection using AI and analyze core challenges. We systematically survey state-of-the-art methods across five critical aspects of building an accurate and robust AI-powered malware-detection model: malware sophistication, analysis techniques, malware repositories, feature selection, and machine learning vs. deep learning. The effectiveness of an AI model is dependent on the quality of the features it is trained with. In turn, the quality and authenticity of these features is dependent on the quality of the dataset and the suitability of the analysis tool. Static analysis is fast but is limited by the widespread use of obfuscation. Dynamic analysis is not impacted by obfuscation but is defeated by ubiquitous anti-analysis techniques and requires more computational power. Sophisticated and evasive malware is challenging to extract authentic discriminatory features from and, combined with poor quality datasets, this can lead to a situation where a model achieves high accuracy with only one specific dataset

    Classifying tor traffic using character analysis

    Get PDF
    Tor is a privacy-preserving network that enables users to browse the Internet anonymously. Although the prospect of such anonymity is welcomed in many quarters, Tor can also be used for malicious purposes, prompting the need to monitor Tor network connections. Most traffic classification methods depend on flow-based features, due to traffic encryption. However, these features can be less reliable due to issues like asymmetric routing, and processing multiple packets can be time-intensive. In light of Tor’s sophisticated multilayered payload encryption compared with nonTor encryption, our research explored patterns in the encrypted data of both networks, challenging conventional encryption theory which assumes that ciphertexts should not be distinguishable from random strings of equal length. Our novel approach leverages machine learning to differentiate Tor from nonTor traffic using only the encrypted payload. We focused on extracting statistical hex character-based features from their encrypted data. For consistent findings, we drew from two datasets: a public one, which was divided into eight application types for more granular insight and a private one. Both datasets covered Tor and nonTor traffic. We developed a custom Python script called Charcount to extract relevant data and features accurately. To verify our results’ robustness, we utilized both Weka and scikit-learn for classification. In our first line of research, we conducted hex character analysis on the encrypted payloads of both Tor and nonTor traffic using statistical testing. Our investigation revealed a significant differentiation rate between Tor and nonTor traffic of 95.42% for the public dataset and 100% for the private dataset. The second phase of our study aimed to distinguish between Tor and nonTor traffic using machine learning, focusing on encrypted payload features that are independent of length. In our evaluations, the public dataset yielded an average accuracy of 93.56% when classified with the Decision Tree (DT) algorithm in scikit-learn, and 95.65% with the j48 algorithm in Weka. For the private dataset, the accuracies were 95.23% and 97.12%, respectively. Additionally, we found that the combination of WrapperSubsetEval+BestFirst with the J48 classifier both enhanced accuracy and optimized processing efficiency. In conclusion, our study contributes to both demonstrating the distinction between Tor and nonTor traffic and achieving efficient classification of both types of traffic using features derived exclusively from a single encrypted payload packet. This work holds significant implications for cybersecurity and points towards further advancements in the field.Tor is a privacy-preserving network that enables users to browse the Internet anonymously. Although the prospect of such anonymity is welcomed in many quarters, Tor can also be used for malicious purposes, prompting the need to monitor Tor network connections. Most traffic classification methods depend on flow-based features, due to traffic encryption. However, these features can be less reliable due to issues like asymmetric routing, and processing multiple packets can be time-intensive. In light of Tor’s sophisticated multilayered payload encryption compared with nonTor encryption, our research explored patterns in the encrypted data of both networks, challenging conventional encryption theory which assumes that ciphertexts should not be distinguishable from random strings of equal length. Our novel approach leverages machine learning to differentiate Tor from nonTor traffic using only the encrypted payload. We focused on extracting statistical hex character-based features from their encrypted data. For consistent findings, we drew from two datasets: a public one, which was divided into eight application types for more granular insight and a private one. Both datasets covered Tor and nonTor traffic. We developed a custom Python script called Charcount to extract relevant data and features accurately. To verify our results’ robustness, we utilized both Weka and scikit-learn for classification. In our first line of research, we conducted hex character analysis on the encrypted payloads of both Tor and nonTor traffic using statistical testing. Our investigation revealed a significant differentiation rate between Tor and nonTor traffic of 95.42% for the public dataset and 100% for the private dataset. The second phase of our study aimed to distinguish between Tor and nonTor traffic using machine learning, focusing on encrypted payload features that are independent of length. In our evaluations, the public dataset yielded an average accuracy of 93.56% when classified with the Decision Tree (DT) algorithm in scikit-learn, and 95.65% with the j48 algorithm in Weka. For the private dataset, the accuracies were 95.23% and 97.12%, respectively. Additionally, we found that the combination of WrapperSubsetEval+BestFirst with the J48 classifier both enhanced accuracy and optimized processing efficiency. In conclusion, our study contributes to both demonstrating the distinction between Tor and nonTor traffic and achieving efficient classification of both types of traffic using features derived exclusively from a single encrypted payload packet. This work holds significant implications for cybersecurity and points towards further advancements in the field
    corecore