22 research outputs found

    Semantic-Preserving Linguistic Steganography by Pivot Translation and Semantic-Aware Bins Coding

    Full text link
    Linguistic steganography (LS) aims to embed secret information into a highly encoded text for covert communication. It can be roughly divided to two main categories, i.e., modification based LS (MLS) and generation based LS (GLS). Unlike MLS that hides secret data by slightly modifying a given text without impairing the meaning of the text, GLS uses a trained language model to directly generate a text carrying secret data. A common disadvantage for MLS methods is that the embedding payload is very low, whose return is well preserving the semantic quality of the text. In contrast, GLS allows the data hider to embed a high payload, which has to pay the high price of uncontrollable semantics. In this paper, we propose a novel LS method to modify a given text by pivoting it between two different languages and embed secret data by applying a GLS-like information encoding strategy. Our purpose is to alter the expression of the given text, enabling a high payload to be embedded while keeping the semantic information unchanged. Experimental results have shown that the proposed work not only achieves a high embedding payload, but also shows superior performance in maintaining the semantic consistency and resisting linguistic steganalysis

    Novel linguistic steganography based on character-level text generation

    Full text link
    With the development of natural language processing, linguistic steganography has become a research hotspot in the field of information security. However, most existing linguistic steganographic methods may suffer from the low embedding capacity problem. Therefore, this paper proposes a character-level linguistic steganographic method (CLLS) to embed the secret information into characters instead of words by employing a long short-term memory (LSTM) based language model. First, the proposed method utilizes the LSTM model and large-scale corpus to construct and train a character-level text generation model. Through training, the best evaluated model is obtained as the prediction model of generating stego text. Then, we use the secret information as the control information to select the right character from predictions of the trained character-level text generation model. Thus, the secret information is hidden in the generated text as the predicted characters having different prediction probability values can be encoded into different secret bit values. For the same secret information, the generated stego texts vary with the starting strings of the text generation model, so we design a selection strategy to find the highest quality stego text from a number of candidate stego texts as the final stego text by changing the starting strings. The experimental results demonstrate that compared with other similar methods, the proposed method has the fastest running speed and highest embedding capacity. Moreover, extensive experiments are conducted to verify the effect of the number of candidate stego texts on the quality of the final stego text. The experimental results show that the quality of the final stego text increases with the number of candidate stego texts increasing, but the growth rate of the quality will slow down

    Hiding Information in Reversible English Transforms for a Blind Receiver

    Get PDF
    This paper proposes a new technique for hiding secret messages in ordinary English text. The proposed technique exploits the redundancies existing in some English language constructs. Redundancies result from the flexibility in maneuvering certain statement constituents without altering the statement meaning or correctness. For example, one can say “she went to sleep, because she was tired” or “Because she was tired, she went to sleep.” The paper provides a number of such transformations that can be applied concurrently, while keeping the overall meaning and grammar intact. The proposed data hiding technique is blind since the receiver does not keep a copy of the original uncoded text (cover). Moreover, it can hide more than three bits per statement, which is higher than that achieved in the prior work. A secret key that is a function of the various transformations used is proposed to protect the confidentiality of the hidden message. Our security analysis shows that even if the attacker knows how the transforms are employed, the secret key provides enough security to protect the confidentiality of the hidden message. Moreover, we show that the proposed transformations do not affect the inconspicuousness of the transformed statements, and thus unlikely to draw suspicion

    Hiding Information in Reversible English Transforms for a Blind Receiver

    Get PDF
    This paper proposes a new technique for hiding secret messages in ordinary English text. The proposed technique exploits the redundancies existing in some English language constructs. Redundancies result from the flexibility in maneuvering certain statement constituents without altering the statement meaning or correctness. For example, one can say "she went to sleep, because she was tired" or "Because she was tired, she went to sleep. " The paper provides a number of such transformations that can be applied concurrently, while keeping the overall meaning and grammar intact. The proposed data hiding technique is blind since the receiver does not keep a copy of the original uncoded text (cover). Moreover, it can hide more than three bits per statement, which is higher than that achieved in the prior work. A secret key that is a function of the various transformations used is proposed to protect the confidentiality of the hidden message. Our security analysis shows that even if the attacker knows how the transforms are employed, the secret key provides enough security to protect the confidentiality of the hidden message. Moreover, we show that the proposed transformations do not affect the inconspicuousness of the transformed statements, and thus unlikely to draw suspicion

    Covert Channels Within IRC

    Get PDF
    The exploration of advanced information hiding techniques is important to understand and defend against illicit data extractions over networks. Many techniques have been developed to covertly transmit data over networks, each differing in their capabilities, methods, and levels of complexity. This research introduces a new class of information hiding techniques for use over Internet Relay Chat (IRC), called the Variable Advanced Network IRC Stealth Handler (VANISH) system. Three methods for concealing information are developed under this framework to suit the needs of an attacker. These methods are referred to as the Throughput, Stealth, and Baseline scenarios. Each is designed for a specific purpose: to maximize channel capacity, minimize shape-based detectability, or provide a baseline for comparison using established techniques applied to IRC. The effectiveness of these scenarios is empirically tested using public IRC servers in Chicago, Illinois and Amsterdam, Netherlands. The Throughput method exfiltrates covert data at nearly 800 bits per second (bps) compared to 18 bps with the Baseline method and 0.13 bps for the Stealth method. The Stealth method uses Reed-Solomon forward error correction to reduce bit errors from 3.1% to nearly 0% with minimal additional overhead. The Stealth method also successfully evades shape-based detection tests but is vulnerable to regularity-based tests

    Hybrid Arabic text steganography

    Get PDF
    An improved method for Arabic text steganography is introduced in this paper. This method hides an Arabic text inside another based on a hybrid approach. Both Kashida and Arabic Diacritics are used to hide the Arabic text inside another text. In this improved method, the secret message is divided into two parts, the first part is to be hidden by the Kashida method, and the second is to be hidden by the Diacritics or Harakat method. For security purposes, we benefitted from the natural existence of Diacritics as a characteristic of Arabic written language, as used to represent vowel sounds. The paper exploits the possibility of hiding data in Fathah diacritic and Kashida punctuation marks, adjusting previously presented schemes that are based on a single method only. Here, the secret message is divided into two parts, the cover text is prepared, and then we apply the Harakat method on the first part. The Kashida method is applied on the second part, and then the two parts are combined. When the hidden ‘StegoText’ is received, a split mechanism is used to recover the original message. The described hybrid Arabic StegoText showed higher capacity and security with promising results compared to other methods

    Towards Code Watermarking with Dual-Channel Transformations

    Full text link
    The expansion of the open source community and the rise of large language models have raised ethical and security concerns on the distribution of source code, such as misconduct on copyrighted code, distributions without proper licenses, or misuse of the code for malicious purposes. Hence it is important to track the ownership of source code, in wich watermarking is a major technique. Yet, drastically different from natural languages, source code watermarking requires far stricter and more complicated rules to ensure the readability as well as the functionality of the source code. Hence we introduce SrcMarker, a watermarking system to unobtrusively encode ID bitstrings into source code, without affecting the usage and semantics of the code. To this end, SrcMarker performs transformations on an AST-based intermediate representation that enables unified transformations across different programming languages. The core of the system utilizes learning-based embedding and extraction modules to select rule-based transformations for watermarking. In addition, a novel feature-approximation technique is designed to tackle the inherent non-differentiability of rule selection, thus seamlessly integrating the rule-based transformations and learning-based networks into an interconnected system to enable end-to-end training. Extensive experiments demonstrate the superiority of SrcMarker over existing methods in various watermarking requirements.Comment: 16 page
    corecore