9 research outputs found

    A semantics aware approach to automated reverse engineering unknown protocols

    Full text link
    Abstract—Extracting the protocol message format specifica-tions of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100 % precision and recall. For SMTP, ProDecoder achieves approximately 95 % precision and recall

    Unsupervised Time Series Extraction from Controller Area Network Payloads

    Full text link
    This paper introduces a method for unsupervised tokenization of Controller Area Network (CAN) data payloads using bit level transition analysis and a greedy grouping strategy. The primary goal of this proposal is to extract individual time series which have been concatenated together before transmission onto a vehicle's CAN bus. This process is necessary because the documentation for how to properly extract data from a network may not always be available; passenger vehicle CAN configurations are protected as trade secrets. At least one major manufacturer has also been found to deliberately misconfigure their documented extraction methods. Thus, this proposal serves as a critical enabler for robust third-party security auditing and intrusion detection systems which do not rely on manufacturers sharing confidential information.Comment: 2018 IEEE 88th Vehicular Technology Conference (VTC2018-Fall

    Multimedia Data Flow Traffic Classification Using Intelligent Models Based on Traffic Patterns

    Full text link
    [EN] Nowadays, there is high interest in modeling the type of multimedia traffic with the purpose of estimating the network resources required to guarantee the quality delivered to the user. In this work we propose a multimedia traffic classification model based on patterns that allows us to differentiate the type of traffic by using video streaming and network characteristics as input parameters. We show that there is low correlation between network parameters and the delivered video quality. Because of this, in addition to network parameters, we also add video streaming parameters in order to improve the efficiency of our system. Finally, it should be noted that, based on the objective video quality received by the user, we have extracted traffic patterns that we use to perform the development of the classification model.This work has been supported by the Ministerio de Economia y Competitividad in the Programa Estatal de Fomento de la Investigacion Cientifica y Tecnica de Excelencia, Subprograma Estatal de Generacion de Conocimiento within the Project with reference TIN2017-84802-C2-1-P.Canovas Solbes, A.; Jimenez, JM.; Romero Martínez, JO.; Lloret, J. (2018). Multimedia Data Flow Traffic Classification Using Intelligent Models Based on Traffic Patterns. IEEE Network. 32(6):100-107. doi:10.1109/MNET.2018.180012110010732

    Message Type Identification of Binary Network Protocols using Continuous Segment Similarity

    Full text link
    Protocol reverse engineering based on traffic traces infers the behavior of unknown network protocols by analyzing observable network messages. To perform correct deduction of message semantics or behavior analysis, accurate message type identification is an essential first step. However, identifying message types is particularly difficult for binary protocols, whose structural features are hidden in their densely packed data representation. We leverage the intrinsic structural features of binary protocols and propose an accurate method for discriminating message types. Our approach uses a similarity measure with continuous value range by comparing feature vectors where vector elements correspond to the fields in a message, rather than discrete byte values. This enables a better recognition of structural patterns, which remain hidden when only exact value matches are considered. We combine Hirschberg alignment with DBSCAN as cluster algorithm to yield a novel inference mechanism. By applying novel autoconfiguration schemes, we do not require manually configured parameters for the analysis of an unknown protocol, as required by earlier approaches. Results of our evaluations show that our approach has considerable advantages in message type identification result quality and also execution performance over previous approaches.Comment: 11 pages, 4 figures, to be published in IEEE International Conference on Computer Communications. INFOCOM. Beijing, China, 202

    P-gram: positional N-gram for the clustering of machine-generated messages

    Get PDF
    An IT system generates messages for other systems or users to consume, through direct interaction or as system logs. Automatically identifying the types of these machine-generated messages has many applications, such as intrusion detection and system behavior discovery. Among various heuristic methods for automatically identifying message types, the clustering methods based on keyword extraction have been quite effective. However, these methods still suffer from keyword misidentification problems, i.e., some keyword occurrences are wrongly identified as payload and some strings in the payload are wrongly identified as keyword occurrences, leading to the misidentification of the message types. In this paper, we propose a new machine language processing (MLP) approach, called P-gram, specifically designed for identifying keywords in, and subsequently clustering, machine-generated messages. First, we introduce a novel concept and technique, positional n-gram, for message keywords extraction. By associating the position as meta-data with each n-gram, we can more accurately discern which n-grams are keywords of a message and which n-grams are parts of the payload information. Then, the positional keywords are used as features to cluster the messages, and an entropy-based positional weighting method is devised to measure the importance or weight of the positional keywords to each message. Finally, a general centroid clustering method, K-Medoids, is used to leverage the importance of the keywords and cluster messages into groups reflecting their types. We evaluate our method on a range of machine-generated (text and binary) messages from the real-world systems and show that our method achieves higher accuracy than the current state-of-the-art tools

    Enabling Auditing and Intrusion Detection of Proprietary Controller Area Networks

    Get PDF
    The goal of this dissertation is to provide automated methods for security researchers to overcome ‘security through obscurity’ used by manufacturers of proprietary Industrial Control Systems (ICS). `White hat\u27 security analysts waste significant time reverse engineering these systems\u27 opaque network configurations instead of performing meaningful security auditing tasks. Automating the process of documenting proprietary protocol configurations is intended to improve independent security auditing of ICS networks. The major contributions of this dissertation are a novel approach for unsupervised lexical analysis of binary network data flows and analysis of the time series data extracted as a result. We demonstrate the utility of these methods using Controller Area Network (CAN) data sampled from passenger vehicles

    Automatic Configuration of Programmable Logic Controller Emulators

    Get PDF
    Programmable logic controllers (PLCs), which are used to control much of the world\u27s critical infrastructures, are highly vulnerable and exposed to the Internet. Many efforts have been undertaken to develop decoys, or honeypots, of these devices in order to characterize, attribute, or prevent attacks against Industrial Control Systems (ICS) networks. Unfortunately, since ICS devices typically are proprietary and unique, one emulation solution for a particular vendor\u27s model will not likely work on other devices. Many previous efforts have manually developed ICS honeypots, but it is a very time intensive process. Thus, a scalable solution is needed in order to automatically configure PLC emulators. The ScriptGenE Framework presented in this thesis leverages several techniques used in reverse engineering protocols in order to automatically configure PLC emulators using network traces. The accuracy, flexibility, and efficiency of the ScriptGenE Framework is tested in three fully automated experiments

    GrAMeFFSI: Graph Analysis Based Message Format and Field Semantics Inference For Binary Protocols, Using Recorded Network Traffic

    Get PDF
    Protocol specifications describe the interaction between different entities by defining message formats and message processing rules. Having access to such protocol specifications is highly desirable for many tasks, including the analysis of botnets, building honeypots, defining network intrusion detection rules, and fuzz testing protocol implementations. Unfortunately, many protocols of interest are proprietary, and their specifications are not publicly available. Protocol reverse engineering is an approach to reconstruct the specifications of such closed protocols. Protocol reverse engineering can be tedious work if done manually, so prior research focused on automating the reverse engineering process as much as possible. Some approaches rely on access to the protocol implementation, but in many cases, the protocol implementation itself is not available or its license does not permit its use for reverse engineering purposes. Hence, in this paper, we focus on reverse engineering protocol specifications relying solely on recorded network traffic. More specifically, we propose GrAMeFFSI, a method based on graph analysis that can infer protocol message formats as well as certain field semantics for binary protocols from network traces. We demonstrate the usability of our approach by running it on packet captures of two known protocols, Modbus and MQTT, then comparing the inferred specifications to the official specifications of these protocols
    corecore