A framework for malicious host fingerprinting using distributed network sensors

Abstract

Numerous software agents exist and are responsible for increasing volumes of malicious traffic that is observed on the Internet today. From a technical perspective the existing techniques for monitoring malicious agents and traffic were not developed to allow for the interrogation of the source of malicious traffic. This interrogation or reconnaissance would be considered active analysis as opposed to existing, mostly passive analysis. Unlike passive analysis, the active techniques are time-sensitive and their results become increasingly inaccurate as time delta between observation and interrogation increases. In addition to this, some studies had shown that the geographic separation of hosts on the Internet have resulted in pockets of different malicious agents and traffic targeting victims. As such it would be important to perform any kind of data collection over various source and in distributed IP address space. The data gathering and exposure capabilities of sensors such as honeypots and network telescopes were extended through the development of near-realtime Distributed Sensor Network modules that allowed for the near-realtime analysis of malicious traffic from distributed, heterogeneous monitoring sensors. In order to utilise the data exposed by the near-realtime Distributed Sensor Network modules an Automated Reconnaissance Framework was created, this framework was tasked with active and passive information collection and analysis of data in near-realtime and was designed from an adapted Multi Sensor Data Fusion model. The hypothesis was made that if sufficiently different characteristics of a host could be identified; combined they could act as a unique fingerprint for that host, potentially allowing for the re-identification of that host, even if its IP address had changed. To this end the concept of Latency Based Multilateration was introduced, acting as an additional metric for remote host fingerprinting. The vast amount of information gathered by the AR-Framework required the development of visualisation tools which could illustrate this data in near-realtime and also provided various degrees of interaction to accommodate human interpretation of such data. Ultimately the data collected through the application of the near-realtime Distributed Sensor Network and AR-Framework provided a unique perspective of a malicious host demographic. Allowing for new correlations to be drawn between attributes such as common open ports and operating systems, location, and inferred intent of these malicious hosts. The result of which expands our current understanding of malicious hosts on the Internet and enables further research in the area

    Similar works