3 research outputs found

    D-FENS: DNS Filtering & Extraction Network System for Malicious Domain Names

    Get PDF
    While the DNS (Domain Name System) has become a cornerstone for the operation of the Internet, it has also fostered creative cases of maliciousness, including phishing, typosquatting, and botnet communication among others. To address this problem, this dissertation focuses on identifying and mitigating such malicious domain names through prior knowledge and machine learning. In the first part of this dissertation, we explore a method of registering domain names with deliberate typographical mistakes (i.e., typosquatting) to masquerade as popular and well-established domain names. To understand the effectiveness of typosquatting, we conducted a user study which helped shed light on which techniques were more successful than others in deceiving users. While certain techniques fared better than others, they failed to take the context of the user into account. Therefore, in the second part of this dissertation we look at the possibility of an advanced attack which takes context into account when generating domain names. The main idea is determining the possibility for an adversary to improve their success rate of deceiving users with specifically-targeted malicious domain names. While these malicious domains typically target users, other types of domain names are generated by botnets for command & control (C2) communication. Therefore, in the third part of this dissertation we investigate domain generation algorithms (DGA) used by botnets and propose a method to identify DGA-based domain names. By analyzing DNS traffic for certain patterns of NXDomain (non-existent domain) query responses, we can accurately predict DGA-based domain names before they are registered. Given all of these approaches to malicious domain names, we ultimately propose a system called D-FENS (DNS Filtering & Extraction Network System). D-FENS uses machine learning and prior knowledge to accurately predict unreported malicious domain names in real-time, thereby preventing Internet devices from unknowingly connecting to a potentially malicious domain name

    On Frequency Estimation and Detection of Heavy Hitters in Data Streams

    Get PDF
    A stream can be thought of as a very large set of data, sometimes even infinite, which arrives sequentially and must be processed without the possibility of being stored. In fact, the memory available to the algorithm is limited and it is not possible to store the whole stream of data which is instead scanned upon arrival and summarized through a succinct data structure in order to maintain only the information of interest. Two of the main tasks related to data stream processing are frequency estimation and heavy hitter detection. The frequency estimation problem requires estimating the frequency of each item, that is the number of times or the weight with which each appears in the stream, while heavy hitter detection means the detection of all those items with a frequency higher than a fixed threshold. In this work we design and analyze ACMSS, an algorithm for frequency estimation and heavy hitter detection, and compare it against the state of the art ASKETCH algorithm. We show that, given the same budgeted amount of memory, for the task of frequency estimation our algorithm outperforms ASKETCH with regard to accuracy. Furthermore, we show that, under the assumptions stated by its authors, ASKETCH may not be able to report all of the heavy hitters whilst ACMSS will provide with high probability the full list of heavy hitters

    Frequent Elements with Witnesses in Data Streams

    Full text link
    Detecting frequent elements is among the oldest and most-studied problems in the area of data streams. Given a stream of mm data items in {1,2,,n}\{1, 2, \dots, n\}, the objective is to output items that appear at least dd times, for some threshold parameter dd, and provably optimal algorithms are known today. However, in many applications, knowing only the frequent elements themselves is not enough: For example, an Internet router may not only need to know the most frequent destination IP addresses of forwarded packages, but also the timestamps of when these packages appeared or any other meta-data that "arrived" with the packages, e.g., their source IP addresses. In this paper, we introduce the witness version of the frequent elements problem: Given a desired approximation guarantee α1\alpha \ge 1 and a desired frequency dΔd \le \Delta, where Δ\Delta is the frequency of the most frequent item, the objective is to report an item together with at least d/αd / \alpha timestamps of when the item appeared in the stream (or any other meta-data that arrived with the items). We give provably optimal algorithms for both the insertion-only and insertion-deletion stream settings: In insertion-only streams, we show that space O~(n+dn1α)\tilde{O}(n + d \cdot n^{\frac{1}{\alpha}}) is necessary and sufficient for every integral 1αlogn1 \le \alpha \le \log n. In insertion-deletion streams, we show that space O~(ndα2)\tilde{O}(\frac{n \cdot d}{\alpha^2}) is necessary and sufficient, for every αn\alpha \le \sqrt{n}.Comment: Fixed the statement of Lemma 5.1, introduction update
    corecore