4 research outputs found

    Regular Language Distance and Entropy

    Get PDF
    This paper addresses the problem of determining the distance between two regular languages. It will show how to expand Jaccard distance, which works on finite sets, to potentially-infinite regular languages. The entropy of a regular language plays a large role in the extension. Much of the paper is spent investigating the entropy of a regular language. This includes addressing issues that have required previous authors to rely on the upper limit of Shannon\u27s traditional formulation of channel capacity, because its limit does not always exist. The paper also includes proposing a new limit based formulation for the entropy of a regular language and proves that formulation to both exist and be equivalent to Shannon\u27s original formulation (when it exists). Additionally, the proposed formulation is shown to equal an analogous but formally quite different notion of topological entropy from Symbolic Dynamics -- consequently also showing Shannon\u27s original formulation to be equivalent to topological entropy. Surprisingly, the natural Jaccard-like entropy distance is trivial in most cases. Instead, the entropy sum distance metric is suggested, and shown to be granular in certain situations

    Comparing the Effectiveness of Different Classification Techniques in Predicting DNS Tunnels

    Get PDF
    DNS is one of the most widely used protocols on the internet and is used in the translation of domain names into IP address in order to correctly route messages between computers. It presents an attractive attack vector for criminals as the service is not as closely monitored by security experts as other protocols such as HTTP or FTP. Its use as a covert means of communication has increased with the availability of tools that allow for the creation of DNS tunnels using the protocol. One of the primary motivations for using DNS tunnels is the illegal extraction of information from a company’s network. This can lead to reputational damage for the organisation and result in significant fines – particularly with the introduction of General Data Protection Regulations in the EU. Most of the research into the detection of DNS tunnels has used anomalies in the relationship between DNS requests and other protocols, or anomalies in the rate of DNS requests made over specific time periods. This study will look at the characteristics of an individual DNS requests to see how effective different classification techniques are at identifying tunnels. The different techniques selected are Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM). The effectiveness of the different techniques will be measured and compared to see if there are statistically significant differences between them using a Cochran’s Q test. The results will indicate that DT, RF and SVM, are the most effective techniques at categorising DNS requests, and that they are significantly different to the other models. Key Words: DNS Tunnel, Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, Cochran’s Q Test

    Comparing Languages and Reducing Automata Used in Network Traffic Filtering

    Get PDF
    Tato práce se zabývá porovnáváním jazyků automatů a redukcí automatů používaných při monitorování síťového provozu. Je navrženo několik přístupů pro přibližnou redukci automatů (nezachovávající jazyk) a přístup pro porovnávání jejich jazyků. Redukce jsou založeny na podaproximaci jazyka automatu, kdy dochází k odstraňování stavů nebo na nadaproximaci jazyka, kdy dochází k přidávání nových smyček (a odstranění zbytečných stavů později). Navržené metody pro přibližnou redukci a navržená pravděpodobnostní vzdálenost využívají informaci ze síťového provozu. Jsou poskytnuty formální záruky vzhledem k modelu síťového provozu, který je reprezentován pravděpodobnostním automatem. Metody byly implementovány a jejich vlastnosti byly ověřeny na automatech používaných pro filtrování síťového provozu.The focus of this thesis is the comparison of languages and the reduction of automata used in network traffic monitoring. In this work, several approaches for approximate (language non-preserving) reduction of automata and comparison of their languages are proposed. The reductions are based on either under-approximating the languages of automata by pruning their states, or over-approximating the language by introducing new self-loops (and pruning redundant states later). The proposed approximate reduction methods and the proposed probabilistic distance utilize information from a network traffic. Formal guarantees with respect to a model of network traffic, represented using a probabilistic automaton are provided. The methods were implemented and evaluated on automata used in network traffic filtering.
    corecore