14 research outputs found
Exploring Alternative Approaches for TwitterForensics: Utilizing Social Network Analysis to Identify Key Actors and Potential Suspects
SNA (Social Network Analysis) is a modeling method for users which is symbolized by points (nodes) and interactions between users are represented by lines (edges). This method is needed to see patterns of social interaction in the network starting with finding out who the key actors are. The novelty of this study lies in the expansion of the analysis of other suspects, not only key actors identified during this time. This method performs a narrowed network mapping by examining only nodes connected to key actors. Secondary key actors no longer use centrality but use weight indicators at the edges. A case study using the hashtag "Manchester United" on the social media platform Twitter was conducted in the study. The results of the Social Network Analysis (SNA) revealed that @david_ornstein accounts are key actors with centrality of 2298 degrees. Another approach found @hadrien_grenier, @footballforall, @theutdjournal accounts had a particularly high intensity of interaction with key actors. The intensity of communication between secondary actors and key actors is close to or above the weighted value of 50. The results of this analysis can be used to suspect other potential suspects who have strong ties to key actors by looking.SNA (Social Network Analysis) is a modeling method for users which is symbolized by points (nodes) and interactions between users are represented by lines (edges). This method is needed to see patterns of social interaction in the network starting with finding out who the key actors are. The novelty of this study lies in the expansion of the analysis of other suspects, not only key actors identified during this time. This method performs a narrowed network mapping by examining only nodes connected to key actors. Secondary key actors no longer use centrality but use weight indicators at the edges. A case study using the hashtag "Manchester United" on the social media platform Twitter was conducted in the study. The results of the Social Network Analysis (SNA) revealed that @david_ornstein accounts are key actors with centrality of 2298 degrees. Another approach found @hadrien_grenier, @footballforall, @theutdjournal accounts had a particularly high intensity of interaction with key actors. The intensity of communication between secondary actors and key actors is close to or above the weighted value of 50. The results of this analysis can be used to suspect other potential suspects who have strong ties to key actors by looking
Recommended from our members
Detecting Traffic Snooping in Anonymity Networks Using Decoys
Anonymous communication networks like Tor partially protect the confidentiality of their users' traffic by encrypting all intra-overlay communication. However, when the relayed traffic reaches the boundaries of the overlay network towards its actual destination, the original user traffic is inevitably exposed. At this point, unless end-to-end encryption is used, sensitive user data can be snooped by a malicious or compromised exit node, or by any other rogue network entity on the path towards the actual destination. We explore the use of decoy traffic for the detection of traffic interception on anonymous proxying systems. Our approach is based on the injection of traffic that exposes bait credentials for decoy services that require user authentication. Our aim is to entice prospective eavesdroppers to access decoy accounts on servers under our control using the intercepted credentials. We have deployed our prototype implementation in the Tor network using decoy IMAP and SMTP servers. During the course of six months, our system detected eight cases of traffic interception that involved eight different Tor exit nodes. We provide a detailed analysis of the detected incidents, discuss potential improvements to our system, and outline how our approach can be extended for the detection of HTTP session hijacking attacks
Analyzing and Enhancing Routing Protocols for Friend-to-Friend Overlays
The threat of surveillance by governmental and industrial parties is more eminent than ever. As communication moves into the digital domain, the advances in automatic assessment and interpretation of enormous amounts of data enable tracking of millions of people, recording and monitoring their private life with an unprecedented accurateness. The knowledge of such an all-encompassing loss of privacy affects the behavior of individuals, inducing various degrees of (self-)censorship and anxiety. Furthermore, the monopoly of a few large-scale organizations on digital communication enables global censorship and manipulation of public opinion. Thus, the current situation undermines the freedom of speech to a detrimental degree and threatens the foundations of modern society.
Anonymous and censorship-resistant communication systems are hence of utmost importance to circumvent constant surveillance. However, existing systems are highly vulnerable to infiltration and sabotage. In particular, Sybil attacks, i.e., powerful parties inserting a large number of fake identities into the system, enable malicious parties to observe and possibly manipulate a large fraction of the communication within the system. Friend-to-friend (F2F) overlays, which restrict direct communication to parties sharing a real-world trust relationship, are a promising countermeasure to Sybil attacks, since the requirement of establishing real-world trust increases the cost of infiltration drastically. Yet, existing
F2F overlays suffer from a low performance, are vulnerable to denial-of-service attacks, or fail to provide anonymity.
Our first contribution in this thesis is concerned with an in-depth analysis of the concepts underlying the design of state-of-the-art F2F overlays. In the course of this analysis, we first extend the existing evaluation methods considerably, hence providing tools for both our and future research in the area of F2F overlays and distributed systems in general. Based on the novel methodology, we prove that existing approaches are inherently unable to offer acceptable delays without either requiring exhaustive maintenance costs or enabling denial-of-service attacks and de-anonymization.
Consequentially, our second contribution lies in the design and evaluation of a novel concept for F2F overlays based on insights of the prior in-depth analysis. Our previous analysis has revealed that greedy embeddings allow highly efficient communication in arbitrary connectivity-restricted overlays by addressing participants through coordinates and adapting these coordinates to the overlay structure. However, greedy embeddings in their original form reveal the identity of the communicating parties and fail to provide the necessary resilience in the presence of dynamic and possibly malicious users. Therefore, we present a privacy-preserving communication protocol for greedy embeddings based on anonymous return addresses rather than identifying node coordinates. Furthermore, we enhance the communicationâs robustness and attack-resistance by using multiple parallel embeddings and alternative algorithms for message delivery. We show that our approach achieves a low communication complexity.
By replacing the coordinates with anonymous addresses, we furthermore provably achieve anonymity in the form of plausible deniability against an internal local adversary. Complementary, our simulation study on real-world data indicates that our approach is highly efficient and effectively mitigates the impact of failures as well as powerful denial-of-service attacks. Our fundamental results open new possibilities for anonymous and censorship-resistant applications.Die Bedrohung der Ăberwachung durch staatliche oder kommerzielle Stellen ist ein drĂ€ngendes Problem der modernen Gesellschaft. Heutzutage findet Kommunikation vermehrt ĂŒber digitale KanĂ€le statt. Die so verfĂŒgbaren Daten ĂŒber das Kommunikationsverhalten eines GroĂteils der Bevölkerung in Kombination mit den Möglichkeiten im Bereich der automatisierten Verarbeitung solcher Daten erlauben das groĂflĂ€chige Tracking von Millionen an Personen, deren Privatleben mit noch nie da gewesener Genauigkeit aufgezeichnet und beobachtet werden kann. Das Wissen ĂŒber diese allumfassende Ăberwachung verĂ€ndert das individuelle Verhalten und fĂŒhrt so zu (Selbst-)zensur sowie Ăngsten. Des weiteren ermöglicht die Monopolstellung einiger weniger Internetkonzernen globale Zensur und Manipulation der öffentlichen Meinung. Deshalb stellt die momentane Situation eine drastische EinschrĂ€nkung der Meinungsfreiheit dar und bedroht die Grundfesten der modernen Gesellschaft.
Systeme zur anonymen und zensurresistenten Kommunikation sind daher von ungemeiner Wichtigkeit. Jedoch sind die momentanen System anfĂ€llig gegen Sabotage. Insbesondere ermöglichen es Sybil-Angriffe, bei denen ein Angreifer eine groĂe Anzahl an gefĂ€lschten Teilnehmern in ein System einschleust und so einen groĂen Teil der Kommunikation kontrolliert, Kommunikation innerhalb eines solchen Systems zu beobachten und zu manipulieren. F2F Overlays dagegen erlauben nur direkte Kommunikation zwischen Teilnehmern, die eine Vertrauensbeziehung in der realen Welt teilen. Dadurch erschweren F2F Overlays das Eindringen von Angreifern in das System entscheidend und verringern so den Einfluss von Sybil-Angriffen. Allerdings leiden die existierenden F2F Overlays an geringer LeistungsfĂ€higkeit, AnfĂ€lligkeit gegen Denial-of-Service Angriffe oder fehlender AnonymitĂ€t.
Der erste Beitrag dieser Arbeit liegt daher in der fokussierten Analyse der Konzepte, die in den momentanen F2F Overlays zum Einsatz kommen. Im Zuge dieser Arbeit erweitern wir zunĂ€chst die existierenden Evaluationsmethoden entscheidend und erarbeiten so Methoden, die Grundlagen fĂŒr unsere sowie zukĂŒnftige Forschung in diesem Bereich bilden. Basierend auf diesen neuen Evaluationsmethoden zeigen wir, dass die existierenden AnsĂ€tze grundlegend nicht fĂ€hig sind, akzeptable Antwortzeiten bereitzustellen ohne im Zuge dessen enorme Instandhaltungskosten oder AnfĂ€lligkeiten gegen Angriffe in Kauf zu nehmen.
Folglich besteht unser zweiter Beitrag in der Entwicklung und Evaluierung eines neuen Konzeptes fĂŒr F2F Overlays, basierenden auf den Erkenntnissen der vorangehenden Analyse. Insbesondere ergab sich in der vorangehenden Evaluation, dass Greedy Embeddings hoch-effiziente Kommunikation erlauben indem sie Teilnehmer durch Koordinaten adressieren und diese an die Struktur des Overlays anpassen. Jedoch sind Greedy Embeddings in ihrer ursprĂŒnglichen Form nicht auf anonyme Kommunikation mit einer dynamischen Teilnehmermengen und potentiellen Angreifern ausgelegt. Daher prĂ€sentieren wir ein PrivĂ€tssphĂ€re-schĂŒtzenden Kommunikationsprotokoll fĂŒr F2F Overlays, in dem die identifizierenden Koordinaten durch anonyme Adressen ersetzt werden.
Des weiteren erhöhen wir die Resistenz der Kommunikation durch den Einsatz mehrerer Embeddings und alternativer Algorithmen zum Finden von Routen. Wir beweisen, dass unser Ansatz eine geringe KommunikationskomplexitÀt im Bezug auf die eigentliche Kommunikation sowie die Instandhaltung des Embeddings aufweist. Ferner zeigt unsere Simulationstudie, dass der Ansatz effiziente Kommunikation mit kurzen Antwortszeiten und geringer Instandhaltungskosten erreicht sowie den Einfluss von AusfÀlle und Angriffe erfolgreich abschwÀcht. Unsere grundlegenden Ergebnisse eröffnen neue Möglichkeiten in der Entwicklung anonymer und zensurresistenter Anwendungen
A systematic survey of online data mining technology intended for law enforcement
As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies
A taxonomy of network threats and the effect of current datasets on intrusion detection systems
As the world moves towards being increasingly dependent on computers and automation, building secure applications, systems and networks are some of the main challenges faced in the current decade. The number of threats that individuals and businesses face is rising exponentially due to the increasing complexity of networks and services of modern networks. To alleviate the impact of these threats, researchers have proposed numerous solutions for anomaly detection; however, current tools often fail to adapt to ever-changing architectures, associated threats and zero-day attacks. This manuscript aims to pinpoint research gaps and shortcomings of current datasets, their impact on building Network Intrusion Detection Systems (NIDS) and the growing number of sophisticated threats. To this end, this manuscript provides researchers with two key pieces of information; a survey of prominent datasets, analyzing their use and impact on the development of the past decade's Intrusion Detection Systems (IDS) and a taxonomy of network threats and associated tools to carry out these attacks. The manuscript highlights that current IDS research covers only 33.3% of our threat taxonomy. Current datasets demonstrate a clear lack of real-network threats, attack representation and include a large number of deprecated threats, which together limit the detection accuracy of current machine learning IDS approaches. The unique combination of the taxonomy and the analysis of the datasets provided in this manuscript aims to improve the creation of datasets and the collection of real-world data. As a result, this will improve the efficiency of the next generation IDS and reflect network threats more accurately within new datasets
Recommended from our members
Traffic Analysis Attacks and Defenses in Low Latency Anonymous Communication
The recent public disclosure of mass surveillance of electronic communication, involving powerful government authorities, has drawn the public's attention to issues regarding Internet privacy. For almost a decade now, there have been several research efforts towards designing and deploying open source, trustworthy and reliable systems that ensure users' anonymity and privacy. These systems operate by hiding the true network identity of communicating parties against eavesdropping adversaries. Tor, acronym for The Onion Router, is an example of such a system. Such systems relay the traffic of their users through an overlay of nodes that are called Onion Routers and are operated by volunteers distributed across the globe. Such systems have served well as anti-censorship and anti-surveillance tools. However, recent publications have disclosed that powerful government organizations are seeking means to de-anonymize such systems and have deployed distributed monitoring infrastructure to aid their efforts.
Attacks against anonymous communication systems, like Tor, often involve trac analysis. In such attacks, an adversary, capable of observing network traffic statistics in several different networks, correlates the trac patterns in these networks, and associates otherwise seemingly unrelated network connections. The process can lead an adversary to the source of an anonymous connection. However, due to their design, consisting of globally distributed relays, the users of anonymity networks like Tor, can route their traffic virtually via any network; hiding their tracks and true identities from their communication peers and eavesdropping adversaries. De-anonymization of a random anonymous connection is hard, as the adversary is required to correlate traffic patterns in one network link to those in virtually all other networks. Past research mostly involved reducing the complexity of this process by rst reducing the set of relays or network routers to monitor, and then identifying the actual source of anonymous traffic among network connections that are routed via this reduced set of relays or network routers to monitor. A study of various research efforts in this field reveals that there have been many more efforts to reduce the set of relays or routers to be searched than to explore methods for actually identifying an anonymous user amidst the network connections using these routers and relays. Few have tried to comprehensively study a complete attack, that involves reducing the set of relays and routers to monitor and identifying the source of an anonymous connection. Although it is believed that systems like Tor are trivially vulnerable to traffic analysis, there are various technical challenges and issues that can become obstacles to accurately identifying the source of anonymous connection. It is hard to adjudge the vulnerability of anonymous communication systems without adequately exploring the issues involved in identifying the source of anonymous traffic.
We take steps to ll this gap by exploring two novel active trac analysis attacks, that solely rely on measurements of network statistics. In these attacks, the adversary tries to identify the source of an anonymous connection arriving to a server from an exit node. This generally involves correlating traffic entering and leaving the Tor network, linking otherwise unrelated connections. To increase the accuracy of identifying the victim connection among several connections, the adversary injects a traffic perturbation pattern into a connection arriving to the server from a Tor node, that the adversary wants to de-anonymize. One way to achieve this is by colluding with the server and injecting a traffic perturbation pattern using common traffic shaping tools. Our first attack involves a novel remote bandwidth estimation technique to conrm the identity of Tor relays and network routers along the path connecting a Tor client and a server by observing network bandwidth fluctuations deliberately injected by the server. The second attack involves correlating network statistics, for connections entering and leaving the Tor network, available from existing network infrastructure, such as Cisco's NetFlow, for identifying the source of an anonymous connection. Additionally, we explored a novel technique to defend against the latter attack. Most research towards defending against traffic analysis attacks, involving transmission of dummy traffic, have not been implemented due to fears of potential performance degradation. Our novel technique involves transmission of dummy traffic, consisting of packets with IP headers having small Time-to-Live (TTL) values. Such packets are discarded by the routers before they reach their destination. They distort NetFlow statistics, without degrading the client's performance. Finally, we present a strategy that employs transmission of unique plain-text decoy traffic, that appears sensitive, such as fake user credentials, through Tor nodes to decoy servers under our control. Periodic tallying of client and server logs to determine unsolicited connection attempts at the server is used to identify the eavesdropping nodes. Such malicious Tor node operators, eavesdropping on users' traffic, could be potential traffic analysis attackers
A taxonomy of network threats and the effect of current datasets on intrusion detection systems
As the world moves towards being increasingly dependent on computers and automation, building secure applications, systems and networks are some of the main challenges faced in the current decade. The number of threats that individuals and businesses face is rising exponentially due to the increasing complexity of networks and services of modern networks. To alleviate the impact of these threats, researchers have proposed numerous solutions for anomaly detection; however, current tools often fail to adapt to ever-changing architectures, associated threats and zero-day attacks. This manuscript aims to pinpoint research gaps and shortcomings of current datasets, their impact on building Network Intrusion Detection Systems (NIDS) and the growing number of sophisticated threats. To this end, this manuscript provides researchers with two key pieces of information; a survey of prominent datasets, analyzing their use and impact on the development of the past decadeâs Intrusion Detection Systems (IDS) and a taxonomy of network threats and associated tools to carry out these attacks. The manuscript highlights that current IDS research covers only 33.3% of our threat taxonomy. Current datasets demonstrate a clear lack of real-network threats, attack representation and include a large number of deprecated threats, which together limit the detection accuracy of current machine learning IDS approaches. The unique combination of the taxonomy and the analysis of the datasets provided in this manuscript aims to improve the creation of datasets and the collection of real-world data. As a result, this will improve the efficiency of the next generation IDS and reflect network threats more accurately within new datasets
Recommended from our members
INFERENCE-BASED FORENSICS FOR EXTRACTING INFORMATION FROM DIVERSE SOURCES
Digital forensics is tasked with the examination and extraction of evidence from a diverse set of devices and information sources. While digital forensics has long been synonymous with file recovery, this label no longer adequately describes the scienceâs role in modern investigations. Spurred by evolving technologies and online crime, law enforcement is shifting the focus of digital forensics from its traditional role in the final stages of an investigation to assisting investigators in the earliest phases â often before a suspect has been identified and a warrant served. Investigators need new forensic techniques to investigate online crimes, such as child pornography trafficking on peer-to-peer networks (p2p), and to extract evidence from new information sources, such as mobile phones. The traditional approach of developing tools tailored specifically to each source is no longer tenable given the diversity, volume of storage, and introduction rate of new devices and network applications. Instead, we propose the adoption of flexible, inference-based techniques to extract evidence from any format. Such techniques can be readily applied to a wide variety of different evidence sources without requiring significant manual work on the investigatorâs part. The primary contribution of my dissertation is a set of novel forensic techniques for extracting information from diverse data sources. We frame the evaluation using two different, but increasingly important, forensic scenarios: mobile phone triage and network-based investigations.
Via probabilistic descriptions of typical data structures, and using a classic dynamic programming algorithm, our phone triage techniques are able to identify user information in phones across varied models and manufacturers. We also show how to incorporate feedback from the investigator to improve the usability of extracted information.
For network-based investigations, we quantify and characterize the extent of contraband trafficking on peer-to-peer networks. We suggest various techniques for prioritizing law enforcementâs limited resources. We finally investigate techniques that use system logs to generate and then analyze a finite state model of a protocolâs implementation. The objective is to infer behavior that an investigator can leverage to further law enforcement objectives.
We evaluate all of our techniques using the real-world legal constraints and restrictions of investigators
Are Public Intrusion Datasets Fit for Purpose: Characterising the State of the Art in Intrusion Event Datasets
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.In recent years cybersecurity attacks have caused major disruption and information loss for online organisations, with high profile incidents in the news. One of the key challenges in advancing the state of the art in intrusion detection is the lack of representative datasets. These datasets typically contain millions of time-ordered events (e.g. network packet traces, flow summaries, log entries); subsequently analysed to identify abnormal behavior and specific attacks [1]. Generating realistic datasets has historically required expensive networked assets, specialised traffic generators, and considerable design preparation. Even with advances in virtualisation it remains challenging to create and maintain a representative environment.
Major improvements are needed in the design, quality and availability of datasets, to assist researchers in developing advanced detection techniques. With the emergence of new technology paradigms, such as intelligent transport and autonomous vehicles, it is also likely that new classes of threat will emerge [2]. Given the rate of change in threat behavior [3] datasets become quickly obsolete, and some of the most widely cited datasets date back over two decades. Older datasets have limited value: often heavily filtered and anonymised, with unrealistic event distributions, and opaque design methodology.
The relative scarcity of (Intrusion Detection System) IDS datasets is compounded by the lack of a central registry, and inconsistent information on provenance. Researchers may also find it hard to locate datasets or understand their relative merits. In addition, many datasets rely on simulation, originating from academic or government institutions. The publication process itself often creates conflicts, with the need to de-identify sensitive information in order to meet regulations such as General Data Protection Act (GDPR) [4]. Another final issue for researchers is the lack of standardised metrics with which to compare dataset quality.
In this paper we attempt to classify the most widely used public intrusion datasets, providing references to archives and associated literature. We illustrate their relative utility and scope, highlighting the threat composition, formats, special features, and associated limitations. We identify best practice in dataset design, and describe potential pitfalls of designing anomaly detection techniques based on data that may be either inappropriate, or compromised due to unrealistic threat coverage. Such contributions as made in this paper is expected to facilitate continuous research and development for effectively combating the constantly evolving cyber threat landscape