4 research outputs found

    Using Visual Analytics to Discover Bot Traffic

    Get PDF
    With the advance of technology, the Internet has become a medium tool used for many malicious activities. The presence of bot traffic has increased greatly that causes significant problems for businesses and organisations, such as spam bots, scraper bots, distributed denial of service bots and adaptive bots that aim to exploit the vulnerabilities of a website. Discriminating bot traffic against legitimate flash crowds remains an open challenge to date.In order to address the above issues and enhance security awareness, this thesis proposes an interactive visual analytics system for discovering bot traffic. The system provides an interactive visualisation, with details on demand capabilities, which enables knowledge discovery from very large datasets. It enables an analyst to understand comprehensive details without being constrained by large datasets. The system has a dashboard view to represent legitimate and bot traffic by adopting Quadtree data structure and Voronoi diagrams. The main contribution of this thesis is a novel visual analytics system that is capable of discovering bot traffic.This research conducted a literature review in order to gain systematic understanding of the research area. Furthermore, the research was conducted by utilising experiment and simulation approaches. The experiment was conducted by capturing website traffic, identifying browser fingerprints, simulating bot attacks and analysing mouse dynamics, such as movements and events, of participants. Data were captured as the participants performed a list of tasks, such as responding to the banner. The data collection is transparent to the participants and only requires JavaScript to be activated on the client side. This study involved 10 participants who are familiar with the Internet. To analyse the data, Weka 3.6.10 was used to perform classification based on a training dataset. The test dataset of all participants was evaluated using a built-in decision tree algorithm. The results of classifying the test dataset were promising, and the model was able to identify ten participants and six simulated bot attacks with an accuracy of 86.67%. Finally, the visual analytics design was formulated in order to assist an analyst to discover bot presence

    Visuelle Suchanfragen auf graphbasierten Datenstrukturen

    Get PDF
    Die Menge an verfügbaren Daten nimmt stetig zu. Durch standardisierte Datenformate wird die Verknüpfung verschiedener Datenquellen und dadurch auch die Zusammenführung unterschiedlicher Datenelemente je nach Anwendungszweck ermöglicht. Dies führt wiederum zu noch umfassenderen Datenbeständen, in denen die eigentlich gewünschten Informationen teilweise nur schwer gefunden werden können. Handelt es sich bei den Daten um unstrukturierte oder gleichförmige Informationen, so beschränken sich Suchmöglichkeiten auf die Suche nach Übereinstimmungen von Mustern mit Datenelementen oder Teilen davon - beispielsweise Zeichenketten oder regulären Ausdrücken, die mit Teilen von textuellen Datenelementen übereinstimmen. In zunehmendem Maß stehen jedoch auch strukturierte Daten zur Verfügung. Bei diesen wird entweder von Anfang an zwischen unterschiedlichen Facetten pro Datenelement unterschieden, oder es wurden ursprünglich unstrukturierte Daten entsprechend angereichert. Da die einzelnen Facetten auch Verknüpfungen zu anderen Datenelementen darstellen können, entstehen hierbei Graphstrukturen, welche sich für Ansätze der facettierten Suche eignen. Eine Interoperabilität zwischen Datenquellen wird hier unter anderem über die Konzepte und Techniken des Semantic Web erreicht. Zahlreiche Arbeiten haben sich mit der Darstellung der gesamten Datenmengen als Übersicht oder von festgelegten Ausschnitten der Datenmengen im Detail auseinandergesetzt. Jedoch ist das Auffinden bestimmter Daten nach wie vor ein Problem. Die Schwierigkeit liegt dabei darin, die Suchkriterien präzise auszudrücken. Da sich zwischen den einzelnen Kriterien komplexe Zusammenhänge ergeben können, bietet sich auch hier genau wie bei der Übersicht der Datenmengen eine visuelle Darstellung an. Eine Besonderheit dieses Einsatzszenarios für Visualisierungen besteht darin, dass nicht zwangsläufig Daten vorliegen. Statt dessen muss die Visualisierung auch ohne verfügbare Daten die konzeptuelle Idee einer Suchanfrage ausdrücken. Frühere Arbeiten zu diesem Problem befassen sich mit der visuellen Repräsentation von Suchanfragen und Filterausdrücken in Bezug auf relationale Datenbanken und Objektdatenbanken. Viele neuere Arbeiten gehen vermehrt auch auf den Kontext des Semantic Webs ein. Einige dieser Konzepte sind jedoch nicht auf abstrakte Weise klar definiert. Bei komplexeren Anfragen treten zum Teil auch Skalierungsprobleme auf. Zudem wurde bisher kaum betrachtet, wie sich unterschiedliche Konzepte miteinander in Verbindung bringen lassen, um die Vorteile aus unterschiedlichen Anfragevisualisierungen nutzen zu können. Diese Dissertation adressiert die beschriebenen Probleme und stellt sechs Konzepte für die visuelle Darstellung von Suchanfragen vor. Es wird sowohl auf Visualisierungen für allgemeine Einsatzzwecke - also für die Filterung beliebiger strukturierter Informationen -, als auch für spezielle Domänen oder Arten von Informationen eingegangen. Bestehende Ansätze wurden teilweise auf die Gegebenheiten graphbasierter Datenstrukturen angepasst. Ebenso werden neue Ansätze präsentiert, die gezielt auf diese Art von Datenstrukturen ausgelegt sind. Dazu wird jeweils erörtert, inwiefern sich die Anfragevisualisierungen auch ohne Vorhandensein einer zu filternden Datensammlung einsetzen lassen. Zudem wird erklärt, wie bei Vorhandensein einer solchen eine Vorschau auf die Ergebnisse des Filtervorgangs gewährt werden kann. Abschließend werden Verbindungsmöglichkeiten der unterschiedlichen Visualisierungskonzepte präsentiert. Dieser Verbindungsansatz eignet sich dazu, beliebige Anfragevisualisierungen systematisch miteinander zu kombinieren. Mit dem Verbindungskonzept können Benutzer verschiedene Bestandteile einer Anfrage mittels unterschiedlicher Visualisierungskonzepte ausdrücken, um gleichzeitig von den Stärken unterschiedlicher Anfragevisualisierungen zu profitieren. Auf diese Weise können nun Anfragen visuell definiert und dargestellt werden, die sowohl komplexe Bedingungen als auch komplexe Zusammenhänge zwischen den Bedingungen aufweisen, ohne die visuelle Übersicht über einen dieser Aspekte zu verlieren.The total amount of available data is steadily increasing. Standardized data formats allow for connecting different data sources, which can include merging of different data items depending on the use case. This creates even more comprehensive datasets that render finding a particular piece of information difficult. If the data consist of unstructured of homogenous information, searching can only be done by matching patterns with data items or parts thereof - for instance, character strings or regular expressions that match parts of textual data items. However, the availability of structured data is increasing. This kind of data is either stored as distinct facets of each data item from the outset, or originally unstructured data has been enriched to form a structure. As each facet can indicate a link to another data item, the entire dataset forms a graph that is suitable for faceted search conepts. At this point, some interoperability across data sources can be achieved by employing Semantic Web approaches and techniques. Numerous works have attempted to visualize an overview of the entire dataset, or details of a particular excerpt of the dataset. Finding specific data remains a problem, however, as the precise specification of search criteria is difficult. As these criteria can be connected in complex ways, just like the overview of datasets, this issue lends itself to using visual representations. A special trait of this application of visualization is the possible absence of any data. Instead, the visualization must be capable of conveying the conceptual idea of a search query without displaying any data. Former works related to this problem focused on the visual representation of search queries and filter expressions for relational and object-oriented databases. More recent works increasingly address a Semantic Web context. Various of these concepts, however, lack a clear abstract definition. Also, scalability issues appear in the case of complex queries. Furthermore, little attention was paid to how to connect several concepts in order to combine advantages of different query visualizations. This dissertation considers the described problems and presents six concepts for query visualization. Both generic visualizations - that is, for filtering any kind of structured data - and domain-specific or type-specific visualizations are addressed. In part, existing approaches have been adapted to the particularities of graph-based data structures. Likewise, several new approaches specifically designed for this kind of data are presented. For each of these concepts, the necessity of a dataset is discussed. Moreover, options for providing a preview on query results from such a dataset, if available, are considered. Finally, ways for connecting the query visualization concepts are presented. This connection approach is suitable for systematically linking together arbitrary query visualizations. By means of the connection approach, users can express different parts of a query using different visualization concepts, in order to benefit from the advantages of several query visualizations at a time. Like this, queries that include complex criteria as well as complex relations between criteria can now be defined and displayed visually without losing the visual overview of any of these aspects

    A dynamic visual analytics framework for complex temporal environments

    Get PDF
    Introduction: Data streams are produced by sensors that sample an external system at a periodic interval. As the cost of developing sensors continues to fall, an increasing number of data stream acquisition systems have been deployed to take advantage of the volume and velocity of data streams. An overabundance of information in complex environments have been attributed to information overload, a state of exposure to overwhelming and excessive information. The use of visual analytics provides leverage over potential information overload challenges. Apart from automated online analysis, interactive visual tools provide significant leverage for human-driven trend analysis and pattern recognition. To facilitate analysis and knowledge discovery in the space of multidimensional big data, research is warranted for an online visual analytic framework that supports human-driven exploration and consumption of complex data streams. Method: A novel framework was developed called the temporal Tri-event parameter based Dynamic Visual Analytics (TDVA). The TDVA framework was instantiated in two case studies, namely, a case study involving a hypothesis generation scenario, and a second case study involving a cohort-based hypothesis testing scenario. Two evaluations were conducted for each case study involving expert participants. This framework is demonstrated in a neonatal intensive care unit case study. The hypothesis generation phase of the pipeline is conducted through a multidimensional and in-depth one subject study using PhysioEx, a novel visual analytic tool for physiologic data stream analysis. The cohort-based hypothesis testing component of the analytic pipeline is validated through CoRAD, a visual analytic tool for performing case-controlled studies. Results: The results of both evaluations show improved task performance, and subjective satisfaction with the use of PhysioEx and CoRAD. Results from the evaluation of PhysioEx reveals insight about current limitations for supporting single subject studies in complex environments, and areas for future research in that space. Results from CoRAD also support the need for additional research to explore complex multi-dimensional patterns across multiple observations. From an information systems approach, the efficacy and feasibility of the TDVA framework is demonstrated by the instantiation and evaluation of PhysioEx and CoRAD. Conclusion: This research, introduces the TDVA framework and provides results to validate the deployment of online dynamic visual analytics in complex environments. The TDVA framework was instantiated in two case studies derived from an environment where dynamic and complex data streams were available. The first instantiation enabled the end-user to rapidly extract information from complex data streams to conduct in-depth analysis. The second allowed the end-user to test emerging patterns across multiple observations. To both ends, this thesis provides knowledge that can be used to improve the visual analytic pipeline in dynamic and complex environments

    Real-time visual analytics for event data streams

    No full text
    Real-time analysis of data streams has become an important factor for success in many domains such as server and system administration, news analysis and finance to name just a few. Introducing real-time visual analytics into such application areas promises a lot of benefits since the rate of new incoming information often exceeds human perceptual limits when displayed linearly in raw formats such as textual lines and automatic aggregation often hides important details. This paper presents a system to tackle some of the visualization challenges when analyzing such dynamic event data streams. In particular, we introduce the Event Visualizer, which is a loosely coupled modular system for collecting, processing, analyzing and visualizing dynamic real-time event data streams. Due to the variety of different analysis tasks the system provides an extensible framework with several interactive linked visualizations to focus on different aspects of the event data stream. Data streams with logging data from a computer network are used as a case study to demonstrate the advantages of visual exploration
    corecore