57 research outputs found
Predictive Cyber Situational Awareness and Personalized Blacklisting: A Sequential Rule Mining Approach
Cybersecurity adopts data mining for its ability to extract concealed and indistinct patterns in the data, such as for the needs of alert correlation. Inferring common attack patterns and rules from the alerts helps in understanding the threat landscape for the defenders and allows for the realization of cyber situational awareness, including the projection of ongoing attacks. In this paper, we explore the use of data mining, namely sequential rule mining, in the analysis of intrusion detection alerts. We employed a dataset of 12 million alerts from 34 intrusion detection systems in 3 organizations gathered in an alert sharing platform, and processed it using our analytical framework. We execute the mining of sequential rules that we use to predict security events, which we utilize to create a predictive blacklist. Thus, the recipients of the data from the sharing platform will receive only a small number of alerts of events that are likely to occur instead of a large number of alerts of past events. The predictive blacklist has the size of only 3 % of the raw data, and more than 60 % of its entries are shown to be successful in performing accurate predictions in operational, real-world settings
Malicious Changeload for the Resilience Evaluation of Self-adaptive Authorisation Infrastructures
Self-adaptive systems are able to modify their behaviour and/or structure in response to changes that occur to the system, its environment, or even its goals. In terms of authorisation infrastructures, self-adaptation has shown to be a promising solution for enforcing access control policies and subject access privileges when mitigating insider threat. This paper describes the resilience evaluation of a self-adaptive authorisation infrastructure by simulating a case study related to insider threats. As part of this evaluation, a malicious changeload has been formally defined in order to describe scenarios of abuse in access control. This malicious changeload was then used to stimulate self-adaptation within a federated authorisation infrastructure.
The evaluation confirmed the resilience of a self-adaptive authorisation infrastructure in handling abuse of access under repeatable conditions by consistently mitigating abuse under normal and high loads. The evaluation has also shown that self-adaptation had a minimal impact on the authorisation infrastructure, even when adapting authorisation policies while mitigating abuse of access
BiobankUniverse:Automatic matchmaking between datasets for biobank data discovery and integration
Motivation: Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions. Results: To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion. The result is BiobankUniverse, a fast matchmaking service for biobanks and researchers. Biobankers upload their data elements and researchers their desired study variables, BiobankUniverse automatically shortlists matching attributes between them. Users can quickly explore matching potential and search for biobanks/data elements matching their research. They can also curate matches and define personalized data-universes
Toward High Performance Computing Education
High Performance Computing (HPC) is the ability to process data and perform complex calculations at extremely high speeds. Current HPC platforms can achieve calculations on the order of quadrillions of calculations per second with quintillions on the horizon. The past three decades witnessed a vast increase in the use of HPC across different scientific, engineering and business communities, for example, sequencing the genome, predicting climate changes, designing modern aerodynamics, or establishing customer preferences. Although HPC has been well incorporated into science curricula such as bioinformatics, the same cannot be said for most computing programs. This working group will explore how HPC can make inroads into computer science education, from the undergraduate to postgraduate levels. The group will address research questions designed to investigate topics such as identifying and handling barriers that inhibit the adoption of HPC in educational environments, how to incorporate HPC into various curricula, and how HPC can be leveraged to enhance applied critical thinking and problem solving skills. Four deliverables include: (1) a catalog of core HPC educational concepts, (2) HPC curricula for contemporary computing needs, such as in artificial intelligence, cyberanalytics, data science and engineering, or internet of things, (3) possible infrastructures for implementing HPC coursework, and (4) HPC-related feedback to the CC2020 project
Energy efficiency of large scale graph processing platforms
A number of graph processing platforms have emerged recently as a result of the growing demand on graph data analytics with complex and large-scale graph structured datasets. These platforms have been tailored for iterative graph computations and can offer an order of magnitude performance gain over generic data-flow frameworks like Apache Hadoop and Spark. Nevertheless, the increasing availability of such platforms and their functionality overlap necessitates a comparative study on various aspects of the platforms, including applications, performance and energy efficiency. In this work, we focus on the energy efficiency aspect of some large scale graph processing platforms. Specifically, we select two representatives, e.g., Apache Giraph and Spark GraphX, for the comparative study. We compare and analyze the energy consumption of these two platforms with PageRank, Strongly Connected Component and Single Source Shortest Path algorithms over five different realistic graphs. Our experimental results demonstrate that GraphX outperforms Giraph in terms of energy consumption. Specifically, Giraph consumes 1.71 times more energy than GraphX on average for the mentioned algorithms
- …