4 research outputs found

    Ranking sets of morbidities using hypergraph centrality

    Get PDF
    Multi-morbidity, the health state of having two or more concurrent chronic conditions, is becoming more common as populations age, but is poorly understood. Identifying and understanding commonly occurring sets of diseases is important to inform clinical decisions to improve patient services and outcomes. Network analysis has been previously used to investigate multi-morbidity, but a classic application only allows for information on binary sets of diseases to contribute to the graph. We propose the use of hypergraphs, which allows for the incorporation of data on people with any number of conditions, and also allows us to obtain a quantitative understanding of the centrality, a measure of how well connected items in the network are to each other, of both single diseases and sets of conditions. Using this framework we illustrate its application with the set of conditions described in the Charlson morbidity index using data extracted from routinely collected population-scale, patient level electronic health records (EHR) for a cohort of adults in Wales, UK. Stroke and diabetes were found to be the most central single conditions. Sets of diseases featuring diabetes; diabetes with Chronic Pulmonary Disease, Renal Disease, Congestive Heart Failure and Cancer were the most central pairs of diseases. We investigated the differences between results obtained from the hypergraph and a classic binary graph and found that the cen-trality of diseases such as paraplegia, which are connected strongly to a single other disease is exaggerated in binary graphs compared to hypergraphs. The measure of centrality is derived from the weighting metrics calculated for disease sets and further investigation is needed to better understand the effect of the metric used in identifying the clinical significance and ranked centrality of grouped diseases. These initial results indicate that hypergraphs can be used as a valuable tool for analysing previously poorly understood relationships and in-formation available in EHR data

    Distances between sets based on set commonality

    No full text
    We construct a new family of normalised metrics for measuring the dissimilarity of finite sets in terms of the sizes of the sets and of their intersection. The family normalises a set-based analogue of the Minkowski metric family. It is parametrised by a real variable p=1, is monotonic decreasing in p, equals the normalised set difference metric when p=1 and equals the normalised maximum difference metric in the limit p?8. These metrics are suitable for comparison of finite sets in any context. Several applications to comparison of finite graphs are described

    Mathematical models for insider threat mitigation

    Get PDF
    The world is rapidly undergoing a massive digital transformation where every human will have no choice but to rely on the confidentiality, integrity, and availability of information systems. At the same time, there are increasing numbers of malicious attackers who are ever trying to compromise information systems for financial or political gain. Given the threat landscape and its sophistication, the traditional approach of fortifying the castle will not provide sufficient protection to the information systems. This formidable threat can only be restrained by a new approach, which looks at both inwards and outwards for potential attacks. It is well established that humans are the weakest link when it comes to information security controls although the same humans are considered as the most valued assets. A trusted custodian with malicious intent can inflict an enormous damage to critical information assets. Often these attacks go unnoticed for a considerable period and will have caused irreversible damage to the organisation by the time they are discovered. In the recent past, there have been well publicised data compromises in the media which have damaged the reputations of governments and organisations and in some cases endangered human life. While some of these leaks can be classified as whistleblowing in the public interest, they are very real examples of information compromises in the context of information security. High profile leaks by Edward Snowden and Bradley (Chelsea) Manning, are perfect examples of the potential damage from an insider. Furthermore, most malicious insider activities go unnoticed or unpublicised as a damage control measure by the affected organisations. While there is lots of research and investment going into insider threat prevention, these attacks are on the rise at an alarming rate. A comprehensive study of publicly available insider threat cases, academic literature, and technical reports reveals the need for a multifaceted view of the problem. The insider threat problem can no longer be treated only as a technical data driven problem but requires the analysis of associated factors, a combination of technical and human behavioural aspects going beyond the traditional technology driven approaches. Furthermore, there is no universally agreed comprehensive feature set as the majority of the proposed models are bounded into a single threat scenario or conducted on a specific system. In order to overcome this limitation, this thesis introduces a precise user profile model integrating insider threat related parameters from technical, behavioural, psychological, and organisational paradigms. The proposed user profile model is a combination of: a comprehensive insider threat detection and prediction feature set; a collection of various techniques for feature specific user behaviour comparisons; and a framework for quantifying user behaviour as a numerical value. The unpredictability of malicious attackers and the complexity of malicious actions, necessitates the careful analysis of network, system and user parameters correlated with the insider threat problem. Also, unearthing the hidden evidence requires the analysis of an enormous amount of data generated from heterogeneous input streams. This creates a high dimensional, heterogeneous data analysis problem for distinguishing suspicious users from benign users. This creates the need to identify an appropriate means for data representation and feature extraction. Since traditional graph theory and new approaches in the field of complex networks enable the means of representing high dimensional, heterogeneous data, the feasibility of the use of graphs for data representation and feature extraction are investigated going beyond traditional data mining techniques. Unattributed graphs are introduced to represent users’ device usage data, web access data, and organisational hierarchy. A graph based feature extraction technique based on subgraphs generated on different order of neighbourhoods are introduced. A graph based approach to capture inter-user relationships using web access data is presented. Various insider threat models proposed in the literature including intrusion detection based approaches, system call based approaches, honeypot based approaches and stream mining approaches end up with high false positive rates. More recently machine learning approaches for identifying suspicious users from normal users have increased. However, the application of graph based anomaly detection techniques addressing the insider threat problem is relatively rare in the academic literature as well as uncommon in the commercial world. Therefore, we focused our attention on graph based anomaly detection techniques for differentiating suspicious users from the benign users. This thesis introduces two distinct insider threat detection frameworks. The first is a hybrid insider threat detection framework based on graph theoretic feature extraction mechanism and an unsupervised anomaly detection algorithm. The second is built on an attributed graph clustering mechanism integrated with an outlier ranking mechanism. Finally, a comprehensive theoretical and commercially viable framework for insider threat mitigation integrating user profiling, threat detection, and threat detection is introduced
    corecore