10 research outputs found

    Isolation based anomaly detection: a re-examination

    No full text
    Anomalies are instances that do not conform to the norm of a dataset. They are often indicators of interesting events such as deliberate human actions, system faults, sudden changes in the environment etc. Detecting anomalies can provide information about such events. Therefore, anomaly detection is an important data mining task which is utilised in many application domains such as intrusion detection, fraud detection, detection of disease conditions, and fault diagnosis. With the improvements in data collection and processing technologies, databases are on a course of an explosive growth in both size and number of attributes. Such growth is challenging for anomaly detection approaches because of the required scale of efficiency to handle such datasets. iForest is a recently introduced anomaly detector which is unique in the literature because it uses an isolation mechanism to identify anomalies without any distance or density calculations. The core strength of iForest is its exceptional efficiency which enables it to scaleup to very large datasets. It has been shown to perform competitively with the existing state-of-the-art anomaly detectors in datasets with several attributes. This thesis re-examines iForest to identify its strengths and weaknesses in different application settings. Three key weaknesses of iForest are identified as follows: deficiency in detecting local anomalies, anomalies masked by axis parallel normal clusters, and anomalies in multi-modal datasets. A novel isolation method is designed that employs an alternative isolation mechanism. This proposed isolation mechanism employs nearest-neighbour distance to perform isolation which is designed to be capable of overcoming the identified weaknesses of iForest. Subsequently, a hybrid isolation method which combines both the proposed isolation mechanism and the isolation mechanism of iForest is designed to harness the strengths of both mechanisms. Empirical evidence is provided to show that the proposed methods can overcome the identified weaknesses of iForest and that they are also able to scaleup efficiently to datasets of a large size and with a large number of attributes. The performance with benchmark datasets shows that the proposed methods are competitive with state-of-the-art anomaly detectors

    Isolation based anomaly detection: a re-examination

    No full text
    Anomalies are instances that do not conform to the norm of a dataset. They are often indicators of interesting events such as deliberate human actions, system faults, sudden changes in the environment etc. Detecting anomalies can provide information about such events. Therefore, anomaly detection is an important data mining task which is utilised in many application domains such as intrusion detection, fraud detection, detection of disease conditions, and fault diagnosis. With the improvements in data collection and processing technologies, databases are on a course of an explosive growth in both size and number of attributes. Such growth is challenging for anomaly detection approaches because of the required scale of efficiency to handle such datasets. iForest is a recently introduced anomaly detector which is unique in the literature because it uses an isolation mechanism to identify anomalies without any distance or density calculations. The core strength of iForest is its exceptional efficiency which enables it to scaleup to very large datasets. It has been shown to perform competitively with the existing state-of-the-art anomaly detectors in datasets with several attributes. This thesis re-examines iForest to identify its strengths and weaknesses in different application settings. Three key weaknesses of iForest are identified as follows: deficiency in detecting local anomalies, anomalies masked by axis parallel normal clusters, and anomalies in multi-modal datasets. A novel isolation method is designed that employs an alternative isolation mechanism. This proposed isolation mechanism employs nearest-neighbour distance to perform isolation which is designed to be capable of overcoming the identified weaknesses of iForest. Subsequently, a hybrid isolation method which combines both the proposed isolation mechanism and the isolation mechanism of iForest is designed to harness the strengths of both mechanisms. Empirical evidence is provided to show that the proposed methods can overcome the identified weaknesses of iForest and that they are also able to scaleup efficiently to datasets of a large size and with a large number of attributes. The performance with benchmark datasets shows that the proposed methods are competitive with state-of-the-art anomaly detectors

    Efficient anomaly detection by isolation using Nearest Neighbour Ensemble

    No full text
    This paper presents iNNE (isolation using Nearest Neighbour Ensemble), an efficient nearest neighbour-based anomaly detection method by isolation. Inne runs significantly faster than existing nearest neighbour-based methods such as Local Outlier Factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. Compared with the existing tree-based isolation method iForest, the proposed isolation method overcomes three weaknesses of iForest that we have identified, i.e., Its inability to detect local anomalies, anomalies with a low number of relevant attributes, and anomalies that are surrounded by normal instances

    Isolation-based anomaly detection using nearest-neighbor ensembles

    Full text link
    The first successful isolation-based anomaly detector, ie, iForest, uses trees as a means to perform isolation. Although it has been shown to have advantages over existing anomaly detectors, we have identified 4 weaknesses, ie, its inability to detect local anomalies, anomalies with a high percentage of irrelevant attributes, anomalies that are masked by axis-parallel clusters, and anomalies in multimodal data sets. To overcome these weaknesses, this paper shows that an alternative isolation mechanism is required and thus presents iNNE or isolation using Nearest Neighbor Ensemble. Although relying on nearest neighbors, iNNE runs significantly faster than the existing nearest neighbor–based methods such as the local outlier factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. © 2018 Wiley Periodicals, Inc

    Machine learning to support social media empowered patients in cancer care and cancer treatment decisions

    Get PDF
    <div><p>Background</p><p>A primary variant of social media, online support groups (OSG) extend beyond the standard definition to incorporate a dimension of advice, support and guidance for patients. OSG are complementary, yet significant adjunct to patient journeys. Machine learning and natural language processing techniques can be applied to these large volumes of unstructured text discussions accumulated in OSG for intelligent extraction of patient-reported demographics, behaviours, decisions, treatment, side effects and expressions of emotions. New insights from the fusion and synthesis of such diverse patient-reported information, as expressed throughout the patient journey from diagnosis to treatment and recovery, can contribute towards informed decision-making on personalized healthcare delivery and the development of healthcare policy guidelines.</p><p>Methods and findings</p><p>We have designed and developed an artificial intelligence based analytics framework using machine learning and natural language processing techniques for intelligent analysis and automated aggregation of patient information and interaction trajectories in online support groups. Alongside the social interactions aspect, patient behaviours, decisions, demographics, clinical factors, emotions, as subsequently expressed over time, are extracted and analysed. More specifically, we utilised this platform to investigate the impact of online social influences on the intimate decision scenario of selecting a treatment type, recovery after treatment, side effects and emotions expressed over time, using prostate cancer as a model. Results manifest the three major decision-making behaviours among patients, Paternalistic group, Autonomous group and Shared group. Furthermore, each group demonstrated diverse behaviours in post-decision discussions on clinical outcomes, advice and expressions of emotion during the twelve months following treatment. Over time, the transition of patients from information and emotional support seeking behaviours to providers of information and emotional support to other patients was also observed.</p><p>Conclusions</p><p>Findings from this study are a rigorous indication of the expectations of social media empowered patients, their potential for individualised decision-making, clinical and emotional needs. The increasing popularity of OSG further confirms that it is timely for clinicians to consider patient voices as expressed in OSG. We have successfully demonstrated that the proposed platform can be utilised to investigate, analyse and derive actionable insights from patient-reported information on prostate cancer, in support of patient focused healthcare delivery. The platform can be extended and applied just as effectively to any other medical condition.</p></div
    corecore