134 research outputs found

    Predictive Cyber Situational Awareness and Personalized Blacklisting: A Sequential Rule Mining Approach

    Get PDF
    Cybersecurity adopts data mining for its ability to extract concealed and indistinct patterns in the data, such as for the needs of alert correlation. Inferring common attack patterns and rules from the alerts helps in understanding the threat landscape for the defenders and allows for the realization of cyber situational awareness, including the projection of ongoing attacks. In this paper, we explore the use of data mining, namely sequential rule mining, in the analysis of intrusion detection alerts. We employed a dataset of 12 million alerts from 34 intrusion detection systems in 3 organizations gathered in an alert sharing platform, and processed it using our analytical framework. We execute the mining of sequential rules that we use to predict security events, which we utilize to create a predictive blacklist. Thus, the recipients of the data from the sharing platform will receive only a small number of alerts of events that are likely to occur instead of a large number of alerts of past events. The predictive blacklist has the size of only 3 % of the raw data, and more than 60 % of its entries are shown to be successful in performing accurate predictions in operational, real-world settings

    Roses Have Thorns: Understanding the Downside of Oncological Care Delivery Through Visual Analytics and Sequential Rule Mining

    Full text link
    Personalized head and neck cancer therapeutics have greatly improved survival rates for patients, but are often leading to understudied long-lasting symptoms which affect quality of life. Sequential rule mining (SRM) is a promising unsupervised machine learning method for predicting longitudinal patterns in temporal data which, however, can output many repetitive patterns that are difficult to interpret without the assistance of visual analytics. We present a data-driven, human-machine analysis visual system developed in collaboration with SRM model builders in cancer symptom research, which facilitates mechanistic knowledge discovery in large scale, multivariate cohort symptom data. Our system supports multivariate predictive modeling of post-treatment symptoms based on during-treatment symptoms. It supports this goal through an SRM, clustering, and aggregation back end, and a custom front end to help develop and tune the predictive models. The system also explains the resulting predictions in the context of therapeutic decisions typical in personalized care delivery. We evaluate the resulting models and system with an interdisciplinary group of modelers and head and neck oncology researchers. The results demonstrate that our system effectively supports clinical and symptom research

    On the Sequential Pattern and Rule Mining in the Analysis of Cyber Security Alerts

    Get PDF
    Data mining is well-known for its ability to extract concealed and indistinct patterns in the data, which is a common task in the field of cyber security. However, data mining is not always used to its full potential among cyber security community. In this paper, we discuss usability of sequential pattern and rule mining, a subset of data mining methods, in an analysis of cyber security alerts. First, we survey the use case of data mining, namely alert correlation and attack prediction. Subsequently, we evaluate sequential pattern and rule mining methods to find the one that is both fast and provides valuable results while dealing with the peculiarities of security alerts. An experiment was performed using the dataset of real alerts from an alert sharing platform. Finally, we present lessons learned from the experiment and a comparison of the selected methods based on their performance and soundness of the results

    E-commerce Recommendation by an Ensemble of Purchase Matrices with Sequential Patterns

    Get PDF
    In E-commerce recommendation systems, integrating collaborative filtering (CF) and sequential pattern mining (SPM) of purchase history data will improve the accuracy of recommendations and mitigate limitations of using only explicit user ratings for recommendations. Existing E-commerce recommendation systems which have combined CF with some form of sequences from purchase history are those referred to as LiuRec09, ChioRec12, and HPCRec18. ChoiRec12 system, HOPE first derives implicit ratings from purchase frequency of users in transaction data which it uses to create user item rating matrix input to CF. Then, it computes the CFPP, the CF-based predicted preference of each target user_u on an item_i as its output from the CF process. Similarly, it derives sequential patterns from the historical purchase database from which it obtains the second output matrix of SPAPP, sequential pattern analysis predicted preference of each user for each item. The final predicted preference of each user for each item FPP is obtained by integrating these two matrices by giving 90\% to SPAPP and 10\% to CFPP so it can recommend items with highest ratings to users. A limitation of HOPE system is that in user item matrix of CF, it does not distinguish between purchase frequency and ratings used for CF. Also in SPM, it recommends items, regardless of whether user has purchased that item before or not. This thesis proposes an E-commerce recommendation system, SEERs (Stacking Ensemble E-commerce Recommendation system), which improves on HOPE system to make better recommendations in the following two ways: i) Learning the best minimum support for SPA, best k similar users for CF and the best weights for integrating the four used matrices. ii) Separating their two intermediate matrices of CFPP and SPAPP into four intermediate matrices of CF not purchased, SPM purchased, SPM not purchased and purchase history matrix which are obtained and merged with the better-learned parameters from (i) above. Experimental results show that by using best weights discovered in training phase, and also separating purchased and not purchased items in CF and sequential pattern mining methods, SEERS provides better precision, recall, F1 score, and accuracy compared to tested systems

    Correlating contexts and NFR conflicts from event logs

    Get PDF
    In the design of autonomous systems, it is important to consider the preferences of the interested parties to improve the user experience. These preferences are often associated with the contexts in which each system is likely to operate. The operational behavior of a system must also meet various non-functional requirements (NFRs), which can present different levels of conflict depending on the operational context. This work aims to model correlations between the individual contexts and the consequent conflicts between NFRs. The proposed approach is based on analyzing the system event logs, tracing them back to the leaf elements at the specification level and providing a contextual explanation of the system’s behavior. The traced contexts and NFR conflicts are then mined to produce Context-Context and Context-NFR conflict sequential rules. The proposed Contextual Explainability (ConE) framework uses BERT-based pre-trained language models and sequential rule mining libraries for deriving the above correlations. Extensive evaluations are performed to compare the existing state-of-the-art approaches. The best-fit solutions are chosen to integrate within the ConE framework. Based on experiments, an accuracy of 80%, a precision of 90%, a recall of 97%, and an F1-score of 88% are recorded for the ConE framework on the sequential rules that were mined

    Segmentation-Based Sequential Rules For Product Promotion Recommendations As Sales Strategy (Case Study: Dayra Store)

    Get PDF
    One of the problems in the promotion is the high cost. Identifying the customer segments that have made transactions, sellers can promote better products to potential consumers. The segmentation of potential consumers can be integrated with the products that consumers tend to buy. The relationship can be found using pattern analysis using the Association Rule Mining (ARM) method. ARM will generate rule patterns from the old transaction data, and the rules can be used for recommendations. This study uses a segmented-based sequential rule method that generates sequential rules from each customer segment to become product promotion for potential consumers. The method was tested by comparing product promotions based on rules and product promotions without based on rules. Based on the test results, the average percentage of transaction from product promotion based on rules is 2,622%, higher than the promotion with the latest products with an average rate of transactions only 0,315%. The hypothesis in each segment obtained from the sample can support the statement that product promotion in all segments based on rules can be more effective in increasing sales compared to promotions that use the latest products without using rules recommendations

    Discovering Utility-driven Interval Rules

    Full text link
    For artificial intelligence, high-utility sequential rule mining (HUSRM) is a knowledge discovery method that can reveal the associations between events in the sequences. Recently, abundant methods have been proposed to discover high-utility sequence rules. However, the existing methods are all related to point-based sequences. Interval events that persist for some time are common. Traditional interval-event sequence knowledge discovery tasks mainly focus on pattern discovery, but patterns cannot reveal the correlation between interval events well. Moreover, the existing HUSRM algorithms cannot be directly applied to interval-event sequences since the relation in interval-event sequences is much more intricate than those in point-based sequences. In this work, we propose a utility-driven interval rule mining (UIRMiner) algorithm that can extract all utility-driven interval rules (UIRs) from the interval-event sequence database to solve the problem. In UIRMiner, we first introduce a numeric encoding relation representation, which can save much time on relation computation and storage on relation representation. Furthermore, to shrink the search space, we also propose a complement pruning strategy, which incorporates the utility upper bound with the relation. Finally, plentiful experiments implemented on both real-world and synthetic datasets verify that UIRMiner is an effective and efficient algorithm.Comment: Preprint. 11 figures, 5 table

    Towards Correlated Sequential Rules

    Full text link
    The goal of high-utility sequential pattern mining (HUSPM) is to efficiently discover profitable or useful sequential patterns in a large number of sequences. However, simply being aware of utility-eligible patterns is insufficient for making predictions. To compensate for this deficiency, high-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns based on the appearance of premise sequential patterns. It has numerous applications, such as product recommendation and weather prediction. However, the existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules. To address this issue, we propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM. The proposed algorithm requires not only that each rule be correlated but also that the patterns in the antecedent and consequent of the high-utility sequential rule be correlated. The algorithm adopts a utility-list structure to avoid multiple database scans. Additionally, several pruning strategies are used to improve the algorithm's efficiency and performance. Based on several real-world datasets, subsequent experiments demonstrated that CoUSR is effective and efficient in terms of operation time and memory consumption.Comment: Preprint. 7 figures, 6 table
    • …
    corecore