30 research outputs found

    Improving mining efficiency: A new scheme for extracting association rules

    Get PDF
    In the age of information technology, the amount of accumulated data is tremendous. Extracting the association rule from this data is one of the important tasks in data mining.Most of the existing association rules in algorithms typically assume that the data set can fit in the memory.In this paper, we propose a practical and effective scheme to mine association rules from frequent patterns, called Prefixfoldtree scheme (PFT scheme).The original dataset is divided into folds, and then from each fold the frequent patterns are mined by using the tree projection approach.These frequent patterns are combined into one set and finally interestingness constraints are used to extract the association rules.The experiments will be conducted to illustrate the efficiency of our scheme

    An Approach of Data Mining Techniques Using Firewall Detection for Security and Event Management System

    Full text link
    Security is one of the most important issues to force a lot of research and development effort in last decades. We are introduced a mining technique like firewall detection and frequent item set selection to enhance the system security in event management system. In addition, we are increasing the deduction techniques we have try to overcome attackers in data mining rules using our SIEM project. In proposed work to leverages to significantly improve attack detection and mitigate attack consequences. And also we proposed approach in an advanced decision-making system that supports domain expert’s targeted events based on the individuality of the exposed IWIs. Furthermore, the application of different aggregation functions besides minimum and maximum of the item sets. Frequent and infrequent weighted item sets represent correlations frequently holding the data in which items may weight differently. However, we need is discovering the rare or frequent data correlations, cost function would get minimized using data mining techniques. There are many issues discovering rare data like processing the larger data, it takes more for process. Not applicable to discovering data like minimum of certain values. We need to handle the issue of discovering rare and weighted item sets, the frequent weighted itemset (WI) mining problem. Two novel quality measures are proposed to drive the WI mining process and Minimal WI mining efficiently in SIEM system

    Towards a theory unifying implicative interestingness measures and critical values consideration in MGK

    Get PDF
    The present paper shows the possibility and the benefit to compute statistical freshold for the so-called Guillaume-Kenchaff interestingness measure MGK of association rule and compares it with other measures as Confidence, Lift and Lovinger’s one. Afterwards, it proposes a theory of normalized interestingness measure unifying a set of rule quality measures in a binary context and being surprisingly centered on MGK

    Measuring Positive and Negative Association of Apriori Algorithm with Cosine Correlation Analysis

    Get PDF
    يهدف هذا العمل إلى معرفة قواعد الارتباط الإيجابية وقواعد الارتباط السلبية في خوارزمية (Apriori) باستخدام تحليل ارتباط جيب التمام. يتم تطبيق الخوارزمية الافتراضية وخوارزمية استخراج قواعد الارتباط المعدلة على قاعدة بيانات الفطر لمعرفة الفرق في النتائج. أظهرت النتائج التجريبية أن خوارزمية استخراج قواعد الارتباط المعدلة يمكن أن تولد قواعد ارتباط سلبية. وتعطي إضافة تحليل ارتباط جيب التمام قدرًا أصغر من قواعد الارتباط عما هو من كميات خوارزمية استخراج قواعد الارتباط الافتراضية. من خلال قواعد الارتباط العشرة الأولى ، يمكن ملاحظة وجود قواعد مختلفة بين الخوارزمية الافتراضية وخوارزمية Apriori المعدلة. إن اختلاف القواعد التي تم الحصول عليها من قواعد الارتباط الإيجابية وقواعد الارتباط السلبية يقوي بعضها البعض بدرجة جيدة جدًا.This work aims to see the positive association rules and negative association rules in the Apriori algorithm by using cosine correlation analysis. The default and the modified Association Rule Mining algorithm are implemented against the mushroom database to find out the difference of the results. The experimental results showed that the modified Association Rule Mining algorithm could generate negative association rules. The addition of cosine correlation analysis returns a smaller amount of association rules than the amounts of the default Association Rule Mining algorithm. From the top ten association rules, it can be seen that there are different rules between the default and the modified Apriori algorithm. The difference of the obtained rules from positive association rules and negative association rules strengthens to each other with a pretty good confidence score

    On the Selection of Meaningful Association Rules

    Get PDF

    Applying negative rule mining to improve genome annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items.</p> <p>Results</p> <p>Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower.</p> <p>Conclusion</p> <p>Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.</p

    Techniques for improving the labelling process of sentiment analysis in the Saudi stock market

    Get PDF
    Sentiment analysis is utilised to assess users' feedback and comments. Recently, researchers have shown an increased interest in this topic due to the spread and expansion of social networks. Users' feedback and comments are written in unstructured formats, usually with informal language, which presents challenges for sentiment analysis. For the Arabic language, further challenges exist due to the complexity of the language and no sentiment lexicon is available. Therefore, labelling carried out by hand can lead to mislabelling and misclassification. Consequently, inaccurate classification creates the need to construct a relabelling process for Arabic documents to remove noise in labelling. The aim of this study is to improve the labelling process of the sentiment analysis. Two approaches were utilised. First, a neutral class was added to create a framework of reliable Twitter tweets with positive, negative, or neutral sentiments. The second approach was improving the labelling process by relabelling. In this study, the relabelling process applied to only seven random features (positive or negative): "earnings" (Arabic source), "losses" (Arabic source), "green colour" (Arabic source:Arabic source), "growing" (Arabic source), "distribution" (Arabic source), "decrease" (Arabic source), "financial penalty" (Arabic source), and "delay" (Arabic source). Of the 48 tweets documented and examined, 20 tweets were relabelled and the classification error was reduced by 1.34%
    corecore