62,401 research outputs found

    SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis

    Full text link
    In this paper, we propose a novel approach, called SENATUS, for joint traffic anomaly detection and root-cause analysis. Inspired from the concept of a senate, the key idea of the proposed approach is divided into three stages: election, voting and decision. At the election stage, a small number of \nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{, which are used} to represent approximately the total (usually huge) set of traffic flows. In the voting stage, anomaly detection is applied on the senator flows and the detected anomalies are correlated to identify the most possible anomalous time bins. Finally in the decision stage, a machine learning technique is applied to the senator flows of each anomalous time bin to find the root cause of the anomalies. We evaluate SENATUS using traffic traces collected from the Pan European network, GEANT, and compare against another approach which detects anomalies using lossless compression of traffic histograms. We show the effectiveness of SENATUS in diagnosing anomaly types: network scans and DoS/DDoS attacks

    Automatic domain ontology extraction for context-sensitive opinion mining

    Get PDF
    Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline

    Market basket analysis of library circulation data

    Get PDF
    “Market Basket Analysis” algorithms have recently seen widespread use in analyzing consumer purchasing patterns-specifically, in detecting products that are frequently purchased together. We apply the Apriori market basket analysis tool to the task of detecting subject classification categories that co-occur in transaction records of book borrowed form a university library. This information can be useful in directing users to additional portions of the collection that may contain documents relevant to their information need, and in determining a library’s physical layout. These results can also provide insight into the degree of “scatter” that the classification scheme induces in a particular collection of documents

    Improving Recommendation Novelty Based on Topic Taxonomy

    Get PDF
    Clustering has been a widely applied approach to improve the computation efficiency of collaborative filtering based recommendation systems. Many techniques have been suggested to discover the item-to-item, user-to- user, and item-to-user associations within user clusters. However, there are few systems utilize the cluster based topic-to-topic associations to make recommendations. This paper suggests a taxonomy-based recommender system that utilizes cluster based topic-to-topic associations to improve its recommendation quality and novelty

    Evaluation and optimization of frequent association rule based classification

    Get PDF
    Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. In this paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task
    • 

    corecore