62,401 research outputs found
SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis
In this paper, we propose a novel approach, called SENATUS, for joint traffic
anomaly detection and root-cause analysis. Inspired from the concept of a
senate, the key idea of the proposed approach is divided into three stages:
election, voting and decision. At the election stage, a small number of
\nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{,
which are used} to represent approximately the total (usually huge) set of
traffic flows. In the voting stage, anomaly detection is applied on the senator
flows and the detected anomalies are correlated to identify the most possible
anomalous time bins. Finally in the decision stage, a machine learning
technique is applied to the senator flows of each anomalous time bin to find
the root cause of the anomalies. We evaluate SENATUS using traffic traces
collected from the Pan European network, GEANT, and compare against another
approach which detects anomalies using lossless compression of traffic
histograms. We show the effectiveness of SENATUS in diagnosing anomaly types:
network scans and DoS/DDoS attacks
Automatic domain ontology extraction for context-sensitive opinion mining
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizationsâ business strategy development and individual consumersâ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
Market basket analysis of library circulation data
âMarket Basket Analysisâ algorithms have recently seen widespread use in analyzing consumer purchasing patterns-specifically, in detecting products that are frequently purchased together. We apply the Apriori market basket analysis tool to the task of detecting subject classification categories that co-occur in transaction records of book borrowed form a university library. This information can be useful in directing users to additional portions of the collection that may contain documents relevant to their information need, and in determining a libraryâs physical layout. These results can also provide insight into the degree of âscatterâ that the classification scheme induces in a particular collection of documents
Improving Recommendation Novelty Based on Topic Taxonomy
Clustering has been a widely applied approach to improve the computation efficiency of collaborative filtering based recommendation systems. Many techniques have been suggested to discover the item-to-item, user-to- user, and item-to-user associations within user clusters. However, there are few systems utilize the cluster based topic-to-topic associations to make recommendations. This paper suggests a taxonomy-based recommender system that utilizes cluster based topic-to-topic associations to improve its recommendation quality and novelty
Evaluation and optimization of frequent association rule based classification
Deriving useful and interesting rules from a data mining system is an essential and important task. Problems
such as the discovery of random and coincidental patterns or patterns with no significant values, and the
generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness
of rules generated by data mining algorithms are actively and constantly being examined and developed. In this
paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms,
combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task
- âŠ