79,167 research outputs found

    A Hybrid Web Recommendation System based on the Improved Association Rule Mining Algorithm

    Full text link
    As the growing interest of web recommendation systems those are applied to deliver customized data for their users, we started working on this system. Generally the recommendation systems are divided into two major categories such as collaborative recommendation system and content based recommendation system. In case of collaborative recommen-dation systems, these try to seek out users who share same tastes that of given user as well as recommends the websites according to the liking given user. Whereas the content based recommendation systems tries to recommend web sites similar to those web sites the user has liked. In the recent research we found that the efficient technique based on asso-ciation rule mining algorithm is proposed in order to solve the problem of web page recommendation. Major problem of the same is that the web pages are given equal importance. Here the importance of pages changes according to the fre-quency of visiting the web page as well as amount of time user spends on that page. Also recommendation of newly added web pages or the pages those are not yet visited by users are not included in the recommendation set. To over-come this problem, we have used the web usage log in the adaptive association rule based web mining where the asso-ciation rules were applied to personalization. This algorithm was purely based on the Apriori data mining algorithm in order to generate the association rules. However this method also suffers from some unavoidable drawbacks. In this paper we are presenting and investigating the new approach based on weighted Association Rule Mining Algorithm and text mining. This is improved algorithm which adds semantic knowledge to the results, has more efficiency and hence gives better quality and performances as compared to existing approaches.Comment: 9 pages, 7 figures, 2 table

    Data Mining with Linguistic Thresholds

    Get PDF
    Abstract Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. In the past, the minimum supports and minimum confidences were set at numerical values. Linguistic minimum support and minimum confidence values are, however, more natural and understandable for human beings. This paper thus attempts to propose a new mining approach for extracting interesting weighted association rules from transactions, when the parameters needed in the mining process are given in linguistic terms. Items are also evaluated by managers as linguistic terms to reflect their importance, which are then transformed as fuzzy sets of weights. Fuzzy operations including fuzzy ranking are then used to find weighted large itemsets and association rules

    On the Complexity of Rule Discovery from Distributed Data

    Get PDF
    This paper analyses the complexity of rule selection for supervised learning in distributed scenarios. The selection of rules is usually guided by a utility measure such as predictive accuracy or weighted relative accuracy. Other examples are support and confidence, known from association rule mining. A common strategy to tackle rule selection from distributed data is to evaluate rules locally on each dataset. While this works well for homogeneously distributed data, this work proves limitations of this strategy if distributions are allowed to deviate. To identify those subsets for which local and global distributions deviate may be regarded as an interesting learning task of its own, explicitly taking the locality of data into account. This task can be shown to be basically as complex as discovering the globally best rules from local data. Based on the theoretical results some guidelines for algorithm design are derived. --

    Digging deep into weighted patient data through multiple-level patterns

    Get PDF
    Large data volumes have been collected by healthcare organizations at an unprecedented rate. Today both physicians and healthcare system managers are very interested in extracting value from such data. Nevertheless, the increasing data complexity and heterogeneity prompts the need for new efficient and effective data mining approaches to analyzing large patient datasets. Generalized association rule mining algorithms can be exploited to automatically extract hidden multiple-level associations among patient data items (e.g., examinations, drugs) from large datasets equipped with taxonomies. However, in current approaches all data items are assumed to be equally relevant within each transaction, even if this assumption is rarely true. This paper presents a new data mining environment targeted to patient data analysis. It tackles the issue of extracting generalized rules from weighted patient data, where items may weight differently according to their importance within each transaction. To this aim, it proposes a novel type of association rule, namely the Weighted Generalized Association Rule (W-GAR). The usefulness of the proposed pattern has been evaluated on real patient datasets equipped with a taxonomy built over examinations and drugs. The achieved results demonstrate the effectiveness of the proposed approach in mining interesting and actionable knowledge in a real medical care scenario

    Associative pattern mining for supervised learning

    Get PDF
    The Internet era has revolutionized computational sciences and automated data collection techniques, made large amounts of previously inaccessible data available and, consequently, broadened the scope of exploratory computing research. As a result, data mining, which is still an emerging field of research, has gained importance because of its ability to analyze and discover previously unknown, hidden, and useful knowledge from these large amounts of data. One aspect of data mining, known as frequent pattern mining, has recently gained importance due to its ability to find associative relationships among the parts of data, thereby aiding a type of supervised learning known as associative learning . The purpose of this dissertation is two-fold: to develop and demonstrate supervised associative learning in non-temporal data for multi-class classification and to develop a new frequent pattern mining algorithm for time varying (temporal) data which alleviates the current issues in analyzing this data for knowledge discovery. In order to use associative relationships for classification, we have to algorithmically learn their discriminatory power. While it is well known that multiple sets of features work better for classification, we claim that the isomorphic relationships among the features work even better and, therefore, can be used as higher order features. To validate this claim, we exploit these relationships as input features for classification instead of using the underlying raw features. The next part of this dissertation focuses on building a new classifier using associative relationships as a basis for the multi-class classification problem. Most of the existing associative classifiers represent the instances from a class in a row-based format wherein one row represents features of one instance and extract association rules from the entire dataset. The rules formed in this way are known as class constrained rules, as they have class labels on the right side of the rules. We argue that this class constrained representation schema lacks important information that is necessary for multi-class classification. Further, most existing works use either the intraclass or inter-class importance of the association rules, both of which sets of techniques offer empirical benefits. We hypothesize that both intra-class and inter-class variations are important for fast and accurate multi-class classification. We also present a novel weighted association rule-based classification mechanism that uses frequent relationships among raw features from an instance as the basis for classifying the instance into one of the many classes. The relationships are weighted according to both their intra-class and inter-class importance. The final part of this dissertation concentrates on mining time varying data. This problem is known as inter-transaction association rule mining in the data-mining field. Most of the existing work transforms the time varying data into a static format and then use multiple scans over the new data to extract patterns. We present a unique index-based algorithmic framework for inter-transaction association rule mining. Our proposed technique requires only one scan of the original database. Further, the proposed technique can also provide the location information of each extracted pattern. We use mathematical induction to prove that the new representation scheme captures all underlying frequent relationships

    Construction of Weighted Temporal Association Rules in Data Mining

    Get PDF
    传统的关联规则很少考虑规则的时间适用性,而时态关联规则中每条关联规则都有其成立的时间区域,对上述问题进行了一定的改进。该文在此基础上,构造了一种体现数据时间价值的加权时态关联规则,以使规则的发现体现一种时间趋势,并对同一组数据采用不同关联规则挖掘的结果进行比较,取得了良好的效果。The fitness of time is seldom illustrated by traditional association rules. Temporal association rules are improved by regarding every association rule with valid time area. Weighted temporal association rule is presented in this paper based on these researches, which can reflect the time value of data and the time tendency of discovered rules, and the results of different association rules mining on the same data are also compared and achieve a fine performance.国家教育部新世纪优秀人才计划基金资助项目(NCET-04-0608);; 国家教育部社科研究规划基金资助项目(06JA910003

    Guided review by frequent itemset mining: additional evidence for plaque detection

    Get PDF
    Purpose: A guided review process to support manual coronary plaque detection in computed tomography coronary angiography (CTCA) data sets is proposed. The method learns the spatial plaque distribution patterns by using the frequent itemset mining algorithm and uses this knowledge to predict potentially missed plaques during detection. Materials and methods: Plaque distribution patterns from 252 manually labeled patients who underwent CTCA were included. For various cross-validations a labeling with missing plaques was created from the initial manual ground truth labeling. Frequent itemset mining was used to learn the spatial plaque distribution patterns in form of association rules from a training set. These rules were then applied on a testing set to search for segments in the coronary tree showing evidence of containing unlabeled plaques. The segments with potentially missed plaques were finally reviewed for the existence of plaques. The proposed guided review was compared to a weighted random approach that considered only the probability of occurrence for a plaque in a specific segment and not its spatial correlation to other plaques. Results: Guided review by frequent itemset mining performed significantly better (p<0.001) than the reference weighted random approach in predicting coronary segments with initially missed plaques. Up to 47% of the initially removed plaques were refound by only reviewing 4.4% of all possible segments. Conclusions: The spatial distribution patterns of atherosclerosis in coronary arteries can be used to predict potentially missed plaques by a guided review with frequent itemset mining. It shows potential to reduce the intra- and inter-observer variabilit
    corecore