179,783 research outputs found

    Finding Temporal Patterns in Noisy Longitudinal Data: A Study in Diabetic Retinopathy

    Get PDF
    This paper describes an approach to temporal pattern mining using the concept of user defined temporal prototypes to define the nature of the trends of interests. The temporal patterns are defined in terms of sequences of support values associated with identified frequent patterns. The prototypes are defined mathematically so that they can be mapped onto the temporal patterns. The focus for the advocated temporal pattern mining process is a large longitudinal patient database collected as part of a diabetic retinopathy screening programme, The data set is, in itself, also of interest as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The diabetic retinopathy application, the data warehousing and cleaning process, and the frequent pattern mining procedure (together with the application of the prototype concept) are all described in the paper. An evaluation of the frequent pattern mining process is also presented

    Text mining with exploitation of user\u27s background knowledge : discovering novel association rules from text

    Get PDF
    The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments. This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two major components: a background knowledge developer and a novel association rules miner. The background knowledge developer learns a user\u27s background knowledge by extracting keywords from documents already known to the user (background documents) and developing a concept hierarchy to organize popular keywords. The novel association rule miner discovers association rules among noun phrases extracted from relevant documents (target documents) and compares the rules with the background knowledge to predict the rule novelty to the particular user (useroriented novelty). The user-oriented novelty measure is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. It consists of two components: occurrence distance and connection distance. The former considers the co-occurrences of two keywords in the background documents: the more the shorter the distance. The latter considers the common connections of with others in the concept hierarchy. It is defined as the length of the connecting the two keywords in the concept hierarchy: the longer the path, distance. The user-oriented novelty measure is evaluated from two perspectives: novelty prediction accuracy and usefulness indication power. The results show that the useroriented novelty measure outperforms the WordNet novelty measure and the compared objective measures in term of predicting novel rules and identifying useful rules

    Association Rule Based Classification

    Get PDF
    In this thesis, we focused on the construction of classification models based on association rules. Although association rules have been predominantly used for data exploration and description, the interest in using them for prediction has rapidly increased in the data mining community. In order to mine only rules that can be used for classification, we modified the well known association rule mining algorithm Apriori to handle user-defined input constraints. We considered constraints that require the presence/absence of particular items, or that limit the number of items, in the antecedents and/or the consequents of the rules. We developed a characterization of those itemsets that will potentially form rules that satisfy the given constraints. This characterization allows us to prune during itemset construction itemsets such that neither they nor any of their supersets will form valid rules. This improves the time performance of itemset construction. Using this characterization, we implemented a classification system based on association rules and compared the performance of several model construction methods, including CBA, and several model deployment modes to make predictions. Although the data mining community has dealt only with the classification of single-valued attributes, there are several domains in which the classification target is set-valued. Hence, we enhanced our classification system with a novel approach to handle the prediction of set-valued class attributes. Since the traditional classification accuracy measure is inappropriate in this context, we developed an evaluation method for set-valued classification based on the E-Measure. Furthermore, we enhanced our algorithm by not relying on the typical support/confidence framework, and instead mining for the best possible rules above a user-defined minimum confidence and within a desired range for the number of rules. This avoids long mining times that might produce large collections of rules with low predictive power. For this purpose, we developed a heuristic function to determine an initial minimum support and then adjusted it using a binary search strategy until a number of rules within the given range was obtained. We implemented all of our techniques described above in WEKA, an open source suite of machine learning algorithms. We used several datasets from the UCI Machine Learning Repository to test and evaluate our techniques

    FGC: an efficient constraint-based frequent set miner

    Get PDF
    Despite advances in algorithmic design, association rule mining remains problematic from a performance viewpoint when the size of the underlying transaction database is large. The well-known a priori approach, while reducing the computational effort involved still suffers from the problem of scalability due to its reliance on generating candidate itemsets. In this paper we present a novel approach that combines the power of preprocessing with the application of user-defined constraints to prune the itemset space prior to building a compact FP-tree. Experimentation shows that that our algorithm significantly outperforms the current state of the art algorithm, FP-bonsai

    Association Rules for Web Data Mining in WHOWEDA

    Get PDF
    The authors discuss association rules which can be discovered from Web data. The association rules are discussed within the scope of our WHOWEDA (warehouse of Web data) project. WHOWEDA is supported by a Web data model and a set of algebraic operators. The Web data model allows a uniform and integrated view of Web data gathered using a user\u27\u27s query graph. A user\u27\u27s query graph describes the query by example (what the user perceives as the query) and the Web coupling query gathers instances of such a query graph from the Web and stores them in the form of subgraphs (called Web tuples) in a Web table. We discuss association rules within this domain. An association rule defines an association between the nodes and links attributes of Web tuples within a Web table. There are two different classes of association rules that can be developed from data in a Web table. There are two different classes of association rules that can be developed from data in a Web table. Node-to-node associations are those rules that relate the content (defined by metadata attributes) between two or more nodes within a Web tuple. Link associations are rules that show the connectivity of different URLs. Distinguishing the two types of associations provides a view of the structure of the Web data. The goal of performing Web association mining on Web data is to better organize searching patterns through hyperlinked document

    P-CSREC: A New Approach for Personalized Cloud Service Recommendation

    Get PDF
    It is becoming a challenging issue for users to choose a satisfied service to fit their need due to the rapid growing number of cloud services and the vast amount of service type varieties. This paper proposes an effective cloud service recommendation approach, named personalized cloud service recommendation (P-CSREC), based on the characterization of heterogeneous information network, the use of association rule mining, and the modeling and clustering of user interests. First, a similarity measure is defined to improve the average similarity (AvgSim) measure by the inclusion of the subjective evaluation of users’ interests. Based on the improved AvgSim, a new model for measuring the user interest is established. Second, the traditional K-Harmonic Means (KHM) clustering algorithm is improved by means of involving multi meta-paths to avoid the convergence of local optimum. Then, a frequent pattern growth (FP-Growth) association rules algorithm is proposed to address the issue and the limitation of traditional association rule algorithms to offer personalization in recommendation. A new method to define a support value of nodes is developed using the weight of user’s score. In addition, a multi-level FP-Tree is defined based on the multi-level association rules theory to extract the relationship in higher level. Finally, a combined user interest with the improved KHM clustering algorithm and the improved FP-Growth algorithm is provided to improve accuracy of cloud services recommendation to target users. The experimental results demonstrated the effectiveness of the proposed approach in improving the computational efficiency and recommendation accuracy

    Robust and distributed top-n frequent-pattern mining with SAP BW accelerator

    Get PDF
    Mining for association rules and frequent patterns is a central activity in data mining. However, most existing algorithms are only moderately suitable for real-world scenarios. Most strategies use parameters like minimum support, for which it can be very difficult to define a suitable value for unknown datasets. Since most untrained users are unable or unwilling to set such technical parameters, we address the problem of replacing the minimum-support parameter with top-n strategies. In our paper, we start by extending a top-n implementation of the ECLAT algorithm to improve its performance by using heuristic search strategy optimizations. Also, real-world datasets are often distributed and modern database architectures are switching from expensive SMPs to cheaper shared-nothing blade servers. Thus, most mining queries require distribution handling. Since partitioning can be forced by user-defined semantics, it is often forbidden to transform the data. Therefore, we developed an adaptive top-n frequent-pattern mining algorithm that simplifies the mining process on real distributions by relaxing some requirements on the results. We first combine the PARTITION and the TPUT algorithms to handle distributed top-n frequent-pattern mining. Then, we extend this new algorithm for distributions with real-world data characteristics. For frequent-pattern mining algorithms, equal distributions are important conditions, and tiny partitions can cause performance bottlenecks. Hence, we implemented an approach called MAST that defines a minimum absolute-support threshold. MAST prunes patterns with low chances of reaching the global top-n result set and high computing costs. In total, our approach simplifies the process of frequent-pattern mining for real customer scenarios and data sets. This may make frequent-pattern mining accessible for very new user groups. Finally, we present results of our algorithms when run on the SAP NetWeaver BW Acceleratorwith standard and real business datasets

    An Approach to Develop Security Aspect of MANET using NS2 Field

    Get PDF
    A Mobile network is a open area network in which any user can enter to the system and increases the network traffic. Large amount of useless traffic over the network results the congestion on the network nodes. As the data is transferred over these nodes, it increases the network delay and the data loss over the network. To identify the safe path over the network, we have defined an association mining based adaptive approach under different parameters. A Mobile network always undergoes from different kind of external and internal attacks. One of such internal attack is DOS attack (Denial-of-Service). A DOS attack generally consists of efforts to temporarily or indefinitely interrupt or suspend services of a host connected to the network. In this type of attack a particular user flooded the bandwidth with useless traffic and disturbs flow of data to other users. So a reliable communication path over the network is required with minimum delay & loss. Data mining approach is used to present the solution for this problem with effective throughput and minimum loss over the network. DOI: 10.17762/ijritcc2321-8169.15064
    • …
    corecore