34 research outputs found

    Mining XML documents with association rule algorithms

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2008Includes bibliographical references (leaves: 59-63)Text in English; Abstract: Turkish and Englishx, 63 leavesFollowing the increasing use of XML technology for data storage and data exchange between applications, the subject of mining XML documents has become more researchable and important topic. In this study, we considered the problem of Mining Association Rules between items in XML document. The principal purpose of this study is applying association rule algorithms directly to the XML documents with using XQuery which is a functional expression language that can be used to query or process XML data. We used three different algorithms; Apriori, AprioriTid and High Efficient AprioriTid. We give comparisons of mining times of these three apriori-like algorithms on XML documents using different support levels, different datasets and different dataset sizes

    Mining fuzzy association rules in large databases with quantitative attributes.

    Get PDF
    by Kuok, Chan Man.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 74-77).Abstract --- p.iAcknowledgments --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Data Mining --- p.2Chapter 1.2 --- Association Rule Mining --- p.3Chapter 2 --- Background --- p.6Chapter 2.1 --- Framework of Association Rule Mining --- p.6Chapter 2.1.1 --- Large Itemsets --- p.6Chapter 2.1.2 --- Association Rules --- p.8Chapter 2.2 --- Association Rule Algorithms For Binary Attributes --- p.11Chapter 2.2.1 --- AIS --- p.12Chapter 2.2.2 --- SETM --- p.13Chapter 2.2.3 --- "Apriori, AprioriTid and AprioriHybrid" --- p.15Chapter 2.2.4 --- PARTITION --- p.18Chapter 2.3 --- Association Rule Algorithms For Numeric Attributes --- p.20Chapter 2.3.1 --- Quantitative Association Rules --- p.20Chapter 2.3.2 --- Optimized Association Rules --- p.23Chapter 3 --- Problem Definition --- p.25Chapter 3.1 --- Handling Quantitative Attributes --- p.25Chapter 3.1.1 --- Discrete intervals --- p.26Chapter 3.1.2 --- Overlapped intervals --- p.27Chapter 3.1.3 --- Fuzzy sets --- p.28Chapter 3.2 --- Fuzzy association rule --- p.31Chapter 3.3 --- Significance factor --- p.32Chapter 3.4 --- Certainty factor --- p.36Chapter 3.4.1 --- Using significance --- p.37Chapter 3.4.2 --- Using correlation --- p.38Chapter 3.4.3 --- Significance vs. Correlation --- p.42Chapter 4 --- Steps For Mining Fuzzy Association Rules --- p.43Chapter 4.1 --- Candidate itemsets generation --- p.44Chapter 4.1.1 --- Candidate 1-Itemsets --- p.45Chapter 4.1.2 --- Candidate k-Itemsets (k > 1) --- p.47Chapter 4.2 --- Large itemsets generation --- p.48Chapter 4.3 --- Fuzzy association rules generation --- p.49Chapter 5 --- Experimental Results --- p.51Chapter 5.1 --- Experiment One --- p.51Chapter 5.2 --- Experiment Two --- p.53Chapter 5.3 --- Experiment Three --- p.54Chapter 5.4 --- Experiment Four --- p.56Chapter 5.5 --- Experiment Five --- p.58Chapter 5.5.1 --- Number of Itemsets --- p.58Chapter 5.5.2 --- Number of Rules --- p.60Chapter 5.6 --- Experiment Six --- p.61Chapter 5.6.1 --- Varying Significance Threshold --- p.62Chapter 5.6.2 --- Varying Membership Threshold --- p.62Chapter 5.6.3 --- Varying Confidence Threshold --- p.63Chapter 6 --- Discussions --- p.65Chapter 6.1 --- User guidance --- p.65Chapter 6.2 --- Rule understanding --- p.67Chapter 6.3 --- Number of rules --- p.68Chapter 7 --- Conclusions and Future Works --- p.70Bibliography --- p.7

    A Study on Data Filtering Techniques for Event-Driven Failure Analysis

    Get PDF
    Engineering & Systems DesignHigh performance sensors and modern data logging technology with real-time telemetry facilitate system failure analysis in a very precise manner. Fault detection, isolation and identification in failure analysis are typical steps to analyze the root causes of failures. This systematic failure analysis provides not only useful clues to rectify the abnormal behaviors of a system, but also key information to redesign the current system for retrofit. The main barriers to effective failure analysis are: (i) the gathered sensor data logs, usually in the form of event logs containing massive datasets, are too large, and further (ii) noise and redundant information in the gathered sensor data that make precise analysis difficult. Therefore, the objective of this thesis is to develop an event-driven failure analysis method in order to take into account both functional interactions between subsystems and diverse user???s behaviors. To do this, we first apply various data filtering techniques to data cleaning and reduction, and then convert the filtered data into a new format of event sequence information (called ???eventization???). Four eventization strategies: equal-width binning, entropy, domain knowledge expert, and probability distribution estimation, are examined for data filtering, in order to extract only important information from the raw sensor data while minimizing information loss. By numerical simulation, we identify the optimal values of eventization parameters. Finally, the event sequence information containing the time gap between event occurrences is decoded to investigate the correlation between specific event sequence patterns and various system failures. These extracted patterns are stored in a failure pattern library, and then this pattern library is used as the main reference source to predict failures in real-time during the failure prognosis phase. The efficiency of the developed procedure is examined with a terminal box data log of marine diesel engines.ope

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    Extracção de regras de associação com itens raros e frequentes

    Get PDF
    Ao longo dos últimos anos, as regras de associação têm assumido um papel relevante na extracção de informação e de conhecimento em base de dados e vêm com isso auxiliar o processo de tomada de decisão. A maioria dos trabalhos de investigação desenvolvidos sobre regras de associação têm por base o modelo de suporte e confiança. Este modelo permite obter regras de associação que envolvem particularmente conjuntos de itens frequentes. Contudo, nos últimos anos, tem-se explorado conjuntos de itens que surgem com menor frequência, designados de regras de associação raras ou infrequentes. Muitas das regras com base nestes itens têm particular interesse para o utilizador. Actualmente a investigação sobre regras de associação procuram incidir na geração do maior número possível de regras com interesse aglomerando itens raros e frequentes. Assim, este estudo foca, inicialmente, uma pesquisa sobre os principais algoritmos de data mining que abordam as regras de associação. A finalidade deste trabalho é examinar as técnicas e algoritmos de extracção de regras de associação já existentes, verificar as principais vantagens e desvantagens dos algoritmos na extracção de regras de associação e, por fim, desenvolver um algoritmo cujo objectivo é gerar regras de associação que envolvem itens raros e frequentes.Over the past few years, association rules have taken an important paper in extracting information and knowledge from database, which helps the decision-making process. The most of the investigation works of in association rules is essentially based on the model of support and confidence. This model enables to extract association rules particularly related to frequent items. However, in recent years, the need to explore less frequent itemsets, called rare or unusual association rules, has increased. Many of these rules that involve infrequent items are considered to have particular interest for the user. Recently, efforts on the research of association rules have tried to generate the largest possible number of interest rules agglomerating rare and frequent items. This way, this study initially seals a research on the main algorithms of date mining that approach the association rules. An association rule is considered to be rare when it is formed by frequent and unusual items or unusual items only. The purpose of this study is to examine not only the techniques and algorithms for the extraction of association rules that already exist, but also the main advantages and disadvantages of the algorithms in the mining of association rules, and finally to develop an algorithm whose objective is to generate association rules that involve rare and frequent items

    Fuzzy association rules: general model and applications

    Full text link
    corecore