4 research outputs found

    Experiences in building a tool for navigating association rule result sets

    Get PDF
    Practical knowledge discovery is an iterative process. First, the experiences gained from one mining run are used to inform the parameter setting and the dataset and attribute selection for subsequent runs. Second, additional data, either incremental additions to existing datasets or the inclusion of additional attributes means that the mining process is reinvoked, perhaps numerous times. Reducing the number of iterations, improving the accuracy of parameter setting and making the results of the mining run more clearly understandable can thus significantly speed up the discovery process. In this paper we discuss our experiences in this area and present a system that helps the user to navigate through association rule result sets in a way that makes it easier to find useful results from a large result set. We present several techniques that experience has shown us to be useful. The prototype system – IRSetNav – is discussed, which has capabilities in redundant rule reduction, subjective interestingness evaluation, item and itemset pruning, related information searching, text-based itemset and rule visualisation, hierarchy based searching and tracking changes between data sets using a knowledge base. Techniques also discussed in the paper, but not yet accommodated into IRSetNav, include input schema selection, longitudinal ruleset analysis and graphical visualisation techniques.Adelaide, S

    Data wars over data stores: challenges in medical data linkage

    No full text
    A primary concern of the medical e-research community is the availability of suitable data sets for their analysis requirements. The quantity and dubious quality of data present significant barriers to the application of many automated analysis technologies, including data mining, to the medical and health domain. Publicly available data is frequently poorly coded, incomplete, out-of-date or simply not applicable to the analysis or algorithm being applied. Work has been done to overcome these issues through the application of data linking processes but further complications have been encountered resulting in slow progress. The use of locally held medical data is difficult enough due to its structural complexity and non-standardised language, however linking data from disparate electronic sources adds the challenges of privacy, security, semantic compatibility, provenance, and governance, each with its own inherent issues. A focal requirement is a mechanism for the sharing of medical and health data across multiple sites which incorporates careful management of the semantics and limitations of the data sets whilst maintaining functional relevance for the end user. Our paper addresses this requirement by exploring recent conceptual modeling and data evaluation methodologies that facilitate effective data linking whilst ensuring the semantics of the data are maintained and the individual needs of the end user are met

    A Communication Model that Bridges Knowledge Delivery between Data Miners and Domain Users

    Get PDF
    Findings generated from data mining sometimes are not interesting to the domain users. The problem is that data miners and the domain users do not speak the same language, so human subjectivity towards the domain users’ own fields of knowledge affects the understanding of knowledge generated from data mining. This paper proposes a communication model based on the reference services model in the field of library science in order to bridge the communications between data miners and domain users. The creation of a data liaison specialist role in the data mining team aims at understanding the subjectivity as well as the thinking process of both parties in order to translate knowledge between the two fields and deliver findings to domain users. Through five steps-”data interview, pre-mid evaluation, post-mid evaluation, knowledge delivery, and follow up-”the data liaison specialist can achieve effective knowledge synthesis and delivery to the domain users

    Semi-Automatic Method to Assist Expert for Association Rules Validation

    Get PDF
    Abstract-In order to help the expert to validate association rules extracted from data, some quality measures are proposed in the literature. We distinguish two categories: objective and subjective measures. The first one depends on a fixed threshold and on data quality from which the rules are extracted. The second one consists on providing to the expert some tools in the objective to explore and visualize rules during the evaluation step. However, the number of extracted rules to validate remains high. Thus, the manually mining rules task is very hard. To solve this problem, we propose, in this paper, a semi-automatic method to assist the expert during the association rule's validation. Our method uses rule-based classification as follow: (i) We transform association rules into classification rules (classifiers), (ii) We use the generated classifiers for data classification. (iii) We visualize association rules with their quality classification to give an idea to the expert and to assist him during validation process
    corecore