152 research outputs found

    Categorization of interestingness measures for knowledge extraction

    Full text link
    Finding interesting association rules is an important and active research field in data mining. The algorithms of the Apriori family are based on two rule extraction measures, support and confidence. Although these two measures have the virtue of being algorithmically fast, they generate a prohibitive number of rules most of which are redundant and irrelevant. It is therefore necessary to use further measures which filter uninteresting rules. Many synthesis studies were then realized on the interestingness measures according to several points of view. Different reported studies have been carried out to identify "good" properties of rule extraction measures and these properties have been assessed on 61 measures. The purpose of this paper is twofold. First to extend the number of the measures and properties to be studied, in addition to the formalization of the properties proposed in the literature. Second, in the light of this formal study, to categorize the studied measures. This paper leads then to identify categories of measures in order to help the users to efficiently select an appropriate measure by choosing one or more measure(s) during the knowledge extraction process. The properties evaluation on the 61 measures has enabled us to identify 7 classes of measures, classes that we obtained using two different clustering techniques.Comment: 34 pages, 4 figure

    Metacognitive planning: Development and validation of an online measure.

    Get PDF
    Planning is the critical first stage of metacognition. Although it has long been emphasized theoretically, it has not been the subject of much empirical study due to the lack of a valid assessment tool. Because planning is a metacognitive process, online methods that collect data during task performance would much better capture it. The present study was conducted to develop an online measure of metacognitive planning. Researchers designed a puzzle task that took the form of the popular game Sokoban, and the ratio between planning time and total time of each item was chosen as the metacognitive planning index. The task was administered to a heterogeneous sample of 440 participants composed of college students as well as 5th-, 7th-, and 10th-grade students. The results showed that valid inference could be made from the time ratio score. Cronbach’s alpha and test–retest correlation provided robust evidence of reliability of the time ratio score. Confirmatory factor analysis further confirmed its unidimensionality. Validity evidence also supported the use of the time ratio score. After controlling for demographic variables, intelligence, and motivation, the time ratio score still accounted for a significant proportion of variance of Sokoban performance, the Tower of London performance, and academic achievement. The time ratio score was also found to increase with age. Taken together, the results of the study revealed that the time ratio is a psychometrically sound online measure of metacognitive plannin

    Development of an Explainability Scale to Evaluate Explainable Artificial Intelligence (XAI) Methods

    Get PDF
    Explainable Artificial Intelligence (XAI) is an area of research that develops methods and techniques to make the results of artificial intelligence understood by humans. In recent years, there has been an increased demand for XAI methods to be developed due to model architectures getting more complicated and government regulations requiring transparency in machine learning models. With this increased demand has come an increased need for instruments to evaluate XAI methods. However, there are few, if none, valid and reliable instruments that take into account human opinion and cover all aspects of explainability. Therefore, this study developed an objective, human-centred questionnaire to evaluate all types of XAI methods. This questionnaire consists of 15 items: 5 items asking about the user’s background information and 10 items evaluating the explainability of the XAI method which were based on the notions of explainability. An experiment was conducted (n = 38) which got participants to evaluate one of two XAI methods using the questionnaire. The results from this experiment were used for exploratory factor analysis which showed that the 10 items related to explainability constitute one factor (Cronbach’s α = 0.81). The results were also used to gather evidence of the questionnaire’s construct validity. It is concluded that this 15-item questionnaire has one factor, has acceptable validity and reliability, and can be used to evaluate and compare XAI methods

    Mining time-series data using discriminative subsequences

    Get PDF
    Time-series data is abundant, and must be analysed to extract usable knowledge. Local-shape-based methods offer improved performance for many problems, and a comprehensible method of understanding both data and models. For time-series classification, we transform the data into a local-shape space using a shapelet transform. A shapelet is a time-series subsequence that is discriminative of the class of the original series. We use a heterogeneous ensemble classifier on the transformed data. The accuracy of our method is significantly better than the time-series classification benchmark (1-nearest-neighbour with dynamic time-warping distance), and significantly better than the previous best shapelet-based classifiers. We use two methods to increase interpretability: First, we cluster the shapelets using a novel, parameterless clustering method based on Minimum Description Length, reducing dimensionality and removing duplicate shapelets. Second, we transform the shapelet data into binary data reflecting the presence or absence of particular shapelets, a representation that is straightforward to interpret and understand. We supplement the ensemble classifier with partial classifocation. We generate rule sets on the binary-shapelet data, improving performance on certain classes, and revealing the relationship between the shapelets and the class label. To aid interpretability, we use a novel algorithm, BruteSuppression, that can substantially reduce the size of a rule set without negatively affecting performance, leading to a more compact, comprehensible model. Finally, we propose three novel algorithms for unsupervised mining of approximately repeated patterns in time-series data, testing their performance in terms of speed and accuracy on synthetic data, and on a real-world electricity-consumption device-disambiguation problem. We show that individual devices can be found automatically and in an unsupervised manner using a local-shape-based approach

    Generating High Precision Classification Rules for Screening of Irrelevant Studies in Systematic Review Literature Searches

    Get PDF
    Systematic reviews aim to produce repeatable, unbiased, and comprehensive answers to clinical questions. Systematic reviews are an essential component of modern evidence based medicine, however due to the risks of omitting relevant research they are highly time consuming to create and are largely conducted manually. This thesis presents a novel framework for partial automation of systematic review literature searches. We exploit the ubiquitous multi-stage screening process by training the classifier using annotations made by reviewers in previous screening stages. Our approach has the benefit of integrating seamlessly with the existing screening process, minimising disruption to users. Ideally, classification models for systematic reviews should be easily interpretable by users. We propose a novel, rule based algorithm for use with our framework. A new approach for identifying redundant associations when generating rules is also presented. The proposed approach to redundancy seeks to both exclude redundant specialisations of existing rules (those with additional terms in their antecedent), as well as redundant generalisations (those with fewer terms in their antecedent). We demonstrate the ability of the proposed approach to improve the usability of the generated rules. The proposed rule based algorithm is evaluated by simulated application to several existing systematic reviews. Workload savings of up to 10% are demonstrated. There is an increasing demand for systematic reviews related to a variety of clinical disciplines, such as diagnosis. We examine reviews of diagnosis and contrast them against more traditional systematic reviews of treatment. We demonstrate existing challenges such as target class heterogeneity and high data imbalance are even more pronounced for this class of reviews. The described algorithm accounts for this by seeking to label subsets of non-relevant studies with high precision, avoiding the need to generate a high recall model of the minority class

    Mining and modeling graphs using patterns and priors

    No full text
    • …
    corecore