21,994 research outputs found

    Specious rules: an efficient and effective unifying method for removing misleading and uninformative patterns in association rule mining

    Full text link
    We present theoretical analysis and a suite of tests and procedures for addressing a broad class of redundant and misleading association rules we call \emph{specious rules}. Specious dependencies, also known as \emph{spurious}, \emph{apparent}, or \emph{illusory associations}, refer to a well-known phenomenon where marginal dependencies are merely products of interactions with other variables and disappear when conditioned on those variables. The most extreme example is Yule-Simpson's paradox where two variables present positive dependence in the marginal contingency table but negative in all partial tables defined by different levels of a confounding factor. It is accepted wisdom that in data of any nontrivial dimensionality it is infeasible to control for all of the exponentially many possible confounds of this nature. In this paper, we consider the problem of specious dependencies in the context of statistical association rule mining. We define specious rules and show they offer a unifying framework which covers many types of previously proposed redundant or misleading association rules. After theoretical analysis, we introduce practical algorithms for detecting and pruning out specious association rules efficiently under many key goodness measures, including mutual information and exact hypergeometric probabilities. We demonstrate that the procedure greatly reduces the number of associations discovered, providing an elegant and effective solution to the problem of association mining discovering large numbers of misleading and redundant rules.Comment: Note: This is a corrected version of the paper published in SDM'17. In the equation on page 4, the range of the sum has been correcte

    On the Optimization of Visualizations of Complex Phenomena

    Get PDF
    The problem of perceptually optimizing complex visualizations is a difficult one, involving perceptual as well as aesthetic issues. In our experience, controlled experiments are quite limited in their ability to uncover interrelationships among visualization parameters, and thus may not be the most useful way to develop rules-of-thumb or theory to guide the production of high-quality visualizations. In this paper, we propose a new experimental approach to optimizing visualization quality that integrates some of the strong points of controlled experiments with methods more suited to investigating complex highly-coupled phenomena. We use human-in-the-loop experiments to search through visualization parameter space, generating large databases of rated visualization solutions. This is followed by data mining to extract results such as exemplar visualizations, guidelines for producing visualizations, and hypotheses about strategies leading to strong visualizations. The approach can easily address both perceptual and aesthetic concerns, and can handle complex parameter interactions. We suggest a genetic algorithm as a valuable way of guiding the human-in-the-loop search through visualization parameter space. We describe our methods for using clustering, histogramming, principal component analysis, and neural networks for data mining. The experimental approach is illustrated with a study of the problem of optimal texturing for viewing layered surfaces so that both surfaces are maximally observable

    Dietary garlic and hip osteoarthritis: evidence of a protective effect and putative mechanism of action

    Get PDF
    Background Patterns of food intake and prevalent osteoarthritis of the hand, hip, and knee were studied using the twin design to limit the effect of confounding factors. Compounds found in associated food groups were further studied in vitro. Methods Cross-sectional study conducted in a large population-based volunteer cohort of twins. Food intake was evaluated using the Food Frequency Questionnaire; OA was determined using plain radiographs. Analyses were adjusted for age, BMI and physical activity. Subsequent in vitro studies examined the effects of allium-derived compounds on the expression of matrix-degrading proteases in SW1353 chondrosarcoma cells. Results Data were available, depending on phenotype, for 654-1082 of 1086 female twins (median age 58.9 years; range 46-77). Trends in dietary analysis revealed a specific pattern of dietary intake, that high in fruit and vegetables, showed an inverse association with hip OA (p = 0.022). Consumption of 'non-citrus fruit' (p = 0.015) and 'alliums' (p = 0.029) had the strongest protective effect. Alliums contain diallyl disulphide which was shown to abrogate cytokine-induced matrix metalloproteinase expression. Conclusions Studies of diet are notorious for their confounding by lifestyle effects. While taking account of BMI, the data show an independent effect of a diet high in fruit and vegetables, suggesting it to be protective against radiographic hip OA. Furthermore, diallyl disulphide, a compound found in garlic and other alliums, represses the expression of matrix-degrading proteases in chondrocyte-like cells, providing a potential mechanism of action

    Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research.

    Get PDF
    BackgroundJuvenile idiopathic arthritis is the most common rheumatic disease in children. Chronic uveitis is a common and serious comorbid condition of juvenile idiopathic arthritis, with insidious presentation and potential to cause blindness. Knowledge of clinical associations will improve risk stratification. Based on clinical observation, we hypothesized that allergic conditions are associated with chronic uveitis in juvenile idiopathic arthritis patients.MethodsThis study is a retrospective cohort study using Stanford's clinical data warehouse containing data from Lucile Packard Children's Hospital from 2000-2011 to analyze patient characteristics associated with chronic uveitis in a large juvenile idiopathic arthritis cohort. Clinical notes in patients under 16 years of age were processed via a validated text analytics pipeline. Bivariate-associated variables were used in a multivariate logistic regression adjusted for age, gender, and race. Previously reported associations were evaluated to validate our methods. The main outcome measure was presence of terms indicating allergy or allergy medications use overrepresented in juvenile idiopathic arthritis patients with chronic uveitis. Residual text features were then used in unsupervised hierarchical clustering to compare clinical text similarity between patients with and without uveitis.ResultsPreviously reported associations with uveitis in juvenile idiopathic arthritis patients (earlier age at arthritis diagnosis, oligoarticular-onset disease, antinuclear antibody status, history of psoriasis) were reproduced in our study. Use of allergy medications and terms describing allergic conditions were independently associated with chronic uveitis. The association with allergy drugs when adjusted for known associations remained significant (OR 2.54, 95% CI 1.22-5.4).ConclusionsThis study shows the potential of using a validated text analytics pipeline on clinical data warehouses to examine practice-based evidence for evaluating hypotheses formed during patient care. Our study reproduces four known associations with uveitis development in juvenile idiopathic arthritis patients, and reports a new association between allergic conditions and chronic uveitis in juvenile idiopathic arthritis patients

    Review of the occupational health and safety of Britain’s ethnic minorities

    Get PDF
    This report sets out an evidence-based review on work-related health and safety issues relating to black and minority ethnic groups. Data included available statistical materials and a systematic review of published research and practice-based reports. UK South Asians are generally under-represented within the most hazardous occupational groups. They have lower accident rates overall, while Black Caribbean workers rates are similar to the general population; Bangladeshi and Chinese workers report lowest workplace injury rates UK South Asian people exhibit higher levels of limiting long-term illness (LLI) and self reported poor health than the general population while Black Africans and Chinese report lower levels. Ethnic minority workers with LLI are more likely than whites to withdraw from the workforce, or to experience lower wage rates. Some of these findings conflict with evidence of differentials from USA, Europe and Australasia, but there is a dearth of effective primary research or reliable monitoring data from UK sources. There remains a need to improve monitoring and data collection relating to black and ethnic minority populations and migrant workers. Suggestions are made relating to workshops on occupational health promotion programmes for ethnic minorities, and ethnic minority health and safety 'Beacon' sites

    Finding Statistically Significant Interactions between Continuous Features

    Full text link
    The search for higher-order feature interactions that are statistically significantly associated with a class variable is of high relevance in fields such as Genetics or Healthcare, but the combinatorial explosion of the candidate space makes this problem extremely challenging in terms of computational efficiency and proper correction for multiple testing. While recent progress has been made regarding this challenge for binary features, we here present the first solution for continuous features. We propose an algorithm which overcomes the combinatorial explosion of the search space of higher-order interactions by deriving a lower bound on the p-value for each interaction, which enables us to massively prune interactions that can never reach significance and to thereby gain more statistical power. In our experiments, our approach efficiently detects all significant interactions in a variety of synthetic and real-world datasets.Comment: 13 pages, 5 figures, 2 tables, accepted to the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019
    • …
    corecore