21,994 research outputs found
Specious rules: an efficient and effective unifying method for removing misleading and uninformative patterns in association rule mining
We present theoretical analysis and a suite of tests and procedures for
addressing a broad class of redundant and misleading association rules we call
\emph{specious rules}. Specious dependencies, also known as \emph{spurious},
\emph{apparent}, or \emph{illusory associations}, refer to a well-known
phenomenon where marginal dependencies are merely products of interactions with
other variables and disappear when conditioned on those variables.
The most extreme example is Yule-Simpson's paradox where two variables
present positive dependence in the marginal contingency table but negative in
all partial tables defined by different levels of a confounding factor. It is
accepted wisdom that in data of any nontrivial dimensionality it is infeasible
to control for all of the exponentially many possible confounds of this nature.
In this paper, we consider the problem of specious dependencies in the context
of statistical association rule mining. We define specious rules and show they
offer a unifying framework which covers many types of previously proposed
redundant or misleading association rules. After theoretical analysis, we
introduce practical algorithms for detecting and pruning out specious
association rules efficiently under many key goodness measures, including
mutual information and exact hypergeometric probabilities. We demonstrate that
the procedure greatly reduces the number of associations discovered, providing
an elegant and effective solution to the problem of association mining
discovering large numbers of misleading and redundant rules.Comment: Note: This is a corrected version of the paper published in SDM'17.
In the equation on page 4, the range of the sum has been correcte
On the Optimization of Visualizations of Complex Phenomena
The problem of perceptually optimizing complex visualizations is a difficult one, involving perceptual as well as aesthetic issues. In our experience, controlled experiments are quite limited in their ability to uncover interrelationships among visualization parameters, and thus may not be the most useful way to develop rules-of-thumb or theory to guide the production of high-quality visualizations. In this paper, we propose a new experimental approach to optimizing visualization quality that integrates some of the strong points of controlled experiments with methods more suited to investigating complex highly-coupled phenomena. We use human-in-the-loop experiments to search through visualization parameter space, generating large databases of rated visualization solutions. This is followed by data mining to extract results such as exemplar visualizations, guidelines for producing visualizations, and hypotheses about strategies leading to strong visualizations. The approach can easily address both perceptual and aesthetic concerns, and can handle complex parameter interactions. We suggest a genetic algorithm as a valuable way of guiding the human-in-the-loop search through visualization parameter space. We describe our methods for using clustering, histogramming, principal component analysis, and neural networks for data mining. The experimental approach is illustrated with a study of the problem of optimal texturing for viewing layered surfaces so that both surfaces are maximally observable
Dietary garlic and hip osteoarthritis: evidence of a protective effect and putative mechanism of action
Background Patterns of food intake and prevalent osteoarthritis of the hand, hip, and knee were studied using the twin design to limit the effect of confounding factors. Compounds found in associated food groups were further studied in vitro. Methods Cross-sectional study conducted in a large population-based volunteer cohort of twins. Food intake was evaluated using the Food Frequency Questionnaire; OA was determined using plain radiographs. Analyses were adjusted for age, BMI and physical activity. Subsequent in vitro studies examined the effects of allium-derived compounds on the expression of matrix-degrading proteases in SW1353 chondrosarcoma cells. Results Data were available, depending on phenotype, for 654-1082 of 1086 female twins (median age 58.9 years; range 46-77). Trends in dietary analysis revealed a specific pattern of dietary intake, that high in fruit and vegetables, showed an inverse association with hip OA (p = 0.022). Consumption of 'non-citrus fruit' (p = 0.015) and 'alliums' (p = 0.029) had the strongest protective effect. Alliums contain diallyl disulphide which was shown to abrogate cytokine-induced matrix metalloproteinase expression. Conclusions Studies of diet are notorious for their confounding by lifestyle effects. While taking account of BMI, the data show an independent effect of a diet high in fruit and vegetables, suggesting it to be protective against radiographic hip OA. Furthermore, diallyl disulphide, a compound found in garlic and other alliums, represses the expression of matrix-degrading proteases in chondrocyte-like cells, providing a potential mechanism of action
Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research.
BackgroundJuvenile idiopathic arthritis is the most common rheumatic disease in children. Chronic uveitis is a common and serious comorbid condition of juvenile idiopathic arthritis, with insidious presentation and potential to cause blindness. Knowledge of clinical associations will improve risk stratification. Based on clinical observation, we hypothesized that allergic conditions are associated with chronic uveitis in juvenile idiopathic arthritis patients.MethodsThis study is a retrospective cohort study using Stanford's clinical data warehouse containing data from Lucile Packard Children's Hospital from 2000-2011 to analyze patient characteristics associated with chronic uveitis in a large juvenile idiopathic arthritis cohort. Clinical notes in patients under 16Â years of age were processed via a validated text analytics pipeline. Bivariate-associated variables were used in a multivariate logistic regression adjusted for age, gender, and race. Previously reported associations were evaluated to validate our methods. The main outcome measure was presence of terms indicating allergy or allergy medications use overrepresented in juvenile idiopathic arthritis patients with chronic uveitis. Residual text features were then used in unsupervised hierarchical clustering to compare clinical text similarity between patients with and without uveitis.ResultsPreviously reported associations with uveitis in juvenile idiopathic arthritis patients (earlier age at arthritis diagnosis, oligoarticular-onset disease, antinuclear antibody status, history of psoriasis) were reproduced in our study. Use of allergy medications and terms describing allergic conditions were independently associated with chronic uveitis. The association with allergy drugs when adjusted for known associations remained significant (OR 2.54, 95% CI 1.22-5.4).ConclusionsThis study shows the potential of using a validated text analytics pipeline on clinical data warehouses to examine practice-based evidence for evaluating hypotheses formed during patient care. Our study reproduces four known associations with uveitis development in juvenile idiopathic arthritis patients, and reports a new association between allergic conditions and chronic uveitis in juvenile idiopathic arthritis patients
Review of the occupational health and safety of Britain’s ethnic minorities
This report sets out an evidence-based review on work-related health and safety issues relating to black and
minority ethnic groups. Data included available statistical materials and a systematic review of published research
and practice-based reports.
UK South Asians are generally under-represented within the most hazardous occupational groups. They have
lower accident rates overall, while Black Caribbean workers rates are similar to the general population;
Bangladeshi and Chinese workers report lowest workplace injury rates
UK South Asian people exhibit higher levels of limiting long-term illness (LLI) and self reported poor health than the
general population while Black Africans and Chinese report lower levels. Ethnic minority workers with LLI are more
likely than whites to withdraw from the workforce, or to experience lower wage rates.
Some of these findings conflict with evidence of differentials from USA, Europe and Australasia, but there is a
dearth of effective primary research or reliable monitoring data from UK sources.
There remains a need to improve monitoring and data collection relating to black and ethnic minority populations
and migrant workers. Suggestions are made relating to workshops on occupational health promotion programmes
for ethnic minorities, and ethnic minority health and safety 'Beacon' sites
Finding Statistically Significant Interactions between Continuous Features
The search for higher-order feature interactions that are statistically
significantly associated with a class variable is of high relevance in fields
such as Genetics or Healthcare, but the combinatorial explosion of the
candidate space makes this problem extremely challenging in terms of
computational efficiency and proper correction for multiple testing. While
recent progress has been made regarding this challenge for binary features, we
here present the first solution for continuous features. We propose an
algorithm which overcomes the combinatorial explosion of the search space of
higher-order interactions by deriving a lower bound on the p-value for each
interaction, which enables us to massively prune interactions that can never
reach significance and to thereby gain more statistical power. In our
experiments, our approach efficiently detects all significant interactions in a
variety of synthetic and real-world datasets.Comment: 13 pages, 5 figures, 2 tables, accepted to the 28th International
Joint Conference on Artificial Intelligence (IJCAI 2019
- …