5,790 research outputs found
Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns
Understanding customer buying patterns is of great interest to the retail
industry and has shown to benefit a wide variety of goals ranging from managing
stocks to implementing loyalty programs. Association rule mining is a common
technique for extracting correlations such as "people in the South of France
buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour
bread." Unfortunately, sifting through a high number of buying patterns is not
useful in practice, because of the predominance of popular products in the top
rules. As a result, a number of "interestingness" measures (over 30) have been
proposed to rank rules. However, there is no agreement on which measures are
more appropriate for retail data. Moreover, since pattern mining algorithms
output thousands of association rules for each product, the ability for an
analyst to rely on ranking measures to identify the most interesting ones is
crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a
framework that provides analysts with the ability to compare the outcome of
interestingness measures applied to buying patterns in the retail industry. We
report on how we used CAPA to compare 34 measures applied to over 1,800 stores
of Intermarch\'e, one of the largest food retailers in France
arules - A Computational Environment for Mining Association Rules and Frequent Item Sets
Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
Recommended from our members
Mining High Impact Combinations of Conditions from the Medical Expenditure Panel Survey
The condition of multimorbidity — the presence of two or more medical conditions in an individual — is a growing phenomenon worldwide. In the United States, multimorbid patients represent more than a third of the population and the trend is steadily increasing in an already aging population. There is thus a pressing need to understand the patterns in which multimorbidity occurs, and to better understand the nature of the care that is required to be provided to such patients.
In this thesis, we use data from the Medical Expenditure Panel Survey (MEPS) from the years 2011 to 2015 to identify combinations of multiple chronic conditions (MCCs). We first quantify the significant heterogeneity observed in these combinations and how often they are observed across the five years. Next, using two criteria associated with each combination -- (a) the annual prevalence and (b) the annual median expenditure -- along with the concept of non-dominated Pareto fronts, we determine the degree of impact each combination has on the healthcare system. Our analysis reveals that combinations of four or more conditions are often mixtures of diseases that belong to different clinically meaningful groupings such as the metabolic disorders (diabetes, hypertension, hyperlipidemia); musculoskeletal conditions (osteoarthritis, spondylosis, back problems etc.); respiratory disorders (asthma, COPD etc.); heart conditions (atherosclerosis, myocardial infarction); and mental health conditions (anxiety disorders, depression etc.).
Next, we use unsupervised learning techniques such as association rule mining and hierarchical clustering to visually explore the strength of the relationships/associations between different conditions and condition groupings. This interactive framework allows epidemiologists and clinicians (in particular primary care physicians) to have a systematic approach to understand the relationships between conditions and build a strategy with regards to screening, diagnosis and treatment over a longer term, especially for individuals at risk for more complications. The findings from this study aim to create a foundation for future work where a more holistic view of multimorbidity is possible
Rule Induction-Based Knowledge Discovery for Energy Efficiency
Rule induction is a practical approach to knowledge discovery. Provided that a problem is developed, rule induction is able to return the knowledge that addresses the goal of this problem as if-then rules. The primary goals of knowledge discovery are for prediction and description. The rule format knowledge representation is easily understandable so as to enable users to make decisions. This paper presents the potential of rule induction for energy efficiency. In particular, three rule induction techniques are applied to derive knowledge from a dataset of thousands of Irish electricity customers’ time-series power consumption records, socio-demographic details, and other information, in order to address the following four problems: 1) discovering mathematically interesting knowledge that could be found useful; 2) estimating power consumption features for customers, so that personalized tariffs can be assigned; 3) targeting a subgroup of customers with high potential for peak demand shifting; and 4) identifying customer attitudes that dominate energy conservation
- …