Search CORE

2,475 research outputs found

Post-processing of association rules.

Author: Baesens Bart
Vanthienen Jan
Viaene Stijn
Publication venue
Publication date
Field of study

In this paper, we situate and motivate the need for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases process. Major research effort has already been devoted to optimising the initially proposed mining algorithms. When it comes to effectively extrapolating the most interesting knowledge nuggets from the standard output of these algorithms, one is faced with an extreme challenge, since it is not uncommon to be confronted with a vast amount of association rules after running the algorithms. The sheer multitude of generated rules often clouds the perception of the interpreters. Rightful assessment of the usefulness of the generated output introduces the need to effectively deal with different forms of data redundancy and data being plainly uninteresting. In order to do so, we will give a tentative overview of some of the main post-processing tasks, taking into account the efforts that have already been reported in the literature.

Research Papers in Economics

Would you like to add a weight after this blood pressure, doctor? Discovery of potentially actionable associations between the provision of multiple screens in primary care

Author: Aliarzadeh Babak
Greiver Michelle
Grunfeld Eva
Kalia Sumeet
Meaney Christopher
Moineddin Rahim
Sullivan Frank
Zhao Xu
Publication venue: 'Wiley'
Publication date: 19/01/2018
Field of study

The CPCSSN was funded through a contribution agreement with the Public Health Agency of Canada.Rationale, aims, and objective: Guidelines recommend screening for risk factors associated with chronic diseases but current electronic prompts have limited effects. Our objective was to discover and rank associations between the presence of screens to plan more efficient prompts in primary care. Methods: Risk factors with the greatest impact on chronic diseases are associated with blood pressure, body mass index, waist circumference, glycaemic and lipid levels, smoking, alcohol use, diet, and exercise. We looked for associations between the presence of screens for these in electronic medical records. We used association rule mining to describe relationships among items, factor analysis to find latent categories, and Cronbach α to quantify consistency within latent categories. Results: Data from 92 140 patients in or around Toronto, Ontario, were included. We found positive correlations (lift >1) between the presence of all screens. The presence of any screen was associated with confidence greater than 80% that other data on items with high prevalence (blood pressure, glycaemic and lipid levels, or smoking) would also be present. A cluster of rules predicting the presence of blood pressure were ranked highest using measures of interestingness such as standardized lift. We found 3 latent categories using factor analysis; these were laboratory tests, vital signs, and lifestyle factors; Cronbach α ranged between .58 for lifestyle factors and .88 for laboratory tests. Conclusions: Associations between the provision of important screens can be discovered and ranked. Rules with promising combinations of associated screens could be used to implement data driven alerts.Publisher PDFPeer reviewe

University of St. Andrews - Pure

St Andrews Research Repository

Knowledge Discovery from data of spare parts transactions in the energy industry

Author: Jovelson Aguilar Sabino Junior
Publication venue
Publication date: 14/09/2022
Field of study

Repositório Aberto da Universidade do Porto

Exploring Decomposition for Solving Pattern Mining Problems

Author: Djenouri Youcef
Lin Jerry Chun-Wei
Nørvåg Kjetil
Ramampiaro Heri
Yu Philip S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluation show that the CBPM provides a reduction in both the runtime and memory usage. Also, CBPM based on the approximate strategy provides good accuracy, demonstrating its effectiveness and feasibility. Our GPU implementation achieves significant speedup of up to 552× on a single GPU using big transaction databases.publishedVersio

SINTEF Open

"CHARACTERIZATION OF SLAUGHTERED AND NON-SLAUGHTERED GOAT MEAT AT LOW FREQUENCIES"

Author: A. MOHMMAD ROAA
Publication venue
Publication date: 01/01/2012
Field of study

The electrical stimulation of meat has a high potential for use in the quality control of meat tissues during the past two decades. Dielectric spectroscopy is the most used technique to measure the electrical properties of tissues. Open ended coaxial cable or two parallel plates integrated with network analyzer, impedance analyzer or LCZ meter have been used to measure the dielectric properties of meat for different purposes. The purpose of this research is to construct a capacitive device capable of differentiating slaughtered and non-slaughtered goat meats, by determining the dielectric properties of goat meat at various frequencies and storage times. The detector cell has two circular platinum plates assembled on the micrometer barrel encased within a perspex box material to form the capacitor. The test rig is validated to insure it is working well. Two goats were slaughtered in the same environment. One of the goats was slaughtered properly (Islamic method) and the second one was killed by garrote. The measurements were done on the hindlimb muscles. The sizes of samples were 2 em diameter and 5 mm thick. The slaughtered and non-slaughtered meat samples were separately placed between the capacitor plates. The capacitance and dissipation factor were measured across the capacitor device which was connected to a LCR meter. The experiment was repeated for various frequencies (from I 00 Hz to 2 kHz), and at different storage times (at I day after slaughtering to 10 days). Maxwell Garnett mixing rule was applied to obtain the theoretical value of the effective permittivity by using goat muscle and blood permittivity. The results show that the device is able to differentiate slaughtered and non-slaughtered goat meat. At all applied frequencies, the relative permittivity of the non-slaughtered meat were clearly more than the relative permittivity of the slaughtered meat which agrees with the simulation results. The dissipation factor of the non-slaughtered meat was less than the dissipation factor of the slaughtered meat

UTPedia

The Minimum Description Length Principle for Pattern Mining: A Survey

Author: Galbrun Esther
Publication venue
Publication date: 28/07/2021
Field of study

This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

arXiv.org e-Print Archive

Data mining techniques for complex application domains

Author: Mahoto NAEEM AHMED
Publication venue: Politecnico di Torino
Publication date
Field of study

The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains. The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis. The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments. The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)