239,452 research outputs found
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
Empirical analysis of rough set categorical clustering techniques based on rough purity and value set
Clustering a set of objects into homogeneous groups is a fundamental operation
in data mining. Recently, attention has been put on categorical data clustering,
where data objects are made up of non-numerical attributes. The implementation of
several existing categorical clustering techniques is challenging as some are unable
to handle uncertainty and others have stability issues. In the process of dealing
with categorical data and handling uncertainty, the rough set theory has become
well-established mechanism in a wide variety of applications including databases.
The recent techniques such as Information-Theoretic Dependency Roughness (ITDR),
Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA)
outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness
(TR), Min-Min Roughness (MMR), and standard-deviation roughness (SDR). This
work explores the limitations and issues of ITDR, MDA and MSA techniques on
data sets where these techniques fails to select or faces difficulty in selecting their
best clustering attribute. Accordingly, two alternative techniques named Rough Purity
Approach (RPA) and Maximum Value Attribute (MVA) are proposed. The novelty
of both proposed approaches is that, the RPA presents a new uncertainty definition
based on purity of rough relational data base whereas, the MVA unlike other rough
set theory techniques uses the domain knowledge such as value set combined with
number of clusters (NoC). To show the significance, mathematical and theoretical
basis for proposed approaches, several propositions are illustrated. Moreover, the
recent rough categorical techniques like MDA, MSA, ITDR and classical clustering
technique like simple K-mean are used for comparison and the results are presented
in tabular and graphical forms. For experiments, data sets from previously utilized
research cases, a real supply base management (SBM) data set and UCI repository
are utilized. The results reveal significant improvement by proposed techniques for
categorical clustering in terms of purity (21%), entropy (9%), accuracy (16%), rough
accuracy (11%), iterations (99%) and time (93%).
vi
Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology
This paper presents some experiments in clustering homogeneous XMLdocuments
to validate an existing classification or more generally anorganisational
structure. Our approach integrates techniques for extracting knowledge from
documents with unsupervised classification (clustering) of documents. We focus
on the feature selection used for representing documents and its impact on the
emerging classification. We mix the selection of structured features with fine
textual selection based on syntactic characteristics.We illustrate and evaluate
this approach with a collection of Inria activity reports for the year 2003.
The objective is to cluster projects into larger groups (Themes), based on the
keywords or different chapters of these activity reports. We then compare the
results of clustering using different feature selections, with the official
theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors'
names in the bibliograph
- …