7 research outputs found
An Approach to Find Missing Values in Medical Datasets
Mining medical datasets is a challenging problem before data mining
researchers as these datasets have several hidden challenges compared to
conventional datasets.Starting from the collection of samples through field
experiments and clinical trials to performing classification,there are numerous
challenges at every stage in the mining process. The preprocessing phase in the
mining process itself is a challenging issue when, we work on medical datasets.
One of the prime challenges in mining medical datasets is handling missing
values which is part of preprocessing phase. In this paper, we address the
issue of handling missing values in medical dataset consisting of categorical
attribute values. The main contribution of this research is to use the proposed
imputation measure to estimate and fix the missing values. We discuss a case
study to demonstrate the working of proposed measure.Comment: 7 pages,ACM Digital Library, ICEMIS September 201
Applications of Data Mining Techniques for Vehicular Ad hoc Networks
Due to the recent advances in vehicular ad hoc networks (VANETs), smart
applications have been incorporating the data generated from these networks to
provide quality of life services. In this paper, we have proposed taxonomy of
data mining techniques that have been applied in this domain in addition to a
classification of these techniques. Our contribution is to highlight the
research methodologies in the literature and allow for comparing among them
using different characteristics. The proposed taxonomy covers elementary data
mining techniques such as: preprocessing, outlier detection, clustering, and
classification of data. In addition, it covers centralized, distributed,
offline, and online techniques from the literature
Contributions to Biclustering of Microarray Data Using Formal Concept Analysis
Biclustering is an unsupervised data mining technique that aims to unveil
patterns (biclusters) from gene expression data matrices. In the framework of
this thesis, we propose new biclustering algorithms for microarray data. The
latter is done using data mining techniques. The objective is to identify
positively and negatively correlated biclusters.
This thesis is divided into two part: In the first part, we present an
overview of the pattern-mining techniques and the biclustering of microarray
data. In the second part, we present our proposed biclustering algorithms where
we rely on two axes. In the first axis, we initially focus on extracting
biclusters of positive correlations. For this, we use both Formal Concept
Analysis and Association Rules. In the second axis, we focus on the extraction
of negatively correlated biclusters.
The performed experimental studies highlight the very promising results
offered by the proposed algorithms. Our biclustering algorithms are evaluated
and compared statistically and biologically
Characterization and extraction of condensed representation of correlated patterns based on formal concept analysis
Correlated pattern mining has increasingly become an important task in data
mining since these patterns allow conveying knowledge about meaningful and
surprising relations among data. Frequent correlated patterns were thoroughly
studied in the literature. In this thesis, we propose to benefit from both
frequent correlated as well as rare correlated patterns according to the bond
correlation measure. We propose to extract a subset without information loss of
the sets of frequent correlated and of rare correlated patterns, this subset is
called ``Condensed Representation``. In this regard, we are based on the
notions derived from the Formal Concept Analysis FCA, specifically the
equivalence classes associated to a closure operator fbond dedicated to the
bond measure, to introduce new concise representations of both frequent
correlated and rare correlated patterns
Selection of BJI configuration: Approach based on minimal transversals
Decision systems deal with a large volume of data stored in new databases
called data warehouses. Data warehouses are typically modeled by a star schema
that conventionally presents a central fact table and a set of dimension
tables. The corresponding queries for this type of model are therefore very
complex. In order to reduce the cost of executing complex queries, which
contain very expensive joins, the solution envisaged would be to guarantee a
good physical design of the data warehouses. Binary join indexes are very
suitable to reduce the cost of executing these joins. In this work, we proposed
a binary join index selection approach based on the notion of minimal
transversal. The final configuration obtained is composed of several indexes,
which make it possible to optimize the execution cost of the query set.Comment: Masters thesis (2017) supervised by Sadok Ben Yahia and Mohamed
Nidhal Jelassi, in French. arXiv admin note: text overlap with
arXiv:1902.00911 by other author
Une nouvelle approche de compl\'etion des valeurs manquantes dans les bases de donn\'ees
When tackling real-life datasets, it is common to face the existence of
scrambled missing values within data. Considered as 'dirty data', usually it is
removed during a pre-processing step. Starting from the fact that 'making up
this missing data is better than throwing out it away', we present a new
approach trying to complete missing data. The main singularity of the
introduced approach is that it sheds light on a fruitful synergy between
generic basis of association rules and the topic of missing values handling. In
fact, beyond interesting compactness rate, such generic association rules make
it possible to get a considerable reduction of conflicts during the completion
step. A new metric called 'Robustness' is also introduced, and aims to select
the robust association rule for the completion of a missing value whenever a
conflict appears. Carried out experiments on benchmark datasets confirm the
soundness of our approach. Thus, it reduces conflict during the completion step
while offering a high percentage of correct completion accuracy.Comment: in Frenc
Motifs corr\'el\'es rares : Caract\'erisation et nouvelles repr\'esentations concises
Recently, rare pattern mining proves to be of added-value in different data
mining applications since these patterns allow conveying knowledge on rare and
unexpected events. However, the extraction of rare patterns suffers from two
main limits, namely the large number of mined patterns in real-life
applications, as well as the low informativeness quality of several rare
patterns. In this situation, we propose to use the correlation measure, bond,
in the mining process in order to only retain those rare patterns having a
certain degree of correlation between their respective items. A
characterization of the resulting set, of rare correlated patterns, is then
carried out based on the study of constraints of distinct types induced by the
rarity and the correlation. In addition, based on the equivalence classes
associated to a closure operator dedicated to the bond measure, we propose
concise representations of rare correlated patterns. We then design a new
algorithm CRP_Miner dedicated to the extraction of the whole set of rare
correlated patterns. We also introduce the CRPR_Miner algorithm allowing an
efficient extraction of the proposed concise representations. In addition, we
design two other algorithms which allow to us the query and the regeneration of
the whole set of rare correlated patterns. The carried out experimental studies
show the effectiveness of the algorithm CRPR_Miner and prove the compactness
rate offered by the proposed concise representations.Comment: in French. Master's thesis 201