7 research outputs found

    An Approach to Find Missing Values in Medical Datasets

    Full text link
    Mining medical datasets is a challenging problem before data mining researchers as these datasets have several hidden challenges compared to conventional datasets.Starting from the collection of samples through field experiments and clinical trials to performing classification,there are numerous challenges at every stage in the mining process. The preprocessing phase in the mining process itself is a challenging issue when, we work on medical datasets. One of the prime challenges in mining medical datasets is handling missing values which is part of preprocessing phase. In this paper, we address the issue of handling missing values in medical dataset consisting of categorical attribute values. The main contribution of this research is to use the proposed imputation measure to estimate and fix the missing values. We discuss a case study to demonstrate the working of proposed measure.Comment: 7 pages,ACM Digital Library, ICEMIS September 201

    Applications of Data Mining Techniques for Vehicular Ad hoc Networks

    Full text link
    Due to the recent advances in vehicular ad hoc networks (VANETs), smart applications have been incorporating the data generated from these networks to provide quality of life services. In this paper, we have proposed taxonomy of data mining techniques that have been applied in this domain in addition to a classification of these techniques. Our contribution is to highlight the research methodologies in the literature and allow for comparing among them using different characteristics. The proposed taxonomy covers elementary data mining techniques such as: preprocessing, outlier detection, clustering, and classification of data. In addition, it covers centralized, distributed, offline, and online techniques from the literature

    Contributions to Biclustering of Microarray Data Using Formal Concept Analysis

    Full text link
    Biclustering is an unsupervised data mining technique that aims to unveil patterns (biclusters) from gene expression data matrices. In the framework of this thesis, we propose new biclustering algorithms for microarray data. The latter is done using data mining techniques. The objective is to identify positively and negatively correlated biclusters. This thesis is divided into two part: In the first part, we present an overview of the pattern-mining techniques and the biclustering of microarray data. In the second part, we present our proposed biclustering algorithms where we rely on two axes. In the first axis, we initially focus on extracting biclusters of positive correlations. For this, we use both Formal Concept Analysis and Association Rules. In the second axis, we focus on the extraction of negatively correlated biclusters. The performed experimental studies highlight the very promising results offered by the proposed algorithms. Our biclustering algorithms are evaluated and compared statistically and biologically

    Characterization and extraction of condensed representation of correlated patterns based on formal concept analysis

    Full text link
    Correlated pattern mining has increasingly become an important task in data mining since these patterns allow conveying knowledge about meaningful and surprising relations among data. Frequent correlated patterns were thoroughly studied in the literature. In this thesis, we propose to benefit from both frequent correlated as well as rare correlated patterns according to the bond correlation measure. We propose to extract a subset without information loss of the sets of frequent correlated and of rare correlated patterns, this subset is called ``Condensed Representation``. In this regard, we are based on the notions derived from the Formal Concept Analysis FCA, specifically the equivalence classes associated to a closure operator fbond dedicated to the bond measure, to introduce new concise representations of both frequent correlated and rare correlated patterns

    Selection of BJI configuration: Approach based on minimal transversals

    Full text link
    Decision systems deal with a large volume of data stored in new databases called data warehouses. Data warehouses are typically modeled by a star schema that conventionally presents a central fact table and a set of dimension tables. The corresponding queries for this type of model are therefore very complex. In order to reduce the cost of executing complex queries, which contain very expensive joins, the solution envisaged would be to guarantee a good physical design of the data warehouses. Binary join indexes are very suitable to reduce the cost of executing these joins. In this work, we proposed a binary join index selection approach based on the notion of minimal transversal. The final configuration obtained is composed of several indexes, which make it possible to optimize the execution cost of the query set.Comment: Masters thesis (2017) supervised by Sadok Ben Yahia and Mohamed Nidhal Jelassi, in French. arXiv admin note: text overlap with arXiv:1902.00911 by other author

    Une nouvelle approche de compl\'etion des valeurs manquantes dans les bases de donn\'ees

    Full text link
    When tackling real-life datasets, it is common to face the existence of scrambled missing values within data. Considered as 'dirty data', usually it is removed during a pre-processing step. Starting from the fact that 'making up this missing data is better than throwing out it away', we present a new approach trying to complete missing data. The main singularity of the introduced approach is that it sheds light on a fruitful synergy between generic basis of association rules and the topic of missing values handling. In fact, beyond interesting compactness rate, such generic association rules make it possible to get a considerable reduction of conflicts during the completion step. A new metric called 'Robustness' is also introduced, and aims to select the robust association rule for the completion of a missing value whenever a conflict appears. Carried out experiments on benchmark datasets confirm the soundness of our approach. Thus, it reduces conflict during the completion step while offering a high percentage of correct completion accuracy.Comment: in Frenc

    Motifs corr\'el\'es rares : Caract\'erisation et nouvelles repr\'esentations concises

    Full text link
    Recently, rare pattern mining proves to be of added-value in different data mining applications since these patterns allow conveying knowledge on rare and unexpected events. However, the extraction of rare patterns suffers from two main limits, namely the large number of mined patterns in real-life applications, as well as the low informativeness quality of several rare patterns. In this situation, we propose to use the correlation measure, bond, in the mining process in order to only retain those rare patterns having a certain degree of correlation between their respective items. A characterization of the resulting set, of rare correlated patterns, is then carried out based on the study of constraints of distinct types induced by the rarity and the correlation. In addition, based on the equivalence classes associated to a closure operator dedicated to the bond measure, we propose concise representations of rare correlated patterns. We then design a new algorithm CRP_Miner dedicated to the extraction of the whole set of rare correlated patterns. We also introduce the CRPR_Miner algorithm allowing an efficient extraction of the proposed concise representations. In addition, we design two other algorithms which allow to us the query and the regeneration of the whole set of rare correlated patterns. The carried out experimental studies show the effectiveness of the algorithm CRPR_Miner and prove the compactness rate offered by the proposed concise representations.Comment: in French. Master's thesis 201
    corecore