38 research outputs found

    A combined data mining approach using rough set theory and case-based reasoning in medical datasets

    Get PDF
    Case-based reasoning (CBR) is the process of solving new cases by retrieving the most relevant ones from an existing knowledge-base. Since, irrelevant or redundant features not only remarkably increase memory requirements but also the time complexity of the case retrieval, reducing the number of dimensions is an issue worth considering. This paper uses rough set theory (RST) in order to reduce the number of dimensions in a CBR classifier with the aim of increasing accuracy and efficiency. CBR exploits a distance based co-occurrence of categorical data to measure similarity of cases. This distance is based on the proportional distribution of different categorical values of features. The weight used for a feature is the average of co-occurrence values of the features. The combination of RST and CBR has been applied to real categorical datasets of Wisconsin Breast Cancer, Lymphography, and Primary cancer. The 5-fold cross validation method is used to evaluate the performance of the proposed approach. The results show that this combined approach lowers computational costs and improves performance metrics including accuracy and interpretability compared to other approaches developed in the literature

    Construction and optimization of partial decision rules

    Get PDF
    Tematyka pracy zwi膮zana jest z badaniem algorytm贸w zach艂annych dla konstruowania i optymalizacji cz臋艣ciowych (przybli偶onych) regu艂 decyzyjnych. Przedstawione w pracy badania dotycz膮ce cz臋艣ciowych regu艂 decyzyjnych opieraj膮 si臋 na wynikach badan uzyskanych dla problemu cz臋艣ciowego pokrycia zbioru. Zosta艂o udowodnione, ze bior膮c pod uwag臋 pewne za艂o偶enia dotycz膮ce klasy NP, algorytm zach艂anny pozwala uzyska膰 wyniki, bliskie wynikom uzyskiwanym przez najlepsze przybli偶one wielomianowe algorytmy, dla minimalizacji d艂ugo艣ci cz臋艣ciowych regu艂 decyzyjnych oraz minimalizacji ca艂kowitej wagi atrybut贸w tworz膮cych cz臋艣ciow膮 regu艂臋 decyzyjn膮. Na podstawie danych uzyskanych podczas pracy algorytmu zach艂annego, dokonano oszacowania najlepszych g贸rnych i dolnych granic minimalnej z艂o偶ono艣ci cz臋艣ciowych regu艂 decyzyjnych. Teoretyczne i eksperymentalne wyniki badan pokaza艂y mo偶liwo艣ci wykorzystania tych granic w praktycznych zastosowaniach. Dokonano tak偶e oszacowania granicy dok艂adno艣ci algorytmu zach艂annego dla generowania cz臋艣ciowych regu艂 decyzyjnych, kt贸ra nie zale偶y od liczby wierszy w rozwa偶anej tablicy decyzyjnej. Bior膮c pod uwag臋 pewne za艂o偶enia dotycz膮ce liczby wierszy i kolumn w tablicach decyzyjnych udowodniono, ze dla wi臋kszo艣ci binarnych tablic decyzyjnych istniej膮 tylko kr贸tkie, nieredukowalne cz臋艣ciowe regu艂y decyzyjne. Wyniki przeprowadzonych eksperyment贸w pozwoli艂y potwierdzi膰 0.5-hipoteze: dla wi臋kszo艣ci tablic decyzyjnych algorytm zach艂anny w ka偶dej iteracji, podczas generowania cz臋艣ciowej regu艂y wybiera atrybut, kt贸ry pozwala oddzieli膰 przynajmniej 50% wierszy jeszcze nie oddzielonych. W przypadku klasyfikacji okaza艂o si臋, 偶e dok艂adno艣膰 klasyfikator贸w opartych na cz臋艣ciowych regu艂ach decyzyjnych jest cz臋sto lepsza, ni偶 dok艂adno艣膰 klasyfikator贸w opartych na dok艂adnych regu艂ach decyzyjnych

    Feature Selection Inspired Classifier Ensemble Reduction

    Get PDF
    Classifier ensembles constitute one of the main research directions in machine learning and data mining. The use of multiple classifiers generally allows better predictive performance than that achievable with a single model. Several approaches exist in the literature that provide means to construct and aggregate such ensembles. However, these ensemble systems contain redundant members that, if removed, may further increase group diversity and produce better results. Smaller ensembles also relax the memory and storage requirements, reducing system's run-time overhead while improving overall efficiency. This paper extends the ideas developed for feature selection problems to support classifier ensemble reduction, by transforming ensemble predictions into training samples, and treating classifiers as features. Also, the global heuristic harmony search is used to select a reduced subset of such artificial features, while attempting to maximize the feature subset evaluation. The resulting technique is systematically evaluated using high dimensional and large sized benchmark datasets, showing a superior classification performance against both original, unreduced ensembles, and randomly formed subsets. ? 2013 IEEE

    Rough set based ensemble classifier for web page classification

    Get PDF
    Combining the results of a number of individually trained classification systems to obtain a more accurate classifier is a widely used technique in pattern recognition. In this article, we have introduced a rough set based meta classifier to classify web pages. The proposed method consists of two parts. In the first part, the output of every individual classifier is considered for constructing a decision table. In the second part, rough set attribute reduction and rule generation processes are used on the decision table to construct a meta classifier. It has been shown that (1) the performance of the meta classifier is better than the performance of every constituent classifier and, (2) the meta classifier is optimal with respect to a quality measure defined in the article. Experimental studies show that the meta classifier improves accuracy of classification uniformly over some benchmark corpora and beats other ensemble approaches in accuracy by a decisive margin, thus demonstrating the theoretical results. Apart from this, it reduces the CPU load compared to other ensemble classification techniques by removing redundant classifiers from the combination

    Active Sample Selection Based Incremental Algorithm for Attribute Reduction with Rough Sets

    Get PDF
    Attribute reduction with rough sets is an effective technique for obtaining a compact and informative attribute set from a given dataset. However, traditional algorithms have no explicit provision for handling dynamic datasets where data present themselves in successive samples. Incremental algorithms for attribute reduction with rough sets have been recently introduced to handle dynamic datasets with large samples, though they have high complexity in time and space. To address the time/space complexity issue of the algorithms, this paper presents a novel incremental algorithm for attribute reduction with rough sets based on the adoption of an active sample selection process and an insight into the attribute reduction process. This algorithm first decides whether each incoming sample is useful with respect to the current dataset by the active sample selection process. A useless sample is discarded while a useful sample is selected to update a reduct. At the arrival of a useful sample, the attribute reduction process is then employed to guide how to add and/or delete attributes in the current reduct. The two processes thus constitute the theoretical framework of our algorithm. The proposed algorithm is finally experimentally shown to be efficient in time and space

    Recent advances in the theory and practice of logical analysis of data

    Get PDF
    Logical Analysis of Data (LAD) is a data analysis methodology introduced by Peter L. Hammer in 1986. LAD distinguishes itself from other classification and machine learning methods by the fact that it analyzes a significant subset of combinations of variables to describe the positive or negative nature of an observation and uses combinatorial techniques to extract models defined in terms of patterns. In recent years, the methodology has tremendously advanced through numerous theoretical developments and practical applications. In the present paper, we review the methodology and its recent advances, describe novel applications in engineering, finance, health care, and algorithmic techniques for some stochastic optimization problems, and provide a comparative description of LAD with well-known classification methods
    corecore