16,206 research outputs found

    Performance Analysis of Quickreduct, Quick Relative Reduct Algorithm and a New Proposed Algorithm

    Get PDF
    Feature Selection is a process of selecting a subset of relevant features from a huge dataset that satisfy method dependent criteria and thus minimize the cardinality and ensure that the accuracy and precision is not affected ,hence approximating the original class distribution of data from a given set of selected features. Feature selection and feature extraction are the two problems that we face when we want to select the best and important attributes from a given dataset Feature selection is a step in data mining that is done prior to other steps and is found to be very useful and effective in removing unimportant attributes so that the storage efficiency and accuracy of the dataset can be increased. From a huge pool of data available we want to extract useful and relevant information. The problem is not the unavailability of data, it is the quality of data that we lack in. We have Rough Sets Theory which is very useful in extracting relevant attributes and help to increase the importance of the information system we have. Rough set theory works on the principle of classifying similar objects into classes with respect to some features and those features may collectively and shortly be termed as reducts

    Positive region: An enhancement of partitioning attribute based rough set for categorical data

    Get PDF
    Datasets containing multi-value attributes are often involved in several domains, like pattern recognition, machine learning and data mining. Data partition is required in such cases. Partitioning attributes is the clustering process for the whole data set which is specified for further processing. Recently, there are already existing prominent rough set-based approaches available for group objects and for handling uncertainty data that use indiscernibility attribute and mean roughness measure to perform attribute partitioning. Nevertheless, most of the partitioning attribute methods for selecting partitioning attribute algorithm for categorical data in clustering datasets are incapable of optimal partitioning. This indiscernibility and mean roughness measures, however, require the calculation of the lower approximation, which has less accuracy and it is an expensive task to compute. This reduces the growth of the set of attributes and neglects the data found within the boundary region. This paper presents a new concept called the "Positive Region Based Mean Dependency (PRD)”, that calculates the attribute dependency. In order to determine the mean dependency of the attributes, that is acceptable for categorical datasets, using a positive region-based mean dependency measure, PRD defines the method. By avoiding the lower approximation, PRD is an optimal substitute for the conventional dependency measure in partitioning attribute selection. Contrary to traditional RST partitioning methods, the proposed method can be employed as a measure of data output uncertainty and as a tailback for larger and multiple data clustering. The performance of the method presented is evaluated and compared with the algorithmes of Information-Theoretical Dependence Roughness (ITDR) and Maximum Indiscernible Attribute (MIA)

    Approximation Measures for Conditional Functional Dependencies Using Stripped Conditional Partitions

    Get PDF
    Conditional functional dependencies (CFDs) have been used to improve the quality of data, including detecting and repairing data inconsistencies. Approximation measures have significant importance for data dependencies in data mining. To adapt to exceptions in real data, the measures are used to relax the strictness of CFDs for more generalized dependencies, called approximate conditional functional dependencies (ACFDs). This paper analyzes the weaknesses of dependency degree, confidence and conviction measures for general CFDs (constant and variable CFDs). A new measure for general CFDs based on incomplete knowledge granularity is proposed to measure the approximation of these dependencies as well as the distribution of data tuples into the conditional equivalence classes. Finally, the effectiveness of stripped conditional partitions and this new measure are evaluated on synthetic and real data sets. These results are important to the study of theory of approximation dependencies and improvement of discovery algorithms of CFDs and ACFDs

    Rough sets, their extensions and applications

    Get PDF
    Rough set theory provides a useful mathematical foundation for developing automated computational systems that can help understand and make use of imperfect knowledge. Despite its recency, the theory and its extensions have been widely applied to many problems, including decision analysis, data-mining, intelligent control and pattern recognition. This paper presents an outline of the basic concepts of rough sets and their major extensions, covering variable precision, tolerance and fuzzy rough sets. It also shows the diversity of successful applications these theories have entailed, ranging from financial and business, through biological and medicine, to physical, art, and meteorological

    Rough approximation quality revisited

    Get PDF
    AbstractIn rough set theory, the approximation quality γ is the traditional measure to evaluate the classification success of attributes in terms of a numerical evaluation of the dependency properties generated by these attributes. In this paper we re-interpret the classical γ in terms of a classic measure based on sets, the Marczewski–Steinhaus metric, and also in terms of “proportional reduction of errors” (PRE) measures. We also exhibit infinitely many possibilities to define γ-like statistics which are meaningful in situations different from the classical one, and provide tools to ascertain the statistical significance of the proposed measures, which are valid for any kind of sample

    Adaptive quick reduct for feature drift detection

    Get PDF
    Data streams are ubiquitous and related to the proliferation of low-cost mobile devices, sensors, wireless networks and the Internet of Things. While it is well known that complex phenomena are not stationary and exhibit a concept drift when observed for a sufficiently long time, relatively few studies have addressed the related problem of feature drift. In this paper, a variation of the QuickReduct algorithm suitable to process data streams is proposed and tested: it builds an evolving reduct that dynamically selects the relevant features in the stream, removing the redundant ones and adding the newly relevant ones as soon as they become such. Tests on five publicly available datasets with an artificially injected drift have confirmed the effectiveness of the proposed method
    corecore