193 research outputs found

    Performance Analysis of Quickreduct, Quick Relative Reduct Algorithm and a New Proposed Algorithm

    Get PDF
    Feature Selection is a process of selecting a subset of relevant features from a huge dataset that satisfy method dependent criteria and thus minimize the cardinality and ensure that the accuracy and precision is not affected ,hence approximating the original class distribution of data from a given set of selected features. Feature selection and feature extraction are the two problems that we face when we want to select the best and important attributes from a given dataset Feature selection is a step in data mining that is done prior to other steps and is found to be very useful and effective in removing unimportant attributes so that the storage efficiency and accuracy of the dataset can be increased. From a huge pool of data available we want to extract useful and relevant information. The problem is not the unavailability of data, it is the quality of data that we lack in. We have Rough Sets Theory which is very useful in extracting relevant attributes and help to increase the importance of the information system we have. Rough set theory works on the principle of classifying similar objects into classes with respect to some features and those features may collectively and shortly be termed as reducts

    Positive region: An enhancement of partitioning attribute based rough set for categorical data

    Get PDF
    Datasets containing multi-value attributes are often involved in several domains, like pattern recognition, machine learning and data mining. Data partition is required in such cases. Partitioning attributes is the clustering process for the whole data set which is specified for further processing. Recently, there are already existing prominent rough set-based approaches available for group objects and for handling uncertainty data that use indiscernibility attribute and mean roughness measure to perform attribute partitioning. Nevertheless, most of the partitioning attribute methods for selecting partitioning attribute algorithm for categorical data in clustering datasets are incapable of optimal partitioning. This indiscernibility and mean roughness measures, however, require the calculation of the lower approximation, which has less accuracy and it is an expensive task to compute. This reduces the growth of the set of attributes and neglects the data found within the boundary region. This paper presents a new concept called the "Positive Region Based Mean Dependency (PRD)”, that calculates the attribute dependency. In order to determine the mean dependency of the attributes, that is acceptable for categorical datasets, using a positive region-based mean dependency measure, PRD defines the method. By avoiding the lower approximation, PRD is an optimal substitute for the conventional dependency measure in partitioning attribute selection. Contrary to traditional RST partitioning methods, the proposed method can be employed as a measure of data output uncertainty and as a tailback for larger and multiple data clustering. The performance of the method presented is evaluated and compared with the algorithmes of Information-Theoretical Dependence Roughness (ITDR) and Maximum Indiscernible Attribute (MIA)

    Rough sets, their extensions and applications

    Get PDF
    Rough set theory provides a useful mathematical foundation for developing automated computational systems that can help understand and make use of imperfect knowledge. Despite its recency, the theory and its extensions have been widely applied to many problems, including decision analysis, data-mining, intelligent control and pattern recognition. This paper presents an outline of the basic concepts of rough sets and their major extensions, covering variable precision, tolerance and fuzzy rough sets. It also shows the diversity of successful applications these theories have entailed, ranging from financial and business, through biological and medicine, to physical, art, and meteorological

    New rough set based maximum partitioning attribute algorithm for categorical data clustering

    Get PDF
    Clustering a set of data into homogeneous groups is a fundamental operation in data mining. Recently, consideration has been put on categorical data clustering, where the data set consists of non-numerical attributes. However, implementing several existing categorical clustering algorithms is challenging as some cannot handle uncertainty while others have stability issues. The Rough Set theory (RST) is a mathematical tool for dealing with categorical data and handling uncertainty. It is also used to identify cause-effect relationships in databases as a form of learning and data mining. Therefore, this study aims to address the issues of uncertainty and stability for categorical clustering, and it proposes an improved algorithm centred on RST. The proposed method employed the partitioning measure to calculate the information system's positive and boundary regions of attributes. Firstly, an attributes partitioning method called Positive Region-based Indiscernibility (PRI) was developed to address the uncertainty issue in attribute partitioning for categorical data. The PRI method requires the positive and boundary regions-based partitioning calculation method. Next, to address the computational complexity issue in the clustering process, a clustering attribute selection method called Maximum Mean Partitioning (MMP) is introduced by computing the mean. The MMP method selects the maximum degree of the mean attribute, and the attribute with the maximum mean partitioning value is chosen as the best clustering attribute. The integration of proposed PRI and MMP methods generated a new rough set hybrid clustering algorithm for categorical data clustering algorithm named Maximum Partitioning Attribute (MPA) algorithm. This hybrid algorithm is an all-inclusive solution for uncertainty, computational complexity, cluster purity, and higher accuracy in attribute partitioning and selecting a clustering attribute. The proposed MPA algorithm is compared against the baseline algorithms, namely Maximum Significance Attribute (MSA), Information-Theoretic Dependency Roughness (ITDR), Maximum Indiscernibility Attribute (MIA), and simple classical K-Mean. In addition, seven small data sets from previously utilized research cases and 21 UCI repository and benchmark datasets are used for validation. Finally, the results were presented in tabular and graphical form, showing the proposed MPA algorithm outperforms the baseline algorithms for all data sets. Furthermore, the results showed that the proposed MPA algorithm improves the rough accuracy against MSA, ITDR, and MIA by 54.42%. Hence, the MPA algorithm has reduced the computational complexity compared to MSA, ITDR, and MIA with 77.11% less time and 58.66% minimum iterations. Similarly, a significant percentage improvement, up to 97.35%, was observed for overall purity by the MPA algorithm against MSA, ITDR, and MIA. In addition, the increment up to 34.41% of the overall accuracy of simple K-means by MPA has been obtained. Hence, it is proven that the proposed MPA has given promising solutions to address the categorical data clustering problem

    Neural Techniques for Improving the Classification Accuracy of Microarray Data Set using Rough Set Feature Selection Method

    Get PDF
    Abstract---Classification, a data mining task is an effective method to classify the data in the process of Knowledge Data Discovery. Classification method algorithms are widely used in medical field to classify the medical data for diagnosis. Feature Selection increases the accuracy of the Classifier because it eliminates irrelevant attributes. This paper analyzes the performance of neural network classifiers with and without feature selection in terms of accuracy and efficiency to build a model on four different datasets. This paper provides rough feature selection scheme, and evaluates the relative performance of four different neural network classification procedures such as Learning Vector Quantisation (LVQ) -LVQ1, LVQ3, optimizedlearning-rate LVQ1 (OLVQ1), and The Self-Organizing Map (SOM) incorporating those methods. Experimental results show that the LVQ3 neural classification is an appropriate classification method makes it possible to construct high performance classification models for microarray data

    Port throughput influence factors based on neighborhood rough sets: an exploratory study

    Get PDF
    Purpose: The purpose of this paper is to devise a efficient method for the importance analysis on Port Throughput Influence Factors. Design/methodology/approach: Neighborhood rough sets is applied to solve the problem of selection factors. First the throughput index system is established. Then, we build the attribute reduction model using the updated numerical attribute to reduction algorithm based on neighborhood rough sets. We optimized the algorithm in order to achieve high efficiency performance. Finally, the article do empirical validation using Guangzhou Port throughput and influencing factors’ historical data of year 2000 to 2013. Findings: Through the model and algorithm, port enterprises can identify the importance of port throughput factors. It can provide support for their decisions. Research limitations: The empirical data are historical data of year 2000 to 2013. The amount of data is small. Practical implications: The results provide support for port business investment, decisions and risk control, and also provide assistance for port enterprises’ or other researchers’ throughput forecasting. Originality/value: In this paper, we establish a throughput index system, and optimize the algorithm for efficiency performance.Peer Reviewe

    The Resemblance Structure of Natural Kinds: A Formal Model for Resemblance Nominalism

    Get PDF
    278 p.The aim of this thesis is to better understand the ways natural kinds are related to each other by species-genus relations and the ways in which the members of the kind are related to each other by resemblance relations, by making use of formal models of kinds. This is done by first analysing a Minimal Conception of Natural Kinds and then reconstructing it from the ontological assumptions of Resemblance Nominalism. The questions addressed are:(1) What is the external structure of kinds' In what ways are kinds related to each other by species-genus relations'(2) What is the internal structure of kinds' In what sense are the instances of a kind similar enough to each other'According to the Minimal Conception of Kinds, kinds have two components, a set of members of the kind (the extension) and a set of natural attributes common to these objects (the intension). Several interesting features of this conception are discussed by making use of the mathematical theory of concept lattices. First, such structures provide a model for contemporary formulations of syllogistic logic. Second, kinds are ordered forming a complete lattice that follows Kant's law of the duality between extension and intension, according to which the extension of a kind is inversely related to its intension. Finally, kinds are shown to have Aristotelian definitions in terms of genera and specific differences. Overall this results in a description of the specificity relations of kinds as an algebraic calculus.According to Resemblance Nominalism, attributes or properties are classes of similar objects. Such an approach faces Goodman's companionship and imperfect community problems. In order to deal with these, a specific nominalism, namely Aristocratic Resemblance Nominalism, is chosen. According to it, attributes are classes of objects resembling a given paradigm. A model for it is introduced by making use of the mathematical theory of similarity structures and of some results on the topic of quasianalysis. Two other models (the polar model and an order-theoretic model) are considered and shown to be equivalent to the previous one.The main result is that the class of lattices of kinds that a nominalist can recover uniquely by starting from these assumptions is that of complete coatomistic lattices. Several other related results are obtained, including a generalization of the similarity model that allows for paradigms with several properties and properties with several paradigms. The conclusion is that, under nominalist assumptions, the internal structure of kinds is fixed by paradigmatic objects and the external structure of kinds is that of a coatomistic lattice that satisfies the Minimal Conception of Kinds
    • …
    corecore