34 research outputs found

    Propositional Satisfiability Method in Rough Classification Modeling for Data Mining

    Get PDF
    The fundamental problem in data mining is whether the whole information available is always necessary to represent the information system (IS). The goal of data mining is to find rules that model the world sufficiently well. These rules consist of conditions over attributes value pairs called description and classification of decision attribute. However, the set of all decision rules generated from all conditional attributes can be too large and can contain many chaotic rules that are not appropriate for unseen object classification. Therefore the search for the best rules must be performed because it is not possible to determine the quality of all rules generated from the information systems. In rough set approach to data mining, the set of interesting rules are determined using a notion of reduct. Rules were generated from reducts through binding the condition attribute values of the object class from which the reduct is originated to the corresponding attribute. It is important for the reducts to be minimum in size. The minimal reducts will decrease the size of the conditional attributes used to generate rules. Smaller size of rules are expected to classify new cases more properly because of the larger support in data and in some sense the most stable and frequently appearing reducts gives the best decision rules. The main work of the thesis is the generation of classification model that contains smaller number of rules, shorter length and good accuracy. The propositional satisfiability method in rough classification model is proposed in this thesis. Two models, Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) to represent the minimal reduct computation problem were proposed. The models involved a theoretical formalism of the discemibility relation of a decision system (DS) into an Integer Programming (IP) model. The proposed models were embedded within the default rules generation framework and a new rough classification method was obtained. An improved branch and bound strategy is proposed to solve the SIP and DRIP models that pruned certain amount of search. The proposed strategy used the conflict analysis procedure to remove the unnecessary attribute assignments and determined the branch level for the search to backtrack in a nonchronological manner. Five data sets from VCI machine learning repositories and domain theories were experimented. Total number rules generated for the best classification model is recorded where the 30% of data were used for training and 70% were kept as test data. The classification accuracy, the number of rules and the maximum length of rules obtained from the SIPIDRIP method was compared with other rough set method such as Genetic Algorithm (GA), Johnson, Holte l R, Dynamic and Exhaustive method. Four of the datasets were then chosen for further experiment. The improved search strategy implemented the non-chronological backtracking search that potentially prunes the large portion of search space. The experimental results showed that the proposed SIPIDRIP method is a successful method in rough classification modeling. The outstanding feature of this method is the reduced number of rules in all classification models. SIPIDRIP generated shorter rules among other methods in most dataset. The proposed search strategy indicated that the best performance can be achieved at the lower level or shorter path of the tree search. SIPIDRIP method had also shown promising across other commonly used classifiers such as neural network and statistical method. This model is expected to be able to represent the knowledge of the system efficiently

    Problem Restructuring in Integer Programming for Reduct Searching

    Get PDF
    Standard Integer Programming / Decision Related Integer Programming (SIP/DRIP) is a reduct searching system that finds the reducts in an information system. These reducts are the minimal attributes of the information system that are useful in classificatory task. They can describe the whole information system when implementing discernment. In effect, they are very useful in generating rules when solving the classification problem that is inherent in data mining. The thesis emphasizes mainly on the improvement of the original SIP/DRIP algorithm in term of performance. By using problem restructuring, the searching time and memory are minimized. Simultaneously the approach adheres to an essential criterion of the original method. That is, to improve performance without sacrificing the quality of the reduct.Problem restructuring changes the input of the SIP/DRIP without losing any of inpufs essential properties. In other words, no lost of knowledge occurs with problem restructuring. Only the structural order changes, with the contents kept intact. This hypothetically ensures that no adverse distortion transpired within SIP/DRIP. Restructuring is done by speculating a promising structure for the input to SIP/DRIP based on the potentiality of the attributes in the information system. It uses a nonexpensive approach to predict potentiality. Simply, based on the total covering of each attributes within the information system. Although this measurement is just an approximation, it can be proven to work. To implement the experiment, five data sets were taken. They are gathered from the UCI machine learning repositories. Results are measured by comparing the performance of SIP/DRIP with and without problem restructuring. In addition, the length of reducts generated by both approaches are also compared to ensure that no quality deterioration occurred along the way. Experimental results have shown that problem restructuring generally improves SIP/DRIP for all the data sets. This means that on average, it would enhance the performance of SIP/DRIP. The consumption of time and space were minimized quite significantly. Furthermore, the quality of the solutions was also successfully maintained. There was no increase in reduct length when using it. The concept offered by the approach is an additional component to SIP/DRIP. It complements the process of searching done. By giving more consideration to the initial problem space and not just the searching of the solution, the performance of SIP/DRIP has been humbly improved

    Combining rough and fuzzy sets for feature selection

    Get PDF
    corecore