1,575 research outputs found

    Propositional Satisfiability Method in Rough Classification Modeling for Data Mining

    Get PDF
    The fundamental problem in data mining is whether the whole information available is always necessary to represent the information system (IS). The goal of data mining is to find rules that model the world sufficiently well. These rules consist of conditions over attributes value pairs called description and classification of decision attribute. However, the set of all decision rules generated from all conditional attributes can be too large and can contain many chaotic rules that are not appropriate for unseen object classification. Therefore the search for the best rules must be performed because it is not possible to determine the quality of all rules generated from the information systems. In rough set approach to data mining, the set of interesting rules are determined using a notion of reduct. Rules were generated from reducts through binding the condition attribute values of the object class from which the reduct is originated to the corresponding attribute. It is important for the reducts to be minimum in size. The minimal reducts will decrease the size of the conditional attributes used to generate rules. Smaller size of rules are expected to classify new cases more properly because of the larger support in data and in some sense the most stable and frequently appearing reducts gives the best decision rules. The main work of the thesis is the generation of classification model that contains smaller number of rules, shorter length and good accuracy. The propositional satisfiability method in rough classification model is proposed in this thesis. Two models, Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) to represent the minimal reduct computation problem were proposed. The models involved a theoretical formalism of the discemibility relation of a decision system (DS) into an Integer Programming (IP) model. The proposed models were embedded within the default rules generation framework and a new rough classification method was obtained. An improved branch and bound strategy is proposed to solve the SIP and DRIP models that pruned certain amount of search. The proposed strategy used the conflict analysis procedure to remove the unnecessary attribute assignments and determined the branch level for the search to backtrack in a nonchronological manner. Five data sets from VCI machine learning repositories and domain theories were experimented. Total number rules generated for the best classification model is recorded where the 30% of data were used for training and 70% were kept as test data. The classification accuracy, the number of rules and the maximum length of rules obtained from the SIPIDRIP method was compared with other rough set method such as Genetic Algorithm (GA), Johnson, Holte l R, Dynamic and Exhaustive method. Four of the datasets were then chosen for further experiment. The improved search strategy implemented the non-chronological backtracking search that potentially prunes the large portion of search space. The experimental results showed that the proposed SIPIDRIP method is a successful method in rough classification modeling. The outstanding feature of this method is the reduced number of rules in all classification models. SIPIDRIP generated shorter rules among other methods in most dataset. The proposed search strategy indicated that the best performance can be achieved at the lower level or shorter path of the tree search. SIPIDRIP method had also shown promising across other commonly used classifiers such as neural network and statistical method. This model is expected to be able to represent the knowledge of the system efficiently

    Twofold Integer Programming Model for Improving Rough Set Classification Accuracy in Data Mining.

    Get PDF
    The fast growing size of databases has resulted in a great demand for tools capable of analyzing data with the aim of discovering new knowledge and patterns. These tools will hopefully close the gap between the steady growth of information and the escalating demand to understand and discover the value of such knowledge. These tools are known as Data Mining (DM). One aims of DM is to discover decision rules for extracting meaningful knowledge. These rules consist of conditions over attribute value pairs called the descriptions, and decision attributes. Therefore generating a good decision model or classification model is a major component in many data mining researches. The classification approach basically produces a function that maps data item into one of several predefined classes, by way of inputting training dataset and building a model of the class attribute based on the rest of the attributes.This research undertakes three main tasks. The first task is to introduce a new rough model for minimum reduct selection and default rules generation, which is known as a Twofold Integer Programming (TIP). The second task is to enhance rules accuracy based on the first task, while the third task is to classify new objects or cases. The TIP model is based on translation of the discernibility relation of a Decision System (DS) into an Integer Programming (IP) model, resolved by using the branch and bound search method in order to generate the full reduct of the DS. The TIP model is then applied to the reduct to generate the default rules, which in turn are used to classify unseen objects with a satisfying accuracy. Apart from introducing the TIP model, this research also addressed the issues of missing values, discretization and extracting minimum rules. The treatment of missing values and discretization are being carried out during the preprocessing stage. The extraction of minimum rules operation is conducted after the default rules have been generated in order to obtain the most useful discovered rules. Eight datasets from machine learning repositories and domain theories are tested by the TIP model. Total rules number, rules length and rules accuracy for the generation rules are recorded. The accuracy for rules and classification resulted from the TIP method are compared with other methods such as Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) from Rough Set, Genetic Algorithm (GA), Johnson reducer, HoltelR method, Multiple Regression (MR), Neural Network (NN), Induction of Decision Tree Algorithm (ID3) and Base Learning Algorithm (C4.5); all other classifiers that are mostly used in the classification tasks. Based on the experiment results, the classification method using the TIP approach has successfully performed rules generation and classification tasks as required during a classification operation. The outcome of a considerably good accuracy is mainly due to the right selection of relevant attributes. This research has proven that the TIP method has shown the ability to cater for different kinds of datasets and obtained a good rough classification model with promising results as compared with other commonly used classifiers. This research opens a wide range of future work to be considered, which includes applying the proposed method in other areas such as web mining, text mining or multimedia mining; and extending the proposed approach to work in parallel computing in data mining

    An encompassing framework for Paraconsistent Logic Programs

    Get PDF
    AbstractWe propose a framework which extends Antitonic Logic Programs [Damásio and Pereira, in: Proc. 6th Int. Conf. on Logic Programming and Nonmonotonic Reasoning, Springer, 2001, p. 748] to an arbitrary complete bilattice of truth-values, where belief and doubt are explicitly represented. Inspired by Ginsberg and Fitting's bilattice approaches, this framework allows a precise definition of important operators found in logic programming, such as explicit and default negation. In particular, it leads to a natural semantical integration of explicit and default negation through the Coherence Principle [Pereira and Alferes, in: European Conference on Artificial Intelligence, 1992, p. 102], according to which explicit negation entails default negation. We then define Coherent Answer Sets, and the Paraconsistent Well-founded Model semantics, generalizing many paraconsistent semantics for logic programs. In particular, Paraconsistent Well-Founded Semantics with eXplicit negation (WFSXp) [Alferes et al., J. Automated Reas. 14 (1) (1995) 93–147; Damásio, PhD thesis, 1996]. The framework is an extension of Antitonic Logic Programs for most cases, and is general enough to capture Probabilistic Deductive Databases, Possibilistic Logic Programming, Hybrid Probabilistic Logic Programs, and Fuzzy Logic Programming. Thus, we have a powerful mathematical formalism for dealing simultaneously with default, paraconsistency, and uncertainty reasoning. Results are provided about how our semantical framework deals with inconsistent information and with its propagation by the rules of the program

    Rule Discovery

    Get PDF

    Comparing the knowledge quality in rough classifier and decision tree classifier

    Get PDF
    This paper presents a comparative study of two rule based classifier; rough set (Rc) and decision tree (DTc).Both techniques apply different approach to perform classification but produce same structure of output with comparable result. Theoretically, different classifiers will generate different sets of rules via knowledge even though they are implemented to the same classification problem.Hence, the aim of this paper is to investigate the quality of knowledge produced by Rc and DTc when similar problems are presented to them.In this case, four important performance metrics are used as comparison, the accuracy of classification, rules quantity, rules length and rules coverage.Five dataset from UCI Machine Learning are chosen and then mined using Rc toolkit namely ROSETTA while C4.5 algorithm in WEKA application is chosen as DTc rule generator. The experimental result shows that Rc and DTc own capability to generate quality knowledge since most of the results are comparable. Rc outperform as an accurate classifier, produce shorter and simpler rule with higher coverage. Meanwhile, DTc obviously generates fewer numbers of rules with significant difference

    The VEX-93 environment as a hybrid tool for developing knowledge systems with different problem solving techniques

    Get PDF
    The paper describes VEX-93 as a hybrid environment for developing knowledge-based and problem solver systems. It integrates methods and techniques from artificial intelligence, image and signal processing and data analysis, which can be mixed. Two hierarchical levels of reasoning contains an intelligent toolbox with one upper strategic inference engine and four lower ones containing specific reasoning models: truth-functional (rule-based), probabilistic (causal networks), fuzzy (rule-based) and case-based (frames). There are image/signal processing-analysis capabilities in the form of programming languages with more than one hundred primitive functions. User-made programs are embeddable within knowledge basis, allowing the combination of perception and reasoning. The data analyzer toolbox contains a collection of numerical classification, pattern recognition and ordination methods, with neural network tools and a data base query language at inference engines's disposal. VEX-93 is an open system able to communicate with external computer programs relevant to a particular application. Metaknowledge can be used for elaborate conclusions, and man-machine interaction includes, besides windows and graphical interfaces, acceptance of voice commands and production of speech output. The system was conceived for real-world applications in general domains, but an example of a concrete medical diagnostic support system at present under completion as a cuban-spanish project is mentioned. Present version of VEX-93 is a huge system composed by about one and half millions of lines of C code and runs in microcomputers under Windows 3.1.Postprint (published version

    Zero-one laws with respect to models of provability logic and two Grzegorczyk logics

    Get PDF
    It has been shown in the late 1960s that each formula of first-order logic without constants and function symbols obeys a zero-one law: As the number of elements of finite models increases, every formula holds either in almost all or in almost no models of that size. Therefore, many properties of models, such as having an even number of elements, cannot be expressed in the language of first-order logic. Halpern and Kapron proved zero-one laws for classes of models corresponding to the modal logics K, T, S4, and S5 and for frames corresponding to S4 and S5. In this paper, we prove zero-one laws for provability logic and its two siblings Grzegorczyk logic and weak Grzegorczyk logic, with respect to model validity. Moreover, we axiomatize validity in almost all relevant finite models, leading to three different axiom systems
    corecore