3 research outputs found

    DOMINANT ATTRIBUTE AND MULTIPLE SCANNING APPROACHES FOR DISCRETIZATION OF NUMERICAL ATTRIBUTES

    Get PDF
    Rapid development of high throughput technologies and database management systems has made it possible to produce and store large amount of data. However, making sense of big data and discovering knowledge from it is a compounding challenge. Generally, data mining techniques search for information in datasets and express gained knowledge in the form of trends, regularities, patterns or rules. Rules are frequently identified automatically by a technique called rule induction, which is the most important technique in data mining and machine learning and it was developed primarily to handle symbolic data. However, real life data often contain numerical attributes and therefore, in order to fully utilize the power of rule induction techniques, an essential preprocessing step of converting numeric data into symbolic data called discretization is employed in data mining. Here we present two entropy based discretization techniques known as dominant attribute approach and multiple scanning approach, respectively. These approaches were implemented as two explicit algorithms in a JAVA programming language and experiments were conducted by applying each algorithm separately on seventeen well known numerical data sets. The resulting discretized data sets were used for rule induction by LEM2 or Learning from Examples Module 2 algorithm. For each dataset in multiple scanning approach, experiments were repeated with incremental scans until interval counts were stabilized. Preliminary results from this study indicated that multiple scanning approach performed better than dominant attribute approach in terms of producing comparatively smaller and simpler rule sets

    Dwuetapowa metoda eksploracji danych pozyskiwanych z obrazów cyfrowych

    Get PDF
    The main aim of this work is to develop a two-step method of extracting knowledge from digital images. The method integrates the analysis of digital images directed at the extraction of quantitative and qualitative characteristics and knowledge extraction methods. The proposed method let to conduct a new kind of exploratory research aimed at extracting knowledge from digital images. This method allows to create rule set knowledge bases for decision support systems. Analysis of a coherent series of images allows the extraction of interesting qualitative and quantitative features. Put their further exploratory analysis may allow for the discovery of regularities occurring, generalizations, connections and relationships. The paper presents the research literature on both the available algorithms analysis and processing images as well as methods of exploration of knowledge. Realized computer program implements the proposed method, while providing the environment of its experimental verification. Checks on the correctness of the system were carried out on real data: computer tomographic images of the surface of the tooth, dermatoscopic skin cancer images and microscopic images of Friction Stir Welding joints. The proposed method of data mining of digital images was divided into two steps. The first step uses the selected methods of image analysis, focused on the extraction of quantitative and qualitative characteristics of objects presented on images. The basic input data for the selected methods are a series of images depicting the analyzed objects. At the step of extraction of characteristics: • Determined their number, type, the names of the attributes and names or ranges of features. • Establish a set of graphical transformations to be carried images to their standardization and to obtain the required characteristics. The result of this stage are data table - information system. These data are subjected to pre-processing, covering the processing of missing, outliers, digitization data. Pre-processed data make decision table. In the table determined attribute decision-making and conditional attributes. The decision table is input for the second step of the proposed method. This step involves data mining, ended generating decision rules. This process is preceded by an analysis of the consistency of the two available methods: qualitative and quantitative. Data mining is based on an approach based on rough set theory. The result of exploratory research are the decision rules generated by various methods. Cross validation allows to carry out the quality of the method. At each stage it is possible to support the use of domain experts who are using a dedicated system could verify the results obtained with regard to input images. The last element of the method is the ability to use established knowledge base for the implementation of subject decision support system. This is done by inference forward module. The inference can be used for both practical and experimental verification of the received rule base, as well as to achieve ready to implement the user system. The proposed method can maximize the automation capabilities of acquiring knowledge of the images, while allowing for the use of knowledge and competence domain experts

    A semantical and computational approach to covering-based rough sets

    Get PDF
    corecore