5,127 research outputs found

    Extending Attribute-Oriented Induction as a Key-Preserving Data Mining Method

    Full text link

    Attribute oriented induction with star schema

    Full text link
    This paper will propose a novel star schema attribute induction as a new attribute induction paradigm and as improving from current attribute oriented induction. A novel star schema attribute induction will be examined with current attribute oriented induction based on characteristic rule and using non rule based concept hierarchy by implementing both of approaches. In novel star schema attribute induction some improvements have been implemented like elimination threshold number as maximum tuples control for generalization result, there is no ANY as the most general concept, replacement the role concept hierarchy with concept tree, simplification for the generalization strategy steps and elimination attribute oriented induction algorithm. Novel star schema attribute induction is more powerful than the current attribute oriented induction since can produce small number final generalization tuples and there is no ANY in the results.Comment: 23 Pages, IJDM

    Privacy Preserving Data Mining, Evaluation Methodologies

    Get PDF
    Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. Due to the large amount of possible techniques that can be used to achieve this goal, it is necessary to provide some standard evaluation metrics to determine the best algorithms for a specific application or context. Currently, however, there is no common set of parameters that can be used for this purpose. Moreover, because sanitization modifies the data, an important issue, especially for critical data, is to preserve the quality of data. However, to the best of our knowledge, no approaches have been developed dealing with the issue of data quality in the context of PPDM algorithms. This report explores the problem of PPDM algorithm evaluation, starting from the key goal of preserving of data quality. To achieve such goal, we propose a formal definition of data quality specifically tailored for use in the context of PPDM algorithms, a set of evaluation parameters and an evaluation algorithm. Moreover, because of the "environment related" nature of data quality, a structure to represent constraints and information relevance related to data is presented. The resulting evaluation core process is then presented as a part of a more general three step evaluation framework, taking also into account other aspects of the algorithm evaluation such as efficiency, scalability and level of privacy.JRC.G.6-Sensors, radar technologies and cybersecurit

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Simplification logic for the management of unknown information

    Get PDF
    This paper aims to contribute to the extension of classical Formal Concept Analysis (FCA), allowing the management of unknown information. In a preliminary paper, we define a new kind of attribute implications to represent the knowledge from the information currently available. The whole FCA framework has to be appropriately extended to manage unknown information. This paper introduces a new logic for reasoning with this kind of implications, which belongs to the family of logics with an underlying Simplification paradigm. Specifically, we introduce a new algebra, named weak dual Heyting Algebra, that allows us to extend the Simplification logic for these new implications. To provide a solid framework, we also prove its soundness and completeness and show the advantages of the Simplification paradigm. Finally, to allow further use of this extension of FCA in applications, an algorithm for automated reasoning, which is directly built from logic, is defined.Funding for open access charge: Universidad de MĂĄlaga / CBUA This article is Supported by Grants TIN2017-89023-P, PRE2018-085199 and PID2021-127870OB-I00 of the Ministry of Science and Innovation of Spain and UMA2018-FEDERJA-001 of the Junta de Andalucia and European Social Fund
    • 

    corecore