4,513 research outputs found

    Web Usage Mining with Evolutionary Extraction of Temporal Fuzzy Association Rules

    Get PDF
    In Web usage mining, fuzzy association rules that have a temporal property can provide useful knowledge about when associations occur. However, there is a problem with traditional temporal fuzzy association rule mining algorithms. Some rules occur at the intersection of fuzzy sets' boundaries where there is less support (lower membership), so the rules are lost. A genetic algorithm (GA)-based solution is described that uses the flexible nature of the 2-tuple linguistic representation to discover rules that occur at the intersection of fuzzy set boundaries. The GA-based approach is enhanced from previous work by including a graph representation and an improved fitness function. A comparison of the GA-based approach with a traditional approach on real-world Web log data discovered rules that were lost with the traditional approach. The GA-based approach is recommended as complementary to existing algorithms, because it discovers extra rules. (C) 2013 Elsevier B.V. All rights reserved

    Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics

    Get PDF
    National statistical agencies around the world publish tabular summaries based on combined employer-employee (ER-EE) data. The privacy of both individuals and business establishments that feature in these data are protected by law in most countries. These data are currently released using a variety of statistical disclosure limitation (SDL) techniques that do not reveal the exact characteristics of particular employers and employees, but lack provable privacy guarantees limiting inferential disclosures. In this work, we present novel algorithms for releasing tabular summaries of linked ER-EE data with formal, provable guarantees of privacy. We show that state-of-the-art differentially private algorithms add too much noise for the output to be useful. Instead, we identify the privacy requirements mandated by current interpretations of the relevant laws, and formalize them using the Pufferfish framework. We then develop new privacy definitions that are customized to ER-EE data and satisfy the statutory privacy requirements. We implement the experiments in this paper on production data gathered by the U.S. Census Bureau. An empirical evaluation of utility for these data shows that for reasonable values of the privacy-loss parameter ϵ≥1, the additive error introduced by our provably private algorithms is comparable, and in some cases better, than the error introduced by existing SDL techniques that have no provable privacy guarantees. For some complex queries currently published, however, our algorithms do not have utility comparable to the existing traditiona

    Упрощение транзакционных баз данных на основе четких продукций

    Get PDF
    Рассмотрена задача упрощения транзакционных баз данных. Предложен метод сокращения баз транзакций на основе четких продукций. Разработанный метод позволяет исключить неинформативные признаки и избыточные экземпляры из заданных массивов данных, что, в свою очередь, позволяет понизить структурную и параметрическую сложность синтезируемых диагностических моделей

    Spatial Data Preprocessing for Mining Spatial Association Rule with Conventional Association Mining Algorithms

    Get PDF
    The increasing usage of Geographical Information Systems (GIS) for various problems makes the volume of spatial data is growing fast. Spatial data mining is one of the several ways to find the new knowledge from data collection. One of spatial data mining tasks is spatial association rule. There are numerous association rule algorithms have been developed for mining association. Unfortunately, the most algorithms can only used for mining non-spatial and specific formatted data. Therefore, spatial data preprocessing is needed in order conventional association algorithms can be used for spatial data

    Mining complex structured data: Enhanced methods and applications

    Get PDF
    Conventional approaches to analysing complex business data typically rely on process models, which are difficult to construct and use. This thesis addresses this issue by converting semi-structured event logs to a simpler flat representation without any loss of information, which then enables direct applications of classical data mining methods. The thesis also proposes an effective and scalable classification method which can identify distinct characteristics of a business process for further improvements

    An overview of decision table literature 1982-1995.

    Get PDF
    This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.
    corecore