29 research outputs found

    Using unknowns to prevent discovery of association rules

    Get PDF
    Data mining technology has given us new capabilities to identify correlations in large data sets. This introduces risks when the data is to be made public, but the correlations are private. We introduce a method for selectively removing individual values from a database to prevent the discovery of a set of rules, while preserving the data for other applications. The efficacy and complexity of this method are discussed. We also present an experiment showing an example of this methodology

    Information driven evaluation of data hiding algorithms

    Get PDF
    Abstract. Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. Due to the large amount of possible techniques that can be used to achieve this goal, it is necessary to provide some standard evaluation metrics to determine the best algorithms for a specific application or context. Currently, however, there is no common set of parameters that can be used for this purpose. This paper explores the problem of PPDM algorithm evaluation, starting from the key goal of preserving of data quality. To achieve such goal, we propose a formal definition of data quality specifically tailored for use in the context of PPDM algorithms, a set of evaluation parameters and an evaluation algorithm. The resulting evaluation core process is then presented as a part of a more general three step evaluation framework, taking also into account other aspects of the algorithm evaluation such as efficiency, scalability and level of privacy.

    Distortion-Based Heuristic Sensitive Rule Hiding Method – The Greedy Way

    No full text

    LUWF - Workflow Editor, Konfigurationsmanagement für OCS basiertes Contact-Center

    No full text
    Bestehende Contact Center haben oft das Problem, dass die Erstellung und das Unterhalten von Call Flows(Repräsentiert den Weg eines Anrufes durch die Organisationsprozesse und die Technik des Contact Centers) sehr zeitaufwendig ist. Kleine Änderungen an der Firmen Policy müssen manuell auf allen Call Flows angepasst werden, was zu erhebliche Mehrkosten im Unterhalt führt. Desweiteren wurden die User Interfaces (UI) zum Erstellen von Call Flows von Technikern gestaltet und sind häufig kompliziert und unübersichtlich in der Anwendung. Die Hauptaufgabe des in der Arbeit erstellten Konfigurationsinterfaces besteht in der grafischen Erstellung und Verwaltung von Call Flows, mit dessen Hilfe Ketten von elementaren Ereignissen (z.B. Abspielen einer bestimmten Ansage oder das Sammeln von Benutzereingaben), modelliert werden können. Dabei soll die Konfiguration durch ein intuitives GUI erfolgen. Schlüsselfunktionen des Graphical User Interface (GUI) sind Drag and Drop und das Verbinden von elementaren Operationen. Um den Konfigurationsvorgang zu beschleunigen, können mit Hilfe des eingebauten Policy- Editors verschiedenste Eigenschaften eines Call Flows, wie Öffnungszeiten, vordefiniert werden. Die einzelnen Elemente können einfach per Dropdown Liste zugewiesen werden. Einer Workflow-Instanz kann ein Service Entry hinzugefügt werden. Dieser besteht aus einem Namen, einer Telefonnummer und einer SIP-URI und stellt die Beziehung zu einem effektiven Call in einer Dienstleistung dar – etwa zu einem tatsächlichen Anruf auf eine Hotline

    k-Anonymization Without Q-S Associations

    No full text

    Data pipelines for educational data mining in distance education

    No full text
    New challenges in education demand effective solutions. Although Learning Analytics (LA), Educational Data Mining (EDM) and the use of Big Data are often presented as a panacea, there is a lot of ground to be covered in order for the EDM to answer the real questions of educators. An important step toward this goal is to implement holistic solutions that allow educational stakeholders to engage in the core of the EDM processes. The effectiveness of such an attempt relies on (a) having access to data arranged in an organized and meaningful way and (b) setting a sequence of processes that are flexible and reusable. Therefore, a data pipeline that imports data from a specially developed data warehouse is designed and created. Additionally, it is tested in real-life data, and results are discussed. © 2023 Informa UK Limited, trading as Taylor & Francis Group

    A novel approach for handling semantic trajectories on data warehouses

    No full text
    A trajectory is a set of traces left by a moving object. It contains spatio-temporal information about where and when that object was, as well as other semantical relevant information. It is described by a continuation of movement. Data concerning moving objects and their trajectories can be stored in a Trajectory Data Warehouses for organization, managing, and analysis purposes. This work is dedicated to semantic trajectory data warehouses. A logical schema is proposed, called S-TrODW, where an object relational framework is used. The main novelty of the S-TrODW model is the integration of trajectories and their segments in the fact table by means of a nested relation. An algorithm is presented for transforming the flat star schema (with non-nested trajectory segments) to the S-TrODW schema. The proposal is validated through a case study dealing with freight transportation. A more natural modelling and queries formulation, as well as the improvement of query execution time are among the contributions of this paper compared to other works. © 2022 - IOS Press. All rights reserved

    Efficient algorithms for distortion and blocking techniques in association rule hiding

    No full text
    Data mining provides the opportunity to extract useful information from large databases. Various techniques have been proposed in this context in order to extract this information in the most efficient way. However, efficiency is not our only concern in this study. The security and privacy issues over the extracted knowledge must be seriously considered as well. By taking this into consideration, we study the procedure of hiding sensitive association rules in binary data sets by blocking some data values and we present an algorithm for solving this problem. We also provide a fuzzification of the support and the confidence of an association rule in order to accommodate for the existence of blocked/unknown values. In addition, we quantitatively compare the proposed algorithm with other already published algorithms by running experiments on binary data sets, and we also qualitatively compare the efficiency of the proposed algorithm in hiding association rules. We utilize the notion of border rules, by putting weights in each rule, and we use effective data structures for the representation of the rules so as (a) to minimize the side effects created by the hiding process and (b) to speed up the selection of the victim transactions. Finally, we study the overall security of the modified database, using the C4.5 decision tree algorithm of the WEKA data mining tool, and we discuss the advantages and the limitations of blocking

    Meteorological Data Warehousing and Analysis for Supporting Air Navigation

    No full text
    Data analysis of weather phenomena to either predict or control human imprint on the environment requires the collection of various forms of observational data ranging from historical and longitudinal to forecast. The objective of this research paper is the development of a data warehouse (DW) based on a new hybrid logical schema, concerning the assimilation and maintenance of historical meteorological data from all operating airports in Greece, along with data in the Greek Flight Information Region related to flight delays and cancellations. SQL is used for querying these data and makes them easily accessible and manageable. The data from the DW are collected and used as training data for the induction of predictive models. In this study, the prediction problem is cast as a classification task, and different decision tree induction techniques are applied to build accurate models that allow flexible scheduling and planning for the minimization of waiting time and inconvenience of passengers. © 2022 by the authors

    A tutorial on blocking methods for privacy-preserving record linkage

    No full text
    In this paper, we first present five state-of-the-art private blocking methods which rely mainly on random strings, clustering, and public reference sets. We emphasize on the drawbacks of these methods, and then, we present our L-fold redundant blocking scheme, that relies on the Locality-Sensitive Hashing technique for identifying similar records. These records have undergone an anonymization transformation using a Bloom filter-based encoding technique. Finally, we perform an experimental evaluation of all these methods and present the results. © Springer International Publishing Switzerland 2016
    corecore