Search CORE

24,767 research outputs found

Performance Evaluation of Anonymized Data Stream Classifiers

Author: Bhatnagar Divya
Nyati Aradhana
Publication venue: DLAR LABS
Publication date: 01/04/2016
Field of study

Data stream is a continuous and changing sequence of data that continuously arrive at a system to store or process. It is vital to find out useful information from large enormous amount of data streams generated from different applications viz. organization record, call center record, sensor data, network traffic, web searches etc. Privacy preserving data mining techniques allow generation of data for mining and preserve the private information of the individuals. In this paper, classification algorithms were applied on original data set as well as privacy preserved data set. Results were compared to evaluate the performance of various classification algorithms on the data streams that had been privacy preserved using anonymization techniques. The paper proposes an effective approach for classification of anonymized data streams. Intensive experiments were performed using appropriate data mining and anonymization tools. Experimental result shows that the proposed approach improves accuracy of classification and increases the utility, i.e. accuracy of classification while minimizing the mean absolute error. The proposed work presents the anonymization technique effective in terms of information loss and the classifiers efficient in terms of response time anddata usability

E-LIS

Directory of Open Access Journals

Semi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices

Author: Kaosar Md. Golam
Yi Xun
Publication venue
Publication date: 01/01/2010
Field of study

In this paper a homomorphic privacy preserving association rule mining algorithm is proposed which can be deployed in resource constrained devices (RCD). Privacy preserved exchange of counts of itemsets among distributed mining sites is a vital part in association rule mining process. Existing cryptography based privacy preserving solutions consume lot of computation due to complex mathematical equations involved. Therefore less computation involved privacy solutions are extremely necessary to deploy mining applications in RCD. In this algorithm, a semi-trusted mixer is used to unify the counts of itemsets encrypted by all mining sites without revealing individual values. The proposed algorithm is built on with a well known communication efficient association rule mining algorithm named count distribution (CD). Security proofs along with performance analysis and comparison show the well acceptability and effectiveness of the proposed algorithm. Efficient and straightforward privacy model and satisfactory performance of the protocol promote itself among one of the initiatives in deploying data mining application in RCD.Comment: IEEE Publication format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 8 No. 1, April 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis

arXiv.org e-Print Archive

Research Repository

Victoria University Eprints Repository

Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

Author: Geambasu Roxana
Huang Tzu-Kuo
Lecuyer Mathias
Sen Siddhartha
Spahn Riley
Publication venue
Publication date: 21/05/2017
Field of study

Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

arXiv.org e-Print Archive

Crossref

On content-based recommendation and user privacy in social-tagging systems

Author: Forné Muñoz Jorge
Parra-Arnau Javier
Puglisi Silvia
Rebollo Monedero David
Publication venue: 'Elsevier BV'
Publication date: 01/09/2015
Field of study

Recommendation systems and content filtering approaches based on annotations and ratings, essentially rely on users expressing their preferences and interests through their actions, in order to provide personalised content. This activity, in which users engage collectively has been named social tagging, and it is one of the most popular in which users engage online, and although it has opened new possibilities for application interoperability on the semantic web, it is also posing new privacy threats. It, in fact, consists of describing online or offline resources by using free-text labels (i.e. tags), therefore exposing the user profile and activity to privacy attacks. Users, as a result, may wish to adopt a privacy-enhancing strategy in order not to reveal their interests completely. Tag forgery is a privacy enhancing technology consisting of generating tags for categories or resources that do not reflect the user's actual preferences. By modifying their profile, tag forgery may have a negative impact on the quality of the recommendation system, thus protecting user privacy to a certain extent but at the expenses of utility loss. The impact of tag forgery on content-based recommendation is, therefore, investigated in a real-world application scenario where different forgery strategies are evaluated, and the consequent loss in utility is measured and compared.Peer ReviewedPostprint (author’s final draft

arXiv.org e-Print Archive

Elsevier - Publisher Connector

UPCommons. Portal del coneixement obert de la UPC

FLAIM: A Multi-level Anonymization Framework for Computer and Network Logs

Author: Lakkaraju Kiran
Luo Katherine
Slagell Adam
Publication venue
Publication date: 01/01/2006
Field of study

FLAIM (Framework for Log Anonymization and Information Management) addresses two important needs not well addressed by current log anonymizers. First, it is extremely modular and not tied to the specific log being anonymized. Second, it supports multi-level anonymization, allowing system administrators to make fine-grained trade-offs between information loss and privacy/security concerns. In this paper, we examine anonymization solutions to date and note the above limitations in each. We further describe how FLAIM addresses these problems, and we describe FLAIM's architecture and features in detail.Comment: 16 pages, 4 figures, in submission to USENIX Lis

arXiv.org e-Print Archive

CiteSeerX

Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

Author: Chen Xin
Crandall David J
Sharghi Aidean
Xu Mingze
Publication venue
Publication date: 11/01/2018
Field of study

A major emerging challenge is how to protect people's privacy as cameras and computer vision are increasingly integrated into our daily lives, including in smart devices inside homes. A potential solution is to capture and record just the minimum amount of information needed to perform a task of interest. In this paper, we propose a fully-coupled two-stream spatiotemporal architecture for reliable human action recognition on extremely low resolution (e.g., 12x16 pixel) videos. We provide an efficient method to extract spatial and temporal features and to aggregate them into a robust feature representation for an entire action video sequence. We also consider how to incorporate high resolution videos during training in order to build better low resolution action recognition models. We evaluate on two publicly-available datasets, showing significant improvements over the state-of-the-art.Comment: 9 pagers, 5 figures, published in WACV 201

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)