Search CORE

53,189 research outputs found

Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

Author: Geambasu Roxana
Huang Tzu-Kuo
Lecuyer Mathias
Sen Siddhartha
Spahn Riley
Publication venue
Publication date: 21/05/2017
Field of study

Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

arXiv.org e-Print Archive

Crossref

Anonymizing cybersecurity data in critical infrastructures: the CIPSEC approach

Author: Estrada Jiménez José Antonio
Forné Muñoz Jorge
Rebollo-Monedero David
Rodríguez Hoyos Ana Fernanda
Trapero Burgos Rubén
Álvarez Romero Antonio
Publication venue: ISCRAM
Publication date: 01/01/2019
Field of study

Cybersecurity logs are permanently generated by network devices to describe security incidents. With modern computing technology, such logs can be exploited to counter threats in real time or before they gain a foothold. To improve these capabilities, logs are usually shared with external entities. However, since cybersecurity logs might contain sensitive data, serious privacy concerns arise, even more when critical infrastructures (CI), handling strategic data, are involved. We propose a tool to protect privacy by anonymizing sensitive data included in cybersecurity logs. We implement anonymization mechanisms grouped through the definition of a privacy policy. We adapt said approach to the context of the EU project CIPSEC that builds a unified security framework to orchestrate security products, thus offering better protection to a group of CIs. Since this framework collects and processes security-related data from multiple devices of CIs, our work is devoted to protecting privacy by integrating our anonymization approach.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC