8,918 research outputs found
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
A privacy-preserving model to control social interaction behaviors in social network sites
Social Network Sites (SNSs) served as an invaluable platform to transfer information across a large number of users. SNSs also disseminate users data to third-parties to provide more interesting services for users as well as gaining profits. Users grant access to third-parties to use their services, although they do not necessarily protect users’ data privacy. Controlling social network data diffusion among users and third-parties is difficult due to the vast amount of data. Hence, undesirable users’ data diffusion to unauthorized parties in SNSs may endanger users’ privacy. This paper highlights the privacy breaches on SNSs and emphasizes the most significant privacy issues to users. The goals of this paper are to i) propose a privacy-preserving model for social interactions among users and third-parties; ii) enhance users’ privacy by providing access to the data for appropriate third-parties. These advocate to not compromising the advantages of SNSs information sharing functionalities
Privacy-Preserving Design of Data Processing Systems in the Public Transport Context
The public transport network of a region inhabited by more than 4 million people is run by a complex interplay of public and private actors. Large amounts of data are generated by travellers, buying and using various forms of tickets and passes. Analysing the data is of paramount importance for the governance and sustainability of the system. This manuscript reports the early results of the privacy analysis which is being undertaken as part of the analysis of the clearing process in the Emilia-Romagna region, in Italy, which will compute the compensations for tickets bought from one operator and used with another. In the manuscript it is shown by means of examples that the clearing data may be used to violate various privacy aspects regarding users, as well as (technically equivalent) trade secrets regarding operators. The ensuing discussion has a twofold goal. First, it shows that after researching possible existing solutions, both by reviewing the literature on general privacy-preserving techniques, and by analysing similar scenarios that are being discussed in various cities across the world, the former are found exhibiting structural effectiveness deficiencies, while the latter are found of limited applicability, typically involving less demanding requirements. Second, it traces a research path towards a more effective approach to privacy-preserving data management in the specific context of public transport, both by refinement of current sanitization techniques and by application of the privacy by design approach.
Available at: https://aisel.aisnet.org/pajais/vol7/iss4/4
From Social Data Mining to Forecasting Socio-Economic Crisis
Socio-economic data mining has a great potential in terms of gaining a better
understanding of problems that our economy and society are facing, such as
financial instability, shortages of resources, or conflicts. Without
large-scale data mining, progress in these areas seems hard or impossible.
Therefore, a suitable, distributed data mining infrastructure and research
centers should be built in Europe. It also appears appropriate to build a
network of Crisis Observatories. They can be imagined as laboratories devoted
to the gathering and processing of enormous volumes of data on both natural
systems such as the Earth and its ecosystem, as well as on human
techno-socio-economic systems, so as to gain early warnings of impending
events. Reality mining provides the chance to adapt more quickly and more
accurately to changing situations. Further opportunities arise by individually
customized services, which however should be provided in a privacy-respecting
way. This requires the development of novel ICT (such as a self- organizing
Web), but most likely new legal regulations and suitable institutions as well.
As long as such regulations are lacking on a world-wide scale, it is in the
public interest that scientists explore what can be done with the huge data
available. Big data do have the potential to change or even threaten democratic
societies. The same applies to sudden and large-scale failures of ICT systems.
Therefore, dealing with data must be done with a large degree of responsibility
and care. Self-interests of individuals, companies or institutions have limits,
where the public interest is affected, and public interest is not a sufficient
justification to violate human rights of individuals. Privacy is a high good,
as confidentiality is, and damaging it would have serious side effects for
society.Comment: 65 pages, 1 figure, Visioneer White Paper, see
http://www.visioneer.ethz.c
RoboChain: A Secure Data-Sharing Framework for Human-Robot Interaction
Robots have potential to revolutionize the way we interact with the world
around us. One of their largest potentials is in the domain of mobile health
where they can be used to facilitate clinical interventions. However, to
accomplish this, robots need to have access to our private data in order to
learn from these data and improve their interaction capabilities. Furthermore,
to enhance this learning process, the knowledge sharing among multiple robot
units is the natural step forward. However, to date, there is no
well-established framework which allows for such data sharing while preserving
the privacy of the users (e.g., the hospital patients). To this end, we
introduce RoboChain - the first learning framework for secure, decentralized
and computationally efficient data and model sharing among multiple robot units
installed at multiple sites (e.g., hospitals). RoboChain builds upon and
combines the latest advances in open data access and blockchain technologies,
as well as machine learning. We illustrate this framework using the example of
a clinical intervention conducted in a private network of hospitals.
Specifically, we lay down the system architecture that allows multiple robot
units, conducting the interventions at different hospitals, to perform
efficient learning without compromising the data privacy.Comment: 7 pages, 6 figure
Learning structure and schemas from heterogeneous domains in networked systems: a survey
The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible to manually organize such documents. Additionally, most of the document sexist in an unstructured form and do not follow any schemas. Therefore, research efforts in this direction are being dedicated to automatically infer structure and schemas. This is essential in order to better organize huge collections as well as to effectively and efficiently retrieve documents in heterogeneous domains in networked system. This paper presents a survey of the state-of-the-art methods for inferring structure from documents and schemas in networked environments. The survey is organized around the most important application domains, namely, bio-informatics, sensor networks, social networks, P2Psystems, automation and control, transportation and privacy preserving for which we analyze the recent developments on dealing with unstructured data in such domains.Peer ReviewedPostprint (published version
- …