6,967 research outputs found
Confidential Boosting with Random Linear Classifiers for Outsourced User-generated Data
User-generated data is crucial to predictive modeling in many applications.
With a web/mobile/wearable interface, a data owner can continuously record data
generated by distributed users and build various predictive models from the
data to improve their operations, services, and revenue. Due to the large size
and evolving nature of users data, data owners may rely on public cloud service
providers (Cloud) for storage and computation scalability. Exposing sensitive
user-generated data and advanced analytic models to Cloud raises privacy
concerns. We present a confidential learning framework, SecureBoost, for data
owners that want to learn predictive models from aggregated user-generated data
but offload the storage and computational burden to Cloud without having to
worry about protecting the sensitive data. SecureBoost allows users to submit
encrypted or randomly masked data to designated Cloud directly. Our framework
utilizes random linear classifiers (RLCs) as the base classifiers in the
boosting framework to dramatically simplify the design of the proposed
confidential boosting protocols, yet still preserve the model quality. A
Cryptographic Service Provider (CSP) is used to assist the Cloud's processing,
reducing the complexity of the protocol constructions. We present two
constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of
homomorphic encryption, garbled circuits, and random masking to achieve both
security and efficiency. For a boosted model, Cloud learns only the RLCs and
the CSP learns only the weights of the RLCs. Finally, the data owner collects
the two parts to get the complete model. We conduct extensive experiments to
understand the quality of the RLC-based boosting and the cost distribution of
the constructions. Our results show that SecureBoost can efficiently learn
high-quality boosting models from protected user-generated data
Symmetric and Synchronous Communication in Peer-to-Peer Networks
Motivated by distributed implementations of game-theoretical algorithms, we
study symmetric process systems and the problem of attaining common knowledge
between processes. We formalize our setting by defining a notion of
peer-to-peer networks(*) and appropriate symmetry concepts in the context of
Communicating Sequential Processes (CSP), due to the common knowledge creating
effects of its synchronous communication primitives. We then prove that CSP
with input and output guards makes common knowledge in symmetric peer-to-peer
networks possible, but not the restricted version which disallows output
statements in guards and is commonly implemented.
(*) Please note that we are not dealing with fashionable incarnations such as
file-sharing networks, but merely use this name for a mathematical notion of a
network consisting of directly connected peers "treated on an equal footing",
i.e. not having a client-server structure or otherwise pre-determined roles.)Comment: polished, modernized references; incorporated referee feedback from
MPC'0
Sensor networks and distributed CSP: communication, computation and complexity
We introduce SensorDCSP, a naturally distributed benchmark based on a real-world application that arises in the context of networked distributed systems. In order to study the performance of Distributed CSP (DisCSP) algorithms in a truly distributed setting, we use a discrete-event network simulator, which allows us to model the impact of different network traffic conditions on the performance of the algorithms. We consider two complete DisCSP algorithms: asynchronous backtracking (ABT) and asynchronous weak commitment search (AWC), and perform performance comparison for these algorithms on both satisfiable and unsatisfiable instances of SensorDCSP. We found that random delays (due to network traffic or in some cases actively introduced by the agents) combined with a dynamic decentralized restart strategy can improve the performance of DisCSP algorithms. In addition, we introduce GSensorDCSP, a plain-embedded version of SensorDCSP that is closely related to various real-life dynamic tracking systems. We perform both analytical and empirical study of this benchmark domain. In particular, this benchmark allows us to study the attractiveness of solution repairing for solving a sequence of DisCSPs that represent the dynamic tracking of a set of moving objects.This work was supported in part by AFOSR (F49620-01-1-0076, Intelligent Information Systems Institute and MURI F49620-01-1-0361), CICYT (TIC2001-1577-C03-03 and TIC2003-00950), DARPA (F30602-00-2- 0530), an NSF CAREER award (IIS-9734128), and an Alfred P. Sloan Research Fellowship. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the US Government
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners
The k-nearest neighbors (k-NN) algorithm is a popular and effective
classification algorithm. Due to its large storage and computational
requirements, it is suitable for cloud outsourcing. However, k-NN is often run
on sensitive data such as medical records, user images, or personal
information. It is important to protect the privacy of data in an outsourced
k-NN system.
Prior works have all assumed the data owners (who submit data to the
outsourced k-NN system) are a single trusted party. However, we observe that in
many practical scenarios, there may be multiple mutually distrusting data
owners. In this work, we present the first framing and exploration of privacy
preservation in an outsourced k-NN system with multiple data owners. We
consider the various threat models introduced by this modification. We discover
that under a particularly practical threat model that covers numerous
scenarios, there exists a set of adaptive attacks that breach the data privacy
of any exact k-NN system. The vulnerability is a result of the mathematical
properties of k-NN and its output. Thus, we propose a privacy-preserving
alternative system supporting kernel density estimation using a Gaussian
kernel, a classification algorithm from the same family as k-NN. In many
applications, this similar algorithm serves as a good substitute for k-NN. We
additionally investigate solutions for other threat models, often through
extensions on prior single data owner systems
CSP channels for CAN-bus connected embedded control systems
Closed loop control system typically contains multitude of sensors and actuators operated simultaneously. So they are parallel and distributed in its essence. But when mapping this parallelism to software, lot of obstacles concerning multithreading communication and synchronization issues arise. To overcome this problem, the CT kernel/library based on CSP algebra has been developed. This project (TES.5410) is about developing communication extension to the CT library to make it applicable in distributed systems. Since the library is tailored for control systems, properties and requirements of control systems are taken into special consideration. Applicability of existing middleware solutions is examined. A comparison of applicable fieldbus protocols is done in order to determine most suitable ones and CAN fieldbus is chosen to be first fieldbus used. Brief overview of CSP and existing CSP based libraries is given. Middleware architecture is proposed along with few novel ideas
A comprehensive meta-analysis of cryptographic security mechanisms for cloud computing
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The concept of cloud computing offers measurable computational or information resources as a service over the Internet. The major motivation behind the cloud setup is economic benefits, because it assures the reduction in expenditure for operational and infrastructural purposes. To transform it into a reality there are some impediments and hurdles which are required to be tackled, most profound of which are security, privacy and reliability issues. As the user data is revealed to the cloud, it departs the protection-sphere of the data owner. However, this brings partly new security and privacy concerns. This work focuses on these issues related to various cloud services and deployment models by spotlighting their major challenges. While the classical cryptography is an ancient discipline, modern cryptography, which has been mostly developed in the last few decades, is the subject of study which needs to be implemented so as to ensure strong security and privacy mechanisms in todayâs real-world scenarios. The technological solutions, short and long term research goals of the cloud security will be described and addressed using various classical cryptographic mechanisms as well as modern ones. This work explores the new directions in cloud computing security, while highlighting the correct selection of these fundamental technologies from cryptographic point of view
A group membership algorithm with a practical specification
Presents a solvable specification and gives an algorithm for the group membership problem in asynchronous systems with crash failures. Our specification requires processes to maintain a consistent history in their sequences of views. This allows processes to order failures and recoveries in time and simplifies the programming of high level applications. Previous work has proven that the group membership problem cannot be solved in asynchronous systems with crash failures. We circumvent this impossibility result building a weaker, yet nontrivial specification. We show that our solution is an improvement upon previous attempts to solve this problem using a weaker specification. We also relate our solution to other methods and give a classification of progress properties that can be achieved under different models
Resilient Blocks for Summarising Distributed Data
Summarising distributed data is a central routine for parallel programming,
lying at the core of widely used frameworks such as the map/reduce paradigm. In
the IoT context it is even more crucial, being a privileged mean to allow
long-range interactions: in fact, summarising is needed to avoid data explosion
in each computational unit.
We introduce a new algorithm for dynamic summarising of distributed data,
weighted multi-path, improving over the state-of-the-art multi-path algorithm.
We validate the new algorithm in an archetypal scenario, taking into account
sources of volatility of many sorts and comparing it to other existing
implementations. We thus show that weighted multi-path retains adequate
accuracy even in high-variability scenarios where the other algorithms are
diverging significantly from the correct values.Comment: In Proceedings ALP4IoT 2017, arXiv:1802.0097
- âŠ