318,336 research outputs found

    Computing on Masked Data to improve the Security of Big Data

    Full text link
    Organizations that make use of large quantities of information require the ability to store and process data from central locations so that the product can be shared or distributed across a heterogeneous group of users. However, recent events underscore the need for improving the security of data stored in such untrusted servers or databases. Advances in cryptographic techniques and database technologies provide the necessary security functionality but rely on a computational model in which the cloud is used solely for storage and retrieval. Much of big data computation and analytics make use of signal processing fundamentals for computation. As the trend of moving data storage and computation to the cloud increases, homeland security missions should understand the impact of security on key signal processing kernels such as correlation or thresholding. In this article, we propose a tool called Computing on Masked Data (CMD), which combines advances in database technologies and cryptographic tools to provide a low overhead mechanism to offload certain mathematical operations securely to the cloud. This article describes the design and development of the CMD tool.Comment: 6 pages, Accepted to IEEE HST Conferenc

    How to Optimally Allocate Resources for Coded Distributed Computing?

    Full text link
    Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute computations across many nodes to take advantage of parallel processing. However, as we allocate more and more computing resources to a computation task and further distribute the computations, large amounts of (partially) computed data must be moved between consecutive stages of computation tasks among the nodes, hence the communication load can become the bottleneck. In this paper, we study the optimal allocation of computing resources in distributed computing, in order to minimize the total execution time in distributed computing accounting for both the duration of computation and communication phases. In particular, we consider a general MapReduce-type distributed computing framework, in which the computation is decomposed into three stages: \emph{Map}, \emph{Shuffle}, and \emph{Reduce}. We focus on a recently proposed \emph{Coded Distributed Computing} approach for MapReduce and study the optimal allocation of computing resources in this framework. For all values of problem parameters, we characterize the optimal number of servers that should be used for distributed processing, provide the optimal placements of the Map and Reduce tasks, and propose an optimal coded data shuffling scheme, in order to minimize the total execution time. To prove the optimality of the proposed scheme, we first derive a matching information-theoretic converse on the execution time, then we prove that among all possible resource allocation schemes that achieve the minimum execution time, our proposed scheme uses the exactly minimum possible number of servers

    Secure and efficient multiparty computation on genomic data

    Get PDF
    © ACM 2016. Large scale biomedical research projects involve analysis of huge amount of genomic data which is owned by different data owners. The collection and storing of genomic data is sometimes beyond the capability of a sole organization. Genomic data sharing is a feasible solution to overcome this problem. These scenarios can be generalized into the problem of aggregating data distributed among multiple databases and owned by different data owners. However, we should guarantee that an adversary cannot learn anything about the data or the individual contribution of each party towards the final output of the computation. In this paper, we propose a practical solution for secure sharing and computation of genomic data. We adopt the Paillier cryptosystem and the order preserving encryption to securely execute the count query and the ranked query. Experimental results demonstrate that the computation time is realistic enough to make our system adoptable in the real world

    Resolution strategies for serverless computing in information centric networking

    Get PDF
    Named Function Networking (NFN) offers to compute and deliver results of computations in the context of Information Centric Networking (ICN). While ICN offers data delivery without specifying the location where these data are stored, NFN offers the production of results without specifying where the actual computation is executed. In NFN, computation workflows are encoded in (ICN style) Interest Messages using the lambda calculus and based on these workflows, the network will distribute computations and find execution locations. Depending on the use case of the actual network, the decision where to execute a compuation can be different: A resolution strategy running on each node decides if a computation should be forwarded, split into subcomputations or executed locally. This work focuses on the design of resolution strategies for selected scenarios and the online derivation of "execution plans" based on network status and history. Starting with a simple resolution strategy suitable for data centers, we focus on improving load distribution within the data center or even between multiple data centers. We have designed resolution strategies that consider the size of input data and the load on nodes, leading to priced execution plans from which one can select the ones with the least costs. Moreover, we use these plans to create execution templates: Templates can be used to create a resolution strategy by simulating the execution using the planning system, tailored to the specific use case at hand. Finally we designed a resolution strategy for edge computing which is able to handle mobile scenarios typical for vehicular networking. This “mobile edge computing resolution strategy” handles the problem of frequent handovers to a sequence of road-side units without creating additional overhead for the non-mobile use case. All these resolution strategies were evaluated using a simulation system and were compared to the state of the art behavior of data center execution environments and/or cloud configurations. In the case of the vehicular networking strategy, we enhanced existing road-side units and implemented our NFN-based system and plan derivation such that we were able to run and validate our solution in real world tests for mobile edge computing

    IST Austria Thesis

    Get PDF
    The scalability of concurrent data structures and distributed algorithms strongly depends on reducing the contention for shared resources and the costs of synchronization and communication. We show how such cost reductions can be attained by relaxing the strict consistency conditions required by sequential implementations. In the first part of the thesis, we consider relaxation in the context of concurrent data structures. Specifically, in data structures such as priority queues, imposing strong semantics renders scalability impossible, since a correct implementation of the remove operation should return only the element with highest priority. Intuitively, attempting to invoke remove operations concurrently creates a race condition. This bottleneck can be circumvented by relaxing semantics of the affected data structure, thus allowing removal of the elements which are no longer required to have the highest priority. We prove that the randomized implementations of relaxed data structures provide provable guarantees on the priority of the removed elements even under concurrency. Additionally, we show that in some cases the relaxed data structures can be used to scale the classical algorithms which are usually implemented with the exact ones. In the second part, we study parallel variants of the stochastic gradient descent (SGD) algorithm, which distribute computation among the multiple processors, thus reducing the running time. Unfortunately, in order for standard parallel SGD to succeed, each processor has to maintain a local copy of the necessary model parameter, which is identical to the local copies of other processors; the overheads from this perfect consistency in terms of communication and synchronization can negate the speedup gained by distributing the computation. We show that the consistency conditions required by SGD can be relaxed, allowing the algorithm to be more flexible in terms of tolerating quantized communication, asynchrony, or even crash faults, while its convergence remains asymptotically the same

    PARTE : automatic program partitioning for efficient computation over encrypted data

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 45-47).Many modern applications outsource their data storage and computation needs to third parties. Although this lifts many infrastructure burdens from the application developer, he must deal with an increased risk of data leakage (i.e. there are more distributed copies of the data, the third party may be insecure and/or untrustworthy). Oftentimes, the most practical option is to tolerate this risk. This is far from ideal and in case of highly sensitive data (e.g. medical records, location history) it is unacceptable. We present PARTE, a tool to aid application developers in lowering the risk of data leakage. PARTE statically analyzes a program's source, annotated to indicate types which will hold sensitive data (i.e. data that should not be leaked), and outputs a partitioned version of the source. One partition will operate only on encrypted copies of sensitive data to lower the risk of data leakage and can safely be run by a third party or otherwise untrusted environment. The second partition must have plaintext access to sensitive data and therefore should be run in a trusted environment. Program execution will flow between the partitions, levaraging third party resources when data leakage risk is low. Further, we identify operations which, if efficiently supported by some encryption scheme, would improve the performance of partitioned execution. To demonstrate the feasiblity of these ideas, we implement PARTE in Haskell and run it on a web application, hpaste, which allows users to upload and share text snippets. The partitioned hpaste services web request 1.2 - 2.5 x slower than the original hpaste. We find this overhead to be moderately high. Moreover, the partitioning does not allow much code to run on encrypted data. We discuss why we feel our techniques did not produce an attractive partitioning and offer insight on new research directions which could yield better results.by Meelap Shah.S.M

    Optimization based energy-efficient control inmobile communication networks

    Get PDF
    In this work we consider how best to control mobility and transmission for the purpose of datatransfer and aggregation in a network of mobile autonomous agents. In particular we considernetworks containing unmanned aerial vehicles (UAVs). We first consider a single link betweena mobile transmitter-receiver pair, and show that the total amount of transmittable data isbounded. For certain special, but not overly restrictive cases, we can determine closed-formexpressions for this bound, as a function of relevant mobility and communication parameters.We then use nonlinear model predictive control (NMPC) to jointly optimize mobility and trans-mission schemes of all networked nodes for the purpose of minimizing the energy expenditureof the network. This yields a novel nonlinear optimal control problem for arbitrary networksof autonomous agents, which we solve with state-of-the-art nonlinear solvers. Numerical re-sults demonstrate increased network capacity and significant communication energy savingscompared to more na ̈ıve policies. All energy expenditure of an autonomous agent is due tocommunication, computation, or mobility and the actual computation of the NMPC solutionmay be a significant cost in both time and computational resources. Furthermore, frequentbroadcasting of control policies throughout the network can require significant transmit andreceive energies. Motivated by this, we develop an event-triggering scheme which accounts forthe accuracy of the optimal control solution, and provides guarantees of the minimum timebetween successive control updates. Solution accuracy should be accounted for in any triggeredNMPC scheme where the system may be run in open loop for extended times based on pos-sibly inaccurate state predictions. We use this analysis to trade-off the cost of updating ourtransmission and locomotion policies, with the frequency by which they must be updated. Thisgives a method to trade-off the computation, communication and mobility related energies ofthe mobile autonomous network.Open Acces
    corecore