8,449 research outputs found

    Privacy-Preserving Secret Shared Computations using MapReduce

    Full text link
    Data outsourcing allows data owners to keep their data at \emph{untrusted} clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which was developed for \emph{trusted} private clouds. This paper presents algorithms for data outsourcing based on Shamir's secret-sharing scheme and for executing privacy-preserving SQL queries such as count, selection including range selection, projection, and join while using MapReduce as an underlying programming model. Our proposed algorithms prevent an adversary from knowing the database or the query while also preventing output-size and access-pattern attacks. Interestingly, our algorithms do not involve the database owner, which only creates and distributes secret-shares once, in answering any query, and hence, the database owner also cannot learn the query. Logically and experimentally, we evaluate the efficiency of the algorithms on the following parameters: (\textit{i}) the number of communication rounds (between a user and a server), (\textit{ii}) the total amount of bit flow (between a user and a server), and (\textit{iii}) the computational load at the user and the server.\BComment: IEEE Transactions on Dependable and Secure Computing, Accepted 01 Aug. 201

    Ranking Large Temporal Data

    Full text link
    Ranking temporal data has not been studied until recently, even though ranking is an important operator (being promoted as a firstclass citizen) in database systems. However, only the instant top-k queries on temporal data were studied in, where objects with the k highest scores at a query time instance t are to be retrieved. The instant top-k definition clearly comes with limitations (sensitive to outliers, difficult to choose a meaningful query time t). A more flexible and general ranking operation is to rank objects based on the aggregation of their scores in a query interval, which we dub the aggregate top-k query on temporal data. For example, return the top-10 weather stations having the highest average temperature from 10/01/2010 to 10/07/2010; find the top-20 stocks having the largest total transaction volumes from 02/05/2011 to 02/07/2011. This work presents a comprehensive study to this problem by designing both exact and approximate methods (with approximation quality guarantees). We also provide theoretical analysis on the construction cost, the index size, the update and the query costs of each approach. Extensive experiments on large real datasets clearly demonstrate the efficiency, the effectiveness, and the scalability of our methods compared to the baseline methods.Comment: VLDB201

    Stochastic Block Coordinate Frank-Wolfe Algorithm for Large-Scale Biological Network Alignment

    Get PDF
    With increasingly "big" data available in biomedical research, deriving accurate and reproducible biology knowledge from such big data imposes enormous computational challenges. In this paper, motivated by recently developed stochastic block coordinate algorithms, we propose a highly scalable randomized block coordinate Frank-Wolfe algorithm for convex optimization with general compact convex constraints, which has diverse applications in analyzing biomedical data for better understanding cellular and disease mechanisms. We focus on implementing the derived stochastic block coordinate algorithm to align protein-protein interaction networks for identifying conserved functional pathways based on the IsoRank framework. Our derived stochastic block coordinate Frank-Wolfe (SBCFW) algorithm has the convergence guarantee and naturally leads to the decreased computational cost (time and space) for each iteration. Our experiments for querying conserved functional protein complexes in yeast networks confirm the effectiveness of this technique for analyzing large-scale biological networks
    • …
    corecore