2,691 research outputs found

    SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors

    Full text link
    The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, multiple data (SIMD) vectors within each core (vectorize). By searching against the large UniProtKB/TrEMBL protein database, SWAPHI achieves a performance of up to 58.8 billion cell updates per second (GCUPS) on one coprocessor and up to 228.4 GCUPS on four coprocessors. Furthermore, it demonstrates good parallel scalability on varying number of coprocessors, and is also superior to both SWIPE on 16 high-end CPU cores and BLAST+ on 8 cores when using four coprocessors, with the maximum speedup of 1.52 and 1.86, respectively. SWAPHI is written in C++ language (with a set of SIMD intrinsics), and is freely available at http://swaphi.sourceforge.net.Comment: A short version of this paper has been accepted by the IEEE ASAP 2014 conferenc

    Cost-Effective Resource Allocation and Throughput Maximization in Mobile Cloudlets and Distributed Clouds

    Get PDF
    With the advance in communication networks and the use explosion of mobile devices, distributed clouds consisting of many small and medium datacenters in geographical locations and cloudlets defined as "mini" datacenters are envisioned as the next-generation cloud computing platform. In particular, distributed clouds enable disaster-resilient and scalable services by scaling the services into multiple datacenters, while cloudlets allow pervasive and continuous services with low access delay by further enabling mobile users to access the services within their proximity. To realize the promises provided by distributed clouds and mobile cloudlets, it is urgently to optimize various system performance of distributed clouds and cloudlets, such as system throughput and operational cost by developing efficient solutions. In this thesis, we aim to devise novel solutions to maximize the system throughput of mobile cloudlets, and minimize the operational costs of distributed clouds, while meeting the resource capacity constraints and users' resource demands. This however poses great challenges, that is, (1) how to maximize the system throughput of a mobile cloudlet, considering that a mobile cloudlet has limited resources to serve energy-constrained mobile devices, (2) how to efficiently and effectively manage and evaluate big data in distributed clouds, and (3) how to efficiently allocate the resources of a distributed cloud to meet the resource demands of various users. Existing studies mainly focused on implementing systems and lacked systematic optimization methods to optimize the performance of distributed clouds and mobile cloudlets. Novel techniques and approaches for performance optimization of distributed clouds and mobile cloudlets are desperately needed. To address these challenges, this thesis makes the following contributions. We firstly study online request admissions in a cloudlet with the aim of maximizing the system throughput, assuming that future user requests are not known in advance. We propose a novel admission cost model to accurately model dynamic resource consumption, and devise efficient algorithms for online request admissions. We secondly study a novel collaboration- and fairness-aware big data management problem in a distributed cloud to maximize the system throughput, while minimizing the operational cost of service providers, subject to resource capacities and users' fairness constraints, for which, we propose a novel optimization framework and devise a fast yet scalable approximation algorithm with an approximation ratio. We thirdly investigate online query evaluation for big data analysis in a distributed cloud to maximize the query acceptance ratio, while minimizing the query evaluation cost. For this problem, we propose a novel metric to model the costs of different resource consumptions in datacenters, and devise efficient online algorithms under both unsplittable and splittable source data assumptions. We fourthly address the problem of community-aware data placement of online social networks into a distributed cloud, with the aim of minimizing the operational cost of the cloud service provider, and devise a fast yet scalable algorithm for the problem, by leveraging the close community concept that considers both user read rates and update rates. We also deal with social network evolutions, by developing a dynamic evaluation algorithm for the problem. We finally evaluate the performance of all proposed algorithms in this thesis through experimental simulations, using real and/or synthetic datasets. Simulation results show that the proposed algorithms significantly outperform existing algorithms

    Hybrid Cloud-Based Privacy Preserving Clustering as Service for Enterprise Big Data

    Get PDF
    Clustering as service is being offered by many cloud service providers. It helps enterprises to learn hidden patterns and learn knowledge from large, big data generated by enterprises. Though it brings lot of value to enterprises, it also exposes the data to various security and privacy threats. Privacy preserving clustering is being proposed a solution to address this problem. But the privacy preserving clustering as outsourced service model involves too much overhead on querying user, lacks adaptivity to incremental data and involves frequent interaction between service provider and the querying user. There is also a lack of personalization to clustering by the querying user. This work “Locality Sensitive Hashing for Transformed Dataset (LSHTD)” proposes a hybrid cloud-based clustering as service model for streaming data that address the problems in the existing model such as privacy preserving k-means clustering outsourcing under multiple keys (PPCOM) and secure nearest neighbor clustering (SNNC) models, The solution combines hybrid cloud, LSHTD clustering algorithm as outsourced service model. Through experiments, the proposed solution is able is found to reduce the computation cost by 23% and communication cost by 6% and able to provide better clustering accuracy with ARI greater than 4.59% compared to existing works

    Computing at massive scale: Scalability and dependability challenges

    Get PDF
    Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications

    OS2: Oblivious similarity based searching for encrypted data outsourced to an untrusted domain

    Get PDF
    © 2017 Pervez et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by exploiting cryptographic primitives and secure indexing to identify outsourced data that satisfy the search criteria. Almost all of these schemes rely on exact matching between the encrypted data and search criteria. A few schemes which extend the notion of exact matching to similarity based search, lack realism as those schemes rely on trusted third parties or due to increase storage and computational complexity. In this paper we propose Oblivious Similarity based Search (OS2) for encrypted data. It enables authorized users to model their own encrypted search queries which are resilient to typographical errors. Unlike conventional methodologies, OS2 ranks the search results by using similarity measure offering a better search experience than exact matching. It utilizes encrypted bloom filter and probabilistic homomorphic encryption to enable authorized users to access relevant data without revealing results of search query evaluation process to the untrusted cloud service provider. Encrypted bloom filter based search enables OS2 to reduce search space to potentially relevant encrypted data avoiding unnecessary computation on public cloud. The efficacy of OS2 is evaluated on Google App Engine for various bloom filter lengths on different cloud configurations
    corecore