24 research outputs found

    Parallelization of Ant System for GPU under the PRAM Model

    Get PDF
    We study the parallelized ant system algorithm solving the traveling salesman problem on n cities. First, following the series of recent results for the graphics processing unit, we show that they translate to the PRAM (parallel random access machine) model. In addition, we develop a novel pheromone matrix update method under the PRAM CREW (concurrent-read exclusive-write) model and translate it to the graphics processing unit without atomic instructions. As a consequence, we give new asymptotic bounds for the parallel ant system, resulting in step complexities O(n łg łg n) on CRCW (concurrent-read concurrent-write) and O(n łg n) on CREW variants of PRAM using n2 processors in both cases. Finally, we present an experimental comparison with the currently known pheromone matrix update methods on the graphics processing unit and obtain encouraging results

    Parallelization, scalability, and reproducibility in next generation sequencing analysis

    Get PDF
    The analysis of next-generation sequencing (NGS) data is a major topic in bioinformatics: short reads obtained from DNA, the molecule encoding the genome of living organisms, are processed to provide insight into biological or medical questions. This thesis provides novel solutions to major topics within the analysis of NGS data, focusing on parallelization, scalability and reproducibility. The read mapping problem is to find the origin of the short reads within a given reference genome. We contribute the q-group index, a novel data structure for read mapping with particularly small memory footprint. The q-group index comes with massively parallel build and query algorithms targeted towards modern graphics processing units (GPUs). On top, the read mapping software PEANUT is presented, which outperforms state of the art read mappers in speed while maintaining their accuracy. The variant calling problem is to infer (i.e., call) genetic variants of individuals compared to a reference genome using mapped reads. It is usually solved in a Bayesian way. Often, variant calling is followed by filtering variants of different biological samples against each other. With state of the art solutions, the filtering is decoupled from the calling, leading to difficulties in controlling the false discovery rate. In this work, we show how to integrate the filtering into the calling with an algebraic approach and provide an intuitive solution for controlling the false discovery rate along with solving other challenges of variant calling like scaling with a growing set of biological samples. For this, a hierarchical index data structure for storage of preprocessing results is presented and compression strategies are provided. The developed methods are implemented in the software ALPACA. Depending on the research question, the analysis of NGS data entails many other steps, typically involving diverse tools, data transformations and aggregation of results. These steps can be orchestrated by work ow management. We present the general purpose work ow system Snakemake, which provides an easy to read domain-specific language for defining and documenting work ows, thereby ensuring reproducibility of analyses. The language is complemented by an execution environment that allows to scale a work ow to available resources, including parallelization across CPU cores or cluster nodes, restricting memory usage or the number of available coprocessors like GPUs. The benefits of using Snakemake are exemplified by combining the presented approaches for read mapping and variant calling to a complete, scalable and reproducible NGS analysis

    Efficient Data-Oblivious Computation

    Get PDF
    The rapid increase in the amount of data stored by cloud servers has resulted in growing privacy concerns for users. First, although keeping data encrypted at all times is an attractive approach to privacy, encryption may preclude mining and learning useful patterns from data. Second, companies are unable to distribute proprietary programs to other parties without risking the loss of their private code when those programs are reverse engineered. A challenge underlying both those problems is that how data is accessed — even when that data is encrypted — can leak secret information. Oblivious RAM is a well studied cryptographic primitive that can be used to solve the underlying challenge of hiding data-access patterns. In this dissertation, we improve Oblivious RAMs and oblivious algorithms asymptotically. We then show how to apply our novel oblivious algorithms to build systems that enable privacy-preserving computation on encrypted data and program obfuscation. Specifically, the first part of this dissertation shows two efficient Oblivious RAM algorithms: 1) The first algorithm achieves sub-logarithmic bandwidth blowup while only incurring an inexpensive XOR computation for performing Private Information Retrieval operations, and 2) The second algorithm is the first perfectly-secure Oblivious Parallel RAM with O(log3N)O(\log^3 N ) bandwidth blowup, O((logm+loglogN)logN)O((\log m + \log \log N)\log N) depth blowup, and O(1)O(1) space blowup when the PRAM has mm CPUs and stores NN blocks of data. The second part of this dissertation describes two systems — HOP and GraphSC — that address the problem of computing on private data and the distribution of proprietary programs. HOP is a system that achieves simulation-secure obfuscation of RAM programs assuming secure hardware. It is the first prototype implementation of a provably secure virtual black-box (VBB) obfuscation scheme in any model under any assumptions. GraphSC is a system that allows cloud servers to run a class of data-mining and machine-learning algorithms over users’ data without learning anything about that data. GraphSC brings efficient, parallel secure computation to programmers by allowing them to express computation tasks using the GraphLab abstraction. It is backed by the first non-trivial parallel oblivious algorithms that outperform generic Oblivious RAMs

    Anonymizing large transaction data using MapReduce

    Get PDF
    Publishing transaction data is important to applications such as marketing research and biomedical studies. Privacy is a concern when publishing such data since they often contain person-specific sensitive information. To address this problem, different data anonymization methods have been proposed. These methods have focused on protecting the associated individuals from different types of privacy leaks as well as preserving utility of the original data. But all these methods are sequential and are designed to process data on a single machine, hence not scalable to large datasets. Recently, MapReduce has emerged as a highly scalable platform for data-intensive applications. In this work, we consider how MapReduce may be used to provide scalability in large transaction data anonymization. More specifically, we consider how setbased generalization methods such as RBAT (Rule-Based Anonymization of Transaction data) may be parallelized using MapReduce. Set-based generalization methods have some desirable features for transaction anonymization, but their highly iterative nature makes parallelization challenging. RBAT is a good representative of such methods. We propose a method for transaction data partitioning and representation. We also present two MapReduce-based parallelizations of RBAT. Our methods ensure scalability when the number of transaction records and domain of items are large. Our preliminary results show that a direct parallelization of RBAT by partitioning data alone can result in significant overhead, which can offset the gains from parallel processing. We propose MR-RBAT that generalizes our direct parallel method and allows to control parallelization overhead. Our experimental results show that MR-RBAT can scale linearly to large datasets and to the available resources while retaining good data utility

    Generating and auto-tuning parallel stencil codes

    Get PDF
    In this thesis, we present a software framework, Patus, which generates high performance stencil codes for different types of hardware platforms, including current multicore CPU and graphics processing unit architectures. The ultimate goals of the framework are productivity, portability (of both the code and performance), and achieving a high performance on the target platform. A stencil computation updates every grid point in a structured grid based on the values of its neighboring points. This class of computations occurs frequently in scientific and general purpose computing (e.g., in partial differential equation solvers or in image processing), justifying the focus on this kind of computation. The proposed key ingredients to achieve the goals of productivity, portability, and performance are domain specific languages (DSLs) and the auto-tuning methodology. The Patus stencil specification DSL allows the programmer to express a stencil computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by disburdening her or him of low level programming model issues and of manually applying hardware platform-specific code optimization techniques. The use of domain specific languages also implies code reusability: once implemented, the same stencil specification can be reused on different hardware platforms, i.e., the specification code is portable across hardware architectures. Constructing the language to be geared towards a special purpose makes it amenable to more aggressive optimizations and therefore to potentially higher performance. Auto-tuning provides performance and performance portability by automated adaptation of implementation-specific parameters to the characteristics of the hardware on which the code will run. By automating the process of parameter tuning — which essentially amounts to solving an integer programming problem in which the objective function is the number representing the code's performance as a function of the parameter configuration, — the system can also be used more productively than if the programmer had to fine-tune the code manually. We show performance results for a variety of stencils, for which Patus was used to generate the corresponding implementations. The selection includes stencils taken from two real-world applications: a simulation of the temperature within the human body during hyperthermia cancer treatment and a seismic application. These examples demonstrate the framework's flexibility and ability to produce high performance code

    Privacy-Aware and Secure Decentralized Air Quality Monitoring

    Get PDF
    Indoor Air Quality monitoring is a major asset to improving quality of life and building management. Today, the evolution of embedded technologies allows the implementation of such monitoring on the edge of the network. However, several concerns need to be addressed related to data security and privacy, routing and sink placement optimization, protection from external monitoring, and distributed data mining. In this paper, we describe an integrated framework that features distributed storage, blockchain-based Role-based Access Control, onion routing, routing and sink placement optimization, and distributed data mining to answer these concerns. We describe the organization of our contribution and show its relevance with simulations and experiments over a set of use cases

    Energy-Performance Optimization for the Cloud

    Get PDF

    Distributed and Lightweight Meta-heuristic Optimization method for Complex Problems

    Get PDF
    The world is becoming more prominent and more complex every day. The resources are limited and efficiently use them is one of the most requirement. Finding an Efficient and optimal solution in complex problems needs to practical methods. During the last decades, several optimization approaches have been presented that they can apply to different optimization problems, and they can achieve different performance on various problems. Different parameters can have a significant effect on the results, such as the type of search spaces. Between the main categories of optimization methods (deterministic and stochastic methods), stochastic optimization methods work more efficient on big complex problems than deterministic methods. But in highly complex problems, stochastic optimization methods also have some issues, such as execution time, convergence to local optimum, incompatible with distributed systems, and dependence on the type of search spaces. Therefore this thesis presents a distributed and lightweight metaheuristic optimization method (MICGA) for complex problems focusing on four main tracks. 1) The primary goal is to improve the execution time by MICGA. 2) The proposed method increases the stability and reliability of the results by using the multi-population strategy in the second track. 3) MICGA is compatible with distributed systems. 4) Finally, MICGA is applied to the different type of optimization problems with other kinds of search spaces (continuous, discrete and order based optimization problems). MICGA has been compared with other efficient optimization approaches. The results show the proposed work has been achieved enough improvement on the main issues of the stochastic methods that are mentioned before.Maailmasta on päivä päivältä tulossa yhä monimutkaisempi. Resurssit ovat rajalliset, ja siksi niiden tehokas käyttö on erittäin tärkeää. Tehokkaan ja optimaalisen ratkaisun löytäminen monimutkaisiin ongelmiin vaatii tehokkaita käytännön menetelmiä. Viime vuosikymmenien aikana on ehdotettu useita optimointimenetelmiä, joilla jokaisella on vahvuutensa ja heikkoutensa suorituskyvyn ja tarkkuuden suhteen erityyppisten ongelmien ratkaisemisessa. Parametreilla, kuten hakuavaruuden tyypillä, voi olla merkittävä vaikutus tuloksiin. Optimointimenetelmien pääryhmistä (deterministiset ja stokastiset menetelmät) stokastinen optimointi toimii suurissa monimutkaisissa ongelmissa tehokkaammin kuin deterministinen optimointi. Erittäin monimutkaisissa ongelmissa stokastisilla optimointimenetelmillä on kuitenkin myös joitain ongelmia, kuten korkeat suoritusajat, päätyminen paikallisiin optimipisteisiin, yhteensopimattomuus hajautetun toteutuksen kanssa ja riippuvuus hakuavaruuden tyypistä. Tämä opinnäytetyö esittelee hajautetun ja kevyen metaheuristisen optimointimenetelmän (MICGA) monimutkaisille ongelmille keskittyen neljään päätavoitteeseen: 1) Ensisijaisena tavoitteena on pienentää suoritusaikaa MICGA:n avulla. 2) Lisäksi ehdotettu menetelmä lisää tulosten vakautta ja luotettavuutta käyttämällä monipopulaatiostrategiaa. 3) MICGA tukee hajautettua toteutusta. 4) Lopuksi MICGA-menetelmää sovelletaan erilaisiin optimointiongelmiin, jotka edustavat erityyppisiä hakuavaruuksia (jatkuvat, diskreetit ja järjestykseen perustuvat optimointiongelmat). Työssä MICGA-menetelmää verrataan muihin tehokkaisiin optimointimenetelmiin. Tulokset osoittavat, että ehdotetulla menetelmällä saavutetaan selkeitä parannuksia yllä mainittuihin stokastisten menetelmien pääongelmiin liittyen

    Optimal program variant generation for hybrid manycore systems

    Get PDF
    Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel's HLS address a large part of the programmability issue by synthesizing a Hardware Description Languages representation from a high-level specification of the application, given in programming languages such as OpenCL C, typically used to program CPUs and GPUs. Although HLS solutions make programming easier, they fail to also lighten the burden of optimization. Application developers must rely on expert knowledge to manually optimize their applications for each target device, meaning that traditional HLS solutions do not offer a solution to the issue of performance portability. This state of fact prompted the development of compiler frameworks such as TyTra that operate at an even higher level of abstraction that is amenable to the use of Design Space Exploration (DSE). With DSE the initial program specification can be seen as the starting location in a search-space of correct-by-construction program transformations. In TyTra the search-space is generated from the transitive-closure of term-level transformations derived from type-level transformations. Compiler frameworks such as TyTra theoretically solve the issue of performance portability by providing a way to automatically generate alternative correct program variants. They however suffer from the very practical issue that the generated space is often too large to fully explore. As a consequence, the globally optimal solution may be overlooked. In this work we provide a novel solution to issue performance portability by deriving an efficient yet effective DSE strategy for the TyTra compiler framework. We make use of categorical data types to derive categorical semantics for the formal languages that describe the terms, types, cost-performance estimates and their transformations. From these we define a category of interpretations for TyTra applications, from which we derive a DSE strategy that finds the globally optimal transformation sequence in polynomial time. This is achieved by reducing the size of the generated search space. We formally state and prove a theorem for this claim and then show that the polynomial run-time for our DSE strategy has practically negligible coefficients leading to sub-second exploration times for realistic applications
    corecore