Search CORE

59 research outputs found

Speculative Execution Resilient Cryptography

Author: Rui Pedro Gomes Fernandes
Publication venue
Publication date: 28/07/2023
Field of study

Repositório Aberto da Universidade do Porto

Frequent itemset mining on multiprocessor systems

Author: Schlegel Benjamin
Publication venue
Publication date: 30/05/2013
Field of study

Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets

Technische Universität Dresden: Qucosa

Análise de canais laterais de tempo em tradutores dinâmicos de binários

Author: Napoli Otávio Oliveira, 1994-
Publication venue: [s.n.]
Publication date: 16/08/2019
Field of study

Orientadores: Edson Borin, Diego de Freitas AranhaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Ataques de canais laterais são um importante problema para os algoritmos criptográficos. Se o tempo de execução de uma implementação depende de uma informação secreta, um adversário pode recuperar a mesma através da medição de seu tempo. Diferentes abordagens surgiram recentemente para explorar o vazamento de informações em implementações criptográficas e para protegê-las contra esses ataques. Para tanto, a criptografia em tempo constante é uma pratica amplamente adotada visando descorrelacionar a dependencia entre um dado secreto e suas amostras de tempo. Apesar das contra-medidas serem eficazes para garantir execução dos algoritmos em um sistema evitando canais laterais de tempo, emuladores podem modificar e reintroduzir pontos de vazamento durante sua execução. Trabalhos recentes discutem os impactos dos compiladores Just-In-Time (JIT) de linguagens de alto nível no vazamento de informações a partir do tempo de execução. Entretanto, pouco foi dito sobre a emulação entre ISAs e seu impacto em vazamentos de tempo. Neste trabalho, nós investigamos o impacto de emuladores (tradutores dinâmicos de binários) entre ISAs na propriedade de tempo constante de implementações criptográficas. Utilizando métodos estatísticos e rotinas criptográficas validas, nós afirmamos a viabilidade de vazamentos de tempo em códigos gerados por tradutores dinâmicos de binários, usando diferentes técnicas de formação de regiões. Nós mostramos que a emulação pode ter um impacto significante, inserindo construções de tempo não constante durante sua tradução, levando a vazamentos de tempo significantes. Esses vazamentos podem ser observados em tradutores dinâmicos como o QEMU e o HQEMU durante a emulação de rotinas de bibliotecas criptográficas conhecidas, como a mbedTLS e podem ser rapidamente verificados. Por fim, para garantir a propriedade de tempo constante nós propusemos um modelo de mitigação para tradutores dinâmicos de binários baseado em transformações de compiladores, mitigando os canais laterais inseridosAbstract: Timing side-channel attacks are an important issue for cryptographic algorithms. If the execution time of an implementation depends on secret information, an adversary may recover the latter through measuring the former. Different approaches have recently emerged to exploit information leakage on cryptographic implementations and to protect them against these attacks. Therefore, implementation of constant-time cryptography is a widely adopted practice aiming to decorrelate the dependency between a secret data and its timing samples. Despite the countermeasures are effective to guarantee the execution of algorithms in a system by avoiding timing side-channels, emulators can modify and reintroduce leakage points during their execution. Recent works discusses the impact of high level language Just-In-Time (JIT) compilers in leakages through execution time. However, little has been said about Cross-ISA emulation through DBT and its impact on timing leakages. In this work, we investigate the impact of emulators (dynamic binary translators) on constant-time property of cryptographic implementations. By using statistical methods and cryptographic routines we asserted the feasibility of timing leaks in codes generated by a dynamic binary translator, even using different Region Formation Techniques. We show that the emulation may have a significant impact by inserting non constant-time constructions during its translations, leading to a significant timing leakage. This leakage is observed in dynamic binary translation systems such as QEMU and HQEMU when emulating routines from known cryptographic libraries, such mbedTLS and can be quickly verified. Finally, to guarantee the constant-time property we implemented a compiler transformation based on the if-conversion transformation in the dynamic binary translators, mitigating the inserted timing side-channelsMestradoCiência da ComputaçãoMestre em Ciência da Computação2014/50704-7FAPES

Repositorio da Producao Cientifica e Intelectual da Unicamp

Object-oriented implementations of the MPDATA advection equation solver in C++, Python and Fortran

Author: Arabas Sylwester
Fijałkowski Maciej
Jarecka Dorota
Jaruga Anna
Publication venue: 'IOS Press'
Publication date: 19/03/2013
Field of study

Three object-oriented implementations of a prototype solver of the advection equation are introduced. The presented programs are based on Blitz++ (C++), NumPy (Python), and Fortran's built-in array containers. The solvers include an implementation of the Multidimensional Positive-Definite Advective Transport Algorithm (MPDATA). The introduced codes exemplify how the application of object-oriented programming (OOP) techniques allows to reproduce the mathematical notation used in the literature within the program code. A discussion on the tradeoffs of the programming language choice is presented. The main angles of comparison are code brevity and syntax clarity (and hence maintainability and auditability) as well as performance. In the case of Python, a significant performance gain is observed when switching from the standard interpreter (CPython) to the PyPy implementation of Python. Entire source code of all three implementations is embedded in the text and is licensed under the terms of the GNU GPL license

arXiv.org e-Print Archive

Directory of Open Access Journals

Compiler-Driven Cache Policy (Known Reference String)

Author: Chi Chi Hung
Dietz Henry G.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/06/1987
Field of study

Increasing cache hit-ratios has proved to be instrumental in improving performance of cache-based computers. This is particularly true for computers which have a high cache-miss/cache-hit memory reference delay ratio. Although software policies are often used for main vs. secondary memory caching , the speed required for an implementation of a CPU vs. main memory cache policy has prompted only investigation of policies which can be implemented directly in hardware. Based on compile-time analysis, it is possible to predict program behavior, thereby increasing the hit-ratio beyond the capability of pure run-time (hardware) techniques. In this report, compiler-driven techniques for this kind of cache policy are described. The SCP Model (software cache policy model) provides an optimal cache prefetch and placement/replacement policy when given an arbitrary memory reference string. In addition to suggesting a simplified cache hardware model, the SCP Model can be applied to various cache organizations such as direct mapping, set associative, and full associative. Analytic results demonstrate significant improvements in cache performance. The current work discusses an optimal cache policy which applies where the string of references is known at compile time. However, this constraint can be relaxed to encompass reference strings which are known only statistically, i.e., reference strings in which data aliases make the target of some references ambiguous. Companion reports, currently in preparation, detail the extension of the SCP Model to incorporate aliases, code incorporating loops, and conditional branches

Purdue E-Pubs

Recommended from our members

Efficient analysis and storage of large-scale genomic data

Author: Klarqvist Marcus
Publication venue: University of Cambridge
Publication date: 01/09/2019
Field of study

The impending advent of population-scaled sequencing cohorts involving tens of millions of individuals with matched phenotypic measurements will produce unprecedented volumes of genetic data. Storing and analysing such gargantuan datasets places computational performance at a pivotal position in medical genomics. In this thesis, I explore the potential for accelerating and parallelizing standard genetics workflows, file formats, and algorithms using both hardware-accelerated vectorization, parallel and distributed algorithms, and heterogeneous computing. First, I describe a novel bit-counting operation termed the positional population-count, which can be used together with succinct representations and standard efficient operations to accelerate many genetic calculations. In order to enable the use of this new operator and the canonical population count on any target machine I developed a unified low-level library using CPU dispatching to select the optimal method contingent on the available instruction set architecture and the given input size at run-time. As a proof-of-principle application, I apply the positional population-count operator to computing quality control-related summary statistics for terabyte-scaled sequencing readsets with >3,800-fold speed improvements. As another application, I describe a framework for efficiently computing the cardinality of set intersection using these operators and applied this framework to efficiently compute genome-wide linkage-disequilibrium in datasets with up to 67 million samples resulting in up to >60-fold improvements in speed for dense genotypic vectors and up to >250,000-fold savings in memory and >100,000-fold improvement in speed for sparse genotypic vectors. I next describe a framework for handling the terabytes of compressed output data and describe graphical routines for visualizing long-range linkage-disequilibrium blocks as seen over many human centromeres. Finally, I describe efficient algorithms for storing and querying very large genetic datasets and specialized algorithms for the genotype component of such datasets with >10,000-fold savings in memory compared to the current interchange format.Wellcome Trus

Apollo (Cambridge)

Finance for Food

Author: Köhn Doris
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Development Economics; Agricultural Economics; Finance, general; Economic Growth; Macroeconomics/Monetary Economics//Financial Economics; Microfinance; Rural Finance; Agricultural Finance; Rural Development; Developing Countrie

OAPEN Library

Timing Analysis of General Purpose Graphics Processing Units for Real-Time Systems: Models and Analyses

Author: Kostiantyn Berezovskyi
Publication venue
Publication date: 20/04/2016
Field of study

Repositório Aberto da Universidade do Porto

Efficient software implementation of elliptic curves and bilinear pairings

Author: Aranha Diego de Freitas, 1982-
Publication venue: [s.n.]
Publication date: 19/08/2018
Field of study

Orientador: Júlio César Lopez HernándezTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O advento da criptografia assimétrica ou de chave pública possibilitou a aplicação de criptografia em novos cenários, como assinaturas digitais e comércio eletrônico, tornando-a componente vital para o fornecimento de confidencialidade e autenticação em meios de comunicação. Dentre os métodos mais eficientes de criptografia assimétrica, a criptografia de curvas elípticas destaca-se pelos baixos requisitos de armazenamento para chaves e custo computacional para execução. A descoberta relativamente recente da criptografia baseada em emparelhamentos bilineares sobre curvas elípticas permitiu ainda sua flexibilização e a construção de sistemas criptográficos com propriedades inovadoras, como sistemas baseados em identidades e suas variantes. Porém, o custo computacional de criptossistemas baseados em emparelhamentos ainda permanece significativamente maior do que os assimétricos tradicionais, representando um obstáculo para sua adoção, especialmente em dispositivos com recursos limitados. As contribuições deste trabalho objetivam aprimorar o desempenho de criptossistemas baseados em curvas elípticas e emparelhamentos bilineares e consistem em: (i) implementação eficiente de corpos binários em arquiteturas embutidas de 8 bits (microcontroladores presentes em sensores sem fio); (ii) formulação eficiente de aritmética em corpos binários para conjuntos vetoriais de arquiteturas de 64 bits e famílias mais recentes de processadores desktop dotadas de suporte nativo à multiplicação em corpos binários; (iii) técnicas para implementação serial e paralela de curvas elípticas binárias e emparelhamentos bilineares simétricos e assimétricos definidos sobre corpos primos ou binários. Estas contribuições permitiram obter significativos ganhos de desempenho e, conseqüentemente, uma série de recordes de velocidade para o cálculo de diversos algoritmos criptográficos relevantes em arquiteturas modernas que vão de sistemas embarcados de 8 bits a processadores com 8 coresAbstract: The development of asymmetric or public key cryptography made possible new applications of cryptography such as digital signatures and electronic commerce. Cryptography is now a vital component for providing confidentiality and authentication in communication infra-structures. Elliptic Curve Cryptography is among the most efficient public-key methods because of its low storage and computational requirements. The relatively recent advent of Pairing-Based Cryptography allowed the further construction of flexible and innovative cryptographic solutions like Identity-Based Cryptography and variants. However, the computational cost of pairing-based cryptosystems remains significantly higher than traditional public key cryptosystems and thus an important obstacle for adoption, specially in resource-constrained devices. The main contributions of this work aim to improve the performance of curve-based cryptosystems, consisting of: (i) efficient implementation of binary fields in 8-bit microcontrollers embedded in sensor network nodes; (ii) efficient formulation of binary field arithmetic in terms of vector instructions present in 64-bit architectures, and on the recently-introduced native support for binary field multiplication in the latest Intel microarchitecture families; (iii) techniques for serial and parallel implementation of binary elliptic curves and symmetric and asymmetric pairings defined over prime and binary fields. These contributions produced important performance improvements and, consequently, several speed records for computing relevant cryptographic algorithms in modern computer architectures ranging from embedded 8-bit microcontrollers to 8-core processorsDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Implementace algoritmů pro zpracovaní obrazu na IBM Cell

Author: Klecanda Václav
Publication venue: Univerzita Karlova, Matematicko-fyzikální fakulta
Publication date: 01/01/2009
Field of study

Práce shrnuje dostupné informace o architektuře IBM Cell/B.E. tak, aby čtenář rychle získal potřebný náhled na problematiku programování pro tuto architekturu. Praktické informace jsou čerpány z vývoje aplikace která implementuje netrivialní algoritmus z oblasti zpracování obrazu, sparse field level set segmentation. Další část obsahuje popis vývoje této aplikace a řešení problémů, které mohou během něj nastat. Práce zároveň srovnává klasickou a Cell architekturu a popisuje nutné podmínky pro vytvoření efektivní aplikace pro Cell/B.E. Dále obsahuje stručný postup instalace nejdůležitějších vývojových nástrojů. Tento postup si klade za cíl co nejrychleji připravit vše potřebné a zkrátit tak dobu přípravné fáze tak, aby čtenář mohl začít vyvíjet pro Cell/B.E.This work summarize available information about IBM Cell/B.E. architecture to let the reader create a necessary overview for programming for this architecture. Practical information are based on development of an application that implements nontrivial image processing algorithm, sparse field level set segmentation. Next section contains description of the application development and associated problems solving. The work compares common and Cell B.E. architectures and describes conditions necessary for creation of an effective Cell/B.E. application. The work also contains brief procedure of the most important development tools installation. This procedure has to prepare everything necessary as fast as possible and thus to shorten the duration of the preparation phase to let the reader to start development.Department of Software and Computer Science EducationKatedra softwaru a výuky informatikyFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

CU Digital Repository