Search CORE

39 research outputs found

Superscalar Sample Queue: Engineering a Distribution-Based Priority Queue

Author: Grün Raphael von der
Publication venue
Publication date: 05/09/2022
Field of study

KITopen

Heap-construction programs

Author: Edelkamp Stefan
Elmasry Amr
Katajainen Jyrki
Publication venue: Department of Computer Science, University of Copenhagen
Publication date: 04/11/2016
Field of study

Copenhagen University Research Information System

Simple Symmetric Sustainable Sorting -- the greeNsort article

Author: Oehlschlägel Jens
Publication venue
Publication date: 02/02/2024
Field of study

We explored an uncharted part of the solution space for sorting algorithms: the role of symmetry in divide&conquer algorithms. We found/designed novel simple binary Quicksort and Mergesort algorithms operating in contiguous space which achieve improved trade-offs between worst-case CPU-efficiency, best-case adaptivity and RAM-requirements. The 'greeNsort' algorithms need less hardware (RAM) and/or less energy (CPU) compared to the prior art. The new algorithms fit a theoretical framework: 'Footprint' KPIs allow to compare algorithms with different RAM-requirements, a new 'definition' of sorting API-targets simplifies construction of stable algorithms with mirrored scan directions, and our ordinal machine model encourages robust algorithms that minimize access 'distance'. Unlike earlier 'Quicksorts', our 'Zacksort', 'Zucksort' and 'Ducksort' algorithms optimally marry CPU-efficiency and tie-adaptivity. Unlike earlier 'Mergesorts' which required 100% distant buffer, our 'Frogsort' and 'Geckosort' algorithms achieve similar CPU-efficiency with 50% or less local buffer. Unlike natural Mergesorts such as 'Timsort' which are optimized for the best case of full-presorting, our 'Octosort' and 'Squidsort' algorithms achieve excellent bi-adaptivity to presorted best-cases without sacrificing worst-case efficiency in real sorting tasks. Our 'Walksort' and 'Jumpsort' have lower Footprint than the impressive low-memory 'Grailsort' and 'Sqrtsort' of Astrelin. Given the current climate-emergency, this is a call to action for all maintainers of sorting libraries, all software-engineers using custom sorting code, all professors teaching algorithms, all IT professionals designing programming languages, compilers and CPUs: check for better algorithms and consider symmetric code-mirroring.Comment: 50 pages, 6 Figures, latest version under https://github.com/greeNsort/greeNsort.article, see also https://greensort.or

arXiv.org e-Print Archive

Parallel Sparse Matrix-Matrix Multiplication

Author: Alexandrov Luben
Publication venue: Karlsruher Institut für Technologie
Publication date: 25/01/2021
Field of study

The thesis investigates the BLAS-3 routine of sparse matrix-matrix multiplication (SpGEMM) based on the outer product method. Sev- eral algorithmic approaches have been implemented and empirically an- alyzed. The experiments have shown that an algorithm presented by Gustavson [22] outperforms other alternatives. In this work we propose optimization techniques that improve the scalability and the cache efficiency of the Gustavson’s algorithm for large matrices. Our approach succeeded to reduce the cache misses by more than a factor of five and to improve the net running time by 30% with some instances. The thesis also presents an algorithm for flops estima- tion, which can be used to determine an upper bound for the density of the result matrix. Furthermore, the work analyzes and empirically evaluates techniques for parallelization of the multiplication in a shared memory model by using Intel TBB and OpenMP. We investigate the cache efficiency of the algorithm in a parallel setting and compare several approaches for load balancing of the computation

KITopen

A Simple Deterministic Algorithm for Systems of Quadratic Polynomials over $\mathbb{F}_2$

Author: Charles Bouillaguet
Claire Delaplace
Monika Trimoska
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 17/12/2021
Field of study

This article discusses a simple deterministic algorithm for solving quadratic Boolean systems which is essentially a special case of more sophisticated methods. The main idea fits in a single sentence: guess enough variables so that the remaining quadratic equations can be solved by linearization (i.e. by considering each remaining monomial as an independent variable and solving the resulting linear system) and restart until the solution is found. Under strong heuristic assumptions, this finds all the solutions of

m

quadratic polynomials in

n

variables with

\mathcal{\tilde O}({2^{n-\sqrt{2m}}})

operations. Although the best known algorithms require exponentially less time, the present technique has the advantage of being simpler to describe and easy to implement. In strong contrast with the state-of-the-art, it is also quite efficient in practice

HAL Descartes

Cryptology ePrint Archive

OpenISA, um conjunto de instruções híbrido

Author: Auler Rafael, 1986-
Publication venue: [s.n.]
Publication date: 31/08/2018
Field of study

Orientador: Edson BorinTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: OpenISA é concebido como a interface de processadores que pretendem ser altamente flexíveis. Isto é conseguido por meio de três estratégias: em primeiro lugar, o ISA é empiricamente escolhido para ser facilmente traduzido para outros, possibilitando flexibilidade do software no caso de um processador OpenISA físico não estar disponível. Neste caso, não há nenhuma necessidade de aplicar um processador virtual OpenISA em software. O ISA está preparado para ser estaticamente traduzido para outros ISAs. Segundo, o ISA não é um ISA concreto nem um ISA virtual, mas um híbrido com a capacidade de admitir modificações nos opcodes sem afetar a compatibilidade retroativa. Este mecanismo permite que as futuras versões do ISA possam sofrer modificações em vez de extensões simples das versões anteriores, um problema comum com ISA concretos, como o x86. Em terceiro lugar, a utilização de uma licença permissiva permite o ISA ser usado livremente por qualquer parte interessada no projeto. Nesta tese de doutorado, concentramo-nos nas instruções de nível de usuário do OpenISA. A tese discute (1) alternativas para ISAs, alternativas para distribuição de programas e o impacto de cada opção, (2) características importantes de OpenISA para atingir seus objetivos e (3) fornece uma completa avaliação do ISA escolhido com respeito a emulação de desempenho em duas CPUs populares, uma projetada pela Intel e outra pela ARM. Concluímos que a versão do OpenISA apresentada aqui pode preservar desempenho próximo do nativo quando traduzida para outros hospedeiros, funcionando como um modelo promissor para ISAs flexíveis da próxima geração que podem ser facilmente estendidos preservando a compatibilidade. Ainda, também mostramos como isso pode ser usado como um formato de distribuição de programas no nível de usuárioAbstract: OpenISA is designed as the interface of processors that aim to be highly flexible. This is achieved by means of three strategies: first, the ISA is empirically chosen to be easily translated to others, providing software flexibility in case a physical OpenISA processor is not available. Second, the ISA is not a concrete ISA nor a virtual ISA, but a hybrid one with the capability of admitting modifications to opcodes without impacting backwards compatibility. This mechanism allows future versions of the ISA to have real changes instead of simple extensions of previous versions, a common problem with concrete ISAs such as the x86. Third, the use of a permissive license allows the ISA to be freely used by any party interested in the project. In this PhD. thesis, we focus on the user-level instructions of OpenISA. The thesis discusses (1) ISA alternatives, program distribution alternatives and the impact of each choice, (2) important features of OpenISA to achieve its goals and (3) provides a thorough evaluation of the chosen ISA with respect to emulation performance on two popular host CPUs, one from Intel and another from ARM. We conclude that the version of OpenISA presented here can preserve close-to-native performance when translated to other hosts, working as a promising model for next-generation, flexible ISAs that can be easily extended while preserving backwards compatibility. Furthermore, we show how this can also be a program distribution format at user-levelDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2011/09630-1FAPES

Repositorio da Producao Cientifica e Intelectual da Unicamp

Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021

Author
Publication venue: TU Wien Academic Press
Publication date: 18/10/2021
Field of study

The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

Directory of Open Access Books (DOAB)

Automated application-specific optimisation of interconnects in multi-core systems

Author: Almer Oscar Erik Gabriel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/11/2012
Field of study

In embedded computer systems there are often tasks, implemented as stand-alone devices, that are both application-specific and compute intensive. A recurring problem in this area is to design these application-specific embedded systems as close to the power and efficiency envelope as possible. Work has been done on optimizing singlecore systems and memory organisation, but current methods for achieving system design goals are proving limited as the system capabilities and system size increase in the multi- and many-core era. To address this problem, this thesis investigates machine learning approaches to managing the design space presented in the interconnect design of embedded multi-core systems. The design space presented is large due to the system scale and level of interconnectivity, and also feature inter-dependant parameters, further complicating analysis. The results presented in this thesis demonstrate that machine learning approaches, particularly wkNN and random forest, work well in handling the complexity of the design space. The benefits of this approach are in automation, saving time and effort in the system design phase as well as energy and execution time in the finished system

Edinburgh Research Archive