39 research outputs found
Simple Symmetric Sustainable Sorting -- the greeNsort article
We explored an uncharted part of the solution space for sorting algorithms:
the role of symmetry in divide&conquer algorithms. We found/designed novel
simple binary Quicksort and Mergesort algorithms operating in contiguous space
which achieve improved trade-offs between worst-case CPU-efficiency, best-case
adaptivity and RAM-requirements. The 'greeNsort' algorithms need less hardware
(RAM) and/or less energy (CPU) compared to the prior art. The new algorithms
fit a theoretical framework: 'Footprint' KPIs allow to compare algorithms with
different RAM-requirements, a new 'definition' of sorting API-targets
simplifies construction of stable algorithms with mirrored scan directions, and
our ordinal machine model encourages robust algorithms that minimize access
'distance'. Unlike earlier 'Quicksorts', our 'Zacksort', 'Zucksort' and
'Ducksort' algorithms optimally marry CPU-efficiency and tie-adaptivity. Unlike
earlier 'Mergesorts' which required 100% distant buffer, our 'Frogsort' and
'Geckosort' algorithms achieve similar CPU-efficiency with 50% or less local
buffer. Unlike natural Mergesorts such as 'Timsort' which are optimized for the
best case of full-presorting, our 'Octosort' and 'Squidsort' algorithms achieve
excellent bi-adaptivity to presorted best-cases without sacrificing worst-case
efficiency in real sorting tasks. Our 'Walksort' and 'Jumpsort' have lower
Footprint than the impressive low-memory 'Grailsort' and 'Sqrtsort' of
Astrelin. Given the current climate-emergency, this is a call to action for all
maintainers of sorting libraries, all software-engineers using custom sorting
code, all professors teaching algorithms, all IT professionals designing
programming languages, compilers and CPUs: check for better algorithms and
consider symmetric code-mirroring.Comment: 50 pages, 6 Figures, latest version under
https://github.com/greeNsort/greeNsort.article, see also
https://greensort.or
Parallel Sparse Matrix-Matrix Multiplication
The thesis investigates the BLAS-3 routine of sparse matrix-matrix multiplication (SpGEMM) based on the outer product method. Sev- eral algorithmic approaches have been implemented and empirically an- alyzed. The experiments have shown that an algorithm presented by Gustavson [22] outperforms other alternatives. In this work we propose optimization techniques that improve the scalability and the cache efficiency of the Gustavson’s algorithm for large matrices. Our approach succeeded to reduce the cache misses by more than a factor of five and to improve the net running time by 30% with some instances. The thesis also presents an algorithm for flops estima- tion, which can be used to determine an upper bound for the density of the result matrix. Furthermore, the work analyzes and empirically evaluates techniques for parallelization of the multiplication in a shared memory model by using Intel TBB and OpenMP. We investigate the cache efficiency of the algorithm in a parallel setting and compare several approaches for load balancing of the computation
A Simple Deterministic Algorithm for Systems of Quadratic Polynomials over
This article discusses a simple deterministic algorithm for solving quadratic
Boolean systems which is essentially a special case of more sophisticated
methods. The main idea fits in a single sentence: guess enough variables so
that the remaining quadratic equations can be solved by linearization
(i.e. by considering each remaining monomial as an independent
variable and solving the resulting linear system) and restart until the solution
is found. Under strong heuristic
assumptions, this finds all the solutions of quadratic polynomials in
variables with operations. Although the best
known algorithms require exponentially less time, the present technique has
the advantage of being simpler to describe and easy to implement. In strong
contrast with the state-of-the-art, it is also quite efficient in practice
OpenISA, um conjunto de instruções híbrido
Orientador: Edson BorinTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: OpenISA é concebido como a interface de processadores que pretendem ser altamente flexíveis. Isto é conseguido por meio de três estratégias: em primeiro lugar, o ISA é empiricamente escolhido para ser facilmente traduzido para outros, possibilitando flexibilidade do software no caso de um processador OpenISA físico não estar disponível. Neste caso, não há nenhuma necessidade de aplicar um processador virtual OpenISA em software. O ISA está preparado para ser estaticamente traduzido para outros ISAs. Segundo, o ISA não é um ISA concreto nem um ISA virtual, mas um híbrido com a capacidade de admitir modificações nos opcodes sem afetar a compatibilidade retroativa. Este mecanismo permite que as futuras versões do ISA possam sofrer modificações em vez de extensões simples das versões anteriores, um problema comum com ISA concretos, como o x86. Em terceiro lugar, a utilização de uma licença permissiva permite o ISA ser usado livremente por qualquer parte interessada no projeto. Nesta tese de doutorado, concentramo-nos nas instruções de nível de usuário do OpenISA. A tese discute (1) alternativas para ISAs, alternativas para distribuição de programas e o impacto de cada opção, (2) características importantes de OpenISA para atingir seus objetivos e (3) fornece uma completa avaliação do ISA escolhido com respeito a emulação de desempenho em duas CPUs populares, uma projetada pela Intel e outra pela ARM. Concluímos que a versão do OpenISA apresentada aqui pode preservar desempenho próximo do nativo quando traduzida para outros hospedeiros, funcionando como um modelo promissor para ISAs flexíveis da próxima geração que podem ser facilmente estendidos preservando a compatibilidade. Ainda, também mostramos como isso pode ser usado como um formato de distribuição de programas no nível de usuárioAbstract: OpenISA is designed as the interface of processors that aim to be highly flexible. This is achieved by means of three strategies: first, the ISA is empirically chosen to be easily translated to others, providing software flexibility in case a physical OpenISA processor is not available. Second, the ISA is not a concrete ISA nor a virtual ISA, but a hybrid one with the capability of admitting modifications to opcodes without impacting backwards compatibility. This mechanism allows future versions of the ISA to have real changes instead of simple extensions of previous versions, a common problem with concrete ISAs such as the x86. Third, the use of a permissive license allows the ISA to be freely used by any party interested in the project. In this PhD. thesis, we focus on the user-level instructions of OpenISA. The thesis discusses (1) ISA alternatives, program distribution alternatives and the impact of each choice, (2) important features of OpenISA to achieve its goals and (3) provides a thorough evaluation of the chosen ISA with respect to emulation performance on two popular host CPUs, one from Intel and another from ARM. We conclude that the version of OpenISA presented here can preserve close-to-native performance when translated to other hosts, working as a promising model for next-generation, flexible ISAs that can be easily extended while preserving backwards compatibility. Furthermore, we show how this can also be a program distribution format at user-levelDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2011/09630-1FAPES
Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021
The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
Automated application-specific optimisation of interconnects in multi-core systems
In embedded computer systems there are often tasks, implemented as stand-alone devices,
that are both application-specific and compute intensive. A recurring problem
in this area is to design these application-specific embedded systems as close to the
power and efficiency envelope as possible. Work has been done on optimizing singlecore
systems and memory organisation, but current methods for achieving system design
goals are proving limited as the system capabilities and system size increase in the
multi- and many-core era. To address this problem, this thesis investigates machine
learning approaches to managing the design space presented in the interconnect design
of embedded multi-core systems. The design space presented is large due to the
system scale and level of interconnectivity, and also feature inter-dependant parameters,
further complicating analysis. The results presented in this thesis demonstrate
that machine learning approaches, particularly wkNN and random forest, work well
in handling the complexity of the design space. The benefits of this approach are in
automation, saving time and effort in the system design phase as well as energy and
execution time in the finished system