Search CORE

323 research outputs found

Engineering faster sorters for small sets of items

Author: Bingmann Timo
Marianczuk Jasper
Sanders Peter
Publication venue: John Wiley and Sons
Publication date: 14/12/2020
Field of study

Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as efficiently as possible. But the more sophisticated and complex the algorithms become, the less efficient they are for small sets of items due to large constant factors. A relatively simple sorting algorithm that is often used as a base case sorter is insertion sort, because it has small code size and small constant factors influencing its execution time. We aim to determine if there is a faster way to sort small sets of items to provide an efficient base case sorter. We looked at sorting networks, at how they can improve the speed of sorting few elements, and how to implement them in an efficient manner using conditional moves. Since sorting networks need to be implemented explicitly for each set size, providing networks for larger sizes becomes less efficient due to increased code sizes. To also enable the sorting of slightly larger base cases, we adapted sample sort to Register Sample Sort, to break down those larger sets into sizes that can in turn be sorted by sorting networks. From our experiments we found that when sorting only small sets of integers, the sorting networks outperform insertion sort by a factor of at least 1.76 for any array size between six and 16, and by a factor of 2.72 on average across all machines and array sizes. When integrating sorting networks as a base case sorter into Quicksort, we achieved far less performance improvements over using insertion sort, which is probably due to the networks having a larger code size and cluttering the L1 instruction cache. The same effect occurs when including Register Sample Sort as a base case sorter for IPS4o. But for x86 machines that have a larger L1 instruction cache of 64 KiB or more, we obtained speedups of 12.7% when using sorting networks as a base case sorter in std::sort, and of 5%–6% when integrating Register Sample Sort as a base case sorter into IPS4o, each in comparison to using insertion sort as the base case sorter. In conclusion, the desired improvement in speed could only be achieved under special circumstances, but the results clearly show the potential of using conditional moves in the field of sorting algorithms

KITopen

Automatic Algorithm Recognition Based on Programming Schemas and Beacons - A Supervised Machine Learning Classification Approach

Author: Taherkhani Ahmad
Publication venue: Aalto-yliopisto
Publication date: 01/01/2013
Field of study

In this thesis, we present techniques to recognize basic algorithms covered in computer science education from source code. The techniques use various software metrics, language constructs and other characteristics of source code, as well as the concept of schemas and beacons from program comprehension models. Schemas are high level programming knowledge with detailed knowledge abstracted out. Beacons are statements that imply specific structures in a program. Moreover, roles of variables constitute an important part of the techniques. Roles are concepts that describe the behavior and usage of variables in a program. They have originally been introduced to help novices learn programming. We discuss two methods for algorithm recognition. The first one is a classification method based on a supervised machine learning technique. It uses the vectors of characteristics and beacons automatically computed from the algorithm implementations of a training set to learn what characteristics and beacons can best describe each algorithm. Based on these observed instance-class pairs, the system assigns a class to each new input algorithm implementation according to its characteristics and beacons. We use the C4.5 algorithm to generate a decision tree that performs the task. In the second method, the schema detection method, algorithms are defined as schemas that exist in the knowledge base of the system. To identify an algorithm, the method searches the source code to detect schemas that correspond to those predefined schemas. Moreover, we present a method that combines these two methods: it first applies the schema detection method to extract algorithmic schemas from the given program and then proceeds to the classification method applied to the schema parts only. This enhances the reliability of the classification method, as the characteristics and beacons are computed only from the algorithm implementation code, instead of the whole given program. We discuss several empirical studies conducted to evaluate the performance of the methods. Some results are as follows: evaluated by leave-one-out cross-validation, the estimated classification accuracy for sorting algorithms is 98,1%, for searching, heap, basic tree traversal and graph algorithms 97,3% and for the combined method (on sorting algorithms and their variations from real student submissions) 97,0%. For the schema detection method, the accuracy is 88,3% and 94,1%, respectively. In addition, we present a study for categorizing student-implemented sorting algorithms and their variations in order to find problematic solutions that would allow us to give feedback on them. We also explain how these variations can be automatically recognized

Aaltodoc Publication Archive

Measuring the Energy Consumption of Software written in C on x86-64 Processors

Author: Strempel Tom
Publication venue
Publication date: 03/01/2022
Field of study

In 2016 German data centers consumed 12.4 terawatt-hours of electrical energy, which accounts for about 2% of Germany’s total energy consumption in that year. In 2020 this rose to 16 terawatt-hours or 2.9% of Germany’s total energy consumption in that year. The ever-increasing energy consumption of computers consequently leads to considerations to reduce it to save energy, money and to protect the environment. This thesis aims to answer fundamental questions about the energy consumption of software, e. g. how and how precise can a measurement be taken or if CPU load and energy consumption are correlated. An overview of measurement methods and the related software tooling was created. The most promising approach using software called 'Scaphandre' was chosen as the main basis and further developed. Different sorting algorithms were benchmarked to study their behavior regarding energy consumption. The resulting dataset was also used to answer the fundamental questions stated in the beginning. A replication and reproduction package was provided to enable the reproducibility of the results.Im Jahr 2016 verbrauchten deutsche Rechenzentren 12,4 Terawattstunden elektrische Energie, was etwa 2 % des gesamten Energieverbrauchs in Deutschland in diesem Jahr ausmacht. Im Jahr 2020 stieg dieser Wert auf 16 Terawattstunden bzw. 2,9 % des Gesamtenergieverbrauchs in Deutschland. Der stetig steigende Energieverbrauch von Computern führt folglich zu Überlegungen, diesen zu reduzieren, um Energie und Geld zu sparen und die Umwelt zu schützen. Ziel dieser Arbeit ist es, grundlegende Fragen zum Energieverbrauch von Software zu beantworten, z. B. wie und mit welcher Genauigkeit gemessen werden kann oder ob CPU-Last und Energieverbrauch korrelieren. Es wurde eine Übersicht über Messmethoden und die dazugehörigen Softwaretools erstellt. Der vielversprechendste Ansatz mit der Software 'Scaphandre' wurde als Hauptgrundlage ausgewählt und weiterentwickelt. Verschiedene Sortieralgorithmen wurden einem Benchmarking unterzogen, um ihr Verhalten hinsichtlich des Energieverbrauchs zu untersuchen. Der resultierende Datensatz wurde auch zur Beantwortung der eingangs gestellten grundlegenden Fragen verwendet. Ein Replikations- und Reproduktionspaket wurde bereitgestellt, um die Reproduzierbarkeit der Ergebnisse zu ermöglichen

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Qucosa - Publikationsserver der Universität Leipzig

Engineering Faster Sorters for Small Sets of Items

Author: Bingmann Timo
Marianczuk Jasper
Sanders Peter
Publication venue
Publication date: 02/10/2020
Field of study

Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as fast as possible. But the more sophisticated and complex the algorithms become, the less efficient they are for small sets of items due to large constant factors. We aim to determine if there is a faster way than insertion sort to sort small sets of items to provide a more efficient base case sorter. We looked at sorting networks, at how they can improve the speed of sorting few elements, and how to implement them in an efficient manner by using conditional moves. Since sorting networks need to be implemented explicitly for each set size, providing networks for larger sizes becomes less efficient due to increased code sizes. To also enable the sorting of slightly larger base cases, we adapted sample sort to Register Sample Sort, to break down those larger sets into sizes that can in turn be sorted by sorting networks. From our experiments we found that when sorting only small sets, the sorting networks outperform insertion sort by a factor of at least 1.76 for any array size between six and sixteen, and by a factor of 2.72 on average across all machines and array sizes. When integrating sorting networks as a base case sorter into Quicksort, we achieved far less performance improvements, which is probably due to the networks having a larger code size and cluttering the L1 instruction cache. But for x86 machines with a larger L1 instruction cache of 64 KiB or more, we obtained speedups of 12.7% when using sorting networks as a base case sorter in std::sort. In conclusion, the desired improvement in speed could only be achieved under special circumstances, but the results clearly show the potential of using conditional moves in the field of sorting algorithms.Comment: arXiv admin note: substantial text overlap with arXiv:1908.0811

arXiv.org e-Print Archive

KITopen

Analysis and design of parallel algorithms

Author: Richard C. Dunbar (7169954)
Publication venue
Publication date: 01/01/1978
Field of study

The present state of electronic technology is such that factors affecting computation speed have almost been minimised; switching for instance is almost instantaneous. Electronic components are so good, in fact, that the time taken for a logic signal to travel between two points is now a significant factor of instruction times. Clearly, with the actual physical size of components being very small and the high circuit density, there is little scope for improving computation speech significantly by such means as even denser circuitry or still faster electronic components. Thus, development of faster computers will require a new approach that depends on the imaginative use of existing knowledge. One such approach is to increase computation speed through parallelism. Obviously, a parallel computer with p identical processors is potentially p times as fast as a single computer, although this limit can rarely be achieved

Loughborough University Institutional Repository