6,972 research outputs found

    Solving the Simple Offset Assignment Problem as a Traveling Salesman

    Get PDF
    In this paper, we present an exact approach to the Simple Offset Assignment problem arising in the domain of address code generation for digital signal processors. It is based on transformations to weighted Hamiltonian cycle problems and integer linear programming. To the best of our knowledge, it is the first approach capable to solve all instances of the established OffsetStone benchmark set to optimality within reasonable time. Therefore, it enables to evaluate the quality of several heuristics relative to the optimum solutions for the first time. Further, using the same transformations, we present a simple and effective improvement heuristic. In addition, we include an existing heuristic into our experiments that has so far not been evaluated with OffsetStone

    Survey on Combinatorial Register Allocation and Instruction Scheduling

    Full text link
    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

    Solving the Simple Offset Assignment Problem as a Traveling Salesman

    Get PDF
    In this paper, we present an exact approach to the Simple Offset Assignment problem arising in the domain of address code generation for digital signal processors. It is based on transformations to weighted Hamiltonian cycle problems and integer linear programming. To the best of our knowledge, it is the first approach capable to solve all instances of the established OffsetStone benchmark set to optimality within reasonable time. Therefore, it enables to evaluate the quality of several heuristics relative to the optimum solutions for the first time. Further, using the same transformations, we present a simple and effective improvement heuristic. In addition, we include an existing heuristic into our experiments that has so far not been evaluated with OffsetStone

    Compilation and Scheduling Techniques for Embedded Systems

    Get PDF
    Embedded applications are constantly increasing in size, which has resulted in increasing demand on designers of digital signal processors (DSPs) to meet the tight memory, size and cost constraints. With this trend, memory requirement reduction through code compaction and variable coalescing techniques are gaining more ground. Also, as the current trend in complex embedded systems of using multiprocessor system-on-chip (MPSoC) grows, problems like mapping, memory management and scheduling are gaining more attention. The first part of the dissertation deals with problems related to digital signal processors. Most modern DSPs provide multiple address registers and a dedicated address generation unit (AGU) which performs address generation in parallel to instruction execution. A careful placement of variables in memory is important in decreasing the number of address arithmetic instructions leading to compact and efficient code. Chapters 2 and 3 present effective heuristics for the simple and the general offset assignment problems with variable coalescing. A solution based on simulated annealing is also presented. Chapter 4 presents an optimal integer linear programming (ILP) solution to the offset assignment problem with variable coalescing and operand permutation. A new approach to the general offset assignment problem is introduced. Chapter 5 presents an optimal ILP formulation and a genetic algorithm solution to the address register allocation problem (ARA) with code transformation techniques. The ARA problem is used to generate compact codes for array-intensive embedded applications. In the second part of the dissertation, we study problems related to MPSoCs. MPSoCs provide the flexibility to meet the performance requirements of multimedia applications while respecting the tight embedded system constraints. MPSoC-based embedded systems often employ software-managed memories called scratch-pad memories (SPM). Scheduling the tasks of an application on the processors and partitioning the available SPM budget among those processors are two critical issues in reducing the overall computation time. Traditionally, the step of task scheduling is applied separately from the memory partitioning step. Such a decoupled approach may miss better quality schedules. Chapters 6 and 7 present effective heuristics that integrate task allocation and SPM partitioning to further reduce the execution time of embedded applications for single and multi-application scenarios

    Address optimizations for embedded processors

    Get PDF
    Embedded processors that are common in electronic devices perform a limited set of tasks compared to general-purpose processor systems. They have limited resources which have to be efficiently used. Optimal utilization of program memory needs a reduction in code size which can be achieved by eliminating unnecessary address computations i.e., generate optimal offset assignment that utilizes built-in addressing modes. Single offset assignment (SOA) solutions, used for processors with one address register; start with the access sequence of variables to determine the optimal assignment. This research uses the basic block to commutatively transform statements to alter the access sequence. Edges in the access graphs are classified into breakable and unbreakable edges. Unbreakable edges are preferred when selecting edges for the assignment. Breakable edges are used to commutatively transform statements such that the assignment cost is reduced. The use of a modify register in some processors allows the address to be modified by a value in MR in addition to post-increment/decrement modes. Though finding the most beneficial value of MR is a common practice, this research shows that modifying the access sequence using edge fold, node swap, and path interleave techniques for an MR value of two has significant benefit. General offset assignment requires variables in the access sequence to be partitioned to various address registers. Use of the node degree in the access graph demonstrates greater benefit than using edge weights and frequency of variables. The Static Single Assignment (SSA) form of the basic block introduces new variables to an access graph, making it sparser. Sparser access graphs usually have lower assignment costs. The SSA form allows reuse of variable space based on variable lifetimes. Offset assignment solutions may be improved by incrementally assignment based on uncovered edges, providing the best cost improvement. This heuristic considers improvements due to all uncovered edges. Optimization techniques have primarily been edge-based. Node-based SOA technique has been tested for use with commutative transformations and shown to be better than edge-based heuristics. Heuristics developed in this research perform address optimizations for embedded processors, employing new techniques that lower address computation costs

    Shiftsreduce: Minimizing shifts in racetrack memory 4.0

    Get PDF
    Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This article presents data-placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime, thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and we revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%

    Exact Integer Programming Approaches to Sequential Instruction Scheduling and Offset Assignment

    Get PDF
    The dissertation at hand presents the main concepts and results derived when studying the optimal solution of two NP-hard compiler optimization problems, namely instruction scheduling and offset assignment, by means of integer programming. It is the outcome of several years of research as an assistant at Michael Jünger's computer science chair in Cologne, with the particular aim to apply exact mathematical optimization techniques to real-world problems arising in the domain of technical computer science. The two problems studied are rather unrelated apart from the fact that they both take place during the machine code generation phase of a compiler and deal with the handling of limited resources. Instruction scheduling is about the assignment of issue clock cycles to instructions in the presence of precedence, latency, and resource constraints such that the total time needed to execute all the instructions is minimized. Offset assignment deals with storage layouts of program variables and the efficient use of address registers for accesses to these variables. The objective is to employ specialized instructions in order to minimize the overhead caused by address computations. While instruction scheduling needs to be carried out by almost every present compiler irrespective of the processor architecture, the offset assignment problem occurs mainly in compilers for highly specialized processor designs. Instruction scheduling is a well-studied field where several exact and heuristic approaches have been developed and experimentally evaluated in the past. In this thesis, we concentrate on the basic-block instruction scheduling problem for single-issue processors. Basic blocks are program fragments with no side-entrances and -exits, i.e., every instruction of a basic block needs to be executed before the control flow may leave it and enter another basic block. Single-issue processors are capable of starting the execution of exactly one instruction per clock cycle. A number of techniques to preprocess instances of the basic-block instruction scheduling problem were proposed in the literature and are, with emphasis on the more recent ones that arose since the year 2000, thoroughly reviewed in this thesis. They finally led to a constraint programming approach in 2006 that was shown to solve about 350,000 instances to optimality and where some of these instances comprised up to about 2,500 instructions. The last attempt to tackle the problem using integer programming however dates to a time prior to the publication of the latest preprocessing advances. While being successful on a set of instances that impose very restrictive latency constraints, it was shown to be unable to solve hundreds of instances from the aforementioned benchmark set that comprises also large and varying latencies. In addition, the previous integer programming models were almost all based on so-called time-indexed formulations where decision variables model an explicit assignment of instructions to clock cycles. In this thesis, a completely different and novel approach is taken based on the linear ordering problem, a well-studied combinatorial optimization problem. The new models lead to alternative characterizations of the feasible solutions to the basic-block instruction scheduling problem. These facilitate the employment of advanced integer programming methodologies, in particular the design of branch-and-cut algorithms that can handle larger instances. The formulations are further extended by additional inequalities that can be used as cutting planes. Combined with the preprocessing routines that are partially extended and improved as well, the respective solver implementation eventually turned out to be competitive to the constraint programming method. Reaching this point has taken some years and this thesis presents not only the derived models but also several ideas and byproducts that arose in the meantime, and that can help and inspire researchers even if they aim at the application of different solution methodologies. The starting point regarding the offset assignment problem was a different one because especially exact solution approaches were rather rare prior to the models presented in this thesis. The offset assignment problem arose in the 1990s and is considered in several variants that are of theoretical and practical interest. In the simplest one, a processor is assumed to provide only a single address register and only very restricted possibilities to avoid address computation overhead. However, even this simplest variant, that may serve as a building block for the more complex ones, is already NP-hard and has been studied mainly from a heuristic point of view. The few existing exact solution approaches were not capable to solve moderately sized instances so that the quality of heuristic solutions relative to the optimum was hardly known at all. Again, the inspection of the combinatorial structure of the various problem variants turned out to be the key for designing branch-and-cut implementations that can profit from knowledge about related combinatorial optimization problems. The implementation targeting the simple problem variant was the first capable to optimally solve the majority of about 3,000 instances collected in a standard benchmark set. The method could then be further generalized in two steps. First, in a collaboration with Roberto Castañeda Lozano, additional techniques could be incorporated into the approach in order to handle multiple address registers. Fortunately, the methods could then even be further extended to as well deal with more flexible addressing capabilities. In this way, the thesis at hand does not only answer the question how large the address computation overhead can be when using heuristics, but as well presents first results that allow to analyze the impact of the mentioned increased addressing capabilities on the runtime performance and size of real-world programs

    Memory optimization techniques for embedded systems

    Get PDF
    Embedded systems have become ubiquitous and as a result optimization of the design and performance of programs that run on these systems have continued to remain as significant challenges to the computer systems research community. This dissertation addresses several key problems in the optimization of programs for embedded systems which include digital signal processors as the core processor. Chapter 2 develops an efficient and effective algorithm to construct a worm partition graph by finding a longest worm at the moment and maintaining the legality of scheduling. Proper assignment of offsets to variables in embedded DSPs plays a key role in determining the execution time and amount of program memory needed. Chapter 3 proposes a new approach of introducing a weight adjustment function and showed that its experimental results are slightly better and at least as well as the results of the previous works. Our solutions address several problems such as handling fragmented paths resulting from graph-based solutions, dealing with modify registers, and the effective utilization of multiple address registers. In addition to offset assignment, address register allocation is important for embedded DSPs. Chapter 4 develops a lower bound and an algorithm that can eliminate the explicit use of address register instructions in loops with array references. Scheduling of computations and the associated memory requirement are closely inter-related for loop computations. In Chapter 5, we develop a general framework for studying the trade-off between scheduling and storage requirements in nested loops that access multi-dimensional arrays. Tiling has long been used to improve the memory performance of loops. Only a sufficient condition for the legality of tiling was known previously. While it was conjectured that the sufficient condition would also become necessary for large enough tiles, there had been no precise characterization of what is large enough. Chapter 6 develops a new framework for characterizing tiling by viewing tiles as points on a lattice. This also leads to the development of conditions under the legality condition for tiling is both necessary and sufficient
    corecore