6 research outputs found

    Heuristics for offset assignment in embedded processors

    Get PDF
    This thesis deals with the optimization of program size and performance in current generation embedded digital signal processors (DSPs) by the design of optimal memory layouts for data. Given the tight constraints on the size, power consumption, cost and performance of these processors, the minimization of code size in terms of the number of instructions required and the associated reduction in execution time are important. Several DSPs provide limited addressing modes and the layout of data, known as offset assignment, plays a critical role in determining the code size and performance. Even the simplest variant of the offset assignment problem is NP-complete. Research effort in this area has focused on the design, implementation and evaluation of effective heuristics for several variants of the offset assignment problem. One of the most important factors in the determination of the size, and hence, execution time of a code is the number of instructions required to access the variables stored in the processor memory. The indirect addressing mode common in DSPs requires memory accesses to be realized through address registers that hold the address of the memory location to be accessed. The architecture provides instructions for adding to and subtracting from the values of the address registers to compute the addresses of subsequent data that need to be accessed. In addition, some DSP processors include multiple memory banks that allow increased parallelism in memory access. Proper partitioning of variables across memory banks is critical to effectively using the increased parallelism. The work reported in this thesis aims to evolve efficient methods for designing memory layouts under the conditions of availability of one address register (SOA) or of multiple address registers (GOA). It also proposes a novel technique for choosing the assignment of variables to the memory banks. This thesis motivates, proposes and evaluates heuristics for all these three problems. For the SOA and GOA problems, the heuristics are implemented and tested on different random sample inputs, and the results obtained are compared to those obtained by prior heuristics. In addition, this thesis provides some insight into the SOA, GOA and the variable partitioning problems

    Address optimizations for embedded processors

    Get PDF
    Embedded processors that are common in electronic devices perform a limited set of tasks compared to general-purpose processor systems. They have limited resources which have to be efficiently used. Optimal utilization of program memory needs a reduction in code size which can be achieved by eliminating unnecessary address computations i.e., generate optimal offset assignment that utilizes built-in addressing modes. Single offset assignment (SOA) solutions, used for processors with one address register; start with the access sequence of variables to determine the optimal assignment. This research uses the basic block to commutatively transform statements to alter the access sequence. Edges in the access graphs are classified into breakable and unbreakable edges. Unbreakable edges are preferred when selecting edges for the assignment. Breakable edges are used to commutatively transform statements such that the assignment cost is reduced. The use of a modify register in some processors allows the address to be modified by a value in MR in addition to post-increment/decrement modes. Though finding the most beneficial value of MR is a common practice, this research shows that modifying the access sequence using edge fold, node swap, and path interleave techniques for an MR value of two has significant benefit. General offset assignment requires variables in the access sequence to be partitioned to various address registers. Use of the node degree in the access graph demonstrates greater benefit than using edge weights and frequency of variables. The Static Single Assignment (SSA) form of the basic block introduces new variables to an access graph, making it sparser. Sparser access graphs usually have lower assignment costs. The SSA form allows reuse of variable space based on variable lifetimes. Offset assignment solutions may be improved by incrementally assignment based on uncovered edges, providing the best cost improvement. This heuristic considers improvements due to all uncovered edges. Optimization techniques have primarily been edge-based. Node-based SOA technique has been tested for use with commutative transformations and shown to be better than edge-based heuristics. Heuristics developed in this research perform address optimizations for embedded processors, employing new techniques that lower address computation costs

    Memory optimization techniques for embedded systems

    Get PDF
    Embedded systems have become ubiquitous and as a result optimization of the design and performance of programs that run on these systems have continued to remain as significant challenges to the computer systems research community. This dissertation addresses several key problems in the optimization of programs for embedded systems which include digital signal processors as the core processor. Chapter 2 develops an efficient and effective algorithm to construct a worm partition graph by finding a longest worm at the moment and maintaining the legality of scheduling. Proper assignment of offsets to variables in embedded DSPs plays a key role in determining the execution time and amount of program memory needed. Chapter 3 proposes a new approach of introducing a weight adjustment function and showed that its experimental results are slightly better and at least as well as the results of the previous works. Our solutions address several problems such as handling fragmented paths resulting from graph-based solutions, dealing with modify registers, and the effective utilization of multiple address registers. In addition to offset assignment, address register allocation is important for embedded DSPs. Chapter 4 develops a lower bound and an algorithm that can eliminate the explicit use of address register instructions in loops with array references. Scheduling of computations and the associated memory requirement are closely inter-related for loop computations. In Chapter 5, we develop a general framework for studying the trade-off between scheduling and storage requirements in nested loops that access multi-dimensional arrays. Tiling has long been used to improve the memory performance of loops. Only a sufficient condition for the legality of tiling was known previously. While it was conjectured that the sufficient condition would also become necessary for large enough tiles, there had been no precise characterization of what is large enough. Chapter 6 develops a new framework for characterizing tiling by viewing tiles as points on a lattice. This also leads to the development of conditions under the legality condition for tiling is both necessary and sufficient

    SOFTWARE UPDATE MANAGEMENT IN WIRELESS SENSOR NETWORKS

    Get PDF
    Wireless sensor networks (WSNs) have recently emerged as a promising platform for many non-traditional applications, such as wildfire monitoring and battlefield surveillance. Due to bug fixes, feature enhancements and demand changes, the code running on deployed wireless sensors often needs to be updated, which is done through energy-consuming wireless communication. Since the energy supply of battery-powered sensors is limited, the network lifetime is reduced if more energy is consumed for software update, especially at the early stage of a WSN’s life when bug fixes and feature enhancements are frequent, or in WSNs that support multiple applications, and frequently demand a subset of sensors to fetch and run different applications. In this dissertation, I propose an energy-efficient software update management framework for WSNs. The diff-based software update process can be divided into three phases: new binary generation, diff-patch generation, and patch distribution. I identify the energy-saving opportunities in each phase and develop a set of novel schemes to achieve overall energy efficiency. In the phase of generating new binary after source code changes, I design an update-conscious compilation approach to improve the code similarity between the new and old binaries. In the phase of generating update patch, I adopt simple primitives in the literature and develop a set of advanced primitives. I then study the energy-efficient patch distribution in WSNs and develop a multicast-based code distribution protocol to effectively disseminate the patch to individual sensors. In summary, this dissertation successfully addresses an important problem in WSNs. Update-conscious compilation is the first work that compiles the code with the goal of improving code similarity, and proves to be effective. The other components in the proposed framework also advance the state of the art. The proposed software update management framework benefits all WSN users, as software update is indispensable in WSNs. The techniques developed in this framework can also be adapted to other platforms such as the smart phone network

    Exact Integer Programming Approaches to Sequential Instruction Scheduling and Offset Assignment

    Get PDF
    The dissertation at hand presents the main concepts and results derived when studying the optimal solution of two NP-hard compiler optimization problems, namely instruction scheduling and offset assignment, by means of integer programming. It is the outcome of several years of research as an assistant at Michael Jünger's computer science chair in Cologne, with the particular aim to apply exact mathematical optimization techniques to real-world problems arising in the domain of technical computer science. The two problems studied are rather unrelated apart from the fact that they both take place during the machine code generation phase of a compiler and deal with the handling of limited resources. Instruction scheduling is about the assignment of issue clock cycles to instructions in the presence of precedence, latency, and resource constraints such that the total time needed to execute all the instructions is minimized. Offset assignment deals with storage layouts of program variables and the efficient use of address registers for accesses to these variables. The objective is to employ specialized instructions in order to minimize the overhead caused by address computations. While instruction scheduling needs to be carried out by almost every present compiler irrespective of the processor architecture, the offset assignment problem occurs mainly in compilers for highly specialized processor designs. Instruction scheduling is a well-studied field where several exact and heuristic approaches have been developed and experimentally evaluated in the past. In this thesis, we concentrate on the basic-block instruction scheduling problem for single-issue processors. Basic blocks are program fragments with no side-entrances and -exits, i.e., every instruction of a basic block needs to be executed before the control flow may leave it and enter another basic block. Single-issue processors are capable of starting the execution of exactly one instruction per clock cycle. A number of techniques to preprocess instances of the basic-block instruction scheduling problem were proposed in the literature and are, with emphasis on the more recent ones that arose since the year 2000, thoroughly reviewed in this thesis. They finally led to a constraint programming approach in 2006 that was shown to solve about 350,000 instances to optimality and where some of these instances comprised up to about 2,500 instructions. The last attempt to tackle the problem using integer programming however dates to a time prior to the publication of the latest preprocessing advances. While being successful on a set of instances that impose very restrictive latency constraints, it was shown to be unable to solve hundreds of instances from the aforementioned benchmark set that comprises also large and varying latencies. In addition, the previous integer programming models were almost all based on so-called time-indexed formulations where decision variables model an explicit assignment of instructions to clock cycles. In this thesis, a completely different and novel approach is taken based on the linear ordering problem, a well-studied combinatorial optimization problem. The new models lead to alternative characterizations of the feasible solutions to the basic-block instruction scheduling problem. These facilitate the employment of advanced integer programming methodologies, in particular the design of branch-and-cut algorithms that can handle larger instances. The formulations are further extended by additional inequalities that can be used as cutting planes. Combined with the preprocessing routines that are partially extended and improved as well, the respective solver implementation eventually turned out to be competitive to the constraint programming method. Reaching this point has taken some years and this thesis presents not only the derived models but also several ideas and byproducts that arose in the meantime, and that can help and inspire researchers even if they aim at the application of different solution methodologies. The starting point regarding the offset assignment problem was a different one because especially exact solution approaches were rather rare prior to the models presented in this thesis. The offset assignment problem arose in the 1990s and is considered in several variants that are of theoretical and practical interest. In the simplest one, a processor is assumed to provide only a single address register and only very restricted possibilities to avoid address computation overhead. However, even this simplest variant, that may serve as a building block for the more complex ones, is already NP-hard and has been studied mainly from a heuristic point of view. The few existing exact solution approaches were not capable to solve moderately sized instances so that the quality of heuristic solutions relative to the optimum was hardly known at all. Again, the inspection of the combinatorial structure of the various problem variants turned out to be the key for designing branch-and-cut implementations that can profit from knowledge about related combinatorial optimization problems. The implementation targeting the simple problem variant was the first capable to optimally solve the majority of about 3,000 instances collected in a standard benchmark set. The method could then be further generalized in two steps. First, in a collaboration with Roberto Castañeda Lozano, additional techniques could be incorporated into the approach in order to handle multiple address registers. Fortunately, the methods could then even be further extended to as well deal with more flexible addressing capabilities. In this way, the thesis at hand does not only answer the question how large the address computation overhead can be when using heuristics, but as well presents first results that allow to analyze the impact of the mentioned increased addressing capabilities on the runtime performance and size of real-world programs
    corecore