8 research outputs found

    Optimierende Compiler für DSPs: Was ist verfügbar?

    Get PDF
    Die Softwareentwicklung für eingebettete Prozessoren findet heute größtenteils noch auf Assemblerebene statt. Der Grund für diesen langfristig wohl unhaltbaren Zustand liegt in der mangelnden Verfügbarkeit von guten C-Compilern. In den letzten Jahren wurden allerdings wesentliche Fortschritte in der Codeoptimierung - speziell für DSPs - erzielt, welche bisher nur unzureichend in kommerzielle Produkte umgesetzt wurden. Dieser Beitrag zeigt die prinzipiellen Optimierungsquellen auf und faßt den Stand der Technik zusammen. Die zentralen Methoden hierbei sind komplexe Optimierungsverfahren, welche über die traditionelle Compilertechnologie hinausgehen, sowie die Ausnutzung der DSP-spezifischen Hardware-Architekturen zur effizienten Übersetzung von C-Sprachkonstrukten in DSP-Maschinenbefehle. Die genannten Verfahren lassen sich teilweise auch allgemein auf (durch Compiler generierte oder handgeschriebene) Assemblerprogramme anwenden

    Power aware data and memory management for dynamic applications

    Get PDF
    In recent years, the semiconductor industry has turned its focus towards heterogeneous multiprocessor platforms. They are an economically viable solution for coping with the growing setup and manufacturing cost of silicon systems. Furthermore, their inherent flexibility perfectly supports the emerging market of interactive, mobile data and content services. The platform’s performance and energy depend largely on how well the data-dominated services are mapped on the memory subsystem. A crucial aspect thereby is how efficient data is transferred between the different memory layers. Several compilation techniques have been developed to optimally use the available bandwidth. Unfortunately, they do not take the interaction between multiple threads into account and do not deal with the dynamic behaviour of these novel applications. The main limitations of current techniques are outlined and an approach for dealing with them is introduced

    Geração e vetorização de instruções de multiplicação e acumulação para processadores DSP SIMD

    Get PDF
    Orientador : Guido Costa Souza de AraujoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Processadores que são projetados para executar aplicações específicas - em oposição a processadores de propósito geral- representam uma porcentagem cada vez maior do total de processadores vendidos anualmente. Esses processadores são utilizados em aparelhos eletrônicos como telefones celulares e câmeras digitais, dispositivos médicos de monitoração, modems, sistemas militares de radar, componentes eletrônicos de automóveis, set-top boxes, etc. As aplicações que são executadas por esses processadores tipicamente demandam um alto desempenho, combinado com reduzido tamanho de código e dissipação de energia. Esta dissertação aborda um dos problemas presentes durante a geração de código para uma classe desses processadores, os processadores de sinais digitais (DSPs): como o compilador pode utilizar as instruções especializadas desses processadores a fim de aumentar a densidade e melhorar o desempenho do código gerado. É proposto um procedimento que permite a detecçãoj geração de instruções de multiplicação e acumulação (muito comuns nas aplicações desses processadores). É ainda apresentado um método que permite explorar a possibilidade de execução de código em paralelo por duas ou mais unidades funcionais quando essas são capazes de operar simultaneamente sobre diferentes dados. Os métodos aqui apresentados permitem uma exploração bastante agressiva das instruções de multiplicação e acumulação, e se utilizam de algoritmos de análise de fluxo de dados e técnicas de reestruturação de laços. Não é conhecido nenhum trabalho que aborde esse problema da maneira como é apresentada nesteAbstract: Application specific processors - as opposed to general purpose processors - account for an ever increasing percentage of the processors sold each year. These processors are widely used in electronic devices such as cellular phones and digital cameras, medical monitoring devices, modems, military radar systems, electronic components in vehicles and set-top boxes, to name a few. The applications that usually run on these processors demand high performance, reduced code size and low power consuption. This thesis addresses one of the issues that arise when generating code for a class of these processors, the digital signal processors (DSPs): how the compiler can take advantage of their specialized instructions in order to reduce the size and improve performance of the code generated. A method is proposed that allows for the detectionj generation of multiply and accumulate instructions (typically present in these processors' applications). AIso presented in this work is a method that makes it possible to explore the possibility of running code in parallel on two or more functional units when these are capable of operating simultaneously on different data. The methods herein presented allow for an aggressive harnessing of multiply and accumulate instructions; to accomplish this goal they rely on data flow analysis algorithms and on loop restructuring techniques. No other work is known of that addresses this problem the way it is dealt with in this thesisMestradoMestre em Ciência da Computaçã

    Memory optimization techniques for embedded systems

    Get PDF
    Embedded systems have become ubiquitous and as a result optimization of the design and performance of programs that run on these systems have continued to remain as significant challenges to the computer systems research community. This dissertation addresses several key problems in the optimization of programs for embedded systems which include digital signal processors as the core processor. Chapter 2 develops an efficient and effective algorithm to construct a worm partition graph by finding a longest worm at the moment and maintaining the legality of scheduling. Proper assignment of offsets to variables in embedded DSPs plays a key role in determining the execution time and amount of program memory needed. Chapter 3 proposes a new approach of introducing a weight adjustment function and showed that its experimental results are slightly better and at least as well as the results of the previous works. Our solutions address several problems such as handling fragmented paths resulting from graph-based solutions, dealing with modify registers, and the effective utilization of multiple address registers. In addition to offset assignment, address register allocation is important for embedded DSPs. Chapter 4 develops a lower bound and an algorithm that can eliminate the explicit use of address register instructions in loops with array references. Scheduling of computations and the associated memory requirement are closely inter-related for loop computations. In Chapter 5, we develop a general framework for studying the trade-off between scheduling and storage requirements in nested loops that access multi-dimensional arrays. Tiling has long been used to improve the memory performance of loops. Only a sufficient condition for the legality of tiling was known previously. While it was conjectured that the sufficient condition would also become necessary for large enough tiles, there had been no precise characterization of what is large enough. Chapter 6 develops a new framework for characterizing tiling by viewing tiles as points on a lattice. This also leads to the development of conditions under the legality condition for tiling is both necessary and sufficient

    Instruction and data cache modeling for timing analysis in real-time systems

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Memory Bank and Register Allocation in Software Synthesis for ASIPs

    No full text
    An architectural feature commonly found in digital signal processors(DSPs) is multiple data-memory banks. This feature increases memory bandwidth by permitting multiple memory accesses to occur in parallel when the referenced variables belong to different memory banks and the registers involved are allocated according to a strict set of conditions. Unfortunately, current compiler technology is unable to take advantage of the potential increase in parallelism offered by such architectures. Consequently, most application software for DSP systems is hand-written -- a very time-consuming task. We present an algorithm which attempts to maximize the benefit of this architectural feature. While previous approaches have decoupled the phases of register allocation and memory bank assignment, our algorithm performs these two phasessimultaneously. Experimental results demonstrate that our algorithm substantially improves the code quality of many compiler-generated and even hand-written programs...

    Compiler-managed memory system for software-exposed architectures

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 155-161).Microprocessors must exploit both instruction-level parallelism (ILP) and memory parallelism for high performance. Sophisticated techniques for ILP have boosted the ability of modern-day microprocessors to exploit ILP when available. Unfortunately, improvements in memory parallelism in microprocessors have lagged behind. This thesis explains why memory parallelism is hard to exploit in microprocessors and advocate bank-exposed architectures as an effective way to exploit more memory parallelism. Bank exposed architectures are a kind of software-exposed architecture: one in which the low level details of the hardware are visible to the software. In a bank-exposed architecture, the memory banks are visible to the software, enabling the compiler to exploit a high degree of memory parallelism in addition to ILP. Bank-exposed architectures can be employed by general-purpose processors, and by embedded chips, such as those used for digital-signal processing. This thesis presents Maps, an enabling compiler technology for bank-exposed architectures. Maps solves the problem of bank-disambiguation, i.e., how to distribute data in sequential programs among several banks to best exploit memory parallelism, while retaining the ability to disambiguate each data reference to a particular bank. Two methods for bank disambiguation are presented: equivalence-class unification and modulo unrolling. Taking a sequential program as input, a bank-disambiguation method produces two outputs: first, a distribution of each program object among the memory banks; and second, a bank number for every reference that can be proven to access a single, known bank for that data distribution. Finally, the thesis shows why non-disambiguated accesses are sometimes desirable. Dependences between disambiguated and non-disambiguated accesses are enforced through explicit synchronization and software serial ordering. The MIT Raw machine is an example of a software-exposed architecture. Raw exposes its ILP, memory and communication mechanisms. The Maps system has been implemented in the Raw compiler. Results on Raw using sequential codes demonstrate that using bank disambiguation in addition to ILP improves performance by a factor of 3 to 5 over using ILP alone.by Rajeev Barua.Ph.D
    corecore