1,822 research outputs found

    Power estimation on functional level for programmable processors

    Get PDF
    In diesem Beitrag werden verschiedene AnsĂ€tze zur VerlustleistungsschĂ€tzung von programmierbaren Prozessoren vorgestellt und bezĂŒglich ihrer Übertragbarkeit auf moderne Prozessor-Architekturen wie beispielsweise Very Long Instruction Word (VLIW)-Architekturen bewertet. Besonderes Augenmerk liegt hierbei auf dem Konzept der sogenannten Functional-Level Power Analysis (FLPA). Dieser Ansatz basiert auf der Einteilung der Prozessor-Architektur in funktionale Blöcke wie beispielsweise Processing-Unit, Clock-Netzwerk, interner Speicher und andere. Die Verlustleistungsaufnahme dieser Blšocke wird parameterabhĂ€ngig durch arithmetische Modellfunktionen beschrieben. Durch automatisierte Analyse von Assemblercodes des zu schĂ€tzenden Systems mittels eines Parsers können die Eingangsparameter wie beispielsweise der erzielte ParallelitĂ€tsgrad oder die Art des Speicherzugriffs gewonnen werden. Dieser Ansatz wird am Beispiel zweier moderner digitaler Signalprozessoren durch eine Vielzahl von Basis-Algorithmen der digitalen Signalverarbeitung evaluiert. Die ermittelten SchĂ€tzwerte fĂŒr die einzelnen Algorithmen werden dabei mit physikalisch gemessenen Werten verglichen. Es ergibt sich ein sehr kleiner maximaler SchĂ€tzfehler von 3%.</p><p style=&quot;line-height: 20px;&quot;> In this contribution different approaches for power estimation for programmable processors are presented and evaluated concerning their capability to be applied to modern digital signal processor architectures like e.g. Very Long InstructionWord (VLIW) -architectures. Special emphasis will be laid on the concept of so-called Functional-Level Power Analysis (FLPA). This approach is based on the separation of the processor architecture into functional blocks like e.g. processing unit, clock network, internal memory and others. The power consumption of these blocks is described by parameter dependent arithmetic model functions. By application of a parser based automized analysis of assembler codes of the systems to be estimated the input parameters of the Correspondence to: H. Blume ([email protected]) arithmetic functions like e.g. the achieved degree of parallelism or the kind and number of memory accesses can be computed. This approach is exemplarily demonstrated and evaluated applying two modern digital signal processors and a variety of basic algorithms of digital signal processing. The resulting estimation values for the inspected algorithms are compared to physically measured values. A resulting maximum estimation error of 3% is achieved

    Implementation of arithmetic primitives using truly deep submicron technology (TDST)

    Get PDF
    The invention of the transistor in 1947 at Bell Laboratories revolutionised the electronics industry and created a powerful platform for emergence of new industries. The quest to increase the number of devices per chip over the last four decades has resulted in rapid transition from Small-Scale-Integration (SSI) and Large-Scale-lntegration (LSI), through to the Very-Large-Scale-Integration (VLSI) technologies, incorporating approximately 10 to 100 million devices per chip. The next phase in this evolution is the Ultra-Large-Scale-Integration (ULSI) aiming to realise new application domains currently not accessible to CMOS technology. Although technology is continuously evolving to produce smaller systems with minimised power dissipation, the IC industry is facing major challenges due to constraints on power density (W/cm2) and high dynamic (operating) and static (standby) power dissipation. Mobile multimedia communication and optical based technologies have rapidly become a significant area of research and development challenging a variety of technological fronts. The future emergence or 4G (4th Generation) wireless communications networks is further driving this development, requiring increasing levels of media rich content. The processing requirements for capture, conversion, compression, decompression, enhancement and display of higher quality multimedia, place heavy demands on current ULSI systems. This is also apparent for mobile applications and intelligent optical networks where silicon chip area and power dissipation become primary considerations. In addition to the requirements for very low power, compact size and real-time processing, the rapidly evolving nature of telecommunication networks means that flexible soft programmable systems capable of adaptation to support a number of different standards and/or roles become highly desirable. In order to fully realise the capabilities promised by the 4G and supporting intelligent networks, new enabling technologies arc needed to facilitate the next generation of personal communications devices. Most of the current solutions to meet these challenges are based on various implementations of conventional architectures. For decades, silicon has been the main platform of computing, however it is slow, bulky, runs too hot, and is too expensive. Thus, new approaches to architectures, driving multimedia and future telecommunications systems, are needed in order to extend the life cycle of silicon technology. The emergence of Truly Deep Submicron Technology (TDST) and related 3-D interconnection technologies have provided potential alternatives from conventional architectures to 3-D system solutions, through integration of IDST, Vertical Software Mapping and Intelligent Interconnect Technology (IIT). The concept of Soft-Chip Technology (SCT) entails integration of Soft‱ Processing Circuits with Soft-Configurable Circuits . This concept can effectively manipulate hardware primitives through vertical integration of control and data. Thus the notion of 3-D Soft-Chip emerges as a new design algorithm for content-rich multimedia, telecommunication and intelligent networking system applications. 3‱D architectures (design algorithms used suitable for 3-D soft-chip technology), are driven by three factors. The first is development of new device technology (TDST) that can support new architectures with complexities of 100M to 1000M devices. The second is development of advanced wafer bonding techniques such as Indium bump and the more futuristic optical interconnects for 3-D soft-chip mapping. The third is related to improving the performance of silicon CMOS systems as devices continue to scale down in dimensions. One of the fundamental building blocks of any computer system is the arithmetic component. Optimum performance of the system is determined by the efficiency of each individual component, as well as the network as a whole entity. Development of configurable arithmetic primitives is the fundamental focus in 3-D architecture design where functionality can be implemented through soft configurable hardware elements. Therefore the ability to improve the performance capability of a system is of crucial importance for a successful design. Important factors that predict the efficiency of such arithmetic components are: ‱ The propagation delay of the circuit, caused by the gate, diffusion and wire capacitances within !he circuit, minimised through transistor sizing. and ‱ Power dissipation, which is generally based on node transition activity. [2] Although optimum performance of 3-D soft-chip systems is primarily established by the choice of basic primitives such as adders and multipliers, the interconnecting network also has significant degree of influence on !he efficiency of the system. 3-D superposition of devices can decrease interconnect delays by up to 60% compared to a similar planar architecture. This research is based on development and implementation of configurable arithmetic primitives, suitable to the 3-D architecture, and has these foci: ‱ To develop a variety of arithmetic components such as adders and multipliers with particular emphasis on minimum area and compatible with 3-D soft-chip design paradigm. ‱ To explore implementation of configurable distributed primitives for arithmetic processing. This entails optimisation of basic primitives, and using them as part of array processing. In this research the detailed designs of configurable arithmetic primitives are implemented using TDST O.l3”m (130nm) technology, utilising CAD software such as Mentor Graphics and Cadence in Custom design mode, carrying through design, simulation and verification steps

    Design-time performance analysis of component-based real-time systems

    Get PDF
    In current real-time systems, performance metrics are one of the most challenging properties to specify, predict and measure. Performance properties depend on various factors, like environmental context, load profile, middleware, operating system, hardware platform and sharing of internal resources. Performance failures and not satisfying related requirements cause delays, cost overruns, and even abandonment of projects. In order to avoid these performancerelated project failures, the performance properties should be obtained and analyzed already at the early design phase of a project. In this thesis we employ principles of component-based software engineering (CBSE), which enable building software systems from individual components. The advantage of CBSE is that individual components can be modeled, reused and traded. The main objective of this thesis is to develop a method that enables to predict the performance properties of a system, based on the performance properties of the involved individual components. The prediction method serves rapid prototyping and performance analysis of the architecture or related alternatives, without performing the usual testing and implementation stages. The involved research questions are as follows. How should the behaviour and performance properties of individual components be specified in order to enable automated composition of these properties into an analyzable model of a complete system? How to synthesize the models of individual components into a model of a complete system in an automated way, such that the resulting system model can be analyzed against the performance properties? The thesis presents a new framework called DeepCompass, which realizes the concept of predictable assembly throughout all phases of the system design. The cornerstones of the framework are the composable models of individual software components and hardware blocks. The models are specified at the component development time and shipped in a component package. At the component composition phase, the models of the constituent components are synthesized into an executable system model. Since the thesis focuses on performance properties, we introduce performance-related types of component models, such as behaviour, performance and resource models. The dynamics of the system execution are captured in scenario models. The essential advantage of the introduced models is that, through the behaviour of individual components and scenario models, the behaviour of the complete system is synthesized in the executable system model. Further simulation-based analysis of the obtained executable system model provides application-specific and system-specific performance property values. To support the performance analysis, we have developed a CARAT software toolkit that provides and automates the algorithms for model synthesis and simulation. Besides this, the toolkit provides graphical tools for designing alternative architectures and visualization of obtained performance properties. We have conducted an empirical case study on the use of scenarios in the industry to analyze the system performance at the early design phase. It was found that industrial architects make extensive use of scenarios for performance evaluation. Based on the inputs of the architects, we have provided a set of guidelines for identification and use of performance-critical scenarios. At the end of this thesis, we have validated the DeepCompass framework by performing three case studies on performance prediction of real-time systems: an MPEG-4 video decoder, a Car Radio Navigation system and a JPEG application. For each case study, we have constructed models of the individual components, defined the SW/HW architecture, and used the CARAT toolkit to synthesize and simulate the executable system model. The simulation provided the predicted performance properties, which we later compared with the actual performance properties of the realized systems. With respect to resource usage properties and average task latencies, the variation of the prediction error showed to be within 30% of the actual performance. Concerning the pick loads on the processor nodes, the actual values were sometimes three times larger than the predicted values. As a conclusion, the framework has proven to be effective in rapid architecture prototyping and performance analysis of a complete system. This is valid, as in the case studies we have spent not more than 4-5 days on the average for the complete iteration cycle, including the design of several architecture alternatives. The framework can handle different architectural styles, which makes it widely applicable. A conceptual limitation of the framework is that it assumes that the models of individual components are already available at the design phase

    High-level Modelling and Exploration of Coarse-grained Re-configurable Architectures

    Full text link
    • 

    corecore