1. ABSTRACT Operating systems and development tools can impose overly general requirements that prevent an embedded system from achieving its hardware performance entitlement. It is time for embedded processor designers to become more involved with system software and tools.
INTRODUCTION
Embedded processors are typically evaluated on instruction execution speed, I/O bandwidth, and power consumption. The actual performance of an embedded system depends on many other factors, including how well the instruction set architecture (ISA) can be exploited by software.
Efficiency is particularly important in specialized real-time computers such as digital signal processors, where all available performance is needed for new applications such as AC-3 audio decoding or DSL data communications. Efficiency directly impacts total system cost when, for example, vendors compete on how many V.90 modems can be supported by a single DSP. This level of efficiency is less important on general-purpose processors such as PCs, where programmers are less exposed to the hardware.
There are several obstacles to realizing the available performance. First, as system software is increasingly supplied by independent vendors, hardware designers and software designers rarely work together, and their technologies develop independently rather than synergistically. Second, there are many embedded processors and few common software design frameworks, so system software vendors must support many designs and processor families with general solutions. Finally, product designers worry more about large software building blocks; there is little mind-share and few royalty dollars left for system software.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 99. New Orleans, Louisiana 01999 ACM 1-581 13-092-9/99/0006..$5.00
This paper looks at some ways in which hardware designers can improve system performance using system software and tools. This is not an exhaustive list, but may stimulate ideas.
INSTRUCTION SET ISSUES
With software being written mostly in high-level programming languages, instruction set design must be coordinated with optimizing compiler technology. New ISA technologies may in fact be gated by compiler research. For example, the market acceptance of TI'S VLIW architecture in its TMS320C6000 series of DSPs is due in part to the success of software pipelining optimizations in TI'S compilers.
Compilers can also enable ISA features that would have been unacceptable when software was written entirely in assembly language. For example, computers with non-interlocked pipelines are extremely difficult to program in assembly language, because the programmer must keep track of the execution progress of many instructions at the same time. TI addressed this issue by providing a "linear assembly language" that presents the same instructions as the C6000, but hides pipeline parallelism from the programmer. The assembler optimizes linear code for the actual ISA.
Specialized instructions on DSPs also present compiler issues. There are several cases:
Instructions used only in a few algorithms should be isolated in specialized assembly code libraries. They don't affect software tools.
Optimized instructions, such as multiply-accumulate, should be generated by an enhanced compiler fiom common programming idioms such as array indexing.
ISA features such as circular addressing or fixed-point arithmetic may be best accommodated with language extensions. Alternatively, the compiler can provide direct access to the instructions through assembly inserts or intrinsic functions.
OPERATING SYSTEM ISSUES
With so many embedded processor families and so many ways of designing real-time software, a real-time operating system (RTOS) must mediate between two sets of abstractions. On the hardware side, it must match the architectures of embedded processors from several different manufacturers. On the software side, it must support real-time applications with wildly differing requirements and software architectures. The result is often a generalized RTOS that is a poor fit to either side.
Software Configuration
Most general-purpose operating systems are designed to support dynamic work loads, but embedded processors traditionally run a fixed application. Placing one-time RTOS initialization code on the processor uses memory and cycles without purpose. What is needed is a "resource editor"-a tool that supports the static configuration of software objects when the application is built. These tools are commonly available in PC software development to support fixed graphical objects.
For DSPs, TI's DSP/BIOS(tm) software is designed to be statically configured as part of the application development process. A configuration editor is used to define and configure the tasks, device drivers, and communication channels needed. Pre-allocated data structures are linked with the target application, and a corresponding collection of debug and analysis information remains on the host to help in debugging or monitoring the application.
Addressing and Protection
Despite what was just said about fixed applications, designers are planning more ambitious embedded computer products. Customers will be able to customize their video centers and cellular phones by adding new software applications. The success of this approach depends in part on making it easy for independent software vendors to write such addedvalue applications, while protecting the products' basic functionality from bugs in those applications.
The traditional hardware solution is to provide protected virtual memory on the embedded processor, simplifying programming and isolating applications from each other. Microsoft WinCE(tm) and other general-purpose operating systems being adapted for embedded use depend on such hardware support.
The performance cost of additional logic for hardwareassisted virtual memory is unacceptable in digital signal processors. TI's approach has been to combine a DSP with a general-purpose processor on a single chip. The general processor provides a protected environment for software applications and the larger OS. The DSP handles the hard realtime tasks with higher performance and lower total power consumption than the general processor could achieve alone. Supporting such a system requires a careful hardware system architecture, plus investments in tooling and system software for U 0 and communication. The hardware vendor is in the best position to provide this support.
Devices and Drivers
Peripheral devices are another area in which hardware opportunities and applicatio6 requirements are negotiated by an independent RTOS vendor. Hardware designers often include advanced features in their peripheral devices to get higher throughput. These features will be useless if the operating system's device drivers will not take advantage of them, either because the driver model is too simple or because the device design is incompatible with higher-level operating system abstractions (e.g., multiprocessing).
Designing the drivers and devices together, including the application interfaces to the drivers, yields the best performance. Properly designed drivers can also demonstrate how a device ought to be used-something normally relegated to application notes.
REAL-TIME ANALYSIS AND DEBUG
Hardware designers can also help application developers get their products to market more quickly. Integration, debugging, and analysis of real-time systems can be very difficult, especially in the final product stages.
The simplest hardware tool needed to support performance analysis is an instruction or cycle counter. Unfortunately, many embedded processors supply only a general-purpose, periodic timer, which is often too narrow (e.g., 16 bits) and too difficult to share with other applications. (Stopping and starting the timer can degrade applications with clock skew.) Hardware designers could support measurements by providing a simple, nonstop, read-only cycle counter at least 32 bits wide. Using this device, programmers can collect good performance data with minimal overhead.
More extensive monitoring of real-time system depends on the ability of a development host to send and receive data from the target system unobtrusively. Software techniques using low-priority tasks to transfer data over serial ports are not useful in high-performance DSP systems; they are too intrusive at the data rates needed to monitor real-time signals. TI's approach is to provide RTDX(tm), hardware-supported data transfers using the JTAG debug port.
Finally, hardware-assisted, stop-mode debugging of realtime systems is not useful when the target cannot fail to respond to interrupts controlling external actions. Real-time debug facilities are needed that can halt the DSP while still allowing designated interrupts to be processed. TI is integrating this capability into its DSPs and is supporting it with extensions to the debugging and analysis tools. 
ACKNOWLEDGEMENTS

Panel: Functional Verification -Real Users, Real Problems, Real Opportunities
The panelists will begin by dissecting the bottlenecks in their verification processes. For example, are simulators too slow? Or do test vector generation and coverage analysis consume the most time? The panelists will present their ideas for new EDA products which might accelerate verification. Finally, the panelists will discuss what compromises they would accept in order to achieve this acceleration. Would they learn a new HDL? Restrict their design styles? Forsake legacy designs?
Position Statements Nozar Azarakhsh, Cisco Systems
Design verification of multi-ASIC, large-scale, and complex systems is an extremely challenging proposition since the current tool set and traditional approaches do not scale. It is important that each ASIC in the system is verified in a test bench that is flexible and powerful, and capable of creating directed test cases, comer test cases, stress test cases, and random test cases. It is also equally important that the whole system which incorporates these ASKS has an equally powerful test bench that leverages the ASIC-in-isolation tests and their test benches. There are two other major objectives in system level verification projects: board-level verification and high-level architectural verification. With the right verification methodology, it is possible to accomplish all of these objectives, but powerful test bench and model development tools are required.
Glen Ewing, Hughes Space and Communications
Functional and performance verification of a digital communications subsystem requires simulation with a high level language such as C to obtain the capacity, speed and flexibility required. Verification of the ASIC designs that will implement the subsystem, however, requires VHDL simulation and possibly gate level accelerated simulation.
Bottlenecks occur in trying to link the VHDL and gate simulations to the C simulation. Engineers need to learn differing simulator environments, usually mastering only one. Analysis tools developed for one environment are seldom ported to the others for lack of time. Simulation models are not written in the most natural language (e.g. C for signal processing, VHDL for control) due to lack of interoperability. Custom test benches have to be laboriously coded to integrate the C simulation into each of the other simulators.
Adding openness to EDA tools to allow extension and interoperation would alleviate many of our problems and provide synergistic opportunities. For example simulator control could be based on the use of the freely available Tcl scripting language, allowing all operations across multiple tools to be put under the control of one command line and automated through scripting. Simulators could provide Foreign Language Interface calls to allow them to be used as slaves to other simulators, allowing, for example, a top level simulation to use C for most of its components, with the remaining components modeled with VHDL, an accelerator, or an emulator. Being able to freely extract results from simulations as they mn would be key to being able to build one set of analysis tools that would work with all simulator engines.
Finding EDA tools that meet all one's needs perfectly is impossible, but by making tools modular and easy to extend via open interfaces, one could integrate best-in-class tools to fit one's needs. The compromises to be made would be reduced to accepting the simulation speed hit from handshaking the various tools.
Paul Gingras, Sun Microsystems
One of the future challenges in functional verification is in developing new tools and techniques to improve coverage and feedback within a given simulation environment. Existing methodologies consist of writing "bus functional models" and/or "drivers" to stimulate and verify a well defined protocol, which is a very important aspect of verification. However, the challenge for any verification effort is knowing the extent to which the design space has been sufficiently covered. Random simulations are a means for expanding design coverage, but without an intelligent feedback mechanism, the likelihood that all design states and permutations are exposed is extremely low, and therefore, the likelihood of having hardware problems in the ASIC is very high.
Part of the problem with random testing is that the distribution of random events tends to follow a traditional bell-shaped curve; adding more and more random tests does little to guarantee that "new" states of the design are covered. In fact, random tests typically exercise the same design space over and over again, with only slight variations. Rather than blindly running random tests, it would be desirable to have tools and technologies that could intelligently steer random tests into new directions and avoid the overlap of coverage. By doing this, tests would no longer duplicate paths that have been already been covered and the bell curve for random test coverage would widen dramatically. In addition, this intelligent random testing, would greatly improve testing efficiency and, consequently, verification completeness could be measured more accurately.
Scott Reedstrom. Guidant
Full verification is not an option in the design of implantable medical devices -it is an ethical and legal necessity. Verification of these devices requires managing the full range of embedded system elements, including specialized analog amplifiers and filters; processors, specialized datapath hardware, and firmware; and external programming devices and software. All these elements interface to a biological system that may have inherent delays in responding to stimulus in the tens of seconds. Traditional verification methods could not support both the long simulation times required to model the biological portion of these systems and the accuracy to model the analog and digital components correctly. Until fairly recently, the only viable option was to build a fully functional prototype with which to start verification. This forced concentration of the verification effort into the back end of the product development cycle, causing the traditionally long design cycles for these products.
To attack this bottleneck, Guidant has utilized several different approaches to lower the amount of time required for the final verification steps. One of these is requirement tracking and partitioning. Requirement tracking and partitioning is a system engineering approach that tightly tracks system level requirements and assigns specific portions of the design to meet all or some of those requirements. The assignment allows specific IC vectors or tests to be traced directly back to a system requirement, thus eliminating that requirement from the final system verification. Another approach is broad use of system level emulation. Although it took significant effort, by modeling portions of the analog devices in hardware, utilizing digitized interfaces into our biological system simulators, and automating the entire system, we were able to provide a hardwarelsoftware co-verification environment early in the design process. This also provided an environment for system level regression tests that could be run in hours instead of weeks.
We see great hope for further reductions as two specific areas develop. At the top level, executable system requirements will allow early verification of designs with system behavior. At the bottom level, simulation systems that couple behavioral, cycle based and analog simulations may reach performance levels that could allow hardwardsoftware co-verification without the need for any type of prototype hardware.
Chris Rowen, Tensilica
In the regime of system-on-a-chip, verification becomes both harder and more critical. It is harder not only because of the sheer hardware complexity of 0.25 and 0.15 micron designs, but also because true system-on-a-chip includes rapidly growing software content. It is more critical because system development cycles are shorter, the urgency for application-specific optimization of power, performance and cost is higher, and the market penalty for system bugs is more severe.
Embedded processors will play the central role in system-on-achip verification. The construction and refinement of system-level and detailed chip logic models all tend to start from the processor and software models. Comprehensive support of fast cycle-accurate simulation, system test benches, diagnostic suites, embedded debug, accurate emulation, co-simulation, and authentic RTL simulation make rapid, reliable system verification possible. Moreover, easy extension of the processor instruction set, memories, peripherals and interfaces allows much more of the total system design to be achieved within this improved processor-centric verification environment.
Embedded processors also lead the way in establishing new block-level verification methods. Tensilica has successfully implemented a highly automated and thorough methodology for application-specific processors. This methodology gives high coverage and confidence of processor correctness across a broad range of environments. Most importantly, it ensures system-level correctness for each possible processor configuration, even as the processor is tuned and adapted to fit the needs of each system design. The complete methodology includes innovations in formal description of processor semantics; hardwardsoftware co-simulation; monitors and checkers; automated generation of test benches, bus functional models, and diagnostics; and coupled generation of software tools and processor hardware. These methods point the way for much more automated system Verification.
