The continued growth of microprocessors' performance and the need for better CPU utilization, has led to the introduction of the software peripherals' approach By this term we refer to sofhvare modules that can successfully emulate peripherals that, until now, were traditionally implemented in hardware. Software implementations offer great flexibility in product design and in hnctional upgrades, while they have high contribution in the cost@rformance ratio optimization. We focus on embedded applications, where the cost and the short time to market are the leading issues. In this paper, we study the hardware and s o h e requirements for developing a generic microprocessor with support for sofhvare peripherals. Additionally, we present three software peripherals, a Universal Asynchronous Receiver Transmitter, a keypad controller and a dot matrix LCD controller, and we analyze their impact in CPU occupation. Finally, we explore the impact of using a software UART on system power dissipation.
INTRODUCTION
Embedded microprocessors are used in a wide range of applications, from automotive control system to Palmtops and communication devices. These different markets have a common point: The need for low cost microprocessors, with high level of integration and performance. The growth of the embedded applications' market has brought an increasing migration from application specific logic to application specific code running on embedded processors [9] . The main reasons for this transition from hardware to software are lower cost, flexibility and reduced time to market that software solutions can provide.
The current state in embedded processor market includes a number of different core CPU architectures implemented by Permission to make digital or h a d copies of all or part of this work for personal OT classroom use i s granted without fee providcd that copies arc not madc or distributed for profit or commercial advantagc and thal copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Motorola's 68000, Intel's i960, Sun's Sparc, Hitachi's SuperH, etc. There are two strategies for integrating peripherals on such microprocessors. According to the first strategy, a basic core and additional logic for a custom device are integrated onto the same die. The second approach uses a standard microprocessor together with a companion chip that serves applications' specific needs [I]. These strategies, and especially the first, are leading to chips that are produced in relatively small volumes due to the fact that they serve only a small range of embedded applications (usually one).
Small production volumes are translated in higher final cost.
Additionally, developers' choice becomes quite difficult, when they have to choose a microprocessor that covers exactly their needs from the provided variety. This mean more time for searching and learning which entails higher cost of the final product, and longer time to market.
A solution to the problem stated above, is to produce more generic microprocessor chips, which can be software configurable, implementing several peripherals, allowing the resultant generic microprocessor to be tailored to many application areas. In this paper, we study a possible structure for such a microprocessor, which will provide the appropriate flexibility and will be able to constitute a common platform for the application designers.
The remainder of the paper is organized as follows: in the next section we present the current state of the art in the domains of hardware to software migration and reconfigurable architectures.
In section 3 we give a brief description of the microprocessor's schema and in section 4 we present the expected benefits of this approach. The utilization of the CPU is the key issue in such a design, thus in section 5 we present the performance analysis conceming a system with three software peripherals. In section 6 we study the e t b t that a sofhvare peripheral may have in power consumption, presenting the comparison between a software and a hardware implemented UART. Conclusions are presented in section 7.
RELATEDWORK
A large embedded processor manufacturer such as Advanced RISC Machines (ARM) claims that many modern 32-bit RISC processors can be used to implement many functions in soilware, including signal processing [3]. Recently, software modems have appeared in the market trying to replace modems traditionally implemented in hardware. Similarly, Motorola Semiconductors has developed the SM56 PCI software modem [14] and they aim to establish a 'software communication' market. With this term, they mean that in the future everything except the physical interface will be implemented in software including control, error correction, compressioddewmpression and modulation. These efforts are focused on high performance desktop processors. In our approach we deal exclusively with embedded processors taking into account their inherent limitations.
Ubicom Inc., (formerly Scenix) [IS] , has introduced the concept of 'Virtual Peripheralm', a method of using a portion of the processor's power to perform peripheral functions in software. Their 8-bit RISC-based microprocessor is the platform for running the virtual peripheral modules. Combining advanced architectural features the device is able, in spite of the small data bus, to implement hard real-time functions as software modules to replace traditional hardware functions.
DSP Processors [4] with the appropriate software routines can replace hardware modules of a design (e.g. modems). This category of processors has a special architecture, which help them to execute software related with signal processing quite fast. DSPs can handle in real time tasks that demand high processing power. In addition, such devices need to be programmed individually before they are used in the field. This limits the scope of such microprocessors to small volumes and thus higher cost.
In our approach, we propose a careful combination of both hardware and software methods to develop peripherals. Our primary concern remains the hardware to software migration.
Nevertheless, there are functions that cannot be implemented efficiently with software. Additionally, the purely software approach can be proven inadequate, when we have to deal with demanding peripherals. In this work we present a case where a minor addition in hardware can have beneficial effect in system performance.
SYS'IXM ARCHITECTURE
In this section we present the hardware and software requirements for developing a generic microprocessor with support for software peripherals. High performance and fast intermpt response are two important requirements, as software peripherals are individual tasks that need to be executed always on time, and microprocessor must be capable to satisfy this demand. Another important issue is the definition of the set of minimal hardware, which is essential for the efficient system operation. The system must also provide a fast and simple way for upgrading peripherals through a wellspecified programmable interface, and must be capable to achieve optimal synchronization of all tasks running concurrently, with a robust scheduling algorithm. We suggest that in an embedded system, sothare peripherals should not occupy more than the 20% of the CPU time. Setting this limitation, we ensure that software peripherals will never introduce great overhead to the system, leaving up to 80% of CPU time for the main application.
In our analysis, which is presented in section 5, we will show that this threshold is adequate to emulate our peripherals.
CPU Architecture Requirements
The functional blocks required by a microprocessor to make it an ideal platform for 'software peripherals', have been defined as hardware functions in the CPU. Figure 1 shows the resultant CPU architecture. As mentioned above, the embedded microprocessor must include:
High pedonnance core: High performance is a critical issue for the described microprocessor. Common techniques to achieve high performance are high clock frequencies, RISC core and pipelining. A high performance CPU can meet our goal of being able to emulate several peripherals, without disrupting the main application. Fasr Inrempr Responre: Implementing peripherals in software, increase the software contexts in an embedded system. We use banked registers to d u c e context switch time and achieve fast interruDt remonse. As it was previously mentioned, there are functions that cannot be implemented efficiently with software. In this work, we use basic hardware functions like timers and interrupt handlers, to implement software peripherals. In other embedded areas like multimedia applications, additional functions in hardware, as for example digital to analog converters, are necessary to retain processor's performance at high level. Furthermore, additions in CPU core, like Multiply and Accumulate Units, should also be included in requirements to support performance demanding DSP applications. Reconflgumble Pins: Reconfigurable pins correspond to a common Programmable Peripheral Interface. According to the peripheral set that is loaded to the system, these pins obtain the appropriate functionality. Suflcient Amounr ofMemory: The microprocessor should include sufficient internal memory (RAM & EPROM) to satisij increased system demands due to software peripherals. Through a programmable interface the appropriate peripherals and application code will be loaded or updated. Nevertheless, an external memory interface is necessary for more demanding applications.
A chip designed in such way, allows efficient implementation of software based peripherals and permits its integration in any embedded system. External peripherals can still be employed if necessary.
Implications on System Software Design
Software peripherals and main application program must execute concurrently. We consider software peripherals as tasks that are waiting for their service. It is possible that complex software peripheral can consist of several simpler ones. We can therefore build a sothare peripheral based on a hierarchy of simpler functions. These peripherals can be combined in a second level to construct a new peripheral and so on.
Software peripherals introduce extra tasks to the system software design. Scheduling these tasks on the processor, so that all the critical constrains are met, is a difficult problem. A great deal of work has been done on scheduling of embedded systems 161, including those with mixed workloads [7] . We can classify the scheduling policies for real time systems into two categories: Sfuric or prerunlime, where the scheduling algorithm runs offline and the tasks are well known in advance, and dynamic or runlime, where the scheduling is decided online. Each policy suits well in specific cases. In our case, the workload introduced by software peripherals is highly dependent of the target application and so does the scheduling technique. Static scheduling technique can offer a very good optimization when the time that events occur is well known in advance. Round-robin method is probably the simplest solution to our problem. Going a step further, we can use more sophisticated algorithms such as the interval scheduling described in [83. In the scenario described in section 5, software peripherals are implemented as timer routines having well known occurrences. Thus. static scheduling is applied. On the other hand, when we cannot predict the arrival and the execution time of tasks, dynamic scheduling gives us great flexibility providing online scheduling, increasing though the system complexity.
System Approach Rationale
The system designed and implemented as above offers the following advantages to the system designer: +Fast Upgrade: Software peripherals introduce a new fast and simple method of adding peripherals to a system or upgrading the existing ones through programmable interface. +Multiple configurations: The microprocessor in the described schema has a set of reconfigurable pins. According to the application, the peripheral set is loaded to the processor and the reconfigurable pins obtain the appropriate properties. In this way, multiple configurations of the same chip are possible. +Common development environment: Application Developers will have one processor for all the different applications that they design. This means less time for learning, great save to expenses of buying different evaluation boards, and shorter time to market for the final product. +Gain in Space: The microprocessor designers can utilize the saved silicon area to enrich the features of the main CPU core and increase its performance, while at the same time unused functions are eliminated. +CPU Utilization: CPU power is fully exploited since it is now also used for the execution of peripheral functions and it does not remain inactive for long period of times. +Chip Count Reduction: The processor will be able to substitute extemal chips, simplifying the PCB design and reducing the critical time-to-market for the final product. +Low Power Consumption: The overall power consumption of the application depends on the main processor utilization and the minimal set of hardware functions and not on extemal chips and circuits. In section 6, we present a case where the software solution is competitive to hardware solution.
Despite the referred advantages, there are also open issues that need to be resolved:
+Performance: Software. of course, cannot replace hardware without trade 0% in performance. Emulated peripherals are expected to have lower performance than the hardware ones. Nevertheless the effect of slower peripherals is expected to be minimal as processors become faster. +Synchronization: In a complex application with several peripherals running in parallel, the synchronization of all tasks is a critical issue. The scheduler's operation and optimization should be carefully studied.
PERFORMANCE ANALYSIS
In this section we study the performance of software peripherals implemented in our lab. We chose to implement a combination of three software peripherals that are used in a wide range of For the CPU occupation, we use the relation:
From the implemented UART program, the instruction count is equal to 16. Assuming that CPI is equal to 1, the Clock frequency is in a range from 30 to 100 MHz and the interrupt response is 8 clock cycles, we can conclude that CPU time for one bit lies also in a range from 8 0 h e c s to 24Onsecs. where CPU time is calculated from (5.1).
The calculated CPU occupation of this peripheral is extremely low: from 0.013% for a 30MHz CPU to 0.004% for a IOOMHz CPU. The main reason for these low values is because the interrupt service routine is executed rarely. Adding more features to the keypad controller, increasing its code complexity and length, will have little impact on CPU occupation which remains far below 1%. The disadvantage of the described solution is the large number of pins that should be used (4+8=12 pins). This problem can be moderated if we use encoderldecoder circuits at the columns/rows of the keypad respectively, reducing the number of used pins to 2+3=5. This software peripheral can be implemented efficiently enough as an internal timer routine. To calculate the exact frequency that this timer interrupt should occur, we take into consideration the LCD refresh rate and the total size of the display in dots:
Timer frequency = Refresh y t e X * Y (5.4) where X and Y are the numbers of dots in horizontal and in vertical dimension respectively. It is efficient to define that the refresh rate of the LCD is equal to 6OHz. Thus, for a 2x16 character display, where each character has 8x5 dots. the calculated timer frequency is: Timer fresuency = 60*(2*8)*(16*5) = 76800 Hz
The software LCD controller should shift a bit to the output every 1Rimer fresuency seconds. This bit will be shifted in an external LCD driver like MSM5260 [I91 from OKI semiconductor, which will be responsible for the interfacing between the software controller and the target display. The software controller should execute the following operations in order to emulate successively the functionality of a hardware implementation: a) read a character from display RAM, b) find the correct character pattem in character generator ROM, c) load the appropriate 5-bit value that corresponds to the current displayed horizontal dot line, d) shift one bit out and e) occasionally, proceed to the next character, load new horizontal dot line, or go to the next character line.
Shifting is the only operation that is always executed at the timer frequency. All the other operations have fewer occurrences than the shifting operation in the same time interval. For example, the read-from-RAM and the corresponding 5bit-load-from-ROM, occur every 54timer frequency) seconds, or the dot-line-change occurs every ScY/(timer frequency) seconds, where Y is the number of horizontal dots. Although in OUT case the total number of instructions is about 35, the average number of instructions executed per interrupt is less than IO (9.3 in our implementation).
Consumed CPU occupation due to this software peripheral is:
where CPUtime is calculated from (5.1) assuming that CPI is equal to 1, as in the case of the UART. As we can see in figure 3 , the software implementation consumes less than the hardware implementation up to the point the baud rate reaches close to 70 Kbitdsec. To obtain these results we used a simple power estimation model. We also made conventions about the way the microprocessor operates. For example, we assumed that there is no time or consumption penalty during a transition from sleep mode to operational mode and vice-versa. More detailed power estimation models are described in [9] , [IO] .
CONCLUSIONS AND

WORK
We have presented a systematic approach to peripherals for embedded systems, implemented in software. We tried to exploit the extra performance that modem processors offer, replacing traditionally hardware peripherals, with equivalent software ones. The basic idea that led us to this direction of 'software migration' was to produce flexible embedded systems without any 'glue logic'. We constructed in software three popular peripherals, an UART, a keypad controller and a dot matrix LCD controller. We investigated their efficiency and the load that they introduce to the main processor. In the case of the UART we also studied its behavior from the scope of power consumption, comparing it with that of an extemal hardware UART. We conclude that we can have an equivalent system using software peripherals, at an acceptable performance. In particular:
+Software peripherals can provide a feasible altemative, offering great flexibility and simplifying the microprocessor design as well as the design of the final embedded system. +They can dramatically reduce the final cost of an embedded application and retain the overall performance in a satisfactory level, giving an excellent cosVperformance ratio. +Software peripherals can follow the rapid microprocessor advances. As the microprocessors get faster the performance of software peripherals will also increase.
All the three peripherals that we studied had little impact on CPU performance, which decreases linearly as the clock frequency of the processor is increased. We should also point out that when a sohare peripheral overcomes the desired threshold of CPU occupation, small hardware additions, like the addition of a shift register in the LCD controller case, might have catalytic impact in the system performance.
The future directions of this work will be the thorough definition of a minimal set of hardware peripherals that are used by a wide range of embedded applications and cannot be implemented in software. Additionally more complicated software peripherals will be implemented and studied. Finally, we will also turn into the domain of embedded scheduling and study the fmibility of systems with a substantial number of s o h a r e peripherals and mixed workloads.
ACKNOWLEDGEMENTS
the University of Patras.
This work was supported by the Caratheodory Programme of
