Abstract. For a programmable I/O device controller, the allocation of device parameters on I/O registers affects the code size and execution time of its associated I/O device driver. In traditional design flow, the development of device drivers can not begin until the allocation is fixed. This paper presents a new design methodology that allows a designer to seek an allocation that reduces the software or hardware cost concurrently with developing device drivers. The software cost means the code size or execution time and the hardware cost the number of I/O registers. The exact allocation with the minimum cost under constraints is formulated as zero-one integer linear programming problem. Heuristic algorithms based on iterative refinement are also proposed. The proposed design methodology was implemented in C language. Compared with current industrial designs, the approach can obtain design alternatives that reduce both software and hardware costs. Furthermore, the experimental results also investigate design spaces for various application features. It turns out that the HW/SW codesign approach is favorable in development of embedded systems.
Introduction
A programmable I/O controller is used to manage a peripheral physical I/O device. A device driver is a software layer that lies between an operation system and the I/O controller. It can configure the operation modes of the device, observe its statuses, and transfer the data via accessing registers in the I/O controller. Figure 1 shows such a hardware and software interface. Usually, a register contains several fields (each occupying several consecutive bits), each of which represents an operation mode, a status or a datum. Each field is referred as a device parameter [3] (or device variable as defined in [8] ). For example, the length of stop bits is a device parameter of a UART controller. To configure a data frame for a UART transfer, besides of the stop bits, one should also set parity mode, word length and baud rate. These parameters associated with a common purpose form a parameter group. To minimize the number of registers and the number of register accesses, hardware designers often allocate device parameters in the same group into the same register. However, this may increase the number of I/O accesses and bit-operations (such as shift instruction and logic instruction) when a driver wants to manipulate individual parameters. Let us see examples in the Fig. 2 , that shows various allocations for two parameters A and B. Fig.  3 shows C codes of several access functions in the device driver (or HAL), including set_B( ), get_A( ), set_A_B( ) (modifying A and B simultaneously) and get_A_B( ), for allocation (2) . Fig. 4 shows C codes of the same access functions for allocation (1) . As shown in these codes, the allocation of device parameters on I/O registers affects the code size and execution time of its associated I/O device driver. As reported in [8] , these low-level codes have been found to represent up to 30% of a device driver. Their size and performance inevitably become important issues for I/O intensive embedded systems.
In traditional design flow, the development of device drivers can not begin until the device parameter allocation is fixed. In the proposed HW/SW codesign flow, a software programmer can write the device driver using pre-defined parameter-access functions such as set_B( ). However, the real C codes of these access functions depend on the physical allocation of parameters. This paper presents a novel design methodology that allows a designer to seek an allocation that reduces the codes of parameter-access functions or the number of I/O registers when developing device drivers. The exact allocation with the minimum cost under constraints is formulated as a zero-one integer linear programming problem. The HW/SW codesign approach is favorable in development of embedded systems. The formulations of an allocation with minimum software or hardware costs are derived. Heuristic algorithms based on iterative refinement are proposed to explore the optimization. The proposed design methodology were implemented in C language and evaluated with a set of real devices. Compared with current industrial designs, we can obtain design alternatives that reduce both software and hardware costs. Furthermore, the results also indicate design spaces for various application features.
To the best of our knowledge, the paper [5] should be the first work to investigate how the parameter allocation can affect the performance of drivers and to try finding an optimal allocation. However, it does not address hardware optimization.
Device driver (or HAL)
Codes for accessing and manipulating device parameters

Operating system
Registers
Device parameters Programmable device controller 
Physical devices
Furthermore, it does not allow parameters belonging to different groups to share a register. Other related works handling the synthesis of device drivers or interface codesign all assume the allocation of device parameters are pre-fixed. F. Merillon et al. [8] address the automatic synthesis of such low-level codes from a higher level specification, called as Devil language. Recent works [1, 9, 10] extend the language's descriptive capability and try to automatically synthesize more dedicated parts of a device driver. P. Chou et al. [2] propose an interface HW/SW cosynthesis work, which synthesizes driver codes as well as glue hardware logic to connect I/O controllers. G. Gognoit et al. proposed communication synthesis and HW/SW integration for Embedded System Design in [4] .
The next section defines the software and hardware cost models. Section 3 formulates the exact solution of a parameter allocation with minimum costs. The whole design methodology and tools are presented in Section 4. Experimental results are shown in Section 5. The final section draws conclusions. 
Cost Models
The proposed system allows users to trade off the HW cost for an I/O controller and the SW cost for its associated device driver. The HW and SW cost modes are defined in this section. Based on the observations on the C codes in Fig. 3 and Fig. 4 , we summarize the software costs for different accesses and allocations in Table 1 . A parameter can be allocated in three types of registers: (1) single: the register contains the parameter only; (2) shared: the register contains more than one parameter and all parameters in it belong to the same group; (3) group-shared: the register contains more than one parameter and the parameters in it are from different groups. The allocation (3) in Fig. 2 is such an example. The I/O access functions include reading (writing) an individual parameter and reading (writing) K parameters of the same group simultaneously. Most access functions for shard register and group-shared register are the same, except writing data to some parameters of a group, in that accessing a group-shared register needs one more I/O read access for retaining the values of parameters belonging to other groups. 
: Execution time (or instruction length) of an I/O access instruction C 2 : Execution time (or instruction length) of a bit-manipulation instruction
The total software cost under our consideration also depends on the execution numbers of these I/O accesses. These numbers are application-specific. The number of times a driver reads (writes) an individual parameter X i in a period is given as read ). The number of times a driver reads (writes) the whole group is given as group-read frequency f g r , (group-write frequency, f g w ). Given these access profiles, the software cost can be determined by the allocation of device parameters. In this work, the software cost can be code size or execution time. If code size is the main concern of software cost, the execution number means the number of parameter-access functions used as in-line functions. The first two rows in the above equation calculate the costs in the first column of Table 2 . The third and fourth rows calculate the costs in the second column. The final two rows calculate the costs in the third column. Generally, the proposed system handles each parameter group separately since the register sharing between different groups always increases software cost. The following shows the software cost function for the case that a group does not share any register with other groups. Φ(V g , m, 0, p, 0) The hardware cost is defined as the number of registers used to allocate device parameters, as described in the following definitions. 
Register index Variables allocated in this register
= { X 1 , X 2 ,…, X n }, the software cost C s (Φ) of an allocation Φ(V g , m p , m q , p, q) equals ) ) 2 ( ( ) ( ) 2 2 )( ... ( ) )( .... ( ) )( ( ) 2 2 )( ... ( ) )( .... ( ) )( ( ) .... .... ( 22 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 1 1 11 C m C C q f C m qC f C C f f f C C f f f C m pC f f C C f f f C C f f f C m m n f f C f f f f C
Definition 3: (C s (Φ) for
mC pC f C m n f C C f f f C C f f f C f f f f
3
Formulation of Exact Minimization
This work seeks an allocation with the minimal SW or HW cost. The exact solution assuming no group-shared register being used is formulated. The software cost can be formulated as a problem of zero-one integer linear programming by using binary decision variables with two indices : Y={y i,j ; i=1, 2,…, n; j=1, 2,…, λ}, where n is the number of parameters, and λ is the upper bound of the register used. The y i,j = 1 (0) means that the device parameter X i is (is not) allocated in the register j. First, the number of shared registers, p, is assumed to be fixed. In other words, at most λ−p single registers are used. General cases are discussed later. Let B i be the bit-length of device parameter X i , and RL the width of register. The registers numbered 1, 2,…, p denote shared registers. The problem now can be stated as follows: Minimize software: λ Condition (a) means that any parameter must be allocated in exactly one register. Condition (b) means that the number of parameters allocated in a shared register must be at least 2. Condition (c) means that the number of parameters allocated in a single register must not be greater than 1 (but could be zero). Condition (d) means that the sum of the bit lengths of parameters in a register must not be larger than the length of a register.
In the hardware side, the exact cost minimization under no constraint is just a bin packing problem [7] . Under a software constraint, the exact minimization can also be formulated as a zero-one ILP problem. We still need the decision variable Y.
Furthermore, we define a binary decision variable Z j , j={1, 2,…, λ}, where λ=n-p, is the maximum number of registers probably used. The z ,j = 1 (0) means that the j th register is (is not) used. Let SC be the given software-cost constraint. The symbols B i , X i , and RL are used as for software minimization formulation. The problem now can be stated as follows:
Minimize hardware:
(the same as ones for the software part)
The former three conditions are the same as those for software minimization. Condition (e) constrains the total bit length of parameters in a register. Condition (f) is the software constraint. Based on the above formulations, an ILP-solver can be used to obtain an exact minimization if p shared registers are used. To obtain a global exact minimization, the ILP-solver must be run n/2 times with p = 1, 2,….n/2, and the best result then chosen.
The flow of the proposed design methodology is shown in Fig. 6 . The bit length of a parameter and groups' members are fixed, while the access profiles are determined by a dedicated application. Given the device-parameter specification, the system first derives two allocations on the assumption that the software (hardware) constraint is unlimited when minimizing the hardware (software) cost. One has a minimal software cost and the other minimal hardware cost. They are two extreme solutions of the design space. Then a user can give a hardware constraint (or a software constraint) and obtain an allocation having a minimal software cost (or hardware). The inner loop allows a user to iteratively refine the solution to meet a certain purpose. The acceptable allocation for all device parameters can then be fed into a software synthesis tool [6, 8] , which generates the low level codes of parameter-access functions. Furthermore, the allocation will be used by a hardware synthesis tool to generate I/O registers.
In both HW and SW optimizations, we first handle each device parameter group separately. Then we allow parameters in different groups to share registers if the following situations occur: (1) if no solution exists to meet the hardware constraint and (2) the hardware cost can be further reduced while still meeting the software constraint.
Although an ILP solver can derive the solution with exact minimum cost, it is time-consuming for handling large cases. Heuristic approaches are presented here. The heuristic algorithm starts from an initial allocation with the smallest number of registers, which corresponds to an exact solution of a bin-packing problem. Then, the allocation is iteratively refined to obtain a lower cost until the constraint is violated or no further improvement can be obtained. The strategy is applied to both software and hardware optimization. Each refinement process relies on proven lemmas, which have been presented in [5] . To show these lemmas, a new notation is defined as follows: The lemmas exploited in our approach are described in the followings, whose proofs can be directly derived by subtracting C s (Φ) from C s (Φ'). . The SW-optimization algorithm can be found in [5] . The HW-optimization part is shown in Fig. 7 . An initial allocation with the minimum number of registers is derived by an exact bin-packing procedure, which uses a branch-and-bound approach as proposed by Martello and Toth [7] . Starting from the initial allocation, the solution is iteratively refined. If the software cost of the initial allocation is larger than the given software constraint (SC), the algorithm runs bin_packing_maximize_single(), which tries to derive a better allocation according to Lemma 1. Then, if the constraint is still not met, the Lemma 2 is applied. 
