Knapsack Model and Algorithm for Hardware/Software Partitioning Problem by Ray, Abhijit et al.
Computing and Informatics, Vol. 23, 2004, 557–569
KNAPSACK MODEL AND ALGORITHM
FOR HARDWARE/SOFTWARE
PARTITIONING PROBLEM
Abhijit Ray, Wu Jigang, Thambipillai Srikanthan
Centre for High Performance Embedded Systems




e-mail: {PA8760452, asjgwu, astsrikan}@ntu.edu.sg
Manuscript received 12 October 2004; revised 17 December 2004
Communicated by Hong Zhu
Abstract. Efficient hardware/software partitioning is crucial towards realizing op-
timal solutions for constraint driven embedded systems. The size of the total solu-
tion space is typically quite large for this problem. In this paper, we show that the
knapsack model could be employed for the rapid identification of hardware com-
ponents that provide for time efficient implementations. In particular, we propose
a method to split the problem into standard 0-1 knapsack problems in order to
leverage on the classical approaches. The proposed method relies on the tight lower
and upper bounds for each of these knapsack problems for the rapid elimination of
the sub-problems, which are guaranteed not to give optimal results. Experimental
results show that, for problem sizes ranging from 30 to 3000, the optimal solution
of the whole problem can be obtained by solving only 1 sub-problem except for one
case where it required the solution of 3 sub-problems.
Keywords: Hardware/software partitioning, embedded systems, algorithm, knap-
sack problem
558 A. Ray, W. Jigang, T. Srikanthan
1 INTRODUCTION
Hardware/software partitioning (HSP) problem is the problem of deciding for each
subsystem, whether the required functionality is to be implemented in hardware or
software to get the desired performance in terms of running time and power while
maintaining least cost. HSP is one of the crucial steps in embedded system design.
Satisfaction of performance requirements for embedded systems can frequently be
achieved only by hardware implementation of some parts of the application. Selec-
tion of the appropriate parts of the system for hardware and software implementation
has a crucial impact both on the cost and overall performance of the final product.
At the same time, hardware area minimization and latency constraints present con-
tradictory objectives to be achieved through hardware/software partitioning.
Most of the existing approaches to HSP are based on either hardware-oriented
partitioning or software-oriented partitioning. A software-oriented approach means
that initially the whole application is allotted to software and during partitioning
system parts are moved to hardware until constraints are met. On the other hand,
in a hardware-oriented approach the whole application is implemented in hardware
and during partitioning the parts are moved to software until constraints are viola-
ted. A software-oriented approach has been proposed by Ernst et al. [1], Vahid et
al. [2]. Hardware-oriented approach has been proposed in Gupta et al. [3], Niemann
et al. [4]. In [5], the authors proposed a flexible granularity approach for hardware
software partitioning. Karam et al. [6], propose partitioning schemes for transfor-
mative applications, i.e. multimedia and digital signal processing applications. The
authors try to optimize the number of pipeline stages and memory required for
pipelining. The partitioning is done in an iterative manner. Rakhmatov et al. [7]
modeled the hardware/software partitioning as a unconstrained bipartitioning prob-
lem. Cost functions were used to model the computation and the communication
costs. The disadvantage in a hardware oriented partitioning is that the partitioning
process is stopped as soon as constraints are met. This can easily result in a non-
optimal solution. Similarly for software oriented partitioning the algorithm can stop
in a local minimum.
Our proposed method is different in the sense that we do not start with a all
hardware or all software solution; we rather model the partitioning problem as some
standard knapsack problem, which can be solved independently to arrive at the solu-
tion. Also for every subproblem we calculate the lower and upper bounds, this helps
in rejecting subproblems, which are not expected to give optimal results. Hence
not all subproblems need to be solved. This paper is the first work to solve hard-
ware/software partitioning problem using the knapsack problem. The advantage of
our work is that it provides the optimal solution. Moreover, many problems are
rejected based on their lower bound and upper bound, and this reduces the number
of subproblems that need to be solved; hence the algorithm is quite fast.
The outline of this paper is as follows: In Section 2 we give the mathemati-
cal model of the hardware/software partitioning problem. In Section 3 we show
how to split the partitioning problem into some standard independent 0-1 knapsack
Knapsack Model and Algorithm for Hardware/Software Partitioning Problem 559
problems, and then we describe the proposed algorithm. In Section 4 we give our
proposed algorithm. In Section 5 we give our experimental results. In Section 6 we
give the conclusion.
2 MODEL OF THE PHYSICAL PROBLEM
We consider a basic case which can be later extended. In our case the application can
be broken down into parts such that each of them can be run simultaneously or in
other words the parts do not have any sort of data dependency between them. So we
have a set of items S = {p1, p2, . . . , pn} to be partitioned into hardware and software.
Let hi and si be the time required for the part pi to be run in hardware and software,
respectively. Also let ai be the area required for hardware implementation of part pi,
and let A be the total area available for hardware implementation. Our goal is to
allot each part into hardware and software so that the combined running time of the
whole application is minimized while the area constraint is satisfied. Let us denote
the solution of the problems as a vector X = [x1, x2, . . . , xn] such that xi ∈ {0, 1},
where xi = 0 (1) implies that the part pi is implemented in software (hardware).
Since the hardware and software can be run in parallel, the total running time of
the application is given by
T (X) = max{H(X), S(X)} (1)
where H(X) is the total running time of the parts running in hardware and S(X) is
the total running time of the parts in software. Since all the parts that are imple-
mented in hardware can be run in parallel to each other and all the software parts
has to be run in serial, we have
H(X) = max
1≤i≤n




(1− xi) · si.










xi · ai ≤ A
(2)
3 PROBLEM SPLITTING
First of all, we review the knapsack problem. Given a knapsack capacity C and
set of items S = {1, . . . , n}, where each item has a weight wi and a benefit bi.
The problem is to find a subset S ′ ⊂ S, that maximizes the total profit
∑
i∈S′ bi
under the constraint that
∑
i∈S′ wi ≤ C, i.e., all the items fit in a knapsack of
carrying capacity C. This problem is called the knapsack problem. The 0-1 knapsack
problem is a special case of the general knapsack problem defined above, where each
560 A. Ray, W. Jigang, T. Srikanthan
item can either be selected or not selected, but cannot be selected fractionally.




















wi · xi ≤ C,
xi ∈ {0, 1}, i = 1, . . . n
(3)
where xi is a binary variable equalling 1 if item i should be included in the knapsack
and 0 otherwise. It is well known that this problem is NP-complete[8]. However,
several large scaled instances could be solved optimally in fractions of a second in
spite of the exponential worst-case solution time of all knapsack algorithms [8, 9, 10].
Let us assume that the items are ordered by their efficiencies in a non-increasing
















wi j = 1, 2, . . . , n. (6)
Packing a knapsack in a greedy way means to put the items in the decreasing
order of their efficiency as long as wj ≤ C − wj−1 i.e., as long as the next item fits
in the unused capacity of the knapsack. According to the definition given in [8], the
break item is the first item, which cannot be included in the knapsack. Thus the
break item t satisfies
wt−1 ≤ C ≤ wt (7)
Let the residual capacity r be defined as
r = C − wt−1. (8)
By linear relaxation, [8] showed that an upper bound on the total benefit
of 0-1 KP is




and the lower bound is given by,
l = bt−1. (10)
In HSP problem, sort all the items p1, p2, . . . , pn in decreasing order of their
hardware running time. We are sorting so that if we allocate the item with the
highest hardware running time, the running time of the whole hardware part is
Knapsack Model and Algorithm for Hardware/Software Partitioning Problem 561
fixed to its hardware running time. This is because all the hardware parts can be
run in parallel, so that after sorting we have the items ordered as p′1, p
′





i) be the time required for the part p
′
i to be run in hardware(software). Thus,
after sorting the following condition is satisfied
h′i ≥ h
′
j for all i ≤ j and 0 ≤ i, j ≤ n. (11)




s′i and Ri = ST − s
′
i. Now we split the problem P into the
following n subproblems P1,P2, · · · ,Pn.
3.1 Subproblem P1
Let p′1 be implemented in hardware, i.e., x1 = 1, so the total time T that we have
to minimize becomes
























































Our goal is to minimize the total running time T (X), i.e.,























xi · ai ≤ A− a1. (12)
562 A. Ray, W. Jigang, T. Srikanthan


















xi · ai ≤ A− a1.
(13)
It is clear that P1 is the standard 0-1 knapsack problem, and the solution of P1
is a feasible solution of the problem P . Let
L1 = max{h
′
1, R1 − u1},
U1 = max{h
′
1, R1 − l1},
where l1 and u1 are the lower bound and the upper bound on P1, respectively.
We call [L1, U1] the bounded interval of P1 in the sense that the optimal solution
of P1 would lie in the range [L1, U1].
Similarly we have subproblem Pk for k > 1.
3.2 Subproblem Pk
We fix p′k to be implemented in hardware i.e., xk = 1, and all the items 1, 2, . . . , k−1
are in software, because if any of them, say j, is in hardware then any subproblem Pl
such that l > j is a subset of subproblem Pj. That is, we have x1 = 0, x2 =
0, . . . , xk−1 = 0. The total time is















We have to minimize the total running time T (X), i.e.,































xi · ai ≤ A− ak. (14)




















xi · ai ≤ A− ak.
(15)
Knapsack Model and Algorithm for Hardware/Software Partitioning Problem 563
The bounded intervals of subproblem Pk are
Lk = max{h
′
k, Rk − uk},
Uk = max{h
′
k, Rk − lk},
and the optimal solution of Pk would lie in the range [Lk, Uk] where lk and uk are
the lower bound and the upper bound of total benefit of Pk, respectively.
Theorem 1. The optimal solution of P is the solution Xk of Pk, which gives the
maximum benefit amongst all the solutions of subproblems Pi, 1 ≤ i ≤ n.
To prove the above, let Xi be the solution of the subproblem Pi, i = 1, 2, . . . , n. We
should prove that
1. Any Xi is a feasible solution of the problem.
2. Any optimal solution of the problem P belongs to {X1, X2, . . . , Xn}.
Proof.
1. Each of Pi, i = 1, 2, . . . , n is formed from the original problem P by fixing the
part i as being included in the knapsack; for each Pi, the capacity is Ai = A−ai.
Hence every optimal solution of Pi is a feasible solution of P with part i in the
knapsack.
2. Let X be any feasible solution of P , and let hi be the part with the highest
hardware running amongst the parts included in the knapsack. Since part i is
included in the knapsack, it is one of the solutions of the subproblem Pi.
2
Theoretically, we can create as many subproblems as there are items to be
partitioned. However, a closer look at the derivations indicates that we need not
create n subproblems if we have n items to be partitioned, for example, if after




aj ≤ A− ai. (16)
This means that after we have fixed items ai to be implemented into hardware
and a1, a2, . . . , ai−1 into software, the rest of the items left to be partitioned can be
implemented easily in hardware as there is enough hardware space left. In most of
the cases, hardware runs faster than software and also the items in hardware can
run in parallel. So we want to implement as much as possible into hardware. Thus,
since enough space is available, we need not solve the subproblem to find a partition
as all the remaining items can now be implemented into hardware. Hence, we can
stop creating more subproblems as soon as equation (16) is satisfied.
A point to be noted is that all subproblems are not of the same size. This implies
from the fact that for the first problem we have fixed item 1 to be implemented in
564 A. Ray, W. Jigang, T. Srikanthan
hardware and the rest of the items are not known whether they are in hardware or
software. However, in the second problem we have fixed item 2 to be implemented
in hardware and item 1 is no longer implemented in hardware. This is because if
in the second problem item 1 is implemented in hardware then it becomes a subset
of subproblem one. Similarly for subproblem i all the items 1, 2, . . . , i − 1 are au-
tomatically fixed to be in software. Hence, the size of the problem decreases from
subproblem 1 to n.
Similarly we can create subproblems for each of the items p1, p2, . . . , pn. The
optimal solutions obtained for each of the subproblems will be different from each
other, but the optimal solution for the whole problem is the optimal solution from
only one of the subproblems. The best solution of all the subproblems in this case
will be the optimal solution for the original problem.
4 ALGORITHM DESCRIPTION
Following is a brief description of the proposed algorithm. The first step is to
sort all the items that are to be partitioned in decreasing order of their hardware
running time. Then, create a subproblem is corresponding to each of the items to be
partitioned, i.e., there are n subproblems to be solved. For each of the subproblems
calculate the upper bound and lower bound on the benefits. Find the subproblem
with the highest lower bound, lbmax. All subproblems whose upper bound is smaller
than lbmax is guaranteed to give a solution whose benefit is less than lbmax. Hence
all such subproblems can be ignored and need not be solved. The subproblem with
the maximum lower bound should be solved first. Then the subproblems whose
upper bound is smaller than the solution should be rejected. These steps should
be repeated until all subproblems have been solved or rejected. The outline of the
algorithm for solving the HSP problem is given below:
1: BOUND := 0;
2: sort all the items to be partitioned in decreasing order of
their hardware running time;
3: form the subproblems P(i), i = 1, 2, . . . , n;
4: for (i := 1; i ≤ n; i++ ){
5: calculate the upper bound U(i) and the lower bound L(i) for P(i);
6: if (L(i) > BOUND) {
7: BOUND := L(i);
8: }
9: }
10: while (there are subproblems left to be solved) {
11: select the subproblem with the highest lower bound;
12: if (U(i) < BOUND) {
13: reject this subproblem;
14: } else {
15: solve this subproblem;
Knapsack Model and Algorithm for Hardware/Software Partitioning Problem 565
16: B(i) :=benifit of the above solution;
17: if (B(i) > BOUND){





The proposed algorithm was implemented on a Pentium, 500MHz system, run-
ning on Linux. We used random data for partitioning. For solving the individual
0-1 knapsack problems we used the algorithm for 0-1 knapsack problem, given in [9].
All results are the average to 1000 runs of the algorithm. The experiment was per-
formed for different problem sizes and area constraints.
Table 1 shows the execution time of our algorithm for different problem sizes.
In this table, the columns denote the different values for the area constraint. Spe-
cifically, we have used values 10, 20, . . . , 90 percent of the total area required
if all the items were to be implemented in hardware. The different rows are for
different problem sizes. For example, for a problem size of 60 and an area constraint
of 30 percent of total area required if all the 60 parts are to be implemented in
hardware, we have a running time of 0.195ms (from Table 1).
size fraction of area put as constraint
n 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
30 0.156 0.461 0.180 0.180 0.180 0.102 0.117 0.102 0.117
60 0.203 0.211 0.195 0.195 0.188 0.117 0.195 0.164 0.109
90 0.125 0.195 0.211 0.188 0.203 0.203 0.188 0.203 0.180
300 0.312 0.336 0.375 0.281 0.352 0.383 0.344 0.359 0.289
500 0.555 0.578 0.594 0.547 0.617 0.508 0.594 0.516 0.516
700 0.906 0.906 0.891 0.945 0.906 0.930 0.859 0.852 0.883
1000 1.688 1.852 1.695 1.633 1.625 1.688 1.758 1.680 1.672
2000 8.078 8.008 8.086 8.078 8.133 7.969 7.977 8.000 7.914
3000 22.102 22.172 22.406 22.141 22.688 22.203 22.195 22.281 22.242
Table 1. Running time (ms) of the algorithm for different problem sizes
Table 2 gives a count of the number of subproblems that needed to be solved to
arrive at the optimal solution for the whole problem. For example, for a problem
size of 30 and area constraint of 20 percent of total area required if all the 30 parts
were to be implemented in hardware, the algorithm needed to solve 3 subproblems
to arrive at the optimal solution. The result shows that we need to solve very
few subproblems to arrive at the solution. In fact only for one instance we had to
solve three subproblems to arrive at the solution, while the rest needed only one
subproblem to be solved.
566 A. Ray, W. Jigang, T. Srikanthan
size fraction of area put as constraint
n 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
30 1 3 1 1 1 1 1 1 1
60 1 1 1 1 1 1 1 1 1
90 1 1 1 1 1 1 1 1 1
300 1 1 1 1 1 1 1 1 1
500 1 1 1 1 1 1 1 1 1
700 1 1 1 1 1 1 1 1 1
1000 1 1 1 1 1 1 1 1 1
2000 1 1 1 1 1 1 1 1 1
3000 1 1 1 1 1 1 1 1 1
Table 2. The number of subproblems solved for different problem size and area constraints
We also provide a plot of running time of the algorithm for the different problem
sizes (Figure 1). We used the area constraint as 50 percent of the area required if
all parts were to be implemented in hardware. Y-axis denotes the running time in


















Fig. 1. Running time vs. Problem size
From the results, we can see that, for small problem size (e.g. 30, 60, 90), the
execution time fluctuates. With increase in size the running time increases. This
correctly reflects the property of the knapsack problem. This can also be seen from
the graph (Figure 1). Also for the same problem size the execution time is stable
for different area constraint. This is because the number of subproblems solved is 1
for almost all the cases (see Table 2). Hence, since the same numbers of same sized
subproblems are being solved for all the cases, the execution time is similar.
Knapsack Model and Algorithm for Hardware/Software Partitioning Problem 567
6 CONCLUSION
We have proposed an efficient hardware/software partitioning technique based on
the knapsack model. It was shown that by examining the upper and lower bounds
of the subproblems, we could rapidly eliminate the large number of subproblems
that do not contribute to optimal solutions. Our investigations demonstrate that
a substantial reduction in the number of subproblems that require processing is
possible, thereby providing for an efficient means to partitioning of hardware and
software. This is of particular significance when the problem size is large. Our
simulations based on problems size ranging from 30 to 3000 confirm that the number
of subproblems require solution is one except for one case where it was increased
to just 3 in order to compute the optimal solution. We are currently extending
our knapsack model based approach to include communication overheads so as to
represent a more accurate partitioning scheme.
Acknowledgement
A part of the work titled “Knapsack Model and Algorithm for HW/SW Parti-
tioning Problem” appeared in the 4th International Conference in Computational
Sciences(ICCS), 2004, Lecture Notes in Conputer Science 3036, pp. 200–205.
REFERENCES
[1] Ernst, R.—Henkel, J.—Benner, T.: Hardware-Software Cosynthesis for Micro-
controllers. IEEE Design and Test of Computers, 1993, pp. 64–75.
[2] Vahid, F.—Gajski, D. D.—Jong, J.: A Binary-Constraint Search Algorithm for
Minimizing Hardware During Hardware/Software Partitioning. IEEE/ACM Proceed-
ings European Conference on Design Automation (EuroDAC), 1994, pp. 214–219.
[3] Gupta, R. K.—Micheli, G. D.: System-Level Synthesis Using Reprogrammable
Components. Proceedings. [3rd] European Conference on Design Automation, 1992,
pp. 2–7.
[4] Niemann, R.—Marwedel, P.: Hardware/Software Partitioning Using Integer Pro-
gramming. Proceedings European Design and Test Conference, 1996. ED&TC 96,
pp. 473–479.
[5] Henkel, J.—Ernst, R.: An Approach to Automated Hardware/Software Parti-
tioning Using a Flexible Granularity That Is Driven by High-Level Estimation Tech-
nique. Very Large Scale Integration (VLSI) Systems, IEEE Transactions, Vol. 9, 2001,
No. 2, pp. 273–289.
[6] Chatha, K. S.—Vemuri, R.: Hardware-Software Partitioning and Pipelined
Scheduling of Transformative Applications. Very Large Scale Integration (VLSI) Sys-
tems, IEEE Transactions, Vol. 10, 2002, No. 3, pp. 193–208.
568 A. Ray, W. Jigang, T. Srikanthan
[7] Rakhmatov, D. N.—Vrudhula, S. B. K.: Hardware-Software Bipartitioning
for Dynamically Reconfigurable Systems. Hardware/Software Codesign, 2002. Codes
2002. Proceedings of the Tenth International Symposium on, pp. 145–150.
[8] Pisinger, D.: Algorithms for Knapsack Problems. Ph.D. Thesis, 1995, pp. 1–200.
[9] Pisinger, D.: A Minimal Algorithm for the 0-1 Knapsack Problem. Operations
Research, 1997, pp. 758–767.
[10] Jigang, W.—Yunfei, L.—Schroeder, H.: A Minimal Reduction Approach
for the Collapsing Knapsack Problem. Computing and Informatics, Vol. 20, 2001,
pp. 359–369.
Abhijit Ray received his B. Tech (Bachelor of Technology) de-
gree in Electrical Engineering from Regional Engineering Col-
lege, Kurukshetra, India (now National Institute of Technology,
Kurkshetra) in 1998. He is currently pursuing his PhD in Com-
puter Engineering at Nanyang Technological University (Singa-
pore). His research interests are in processor selection, hard-
ware/software co-design.
Wu Jigang received his B.S. degree in computational mathe-
matics from Lanzhou University (China) in 1983. He received his
PhD in computer software and theory from University of Science
and Technology of China. He was successively an assistant pro-
fessor, lecturer in Lanzhou University from 1983 to 1993. He was
an associate professor in Yantai University (China) from 1993
to 2000. He has been with Nanyang Technological University
(NTU), Singapore, since 2000. He is currently a research fel-
low in Centre for High Performance Embedded Systems, NTU.
Dr. Wu has published more than 70 technical papers. His re-
search interests include hardware/software co-design, reconfigurable computing and pa-
rallel computing.
Knapsack Model and Algorithm for Hardware/Software Partitioning Problem 569
Thambipillai Srikanthan has been with Nanyang Techno-
logical University (NTU), Singapore, since 1991, where he holds
a joint appointment as Associate Professor and Director of the
Centre for High Performance Embedded Systems. He received
his B.Sc. (Hons) in Computer and Control Systems and PhD
in System Modelling and Information Systems Engineering from
Coventry University, United Kingdom. His research interests
include system integration methodologies, architectural transla-
tions of compute intensive algorithms, high-speed techniques for
image processing and dynamic routing. Dr. Srikanthan has pub-
lished more than 180 technical papers and has served a number of administrative roles
during his academic career. He is the founder and Director of the Centre for High Perfor-
mance Embedded Systems, which is now a University level research centre at NTU. He is
a corporate member of the IEE and a senior member of the IEEE.
