Abstract. This paper presents a hardware/software (HW/SW) codesign framework (TECSCE) which enables software developers to easily design complex embedded systems such as massive data-parallel systems. TECSCE is implemented by integrating TECS and SCE: TECS is a component technology for embedded software, and SCE provides an environment for system-on-a-chip designs. Since TECS is based on standard C language, it allows the developers to start the design process easily and fast. SCE is a rapid design exploration tool capable of efficient MPSoC implementation. TECSCE utilizes all these advantages since it supports transformation from component descriptions and component sources to SpecC specification, and lets the developers decide data partitioning and parallelization at a software component level. Moreover, TECSCE effectively duplicates software components, depending on their degree of data parallelizing, to generate multiple SpecC specification models. An application for creating a panoramic image removing objects, such as people, is illustrated as a case study. The evaluation of the case study demonstrates the effectiveness of the proposed framework.
Introduction
Increasing complexities of embedded system and strict schedules in time-tomarket are critical issues in the today's system-level design. Currently, various embedded systems incorporate multimedia applications, which are required more and more complex functionalities. Meanwhile, the semiconductor technology progress has placed a great amount of hardware resources on one chip, enabling to implement more functionalities as hardware in order to realize efficient systems. This widens design space to be explored and makes system-level designs further complicated -to improve the design productivity, designing systems at a higher abstraction level is necessary [1] .
Hardware/software (HW/SW) codesign of these systems mainly relies on the following challenging issues: (1) data parallelism to improve performance, (2) support for software developers to implement such complicated systems without knowing system-level languages such as SystemC and SpecC, (3) implementation to directly use existing code without modification, and (4) management of communication between functionalities. To the best of our knowledge, there is no work addressing all of the above issues.
This paper presents a system-level framework (TECSCE) to cope with the preceding issues. This framework aims at enabling even software developers to easily design complicated systems such as multimedia applications which are rich in data parallelism. For this, we integrate a component technology for embedded software, TECS (TOPPERS Embedded Component System [2] ), and the systemon-a-chip environment SCE [3] , which is based on SpecC language. Since TECS is based on conventional C language, it allows the developers to start the design process easily and fast. SCE is a rapid design exploration tool capable of efficient MPSoC implementation.
The contribution of this work is to present a system-level design method for software developers to deal with massively parallel embedded systems using TECS. In existing HW/SW codesign technologies, a designer needs to manually add or modify HW/SW communication sources (e.g., their size, direction, and allocator) in input behavioral descriptions, which is complex to specify and errorprone. In contrast, in the proposed framework, the developer can design the overall system at a software component level and has no need to specify the HW/SW communication in the input description because TECS defines the interface between components, and the communication sources are automatically generated. Moreover, a new mechanism of duplicating components realizes data partitioning at the software component level for an effective speedup of the applications.
The rest of this paper is organized as follows. Section 2 explains TECS, SCE, and the overview of the proposed framework. Section 3 depicts a case study of adapting the proposed framework. The evaluation of the case study is shown in Section 4. Related work is described in Section 5. Finally, Section 6 concludes this paper.
TECSCE
In this section, the overviews of TECS, SCE, and a system-level design framework (TECSCE) integrating TECS and SCE are presented.
TECS
In embedded software domains, software component technologies have become popular to improve the productivity [2, 4, 5] . It has many advantages such as increasing reusability, reducing time-to-market, reducing software production cost, and hence, improving productivity [6] .
TECS adopts a static model that statically instantiates and connects components. The attributes of the components and interface sources for connecting the components are statically generated by the interface generator. Furthermore, TECS optimizes the interface sources. Hence, no instantiation overhead is introduced at runtime, and the runtime overhead of the interface code is minimized [7] . Therefore, these attributes of TECS are suitable for system-level designs. Furthermore, in system-level designs, parallelism and pipeline processing should be considered. TECS supports parallelism and pipeline processing on a real-time OS for multi-processors in embedded software [8] . The oneway calling is provided to support the parallelism. It means that a caller component does not need to wait until a callee component finishes executing. At a software level for multiprocessors environment, the parallelism has been already supported in TECS. Therefore, it is possible to adapt the feature for system-level designs.
Component Model in TECS A cell is an instance of component in TECS.
Cells are properly connected in order to develop an appropriate application. A cell has entry port and call port interfaces. The entry port is an interface to provide services (functions) to other cells. Each service of the entry port called the entry function is implemented in C language. The call port is an interface to use the services of other cells. A cell communicates in this environment through these interfaces. To distinguish call ports of caller cells, an entry port array is used. A subscript is utilized to identify the entry port array. A developer decides the size of an entry port array. The entry port and the call port have signatures (sets of services). A signature is the definition of interfaces in a cell. A celltype is the definition of a cell, as well as the Class of an object-oriented language. A cell is an entity of a celltype. Figure 1 shows an example of a component diagram. Each rectangle represents a cell. The dual rectangle depicts a active cell that is the entry point of a program such as a task and an interrupt handler. The left cell is a TaskA cell, and the right cell is a B cell. Here, each of tTask and tB represents the celltype name. The triangle in the B cell depicts an entry port. The connection of the entry port in the cells describes a call port.
Component Description in TECS The description of a component in TECS can be classified into three descriptions: a signature description, a celltype description, and a build description. An example for component descriptions is presented in Section 3 to briefly explain these three descriptions 5 .
SCE
SCE implements a top-down system design flow based on a specify-explore-refine paradigm with support for heterogeneous target platforms consisting of custom hardware components, embedded software processors, dedicated IP blocks, and complex communication bus architectures. The rest of features and design flow is explained in the next subsection. Figure 2 represents the design flow using the proposed framework. The circled numbers in Figure 2 represent the order of design steps.
Overview of TECSCE
-Step1: A framework user (hereafter, a developer) defines signatures (interface definitions) and celltype (component definitions). -Step2: The developer implements celltype source (component source code) in C language. They can use the template code based on signatures and celltype descriptions. -Step3: The developer describes an application structure including definitions of cells (instances of component) and the connection between cells. In this step, the developer decides the degree of data partitioning. If it is possible to use existing source code (i.e., legacy code), the developer can start from Step3. -Step4: The SpecC specification model based on the component description, including definitions of behaviors and channels, is generated by a TECS generator. The specification model is a functional and abstract model that is free of any implementation details. -Step5: The designer can automatically generate system models (Transactionlevel models) based on design decisions (i.e. mapping the behaviors of the specification model onto the allocated PEs). -Step6: The hardware and software parts in the system model are implemented by hardware and software synthesis phases, respectively. SCE supports generating a new model by integrating the design decisions into the previous model.
Case study for proposed framework
In this section, the proposed framework is explained through a case study. First, a target application is described. Then, two kinds of mechanism to generate specification models (Step4 in Figure 2 ) are depicted.
Target Application
The target application named MovingObjectRemoral for a case study of the framework is an application for generating a panoramic image removing objects, such as people. In the panoramic image view system, such as Google Street View, a user can see images from the street using omnidirectional images. Figure  3 illustrates the target application. The application creates the image without people as shown in the right image of Figure 3 based on the algorithm [9] by using a set of panoramic images which are taken at the same position. Since creating an image by removing obstacles needs a number of original images, each of which has too many pixels, the original program is designed only for off-line use. Because the output image depends on the place and environment, we do not know how many source images are needed to create the output image. Therefore, currently, we need enormously long time to take images at each place. Our final goal is to create the output image in real-time by using our framework. Figure 4 shows a TECS component diagram for the target application. Each rectangle represents a cell which is a component in TECS. The left, middle, and right cells are a Reader cell, an MOR (MovingObjectRemoral) cell, and a Writer cell, respectively. The Reader cell reads image files, slices the image, and sends the sliced image data to the MOR cells. The MOR cell collects background colors (RGB) of each pixel based on the input images. The Writer cell creates the final image based on the data collected by the MOR cell. Here, tReader, tMOR, and tWriter represent the celltype name. Figure 5 shows a signature description between tReader and tMOR, and between tMOR and tWriter. The signature description is used to define a set of function heads. A signature name, such as sSliceImage, follows a signature keyword to define the signature. The initial character ("s") of the signature name sSliceImage represents the signature. A set of function heads is enumerated in the body of this keyword. TECS provides the in, out, and inout keywords to distinguish whether a parameter is an input and/or an output. The in keyword is used to transfer data from a caller cell to a callee cell. The oneway keyword means that a caller cell does not need to wait for finishing a callee cell. Namely, the oneway keyword is useful when a caller cell and a callee cell are executed in parallel. Figure 6 describes a celltype description. The celltype description is used to define the entry ports, call ports, attributes, and variables of each celltype. The singleton keyword (Line 1 in Figure 6 ) represents that a singleton celltype is a particular cell, only one of which exists in a system to reduce the overhead. The active keyword (Line 1 in Figure 6 ) represents the entry point of a program such as a task and an interrupt handler. A celltype name, such as tReader, follows a celltype keyword to define celltype. The initial character ("t") of the celltype name tReader represents the celltype. To declare an entry port, an entry keyword is used (Line 6 and 9 in Figure 6 ). Two words follow the entry keyword: a signature name, such as sSliceImage, and an entry port name, such as eSliceImage. The initial character ("e") of the entry port name eSliceImage represents an entry port. Likewise, to declare a call port, a call keyword is used (Line 3 and 10 in Figure 6 ). The initial character ("c") of the call port name cSliceImage represents a call port.
TECS components for the target application
The attr and var keywords that are used to increase the number of different cells are attached to the celltype and are initialized when each cell is created. The set of attributes or variables is enumerated in the body of these keywords. These keywords can be omitted when a celltype does not have an attribute and/or a variable. Figure 7 shows a build description. The build description is used to declare cells and to connect between cells for constructing an application. To declare a cell, the cell keyword is used. Two words follow the cell keyword: a celltype name, such as tReader, and a cell name, such as Reader (Lines 3-5, Lines 7-9, and Lines 10-11 in Figure 7 ). In this case, eSliceImage (entry port name) of MOR 000 (cell name) is connected to cSliceImage (call port name) of Reader (cell name). The signatures of the call port and the entry port must be the same in order to connect the cells. Figure 2 ), the proposed framework realizes data partitioning. A new plugin named cellPlugin is proposed to duplicate cells for data partitioning and connect the cells. There are two types of cellPlugin: RepeatCellPlugin and RepeatJoinPlugin.
cellPlugin At the component level (Step 3 in
RepeatCellPlugin supports duplication of cells depending on the slice count i.e. the number of data partitions. and connection between the call port of the duplicated cells and the entry ports of the connected cell in the original build description (Line 6 in Figure 7 ). RepeatJoinPlugin provides connection between the call port of the duplicated cells generated by RepeatCellPlugin (Line 2 in Figure 7) . Note that it is easy to duplicate MOR cells for realizing data partitioning and parallelization as shown in Figure 4 .
cd2specc
In this subsection, policies of transformation from a component description to a specification model in SpecC language are described. A basic policy of transformation is that a cell and an argument of function of signature correspond to a behavior and a channel in SpecC language, respectively. The tReader, tMOR, and tWriter celltypes correspond to tReader, tMOR, tWriter behaviors generated by cd2specc, respectively. The following pseudo code describes the examples of generated SpecC code. Pseudo Code 1 shows a tMOR behavior. If a behavior has an entry function, the behavior receives parameters to call the entry function. In this case, tMOR behavior receives sliced images by using channels to call entry function (eSliceImage sendBlock) in Pseudo Code 1. Although there are several ways to realize tMOR, here we show in the pseudo code an algorithm to do so easily. This is often used for sorting algorithm based on brightness of each pixel to find the background color for each pixel. In this case, the brighter color depending on the rate value (Line 12 in Figure 6 ) is selected.
Pseudo Code 1 tMOR behavior
A SpecC program starts with execution of the main function of the root behavior which is named Main as shown in Pseudo Code 2. The roles of the main behavior are instantiation of behaviors, initialization of channels, connection of channels between behaviors, and management of execution of the other behaviors.
All behavioral synthesis tools typically do not support all possible C language constructs, such as recursion, dynamic memory allocation. Thus, TECS component source obeying these restrictions can be synthesized. Since recursion and dynamic memory allocation are not usually used for embedded software, these restrictions are not critical. Figure 8 shows a specification model of SpecC language when slice count is two. The model consists of four behaviors and four communication channels. The numbers of channels and MOR instances are depended on the number of slice count.
Evaluation
For the experimental evaluation of the TECSCE design flow, we used the application described in Section 3 to show effectiveness of cellPlugin and cd2specc for improving design productivity.
First, we measured the number of lines of each component description generated by cellPlugin and each SpecC code generated by cd2specc. The values in Figure 9 represent the total number of lines of generated code. When the number of data partitioning is zero, the value shows the lines of common code, e.g., def- initions of channel types, template code of behaviors, and implementation code based on entry functions. As can be seen from Figure 9 , the lines of the code proportionally grow to slice count. In TECSCE, the developers only change the parameter for slice count in order to manage the data partitioning. The results indicate that the communication code between behaviors have a significant impact on productivity. Therefore, it can be concluded that cellPlugin and cd2specc are useful, particularly for large slice count. Next, we evaluated four algorithms to realize the MOR: Bubble, Insert, Average, and Bucket. Bubble is a basic algorithm for MOR based on a bubble sort to decide the background color. Insert is based on an insertion sort. Average is assumed that the background color is the average color value. Bucket is based on a bucket sort.
Each MOR behavior was mapped onto different cores based on ARM7TDMI (100MHz). The execution time of processing 50 images with 128x128 pixels on every core is measured when slice count was eight. An ISS (Instruction Set Simulator) supported by SCE was used to measure the cycle counts for estimation of the execution time. Table 1 shows the results of execution time for each core when slice count is eight. These results indicate that the generated SpecC descriptions are accurately simulatable.
All of the series of images are not necessary to collect the background color for the target application because the series of images are almost the same. Therefore, if a few input images can be obtained per second, it is enough to generate the output image. In our experiments, two images per second were enough to generate an output one. It is possible to use this application in realtime when each input image with 256x512 is used on this configuration (eight cores, ARM 100MHz, and Bucket algorithm). If the developers want to deal with bigger images in real-time, there are several options: to use higher clock frequency, to increase the number of data partitioning, to use hardware IPs, and so forth.
Related Work
HW/SW codesign frameworks have been studied for more than a decade.
Daedalus [10] framework supports a codesign for multimedia systems. It starts from a sequential program in C, and converts the sequential program into a parallel KPN (Kahn Process Network) specification through a KPNgen tool.
SystemBuilder [11] is a codsign tool which automatically synthesizes target implementation of a system from a functional description. It starts with system specification in C language, in which a designer manually specifies the system functionalities as a set of concurrent processes communicating with each other through channels.
SystemCoDesigner [12] supports a fast design space exploration and rapid prototyping of behavioral SystemC models by using an actor-oriented approach.
The system-on-chip environment (SCE) [3] is based on the influential SpecC language and methodology. SCE implements a top-down system design flow based on a specify-explore-refine paradigm with support for heterogeneous target platforms consisting of custom hardware components, embedded software processors, dedicated IP blocks, and complex communication bus architectures.
System-level designs based UML [13, 14] are proposed to improve the productivity. One [13] is for exploring partial and dynamic reconfiguration of modern FPGAs. The other [14] is for closing the gap between UML-based modeling and SystemC-based simulation.
To the best of our knowledge, there is no work addressing all of the issues mentioned in Section 1. TECSCE solve all of the issues since cd2specc, which a part of TECSCE, makes the overall system at a software component level in order to hide the many implementation details such as communication between functionalities. The framework users do not need to specify the HW/SW communication in the input description because the communication sources are automatically generated from component descriptions TECS specifically defines the interface between components. Therefore, TECSCE realizes that existing code can be used without modification and without knowing system-level languages such as SystemC and SpecC. Moreover, cellPlugin, which is a part of TECSCE, supports that duplication of components realizes data partitioning at a component level for an effective speedup of the applications.
Conclusions
This paper proposed a new codesign framework integrating TECS and SCE, which enables software developers to deal with massive parallel computing for multimedia embedded systems. The advantage of our framework is that developers can directly exploit software components for system-level design without modifying input C sources (component sources). Moreover, since TECS supports data partitioning and SCE supports MPSoCs as target architectures, our framework can deal with more complex applications (such as MOR) and can help parallelize them for efficient implementation. The evaluation demonstrated the effectiveness of the proposed framework including cellPlugin and cd2specc and the capability of operating the MOR application in real-time. Furthermore, almost all multimedia applications can be adapted to the same model of our framework. cellPlugin and cd2specc are open-source software, and will be avaibale to download from the website at [15] .
