Abstract-The complexity of hardware/software codesign of embedded real-time signal processing systems can be reduced by rapid system prototyping (RSP). However, existing RSP frameworks do not provide a sound specification and design methodology (SDM) because they require the designer to choose the implementation target before specification and design exploration and they do not work together coherently across development stages. This paper presents a new SDM, called MAGIC, that allows the designer to capture an executable specification model for use in design exploration to find the optimal multiprocessor technology before committing to that technology. MAGIC uses a technique called "virtual benchmarking," for early validation of promising architectures. The MAGIC SDM also exploits emerging open-standards computation and communication middleware to establish model continuity between RSP frameworks. This methodology has been validated through the specification and design of a moderately complex system representative of the signal processing domain: the RASSP Synthetic Aperture Radar benchmark. In this case study, MAGIC achieves three orders of magnitude speedup over existing virtual prototyping approaches and demonstrates the ability to evaluate competitive technologies prior to implementation. Transfer of this methodology to the system-on-a-chip domain using Cadence's Virtual Component Codesign infrastructure is also discussed with promising results.
1. Lack of model continuity. Existing frameworks tend to support only one phase of the development process. They do not work together coherently, i.e., allowing the output of a framework used in one phase to be consumed by a different framework used in the next phase. This lack of coherence usually leads to errors that are not caught until well into the implementation phase. Since the cost of redesign increases as the design moves through development stages, redesign is most expensive when performed in the implementation phase, thus making the current incoherent methodology costly [1] . Designs targeting COTS MP technologies can be improved by providing a coherent coupling between these frameworks, a quality known as "model continuity."
2. Premature commitment to implementation technology before design exploration. In the codesign of embedded MP systems, many design decisions have to be made, including the choice of hardware technology, distribution topology, software algorithms and processes, and software to hardware mapping. Current RSP frameworks require the designer to prematurely choose the hardware implementation target (i.e., COTS multiprocessor hardware technology and software process distribution topology) before exploring the full design space of hardware and software options. Thus, competitive hardware technologies are not evaluated as part of the early design space evaluation. Instead, one technology and topology is prematurely chosen and its effects are not evaluated until the implementation phase, where it is too late and too costly to reconsider another option. The negative effect is that important constraints are neglected until too late in the design process.
We have developed a new SDM known as MAGIC that allows the designer to capture the specification in an executable model that can then be used in design exploration to find the optimal COTS MP technology before committing to that technology. The design exploration uses "virtual benchmarking" based on performance modeling to determine the optimum software/hardware configuration for a given technology. Performance modeling is the simulation of a processor network that accounts for interprocessor communication (IPC) delays in addition to the data movement overheads and computation delays. The data movement and computation delays are deterministic and determined through experimentation, i.e., benchmarking. The IPC delays are dependent upon the processor topology and architecture, and can be simulated at the hardware level using pre-existing hardware description models (e.g., VHDL 1 ) of well-defined and open standard IPC connections. Consequently, candidate architectures can be virtually benchmarked with a high degree of confidence because a combination of measured experimental data and well-defined IPC models are used. This provides early validation of promising architectures and the ability to determine the optimal architecture with confidence. After the optimal architecture is determined, the RSP frameworks can then consume the design model to generate deployable code, configuration files, and runtime scripts.
The MAGIC SDM possesses strong model continuity (the ability to pass an executable model between the phases of specification, design, and implementation), enabled in large measure by emerging open-standard middleware: Vector Scalar Image Processing Library (VSIPL) computation middleware and the Message Passing Interface (MPI) communications middleware. Middleware here refers to an application programmer interface (API) that is supported by multiple vendors with libraries of components for computation and communication.
This paper is organized as follows: Section 2 gives background on the application domain and supporting hardware technologies. This is followed in Section 3 with a design scenario illustrating the state of the practice of specification and design in this domain and highlighting the need for model continuity and early validation through virtual benchmarking. Section 4 describes the MAGIC methodology and Section 5 discusses a case study in which MAGIC is applied to a representative signal processing benchmark to validate the methodology. This case study demonstrates MAGIC's support of model continuity, its efficiency in making early exploration of competitive technologies viable, and its ability to incorporate important technology constraints into early stages of development to more effectively search the space of potential architectures. Ongoing work in transferring this research to the system-ona-chip (SoC) domain is described in Section 6 with promising results and related work is discussed in Section 7.
APPLICATION AND TECHNOLOGY DOMAIN
Codesign methodology is driven and constrained by specific application and technology domains. Domain specificity is critical because different performance and nonperformance constraints will determine what the hardware options will be, while the control and data flow will determine what models of computation [2] , [3] , [4] are appropriate. This in turn determines what languages and tools can be brought to bear [2] , [3] , [4] , [5] . Real-time embedded signal processing is the application space addressed by our methodology along with the technologies required to implement such applications. This includes radar and sonar signal processing, medical imaging, and other applications characterized by high-bandwidth signal input that is on the order of tens to hundreds of megabytes per second and high-throughput computation requirements on the order of tens to hundreds of gigaflops/second. After system initialization, these systems usually have a single data transformational state, but some systems are modal, requiring multiple reactive states. The use of COTS technology in our application domain means using a PCI or VME chassis containing multiprocessor boards with high-speed interprocessor bandwidth and C language support. Table 1 shows the leading technologies used for implementing these systems.
THE NEED FOR MODEL CONTINUITY AND VIRTUAL BENCHMARKING
The lack of model continuity has a variety of negative impacts, as can be seen in the following scenario. Suppose a signal processor is to be implemented with COTS MP hardware and software. Algorithms are developed, modeled, and specified in some pseudocode, perhaps in MATLAB. System constraints (e.g., size, weight, and power) for the processor are specified by a system engineer. In this traditional methodology, there is no way to use this pseudocode in the design analysis phase. The specification is partially executable; the behavioral part of the specification (signal processing algorithms) is typically written in MATLAB, which is executable. However, this partial specification model cannot be used in the design analysis phase (which means there is no model continuity). Due to the absence of system-level design and analysis tools that can provide a model of the overall system behavior and especially interconnectivity constraints, only a few local, low-precision calculations can be made, which are based on published specifications of competing vendors' interprocessor bandwidths and algorithm benchmarks, and adjusted based on experience. A vendor is chosen and design begins. Even though margins have been included via some heuristic rules of thumb, it is only when the detailed design is complete that the designer realizes that the architecture chosen cannot meet system throughput requirements. Unfortunately, in this scenario, the allotted chassis space is already full with a technology that will not meet specifications because the low-precision analysis was unable to predict complex interconnectivity between compute elements used to implement the dataflow model. Despite the project schedule being at great risk, the design process must start over.
If, on the other hand, the requirements model were executable and passed along with nonperformance constraints to a design tool framework, then the system engineer would have been equipped to consider and evaluate alternative architectures and implementation technologies before implementation proceeded. The system engineer could have made certain that alternatives would at least satisfy requirements and can then achieve a nearoptimal design solution. This could have been accomplished before committing the design prematurely to a particular technology that could not satisfy requirements and constraints because the complexity of the design hid subtle technology limitations (such as the latencies resulting from interprocessor communication, which are difficult to predict a priori). Instead, the system engineer would be able to specify the technology, software processes, hardware configuration, and a software-to-hardware mapping.
This unnecessary and costly redesign could have been avoided if only model continuity and the appropriate integration of tools had been present in the engineers' specification and design methodology. Important system information revealed by having an executable specification would not have been lost in the design phase, such as accounting for interprocessor communication and nonperformance constraints. Similarly, important information revealed in the design phase would have been leveraged in the implementation phase, such as software-to-hardware mapping as well as software functions and parameter arguments. The necessary flow of information is illustrated in Fig. 1 .
Unfortunately, the ill-fated design scenario just described is not uncommon. In recent years, both market forces and technological requirements have been driving the design process to limit hardware options to COTS hardware. In the radar signal processing domain, this means using RISC 2 -based and DSP 3 -based multiprocessor boards with high-speed interprocessor bandwidth and C language support. Despite limiting the design space to a finite number of hardware options, the design process has still been challenging. Compressing development cycles increases software development productivity requirements, implicitly requiring that software be portable, so that previous design and development efforts can be reused. This productivity and portability must be achievable without an appreciable loss of system performance.
RSP Frameworks
A partial response to this design challenge of real-time multiprocessor digital signal processing systems has been the development of various graphically-based frameworks, i.e., integrated sets of tools, such as GEDAE, 4 RIPPEN, 5 PGM ACT, 6 and PeakWare for RACE 7 [4] to provide computer-aided system engineering (CASE) support for system implementation [5] , [6] , [7] . In particular, these frameworks offer code generation that reduces the complexity of system configuration and communication coding, a quality known as "complexity control." Yet no one single framework or one single language can cover the entire design process. These RSP frameworks are effective implementation tools since they can generate deployable application code, but are weak in capturing requirements and are difficult to use in exploring architecture design alternatives. Some languages, such as MATLAB, are clear, concise, and unambiguous in capturing computational requirements, but are difficult to convert to distributed parallelized real-time C code.
Ideal COTS MP SDM Flow
Ideally, the information flow for specification and design would occur as shown in Fig. 1 . In this ideal SDM, the executable requirements specification is a representation of the desired system behavior that will execute on and can be manipulated by a computer. It is passed into the design analysis phase and the design model in the form of an executable design specification, which is a complete set of software processes, a hardware configuration description, and a software-to-hardware map. This is then passed into the implementation phase, where the physical implementation occurs.
Current COTS MP SDM Flow
Unfortunately, the current state of practice in this domain (illustrated in Fig. 2) does not support the flow of information that is needed for model continuity. While the different frameworks noted in Section 3.1 have evolved to support the design and rapid system prototyping of signal processing systems using COTS MP technology, they are fundamentally sophisticated software development environments (SDEs) providing a framework for the codesign of a COTS MP-based system. That is, they focus on supporting primarily the detailed stages of design by providing a graphical environment into which to capture the implementation details of software process coding, 2. RISC-Reduced Instruction Set Computing architecture, e.g., the superscalar PowerPC from the Motorola/IBM/Apple consortium.
3. DSP-Digital Signal Processor, e.g., microprocessors tuned for 1D signal processing numeric computation such as the SHARC (super Harvard architecture) from Analog Devices.
GEDAE-Graphical Entry, Distributed Application Environment from
Lockheed Martin ATL.
5. RIPPEN-Real-time Interactive Programming and Processing Environment from ORINCON.
6. PGM ACT-Autocoding Toolset (ACT) using the Processing Graph Method (PGM).
7. PeakWare for RACE-Codesign framework from Mercury Computer Systems that is layered on "Talaris," their configuration middleware.
hardware configuration description, and scripts that launch the specific processes onto their respective hardware nodes. In particular, codesign in this application and technology domain consists of the following:
. Board-level hardware description capture, . Software process and function description capture, . Software-to-hardware mapping, . Generation of code, makefiles, and runtime scripts. Consequently, despite having these RSP frameworks to support codesign, specification and design still suffers from model discontinuity as depicted in Fig. 2 .
THE MAGIC SDM
We have developed design rules and we have integrated commercial prototyping tools in a methodologically unique way that provides a coherent coupling between these frameworks, allowing a correct requirements model to be passed directly into a design analysis environment. Early validation of candidate implementation architectures can be accomplished using virtual benchmarking, i.e., using performance modeling which relies on well-defined hardware models and benchmark data for high-fidelity hardware-software codesign simulations. The new SDM that we have developed is the MAGIC SDM, which stands for the "methodology applying generation, integration, and continuity" [8] , [9] . The origin of this moniker will become clear as we discuss how the MAGIC SDM is used.
Any SDM will start with some natural language text requirements specification document. The goal of SDMs is to go from this inexact document to a design and implementation in a manner that minimizes propagation of specification and/or design errors. We do this with a unique integration of tools guided by sound rules to capture the requirements and then proceed on through a vendorindependent design phase. Without first committing to a vendor, alternate architectures can be considered and an optimum one decided upon. We then take code generated from the specification phase and software-to-hardware mapping determined from the design phase to provide inputs to an implementation framework.
The starting point for specification and design in our domain is the set of computation requirements. These are algorithms and data "specified" by MATLAB code, including different scenarios of inputs and their associated outputs ("test vectors"). The MATLAB code serves well as an input to a framework that can use it to create an executable specification. The scenarios will provide valuable inputs for the generation of test data to be used downstream in the implementation phase. Communication and control requirements typically refer to data I/O rates as well as the signal processor modes and the control signals that determine the processor's mode (state). Processors in our domain have few states; often only two, one state for initialization and setup ("outer loop") and one state for steady state data transformation ("inner loop"). These modes must be defined and described, preferably in an executable model. Constraints include SWAP, latencies, reliability, and other "illities," which are usually tabulated. It would be useful to have these data encapsulated in a fashion that allows us to include their verification during the specification and design iterations. We now redraw the ideal SDM of Fig. 1 as our new SDM, shown in a conceptual diagram of the specification and design flow in Fig. 3 .
Our methodology uses what we have termed "virtual benchmarking." The requirements specification is converted to an executable model, which is translated into computation and communication middleware (industry standard APIs) that is well-characterized on different COTS MP vendors' targets. Computation and communication middleware benchmark data is then used in a performance modeling framework that can provide high fidelity simulation of the nondeterministic MP data traffic during an architectural trade-off evaluation phase-this is what we refer to as the virtual benchmarking of an application for a given configuration. Thus, before committing to a particular COTS MP target, there is a high degree of confidence that the architecture can meet requirements. After arriving at a satisficing architecture, the computation and communication middleware can be consumed by an implementation framework in developing the application software for the detailed design.
A specification and design methodology is comprised of "tools and rules," so we now lay out the tool frameworks (Section 4.1) and middleware (Section 4.2) used to implement the MAGIC SDM, and then delineate the rules (Section 4.3).
MAGIC SDM Tools
We have chosen the following frameworks to integrate into the MAGIC SDM. We did not choose them because they are perfect; almost all frameworks targeting complex systems are more accurately described not as "frameworks" but as "frameworks-in-progress." We have chosen the frameworks described in this section because they are well matched to our application and technology domain, as well as stable commercial products.
For requirements capture and modeling we have chosen the DSP Workshop from The MathWorks [10] , [11] as well as Excel and Excel Link [12] (also from The MathWorks). For design exploration we chose eArchitect [13] from Innoveda (formerly, Viewlogic), which provides a performance modeling framework with the necessary COTS MP component models. Characterization and features driving the selection of The MathWorks and Innoveda frameworks are given in the following paragraphs where we discuss these frameworks.
THE MATHWORKS DSP Workshop
MATLAB is the de facto lingua franca of algorithm developers, including radar signal processing system analysts. Simulink is a system modeling framework strongly tied to MATLAB, allowing MATLAB expressions to be used explicitly in Simulink blocks. Supplemented by the DSP Blockset, Simulink has become a viable rapid prototyping environment for DSP applications. The RealTime Workshop (RTW) [11] is the C code generation facility complementing Simulink. It is especially useful because The MathWorks is moving towards adding VSIPL computational middleware support to RTW, an effort we are supporting.
Excel and MATLAB Excel Link
Excel provides the framework needed for requirements tabulation and analysis. The Excel spreadsheet is a commodity productivity application and, in the same way that many analysis and design frameworks provide support for MATLAB, tabular data-oriented frameworks provide support for Excel. Excel Link is a facility rather than a framework. It is a channel to allow Excel to copy data into MATLAB and execute on it in MATLAB while remaining in Excel. Many requirements are easily captured in a spreadsheet and, depending on the sophistication of the computation required to iterate between requirements modeling and design analysis, Excel or MATLAB may be required. Excel Link allows the specifier to remain in a single framework.
eArchitect
Performance modeling was chosen for design exploration and analysis since it supports architectural trade-off analysis without prematurely committing to a given vendor's hardware and software. The COTS performance modeling framework that is best matched to our domain will provide support for the technologies most likely to be used for implementing the signal processor. There are few embedded multiprocessing performance modeling frameworks available commercially. Innoveda's eArchitect is the only one we are aware of that supports VME and at least two of the COTS MP interconnection technologies (RACEway and Myrinet).
Early Validation via Virtual Benchmarking with Tokens Valuated with Deterministic Middleware
Model continuity is achieved in large part through the use of middleware for computation and communication, where middleware in this context refers to a standard API to component libraries supported by multiple library providers. Open standards-based middleware supports computation and communication software portability, which means that middleware written for one vendor's hardware should run on another vendor's platform. Consequently, middleware code that constitutes the innerloop software implementation can be used for different vendors' platforms for design analysis using performance modeling. In MAGIC, we chose to use the Vector Scalar Image Processing Library (VSIPL) computation middleware and the Message Passing Interface (MPI) communications middleware.
VSIPL. VSIPL is an API supporting portability for COTS users of real-time embedded multicomputers that has been produced by a national forum of government, academia, and industry participants [14] . The VSIPL API standard provides hundreds of functions to support computation on scalars, vectors, or dense rectangular arrays, including linear algebra functions and signal processing computations, such as filtering, correlation, FFTs, and convolution. VSIPL is computational middleware, which also supports interoperability with interprocessor communication (IPC) middleware such as MPI and MPI/RT (real-time MPI, i.e., MPI with a guaranteed quality of service). Commercial implementations have recently become available. Earnest consideration by various defense programs as well as other commercial projects is underway and early adoption has begun, both domestically and abroad.
MPI. Message passing is a powerful and very general method of expressing parallelism and can be used to create extremely efficient parallel applications. The Message Passing Interface (MPI) standard specifies an API for common message passing functions, such as point-to-point communication and collective operations. High-performance implementations of MPI are now available, including implementations for COTS MP platforms. The leading vendor is MPI Software Technology, Inc. (MSTI) who provides high-performance implementations of MPI under the commercial trademark MPI/Pro, including two of the three leading COTS MP vendors in our technology space (RACEway and Myrinet). There is another standards effort underway to specify a real-time version of MPI with a guaranteed quality-of-service (QoS) called MPI/RT [15] . At least one commercial version implementation of MPI/RT has appeared since completion of version 1.1 of this specification [16] .
Critical to making the use of middleware a strong thread of model continuity is the autogeneration of middleware code since automating the generation of software by a framework that is correct in specification reduces the chance of error in the design and implementation. Simulink's RTW can generate middleware for computation using VSIPL, MPI for communication, and/or MPI/RT for communication and control. The C code that RTW produces can be used for both design and implementation. The generated middleware is then used to quantify process delays in the performance model framework and as the core for signal processing implementation application software.
Our reasons for choosing VSIPL and MPI are very similar to our reasons for choosing the frameworks discussed above. They are stated here in order of importance with the most important reason stated first:
. Acceptable performance-These middlewares deliver high-performance because they are tightly integrated with the vendors' computation and communication libraries. . Standards-based-Since all the COTS MP vendors in our domain space support these middlewares and actively participate in their standardization processes, frameworks that generate VSIPL and MPI code will be consumable by all of the hardware vendors' SDEs in the design phase.
. COTS-They have begun to become commercially available and are therefore stable and supported. To generate the steady-state innerloop middlewarebased C code from Simulink, the DSP Blockset is translated into VSIPL and MPI function calls with the arguments determined by the parameters contained in the Simulink blocks. Basically, Simulink "boxes" are transformed into VSIPL computation function calls, while the "arrows" are transformed into MPI communication function calls.
The integration of the frameworks and the flow of information are shown in Fig. 4 . Note how constraints are checked in both the Executable Design Specification and during Design Analysis through the use of the Executable Workbook. Also, note how the computation and communication middleware code generated by Simulink is used to provide functions and architecture parameters that serve as indexes in the Executable Workbook that then provides token delays in the eArchitect performance modeler. The performance modeling simulations provide early validation of which architectures (software/hardware configurations on given technologies) satisfy performance and nonperformance constraints by checking both constraints against the requirements embodied in the Executable Workbook. These simulations have a high fidelity because experimentally determined benchmarks are used for valuating the tokens. The token delays in a compute node are the sum of data input and output overheads (MPI functions) in addition to the computation delay (VSIPL functions).
Representative samples of these benchmarks are shown in Table 2 and Table 3 , based on the PowerPC (PPC) compute node from Mercury Computer Systems, where the benchmarks were provided during October 1999. The MPI benchmarks were provided by MPI Software Technology, Inc., and the VSIPL benchmarks were provided by Mercury Computer Systems, Inc. [9] .
This same middleware code and associated API function parameters can be used in the RSP frameworks to create an executable design specification and implementation code for deployment in the fielded system. 
MAGIC SDM Rules
This section lays out the specification and design rules of the MAGIC SDM. We assume that a natural language (e.g., English) requirements specification document exists that contains the system requirements, interfaces, data rates, etc. We do not assume that the algorithms have been coded in MATLAB, though it would be very unusual for them not to be.
Tabulate requirements-Identify and cull details of the
requirements from the requirements specification document. 2. Capture nonperformance constraint requirements in an executable model-Describe computation, communication, and control requirements in an executable model. 3. Build executable workbook with performance constraint requirements-Put all the requirements into a tabular form to facilitate computational manipulation, e.g., in a worksheet/workbook environment such as Excel. 4. Gather benchmarks for tokens-Gather benchmarks of the middleware functions that are likely to be used in design and implementation and enter them into the executable workbook.
Explore alternative architectures and technologies-Use
performance modeling to explore potential architectures for a given technology, then determine the best architecture for that technology. Repeat as necessary for other candidate technologies.
6. Make design decisions-Decide which technology and architecture to use in implementing the signal processor. 7. Create implementation specification-Pass along architectural details to the system implementation specification based on the design exploration.
CASE STUDY: APPLYING MAGIC SDM TO A SAR BENCHMARK
We have validated the MAGIC SDM by applying it to the design of a system representative of our domain, the RASSP 8 [17] synthetic aperture radar (SAR) benchmark. We chose the RASSP synthetic aperture radar (SAR) benchmark since it is a level playing field on which to assess how MAGIC performs compared to other specification and design methodologies (SDMs) currently used in our domain. In particular, there are currently three main approaches to specifying and designing real-time embedded signal processing systems implemented with COTS MP hardware:
1. Traditional-Natural language (e.g., English) specification, rules-of-thumb hardware selection, implementation in C code using vendor libraries for computation and communication. This approach is methodologically weak; it lacks both model continuity and complexity control, exemplified in how the configuration information and design detail is buried in the C code and header files. This makes the gap quite wide between specification and design, and also makes implementation, debugging, and software maintenance difficult.
Virtual prototyping (VP)-Specification and design
of a digital system using an executable language such as VHDL. VP possesses some model continuity, but very little complexity control since specification is done using a detailed RTL design language rather than an appropriately abstract language and/or visual framework to capture the specification. This approach has been found to be unwieldy for larger, more complex applications using COTS MP because simulation runtimes have been painfully long and only those activities near the beginning of the hardware initialization cycle have been explored [17] , [18] . 3. Deployable RSP frameworks-Specification and design is done within a single monolithic framework that can generate C code that uses vendor-specific computation and communication libraries, makefiles, and runtime scripts. These frameworks do possess some model continuity and complexity control, but require the developer to commit to a hardware target before starting the design phase, the reverse of what the specification and design process should do. Applying our SDM to a real-world application of moderate complexity allowed us to refine our SDM rules and exercise our tools with a domain-representative and realistic application. We found that the MAGIC SDM can simulate complex system performance for whatever period is necessary. (It is able to simulate at least 20 times longer than a comparable VP simulation on the SAR benchmark, and in a small fraction of the time.) This enables the designer to obtain a high fidelity assessment of how well a candidate architecture and technology will do in meeting latency requirements. The MAGIC SDM also provides the framework to evaluate competitive technologies prior to implementation, which the RSP SDMs cannot do at all.
The SAR benchmark was a design exercise undertaken to assess performance of a RASSP-developed system. Application areas for these benchmarks were intended to present realistic challenges to RASSP as well as being of interest to a broad community of users. The COTS MP technologies in our domain are typically the technology of choice for implementation of SAR image processors, so the SAR benchmark is quite representative of our domain of interest.
The requirements were published and made available to the public domain in a variety of formats, formally in [19] and informally in [17] , [18] , among many others. The MIT Lincoln Laboratory RASSP web site [20] has been a rich source of relevant material, including a C-based executable specification and real-world data. Corresponding to that is a MATLAB version of the executable specification that we obtained, both code and data. The requirements are summarized as shown in Table 4 .
Following the rules of the MAGIC SDM, we captured the requirements in Excel (cf. Table 4) and Simulink. We then manipulated the requirements model to represent different parallel architectures, with the executable workbook linked to generate token delays for use in eArchitect for design analysis. The original Simulink model is shown in Fig. 5 and one particular parallelized iteration is shown in Fig. 6 .
We were only able to gather one set of COTS MP benchmarks for VSIPL, which was from Mercury Computer Systems. Since we only had VSIPL benchmarks for Mercury's platform, we obtained only the benchmarks for MPI on Mercury PowerPC compute elements using MSTI's MPI/Pro.
For design analysis, we used "generated" middleware code and performance modeling to estimate the throughput and latency of candidate architectures by iterating with one candidate technology to find the optimum architecture for that given technology. We would normally repeat this with other potential technologies. The optimum architectures would then be compared and the "best of the best" would be chosen as the architecture and technology for implementation. In our case study, we were limited to one technology (Mercury Computer Systems), which was more than adequate for exercising our MAGIC SDM. The results are shown in Table 5 . The architectures that did not satisfy requirements before accounting for interprocessor communication (IPC) are shaded in black, while those that did not after accounting for IPC are shaded in gray. Table 5 illustrates that MAGIC allows the designer to more comprehensively search the space of potential architectures than do existing RSP frameworks. It brings IPC constraints into the early design process to quickly disqualify inappropriate architectures, rather than prematurely committing to a technology and finding out later that it does not satisfy IPC constraints. It does not require implementing the design to determine the IPC effects; this allows the designer to rapidly search the design space to eliminate the unsatisfactory architectures and identify acceptable ones.
Finally, design decisions were made based primarily upon these results and other nonperformance constraints. More details on this are contained in Chapter 7 of [9] .
This case study shows that the MAGIC SDM accomplishes three important goals:
1. The MAGIC SDM works as postulated. The rules can be followed and the tools work-especially in providing model continuity. Examples of model continuity included the passing of requirements model information back and forth to our design analysis performance modeling via our executable workbook, which also assured nonperformance constraints were satisfied. Also, once a design was chosen, our requirements model was used to generate innerloop computation and communication C code as well as test vectors that can all be used in the processor's implementation. The performance model provided hardware configuration and software-to-hardware mapping information to the implementation. 2. The MAGIC SDM yields benchmarks of a full frame of data with runtimes beyond the threesecond latency requirement, which is 20 times the longest VHDL simulation. We have run the VHDLbased virtual benchmarks anywhere from 1.5 seconds to 4.0 seconds, well over the 150 ms achieved in other RASSP SAR virtual prototyping-based VHDL simulations [17] . Also, the simulation times were orders of magnitude faster. The average VP simulations averaged 4.5 hours to simulate 11 milliseconds on a Sparc20 workstation with 50 percent utilization [18] . The average MAGIC simulations took 20 minutes to simulate 4.0 seconds on a comparable machine with comparable loading. This is three orders of magnitude (almost 5,000) times faster. It is important to consider how applicable the tenet of the MAGIC SDM is beyond the domain of real-time streaming data with data transformation processing applications implemented with embedded COTS multiprocessing technology. By constraining our focus to our application domain, we were able to identify the frameworks and middleware that would be viable for integration into the MAGIC SDM. We are now adapting the MAGIC SDM to other application and technology domains. Frameworks exist for other application and technology domains, usually referred to as "EDA" (electronic design automation), such as system-on-a-chip (SoC). While we were developing MAGIC and focusing on our embedded real-time signal processing application domain and the "SiB" ("system-inbox") technology domain, independent development was occurring on a strikingly similar SDM for SoC by Cadence Design Systems, Inc., dubbed VCC for "Virtual Component Codesign." This framework implements a design flow very similar to MAGIC and shows great promise for sound model-continuous and pragmatic SoC specification, design, and implementation. The lead author has joined Cadence to help evolve and deploy VCC commercially, bringing in those elements of MAGIC that will aid in this evolution. It is compelling to note two independent efforts on two coasts focusing on two different implementation technologies and unaware of one another resulted in strikingly similar methodologies, each novel in its own domain and in its own right.
TABLE 5 Latencies of SAR Processor Architectures Accounting for IPC
Architectures that do not meet scalability requirement are shaded.
VCC can be described as a hardware-software codesign methodology and supporting tools framework that allows intellectual property reuse and reduces time-to-market on complex SoCs (see Fig. 7 from [21] ). The designer specifies a model of a complete embedded system, incorporating reusable architectural and behavioral blocks, which can be quickly simulated via performance modeling to quickly explore the performance of alternative hardware and software implementations. Separating the behavior from the architecture, then specifying the mapping between them maximizes reuse potential.
After the designer settles on a particular mapping, the framework performs communication synthesis to synthesize the hardware and software "glue" for the communication between the reusable blocks and exports the complete implementation of the software and the hardware. The framework's implementation export can be customized for different targets and tools, for example, different processors, real-time operating systems, debuggers, and hardware implementation tools. The designer can compare the results of running the implementation code in a coverification environment with the results of running the original highlevel specification model.
In Fig. 7 , the VCC framework handles the methodology above the dotted line, while connectivity to other tools below the line is necessary. Work is being done to enable conduits to these types of tools. More details on VCC can be found in [21] , [22] , [23] . Also, important to note are illustrative anecdotal data generated by Cadence comparing execution times between cycle-approximate performance modeling simulations with cycle-accurate coverification simulations. Eight seconds (8 seconds) of GSM traffic were simulated in a VCC performance model in four minutes (4 minutes) on an HP Kayak-XU 450 MHz Pentium II class machine, while the same GSM traffic was simulated in the VirtualICE coverification simulator running on a Sun Ultra 60 dual-SPARC (450 MHz; 50-60 percent utilized CPU) class machine and took eighteen hours (18 hours). While we are admittedly comparing an apple to an orange here-cycle simulation accuracy and machine class-these two data points are still quite telling in which methodological approach better supports a rapid analysis of codesign architecture. More systematic data is found in [24] , which shows that the clock cycle approximate simulations of VCC vis á vis clock cycle accurate RTL-level simulations are within 5 to 10 percent timing accuracy with a high order of simulation time speedup. The price of abstraction is relatively small compared to the benefit of reducing design time via rapid prototyping with virtual benchmarking.
RELATED WORK
Thus far, this paper has discussed related methodologies and RSP frameworks that are specific to the domain of interest (embedded multiprocessor signal processing systems). This section shows how the research is more broadly related to current and emerging trends in software engineering and embedded systems engineering.
View integration. Model continuity is closely related to the concept of view integration. View integration assumes that multiple, overlapping views of a system exist, where a view is an elaboration of some subset of the design model, such as an architectural representation or dataflow diagram. Views overlap when some information is replicated in each view. Because a designer will usually only focus on a small number of design views at any time to control complexity, it becomes very easy for inconsistency errors to appear as design progresses. View integration strives to maintain consistency across all views in a systematic fashion. One view integration approach is ViewPoints [25] , [26] , [27] , which allows a team of designers to integrate their respective views of a system by specifying inter-view rules for consistency management. Unlike ViewPoints, which maintains view integration horizontally among several parallel design processes, model continuity seeks to maintain view consistency vertically, i.e., between development stages. The goal of model continuity is not to focus on inconsistency management or to focus on unifying multiple overlapping views. Rather, the focus is on ensuring that constraints are applied consistently across multiple stages of development and at varying levels of detail.
Unified Modeling. The desire for model continuity in software development has been a key motivation underlying the Unified Modeling Language (UML) [28] , [29] . The UML provides a generic, extensible framework for design modeling and documentation, and it is quickly gaining acceptance as a de facto industry standard. It provides diagramming notations appropriate for specifying many different views of a model, from state and sequence diagrams for modeling system behavior, to class and deployment diagrams for modeling structural aspects of a system. Because of its extensibility, UML can be tailored to complement a large class of non-UML diagrammatic notations, such as architecture description languages [30] . The UML approach differs from the MAGIC SDM in that the UML is a language for informally expressing models from multiple points of view, but not in an executable form that can be used for early validation. MAGIC is a methodology that includes model representations, process rules, and tool support. The UML focuses primarily on general-purpose model representations as opposed to domain-specific representations. Formalizing the UML so that it is amenable to automated verification and validation techniques is an active area of research. In contrast, the MAGIC SDM strives for high-fidelity modeling based on domain-specific benchmark data associated with standard component libraries for accuracy, error detection/recovery, and validation at early stages of development.
Model Interchange. Recent advances in model interchange language design, such as the recently specified XMI (XML Metadata Interchange) [31] standard show promise in facilitating model continuity in general-purpose software development [32] , [33] . XMI is a standard for exchanging object data among a variety of tools, such as UML modeling tools, database design tools, and programming environments. Instead of simply providing a mapping between equivalent concepts in separate tools, XMI allows the definition of metadata, which is information describing the structure and semantics of the data being transmitted between tools. Rather than defining a separate mapping for each pair of views (which does not scale well and is difficult to change), XMI provides an intermediate representation that any other XMI-compliant view can access. Not only can tools adhering to the XMI standard exchange and understand information easily, but XMI provides inheritance-based extension mechanisms whereby it can evolve as the nature of the transmitted information evolves. This model interchange language is being considered as an alternative to the flat tool-to-tool mapping currently used in supporting information flow in MAGIC. This will allow other tools to access the information more easily and for MAGIC tool support to more easily evolve as RSP framework technology advances. Executable Specifications. High-level executable specification languages, such as Rapide [34] , SpecC [35] , and SystemC [36] , [37] , [38] , [39] integrate computational requirements with architectural exploration for early validation. These languages model systems as collections of concurrent, communicating, distributed objects. Both are intended to take a requirements-centric approach to design instead of an implementation-centric approach, although SpecC is capable of code generation from its high-level specification. A similar executable specification language, VSPEC [40] , [41] , models hardware components with predicate logic constraints (based on the Larch specification language [42] ), allowing validation via theorem-proving which can be computationally expensive. MAGIC complements these approaches by augmenting the behavioral simulation of the executable specification with virtual benchmarking; i.e., it makes quantitative benchmark data available in the early stages of development so that higherfidelity, efficient analysis can be performed.
Enriching Component APIs. Early validation via virtual benchmarking is closely aligned with emerging trends in component-based design to make components "contractaware" [43] and for creating "rich APIs" [44] that include information beyond type signatures in the APIs to standard components. The goal is to annotate components with properties, such as quality of service and synchronization information so that their valid composition can be performed. Virtual benchmarking shows how certain types of quantitative data associated with open-standards middleware can be used as a source of API annotations and what types of analysis can be applied to incorporate this information into early design exploration.
CONCLUSION
Sound specification and design methodology (SDM) is critical for design success as measured by correctness and time-to-market. This becomes all the more essential as system complexity increases. While a specification and design methodology embodies a framework or combination of tools with accompanying specification and design rules created to assure correctness, unless the methodology provides pragmatic accessibility and timely use, such methodologies will not be used. We have developed and shown that a sound SDM (MAGIC) can be achieved in the system-in-a-box technology domain targeted by real-time embedded signal processing applications.
