64 research outputs found
Multidimensional model of estimated resource usage for multimedia NoC QoS
Multiprocessor systems are rapidly entering various high-performance computing segments, like multimedia processing. Instead of an increase in processor clock frequency, the new trend is enabling multiple cores in performing processing, e.g. dual or quadrapule CPUs in one subsystem. In this contribution, we address the problem of modeling the resource requirements of multimedia applications for a distributed computation on a multiprocessor system. This paper shows that the estimation of resource requirements based on input data enables the dynamic activation of tasks and run-time redistribution of application tasks. We also formally specify the optimal selection of the co-executed application with aim to provide the most optimal end-results of such streaming applications within one networks-on-chip (NoC) system. We present a new concept for system optimization which involves the major system parameters and resource usage. Experimental results are based on mapping an arbitrary-shaped MPEG-4 video decoder onto a multiprocessor NoC
Adaptive decoding of MPEG-4 sprites for memory-constrained embedded systems
Background sprite decoding is an essential part of object-based video coding.The composition and rendering of a final scene involves the placing of individual video objects in a predefined way superimposed on the decoded background image. The MPEG-4 standard includes the decoding algorithm for background image decoding, but this algorithm is not suitable for implementation on a memory-constrained platform. In this paper we present a modification of the decoding algorithm that decodes MPEG-4 sequences while fulfilling the requirements of a memory-constrained multiprocessor system with only 17% extra overhead of computation. Our algorithm reduces the memory cost of such decoding with a factor of four. Additionally, our algorithm offers the possibility of high level data parallelism and consequently contributes to an increase of throughput rate
Recommended from our members
Bobbin-Tool Friction-Stir Welding of Thick-Walled Aluminum Alloy Pressure Vessels
It was desired to assemble thick-walled Al alloy 2219 pressure vessels by bobbin-tool friction-stir welding. To develop the welding-process, mechanical-property, and fitness-for-service information to support this effort, extensive friction-stir welding-parameter studies were conducted on 2.5 cm. and 3.8 cm. thick 2219 Al alloy plate. Starting conditions of the plate were the fully-heat-treated (-T62) and in the annealed (-O) conditions. The former condition was chosen with the intent of using the welds in either the 'as welded' condition or after a simple low-temperature aging treatment. Since preliminary stress-analyses showed that stresses in and near the welds would probably exceed the yield-strength of both 'as welded' and welded and aged weld-joints, a post-weld solution-treatment, quenching, and aging treatment was also examined. Once a suitable set of welding and post-weld heat-treatment parameters was established, the project divided into two parts. The first part concentrated on developing the necessary process information to be able to make defect-free friction-stir welds in 3.8 cm. thick Al alloy 2219 in the form of circumferential welds that would join two hemispherical forgings with a 102 cm. inside diameter. This necessitated going to a bobbin-tool welding-technique to simplify the tooling needed to react the large forces generated in friction-stir welding. The bobbin-tool technique was demonstrated on both flat-plates and plates that were bent to the curvature of the actual vessel. An additional issue was termination of the weld, i.e. closing out the hole left at the end of the weld by withdrawal of the friction-stir welding tool. This was accomplished by friction-plug welding a slightly-oversized Al alloy 2219 plug into the termination-hole, followed by machining the plug flush with both the inside and outside surfaces of the vessel. The second part of the project involved demonstrating that the welds were fit for the intended service. This involved determining the room-temperature tensile and elastic-plastic fracture-toughness properties of the bobbin-tool friction-stir welds after a post-weld solution-treatment, quenching, and aging heat-treatment. These mechanical properties were used to conduct fracture-mechanics analyses to determine critical flaw sizes. Phased-array and conventional ultrasonic non-destructive examination was used to demonstrate that no flaws that match or exceed the calculated critical flaw-sizes exist in or near the friction-stir welds
Recommended from our members
The Scrounge-atron: a phased approach to the Advanced Hydrotest Facility utilizing proton radiography
The Department of Energy has initiated its Stockpile Stewardship and Management Program (SSMP) to provide a single, integrated technical program for maintaining the continued safety and reliability of the nation's nuclear weapons stockpile in the absence of nuclear testing. Consistent with the SSMP, the Advanced Hydrotest Facility (AHF) has been conceived to provide improved radiographic imaging with multiple axes and multiple time frames. The AHF would be used to better understand the evolution of nuclear weapon primary implosion shape under normal and accident scenarios. There are three fundamental technologies currently under consideration for use on the AHF. These include linear induction acceleration, inductive-adder pulsed-power technology (both technologies using high current electron beams to produce an intense X-ray beam) and high-energy proton accelerators to produce a proton beam. The Scrounge-atron (a proton synchrotron) was conceived to be a relatively low cost demonstration of the viability of the third technology using bursts of energetic protons, magnetic lenses, and particle detectors to produce the radiographic image. In order for the Scrounge-atron to provide information useful for the AHF technology decision, the accelerator would have to be built as quickly and as economically as possible. These conditions can be met by "scrounging" parts from decommissioned accelerators across the country, especially the Main Ring at Fermilab. The Scrounge-atron is designed to meet the baseline parameters for single axis proton radiography: a 20 GeV proton beam of ten pulses, 10 degrees protons each, spaced 250 ns apart. (2 refs)
Dataflow Analysis for Real-Time Embedded Multiprocessor System Design
Dataflow analysis techniques are key to reduce the number of design iterations and shorten the design time of real-time embedded network based multiprocessor systems that process data streams. With these analysis techniques the worst-case end-to-end temporal behavior of hard real-time applications can be derived from a dataflow model in which computation, communication and arbitration is modeled. For soft real-time applications these static dataflow analysis techniques are combined with simulation of the dataflow model to test statistical assertions about their temporal behavior. The simulation results in combination with properties of the dataflow model are used to derive the sensitivity of design parameters and to estimate parameters like the capacity of data buffers
Performance and QoS-aware MPEG-4 video-object coding for multiprocessor architecture
The introduction of Arbitrary-Shaped (AS) Video Objects (VO) in the MPEG-4 coding standard has enabled various applications using both natural and synthetic composition of video scenes. The work presented in this thesis aims at realizing an embedded-systems design involving the mapping of this type of applications onto a multiprocessor platform, like Network-on-Chip (NoC). The research has focused on the upper design layers, dealing with the application and their control for an ecient execution. The aspects addressed for the mapping are performance modeling of the MPEG-4 decoding, granularity optimization of the algorithm, introduction of task-level scalability, and controlling the quality of the applications by a Quality-of-Service (QoS) manager. The AS VO MPEG-4 decoding algorithm comprises of the conventional DCT coding techniques from MPEG-1/2 that are extended with the coding of object shapes and specic processing for the improvement of the picture quality of object borders, employing padding and block-based ltering. At the system level, the AS VO MPEG-4 coding allows the designer to think in individual planes and objects that together compose the scene. The target platform for such an application should be able to handle the features of MPEG-4 coding: the combination of high-level control-driven operations and streaming-oriented processing at the video-data level. The platform features a tile-based computing network, in which each tile is separated from the network by buered communication. This allows multiple instantiation of object decoding, each having its own dynamic behavior. The Synchronous Data Flow (SDF) graph is a traditional model for computation of multimedia applications mapped on the multiprocessor system. However, SDF cannot cope with the dynamic behavior of object-based video. Therefore, this research has extended SDF by a linear parametrical model of the required computation resources. The model is based on the coding parameters of the input stream (BAB-type of the block, number of non-transparent sub-blocks, number of AC coecients coded by an ESC code, etc.) and weighting coecients depending on the target processor architecture. Similarly, thesis proposes a parametrical model for the communication resources. It was found that our obtained parametrical timing model has about 5% deviation from the real execution on an Æthereal NoC with ARM7 cores. Our comparison with the mostly used worst-case approach for communication resource allocation revealed that it reduces the required resources with a factor of 2.5. For more ecient system control, the thesis presents a hierarchical Qualityof- Service (QoS) concept in combination with a scalable MPEG-4 decoder. To serve scalable execution, we have classied the tasks involved with the AS VO MPEG-4 decoding into two classes. The rst class contains essential tasks that cannot be skipped, while the second class is lled with the enhancement functions. Scalability of AS VO MPEG-4 decoding was obtained by enabling/disabling optional functions of the non-essential tasks next to the essential tasks. The resource distribution is controlled by a hierarchical QoS management. This QoS is based on two QoS managers. In our experimental implementation, the Local QoS provides the estimation of the resource-usage of an application and monitors the real execution. The Global QoS selects the best quality-levels of the active applications and reserves resources for the application. The key contribution of our work on QoS is the design of a heuristic algorithm that searches suitable combinations of quality levels for individual jobs, so that a set of jobs can be mapped on the available resources. In order to further improve the eciency of the mapping, we have distinguished reservation-based QoS control and best-eort computing on top of it as an addition. This combination was studied for controlling the bandwidth of the communication resources. The reservation-based approach guarantees that the video object will be always decoded at least at the lowest quality level, while the best-eort computing improves the quality by using the resources as much as they are available, as controlled by the Global QoS. The complete system was experimentally veried with a network of eight ARM processor cores, using an MPEG-4 Video Object decoder at the ACE prole and at CCIR- 601 resolution. The proposed framework showed that the adaptation at ner granularity, e.g. a VOP level within a GOV, signicantly improve the image quality (provided that resources are constrained. The mapping exploration of AS VO MPEG-4 decoding for execution on an NoC addresses a general case of running modern multimedia applications, because of the variability and dynamics of tasks. It has been shown that parametrical models help in planning the execution and QoS management and best-eort computing clearly improve the eciency of multiple tasks executed in parallel
Performance and QoS-aware MPEG-4 video-object coding for multiprocessor architecture
The introduction of Arbitrary-Shaped (AS) Video Objects (VO) in the MPEG-4 coding standard has enabled various applications using both natural and synthetic composition of video scenes. The work presented in this thesis aims at realizing an embedded-systems design involving the mapping of this type of applications onto a multiprocessor platform, like Network-on-Chip (NoC). The research has focused on the upper design layers, dealing with the application and their control for an ecient execution. The aspects addressed for the mapping are performance modeling of the MPEG-4 decoding, granularity optimization of the algorithm, introduction of task-level scalability, and controlling the quality of the applications by a Quality-of-Service (QoS) manager. The AS VO MPEG-4 decoding algorithm comprises of the conventional DCT coding techniques from MPEG-1/2 that are extended with the coding of object shapes and specic processing for the improvement of the picture quality of object borders, employing padding and block-based ltering. At the system level, the AS VO MPEG-4 coding allows the designer to think in individual planes and objects that together compose the scene. The target platform for such an application should be able to handle the features of MPEG-4 coding: the combination of high-level control-driven operations and streaming-oriented processing at the video-data level. The platform features a tile-based computing network, in which each tile is separated from the network by buered communication. This allows multiple instantiation of object decoding, each having its own dynamic behavior. The Synchronous Data Flow (SDF) graph is a traditional model for computation of multimedia applications mapped on the multiprocessor system. However, SDF cannot cope with the dynamic behavior of object-based video. Therefore, this research has extended SDF by a linear parametrical model of the required computation resources. The model is based on the coding parameters of the input stream (BAB-type of the block, number of non-transparent sub-blocks, number of AC coecients coded by an ESC code, etc.) and weighting coecients depending on the target processor architecture. Similarly, thesis proposes a parametrical model for the communication resources. It was found that our obtained parametrical timing model has about 5% deviation from the real execution on an Æthereal NoC with ARM7 cores. Our comparison with the mostly used worst-case approach for communication resource allocation revealed that it reduces the required resources with a factor of 2.5. For more ecient system control, the thesis presents a hierarchical Qualityof- Service (QoS) concept in combination with a scalable MPEG-4 decoder. To serve scalable execution, we have classied the tasks involved with the AS VO MPEG-4 decoding into two classes. The rst class contains essential tasks that cannot be skipped, while the second class is lled with the enhancement functions. Scalability of AS VO MPEG-4 decoding was obtained by enabling/disabling optional functions of the non-essential tasks next to the essential tasks. The resource distribution is controlled by a hierarchical QoS management. This QoS is based on two QoS managers. In our experimental implementation, the Local QoS provides the estimation of the resource-usage of an application and monitors the real execution. The Global QoS selects the best quality-levels of the active applications and reserves resources for the application. The key contribution of our work on QoS is the design of a heuristic algorithm that searches suitable combinations of quality levels for individual jobs, so that a set of jobs can be mapped on the available resources. In order to further improve the eciency of the mapping, we have distinguished reservation-based QoS control and best-eort computing on top of it as an addition. This combination was studied for controlling the bandwidth of the communication resources. The reservation-based approach guarantees that the video object will be always decoded at least at the lowest quality level, while the best-eort computing improves the quality by using the resources as much as they are available, as controlled by the Global QoS. The complete system was experimentally veried with a network of eight ARM processor cores, using an MPEG-4 Video Object decoder at the ACE prole and at CCIR- 601 resolution. The proposed framework showed that the adaptation at ner granularity, e.g. a VOP level within a GOV, signicantly improve the image quality (provided that resources are constrained. The mapping exploration of AS VO MPEG-4 decoding for execution on an NoC addresses a general case of running modern multimedia applications, because of the variability and dynamics of tasks. It has been shown that parametrical models help in planning the execution and QoS management and best-eort computing clearly improve the eciency of multiple tasks executed in parallel
- …