3,218 research outputs found

    Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs

    Get PDF

    Temporal analysis and scheduling of hard real-time radios running on a multi-processor

    Get PDF
    On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for independent timing analysis or relies on runtime timing analysis. This thesis proposes a design flow and software architecture that meets these challenges, while enabling features such as independent transceiver compilation and dynamic loading, and taking into account other challenges such as ease of programming, efficiency, and ease of validation. We take data flow as the basic model of computation, as it fits the application domain, and several static variants (such as Single-Rate, Multi-Rate and Cyclo-Static) have been shown to possess strong analytical properties. Traditional temporal analysis of data flow can provide minimum throughput guarantees for a self-timed implementation of data flow. Since transceivers may need to guarantee strictly periodic execution and meet latency requirements, we extend the analysis techniques to show that we can enforce strict periodicity for an actor in the graph; we also provide maximum latency analysis techniques for periodic, sporadic and bursty sources. We propose a scheduling strategy and an automatic scheduling flow that enable the simultaneous execution of multiple transceivers with hard-realtime requirements, described as Single-Rate Data Flow (SRDF) graphs. Each transceiver has its own execution rate and starts and stops independently from other transceivers, at times unknown at compile time, on a multiprocessor. We show how to combine scheduling and mapping decisions with the input application data flow graph to generate a worst-case temporal analysis graph. We propose algorithms to find a mapping per transceiver in the form of clusters of statically-ordered actors, and a budget for either a Time Division Multiplex (TDM) or Non-Preemptive Non-Blocking Round Robin (NPNBRR) scheduler per cluster per transceiver. The budget is computed such that if the platform can provide it, then the desired minimum throughput and maximum latency of the transceiver are guaranteed, while minimizing the required processing resources. We illustrate the use of these techniques to map a combination of WLAN and TDS-CDMA receivers onto a prototype Software-Defined Radio platform. The functionality of transceivers for standards with very dynamic behavior – such as WLAN – cannot be conveniently modeled as an SRDF graph, since SRDF is not capable of expressing variations of actor firing rules depending on the values of input data. Because of this, we propose a restricted, customized data flow model of computation, Mode-Controlled Data Flow (MCDF), that can capture the data-value dependent behavior of a transceiver, while allowing rigorous temporal analysis, and tight resource budgeting. We develop a number of analysis techniques to characterize the temporal behavior of MCDF graphs, in terms of maximum latencies and throughput. We also provide an extension to MCDF of our scheduling strategy for SRDF. The capabilities of MCDF are then illustrated with a WLAN 802.11a receiver model. Having computed budgets for each transceiver, we propose a way to use these budgets for run-time resource mapping and admissibility analysis. During run-time, at transceiver start time, the budget for each cluster of statically-ordered actors is allocated by a resource manager to platform resources. The resource manager enforces strict admission control, to restrict transceivers from interfering with each other’s worst-case temporal behaviors. We propose algorithms adapted from Vector Bin-Packing to enable the mapping at start time of transceivers to the multi-processor architecture, considering also the case where the processors are connected by a network on chip with resource reservation guarantees, in which case we also find routing and resource allocation on the network-on-chip. In our experiments, our resource allocation algorithms can keep 95% of the system resources occupied, while suffering from an allocation failure rate of less than 5%. An implementation of the framework was carried out on a prototype board. We present performance and memory utilization figures for this implementation, as they provide insights into the costs of adopting our approach. It turns out that the scheduling and synchronization overhead for an unoptimized implementation with no hardware support for synchronization of the framework is 16.3% of the cycle budget for a WLAN receiver on an EVP processor at 320 MHz. However, this overhead is less than 1% for mobile standards such as TDS-CDMA or LTE, which have lower rates, and thus larger cycle budgets. Considering that clock speeds will increase and that the synchronization primitives can be optimized to exploit the addressing modes available in the EVP, these results are very promising

    An Abstraction-Refinement Theory for the Analysis and Design of Concurrent Real-Time Systems

    Get PDF
    Concurrent real-time systems with shared resources belong to the class of safety-critical systems for which it is required to determine both temporally and functionally conservative guarantees. However, the growing complexity of real-time systems makes it more and more challenging to apply standard techniques for their analysis. Especially the presence of both cyclic data dependencies and cyclic resource dependencies makes many related analysis approaches inapplicable. The usage of Static Priority Preemptive (SPP) scheduling further impedes the employment of many "classical" analysis techniques. To address this growing complexity and to be able to give guarantees nevertheless we present an abstraction-refinement theory for real-time systems. We introduce a timed component model that is defined in such a generic way that both real-time system implementations and any kinds of analysis models for such applications can be expressed therein. Thereafter, we devise three different abstraction-refinement theories for the timed component model, exclusion, inclusion and bounding. Exclusion can be used to remove unconsidered corner cases, inclusion allows for the substitution of uncertainty with non-determinism, while bounding permits to replace non-determinism with determinism. The latter enables the creation of efficiently analyzable models that can be used to give temporal or functional guarantees on non-deterministic and non-monotone implementations. We use such abstractions to construct analysis models from concurrent real-time systems with shared resources and SPP scheduling. On these models we apply various analysis techniques, with the goal to increase analysis accuracy. Our first accuracy improvement is achieved by combining the rather coarse state-of-the-art period-and-jitter interference characterization with an explicit consideration of cyclic data dependencies. The interference-limiting effect of such cycles can be exploited even more with an "iterative buffer sizing". Next we replace period-and-jitter with execution intervals, resulting in an even higher accuracy. In our last approach we increase both accuracy and applicability by enabling the support of real-time systems with tasks consisting of multiple phases and operating at different rates. With a modification of this approach we further enable the analysis of applications with multiple shared resources. Finally, we also present the so-called HAPI simulator that is capable of simulating any kinds of concurrent real-time systems with shared resources

    System-level design of energy-efficient sensor-based human activity recognition systems: a model-based approach

    Get PDF
    This thesis contributes an evaluation of state-of-the-art dataflow models of computation regarding their suitability for a model-based design and analysis of human activity recognition systems, in terms of expressiveness and analyzability, as well as model accuracy. Different aspects of state-of-the-art human activity recognition systems have been modeled and analyzed. Based on existing methods, novel analysis approaches have been developed to acquire extra-functional properties like processor utilization, data communication rates, and finally energy consumption of the system

    Indicating Asynchronous Array Multipliers

    Full text link
    Multiplication is an important arithmetic operation that is frequently encountered in microprocessing and digital signal processing applications, and multiplication is physically realized using a multiplier. This paper discusses the physical implementation of many indicating asynchronous array multipliers, which are inherently elastic and modular and are robust to timing, process and parametric variations. We consider the physical realization of many indicating asynchronous array multipliers using a 32/28nm CMOS technology. The weak-indication array multipliers comprise strong-indication or weak-indication full adders, and strong-indication 2-input AND functions to realize the partial products. The multipliers were synthesized in a semi-custom ASIC design style using standard library cells including a custom-designed 2-input C-element. 4x4 and 8x8 multiplication operations were considered for the physical implementations. The 4-phase return-to-zero (RTZ) and the 4-phase return-to-one (RTO) handshake protocols were utilized for data communication, and the delay-insensitive dual-rail code was used for data encoding. Among several weak-indication array multipliers, a weak-indication array multiplier utilizing a biased weak-indication full adder and the strong-indication 2-input AND function is found to have reduced cycle time and power-cycle time product with respect to RTZ and RTO handshaking for 4x4 and 8x8 multiplications. Further, the 4-phase RTO handshaking is found to be preferable to the 4-phase RTZ handshaking for achieving enhanced optimizations of the design metrics.Comment: arXiv admin note: text overlap with arXiv:1903.0943

    On hard real-time scheduling of cyclo-static dataflow and its application in system-level design

    Get PDF
    This dissertation addresses the problem of designing hard real-time streaming systems running a set of parallel streaming programs in an automated way such that the programs provably meet their timing requirements. A scheduling framework is proposed with which it is analytically proven that any streaming program, modeled as an acyclic Cyclo-Static Dataflow (CSDF) graph, can be executed as a set of real-time periodic tasks. The proposed framework computes the parameters of the periodic tasks corresponding to the graph actors and the minimum buffer sizes of the communication channels such that a valid periodic schedule is guaranteed to exist. In order to demonstrate the effectiveness of the proposed scheduling framework, a system-level design flow that incorporates the scheduling framework is proposed. This proposed design flow accepts, as input, algorithmic sequential specifications of streaming programs, and then applies a set of systematic and automated steps that produce, as output, the final system implementation, which provably meets the timing requirements of the programs. The final system implementation consists of the parallelized versions of the input streaming programs together with the hardware needed to run them. The proposed scheduling framework and design flow are evaluated through a set of experiments. These experiments illustrate the effectiveness of the proposed scheduling framework and design flow.Computer Systems, Imagery and Medi

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    Cassini atmospheric chemistry mapper. Volume 1. Investigation and technical plan

    Get PDF
    The Cassini Atmospheric Chemistry Mapper (ACM) enables a broad range of atmospheric science investigations for Saturn and Titan by providing high spectral and spatial resolution mapping and occultation capabilities at 3 and 5 microns. ACM can directly address the major atmospheric science objectives for Saturn and for Titan, as defined by the Announcement of Opportunity, with pivotal diagnostic measurements not accessible to any other proposed Cassini instrument. ACM determines mixing ratios for atmospheric molecules from spectral line profiles for an important and extensive volume of the atmosphere of Saturn (and Jupiter). Spatial and vertical profiles of disequilibrium species abundances define Saturn's deep atmosphere, its chemistry, and its vertical transport phenomena. ACM spectral maps provide a unique means to interpret atmospheric conditions in the deep (approximately 1000 bar) atmosphere of Saturn. Deep chemistry and vertical transport is inferred from the vertical and horizontal distribution of a series of disequilibrium species. Solar occultations provide a method to bridge the altitude range in Saturn's (and Titan's) atmosphere that is not accessible to radio science, thermal infrared, and UV spectroscopy with temperature measurements to plus or minus 2K from the analysis of molecular line ratios and to attain an high sensitivity for low-abundance chemical species in the very large column densities that may be achieved during occultations for Saturn. For Titan, ACM solar occultations yield very well resolved (1/6 scale height) vertical mixing ratios column abundances for atmospheric molecular constituents. Occultations also provide for detecting abundant species very high in the upper atmosphere, while at greater depths, detecting the isotopes of C and O, constraining the production mechanisms, and/or sources for the above species. ACM measures the vertical and horizontal distribution of aerosols via their opacity at 3 microns and, particularly, at 5 microns. ACM recovers spatially-resolved atmospheric temperatures in Titan's troposphere via 3- and 5-microns spectral transitions. Together, the mixing ratio profiles and the aerosol distributions are utilized to investigate the photochemistry of the stratosphere and consequent formation processes for aerosols. Finally, ring opacities, observed during solar occultations and in reflected sunlight, provide a measurement of the particle size and distribution of ring material. ACM will be the first high spectral resolution mapping spectrometer on an outer planet mission for atmospheric studies while retaining a high resolution spatial mapping capability. ACM, thus, opens an entirely new range of orbital scientific studies of the origin, physio-chemical evolution and structure of the Saturn and Titan atmospheres. ACM provides high angular resolution spectral maps, viewing nadir and near-limb thermal radiation and reflected sunlight; sounds planetary limbs, spatially resolving vertical profiles to several atmospheric scale heights; and measures solar occultations, mapping both atmospheres and rings. ACM's high spectral and spatial resolution mapping capability is achieved with a simplified Fourier Transform spectrometer with a no-moving parts, physically compact design. ACM's simplicity guarantees an inherent stability essential for reliable performance throughout the lengthy Cassini Orbiter mission

    Response modeling:model refinements for timing analysis of runtime scheduling in real-time streaming systems

    Get PDF

    Exploring resource/performance trade-offs for streaming applications on embedded multiprocessors

    Get PDF
    Embedded system design is challenged by the gap between the ever-increasing customer demands and the limited resource budgets. The tough competition demands ever-shortening time-to-market and product lifecycles. To solve or, at least to alleviate, the aforementioned issues, designers and manufacturers need model-based quantitative analysis techniques for early design-space exploration to study trade-offs of different implementation candidates. Moreover, modern embedded applications, especially the streaming applications addressed in this thesis, face more and more dynamic input contents, and the platforms that they are running on are more flexible and allow runtime configuration. Quantitative analysis techniques for embedded system design have to be able to handle such dynamic adaptable systems. This thesis has the following contributions: - A resource-aware extension to the Synchronous Dataflow (SDF) model of computation. - Trade-off analysis techniques, both in the time-domain and in the iterationdomain (i.e., on an SDF iteration basis), with support for resource sharing. - Bottleneck-driven design-space exploration techniques for resource-aware SDF. - A game-theoretic approach to controller synthesis, guaranteeing performance under dynamic input. As a first contribution, we propose a new model, as an extension of static synchronous dataflow graphs (SDF) that allows the explicit modeling of resources with consistency checking. The model is called resource-aware SDF (RASDF). The extension enables us to investigate resource sharing and to explore different scheduling options (ways to allocate the resources to the different tasks) using state-space exploration techniques. Consistent SDF and RASDF graphs have the property that an execution occurs in so-called iterations. An iteration typically corresponds to the processing of a meaningful piece of data, and it returns the graph to its initial state. On multiprocessor platforms, iterations may be executed in a pipelined fashion, which makes performance analysis challenging. As the second contribution, this thesis develops trade-off analysis techniques for RASDF, both in the time-domain and in the iteration-domain (i.e., on an SDF iteration basis), to dimension resources on platforms. The time-domain analysis allows interleaving of different iterations, but the size of the explored state space grows quickly. The iteration-based technique trades the potential of interleaving of iterations for a compact size of the iteration state space. An efficient bottleneck-driven designspace exploration technique for streaming applications, the third main contribution in this thesis, is derived from analysis of the critical cycle of the state space, to reveal bottleneck resources that are limiting the throughput. All techniques are based on state-based exploration. They enable system designers to tailor their platform to the required applications, based on their own specific performance requirements. Pruning techniques for efficient exploration of the state space have been developed. Pareto dominance in terms of performance and resource usage is used for exact pruning, and approximation techniques are used for heuristic pruning. Finally, the thesis investigates dynamic scheduling techniques to respond to dynamic changes in input streams. The fourth contribution in this thesis is a game-theoretic approach to tackle controller synthesis to select the appropriate schedules in response to dynamic inputs from the environment. The approach transforms the explored iteration state space of a scenario- and resource-aware SDF (SARA SDF) graph to a bipartite game graph, and maps the controller synthesis problem to the problem of finding a winning positional strategy in a classical mean payoff game. A winning strategy of the game can be used to synthesize the controller of schedules for the system that is guaranteed to satisfy the throughput requirement given by the designer
    • …
    corecore