14 research outputs found
Heuristic Algorithms for Primitive Traversal Acceleration in Tile-Based Rasterization
Abstract — This paper addresses a series of hardware algorithms to reduce the computational overhead to locate the first rasterization tile position inside the primitive to be rasterized when the tile-based rasterization adopts the classical primitive traversal algorithm. These algorithms can be applied sequentially in a simple-to-complex order for searching a suitable starting tile rasterization position inside the primitive as follows: check if any of the vertices is in the tile, check if the triangle center of gravity (COG) is in the tile, recursive tile quadrant division based on COG attractors, and partial tile boundary scan. The algorithms were modeled in SystemC at the RT-level and integrated in a full-fledged OpenGL-compliant hardware rasterizer SystemC model. Simulation results on a benchmark suite consisting of 30 OpenGL applications have indicated that the throughput penalty is reduced to about 7 % at the expense of about 10 % increase in the hardware area when the entire OpenGL-compliant hardware rasterizer is synthesized in a commercial 0.18µm process technology. Keywords — 3D graphics architectures; tile-based rasterization; embedded systems; digital logic design I
High-Level Energy Estimation for ARM-Based SOCs
Abstract. In recent years, power consumption has become a critical concern for many VLSI systems. Whereas several case studies demonstrate that technology-, layout-, and gate-level techniques offer power savings of a factor of two or less, architecture and system-level optimization can often result in orders of magnitude lower power consumption. Therefore, the energy-efficient design of portable, battery-powered systems demands an early assessment, i.e., at the algorithmic and architectural levels, of the power consumption of the applications they target. Addressing this issue, we developed an energy-aware architectural design exploration and analysis tool for ARM based system-on-chip designs. The tool integrates the behavior and energy models of several user-defined, custom processing units as an extension to the cycle-accurate instruction-level simulator for the ARM low-power processor family, called the ARMulator. The models we implemented take into account the particular class, e.g., datapath, memory, control, or interconnect, as well as the architectural complexity of the hardware unit involved and the signal activity triggered by the specific algorithm executed on the ARM processor. Our tool can estimate at the architectural level of detail the overall energy consumption or can report the energy breakdown among different units. Preliminary experiments indicated that the estimation accuracy is within 25 % of what can be accomplished after a circuit-level simulation on the laid-out chip.
An Energy-Aware Architectural Exploration Tool for ARM-Based SOCs
Abstract — In recent years, power consumption has become a critical concern for many VLSI systems. Whereas several case studies demonstrate that technology-, layout-, and gate-level techniques offer power savings of a factor of two or less, architecture and system-level optimization can often result in orders of magnitude lower power consumption. Therefore, the energy-efficient design of portable, battery-powered systems demands an early assessment, i.e., at the algorithmic and architectural levels, of the power consumption of the applications they target. Addressing this issue, we developed an energy-aware architectural design exploration and analysis tool for ARM based system-on-chip designs. The tool integrates the behavior and energy models of several user-defined, custom processing units as an extension to the cycle-accurate instructionlevel simulator for the ARM low-power processor family, called the ARMulator. The models we implemented take into account the particular class, e.g., datapath, memory, control, or interconnect, as well as the architectural complexity of the hardware unit involved and the signal activity triggered by the specific algorithm executed on the ARM processor. Our tool can estimate at the architectural level of detail the overall energy consumption or can report the energy breakdown among different units. Preliminary experiments indicated that the estimation accuracy is within 25 % of what can be accomplished after a circuit-level simulation on the laid-out chip. Keywords — ARM CPU core; system-on-chip; ARMulator; energy-aware architectural exploration; batterypowered system. I
LOW COST AND LATENCY EMBEDDED 3D GRAPHICS RECIPROCATION
The paper presents low cost and latency reciprocation for fixed-point datapath of embedded 3D graphics accelerators. The algorithm exploits the limitations of the human visual system that allows a reasonable amount of error to be introduced in the computation process without inducing noticeable image artifacts. In the example given in the paper, excerpted from the antialiasing datapath of an embedded QVGA graphics hardware accelerator, for a 14-bit operand, the reciprocal implementation requires an inexpensive operand prescaler, one 1k lookup table with 10-bit entries, and a 5-bit adder, for a maximum relative error of the result of only 1.5 % over the entire range of the operand. Hardware synthesis in a typical 0.18µm process technology has indicated that the hardware implementation requires only 1600 standard cells to achieve a latency of 2.5ns. 1
High-Level Energy Estimation for ARM-Based SOCs
In recent years, power consumption has become a critical concern for many VLSI systems. Whereas several case studies demonstrate that technology-, layout-, and gatelevel techniques offer power savings of a factor of two or less, architecture and system-level optimization can often result in orders of magnitude lower power consumption. Therefore, the energy-efficient design of portable, battery-powered systems demands an early assessment, i.e., at the algorithmic and architectural levels, of the power consumption of the applications they target. Addressing this issue, we developed an energyaware architectural design exploration and analysis tool for ARM based system-on-chip designs. The tool integrates the behavior and energy models of several user-defined, custom processing units as an extension to the cycle-accurate instruction-level simulator for the ARM low-power processor family, called the ARMulator. The models we implemented take into account the particular class, e.g., datapath, memory, control, or interconnect, as well as the architectural complexity of the hardware unit involved and the signal activity triggered by the specific algorithm executed on the ARM processor. Our tool can estimate at the architectural level of detail the overall energy consumption or can report the energy breakdown among different units. Preliminary experiments indicated that the estimation accuracy is within 25% of what can be accomplished after a circuit-level simulation on the laid-out chip
Efficient Hardware for Tile-Based Rasterization
Abstract — An efficient logic-enhanced memory architecture is presented that solves existing problems associated with 3D graphics tile-based hardware rasterization algorithms. The memory contains the same number of bits as the number of pixels in the tile, and during rasterization time it is filled up in several clock cycles by a systolic primitive scanconversion subsystem with the stencil of the primitive: ones are written for memory locations that represent tile pixels covered by the primitive, otherwise zeros are stored. Once the shape of the primitive has been coded inside the memory, the memory internal logic is capable of delivering, on request, up to four hit positions (tile positions inside the primitive) per clock cycle to the pixel processing pipelines, signaling when all the hit positions were consumed. Employing our proposed memory architecture no searching overhead is needed to find the first hit position inside the primitives. Furthermore “ghost ” primitives are handled efficiently with a small constant delay irrespective of the primitive size. Finally, hit positions (communicated in a spatial pattern to increase texture cache hit ratios) can always be mapped to different memory banks in the Z-buffer or colorbuffer breaking the “read-modify-write ” dependency associated with depth test and color blending. Hardware implementation in a commercial 0.18µm process technology for a QVGA 3D graphics hardware accelerator with a tile size of 32 × 16 pixels has indicated that the memory can be clocked at 200MHz and consumes an area of 120000µm 2