498 research outputs found
Ultrafast Temperature Profile Calculation in Ic Chips
One of the crucial steps in the design of an integrated circuit is the
minimization of heating and temperature non-uniformity. Current temperature
calculation methods, such as finite element analysis and resistor networks have
considerable computation times, making them incompatible for use in routing and
placement optimization algorithms. In an effort to reduce the computation time,
we have developed a new method, deemed power blurring, for calculating
temperature distributions using a matrix convolution technique in analogy with
image blurring. For steady state analysis, power blurring was able to predict
hot spot temperatures within 1 degree C with computation times 3 orders of
magnitude faster than FEA. For transient analysis the computation times where
enhanced by a factor of 1000 for a single pulse and around 100 for multiple
frequency application, while predicting hot spot temperature within about 1
degree C. The main strength of the power blurring technique is that it exploits
the dominant heat spreading in the silicon substrate and it uses superposition
principle. With one or two finite element simulations, the temperature point
spread function for a sophisticated package can be calculated. Additional
simulations could be used to improve the accuracy of the point spread function
in different locations on the chip. In this calculation, we considered the
dominant heat transfer path through the back of the IC chip and the heat sink.
Heat transfer from the top of the chip through metallization layers and the
board is usually a small fraction of the total heat dissipation and it is
neglected in this analysis.Comment: Submitted on behalf of TIMA Editions
(http://irevues.inist.fr/tima-editions
NASA Space Engineering Research Center for VLSI System Design
This annual report outlines the activities of the past year at the NASA SERC on VLSI Design. Highlights for this year include the following: a significant breakthrough was achieved in utilizing commercial IC foundries for producing flight electronics; the first two flight qualified chips were designed, fabricated, and tested and are now being delivered into NASA flight systems; and a new technology transfer mechanism has been established to transfer VLSI advances into NASA and commercial systems
Analogue neuromorphic systems.
This thesis addresses a new area of science and technology, that of neuromorphic
systems, namely the problems and prospects of analogue neuromorphic systems. The
subject is subdivided into three chapters.
Chapter 1 is an introduction. It formulates the oncoming problem of the creation
of highly computationally costly systems of nonlinear information processing (such as
artificial neural networks and artificial intelligence systems). It shows that an analogue
technology could make a vital contribution to the creation such systems. The basic principles
of creation of analogue neuromorphic systems are formulated. The importance
will be emphasised of the principle of orthogonality for future highly efficient complex
information processing systems.
Chapter 2 reviews the basics of neural and neuromorphic systems and informs on
the present situation in this field of research, including both experimental and theoretical
knowledge gained up-to-date. The chapter provides the necessary background for
correct interpretation of the results reported in Chapter 3 and for a realistic decision on
the direction for future work.
Chapter 3 describes my own experimental and computational results within the
framework of the subject, obtained at De Montfort University. These include: the
building of (i) Analogue Polynomial Approximator/lnterpolatoriExtrapolator, (ii) Synthesiser
of orthogonal functions, (iii) analogue real-time video filter (performing the
homomorphic filtration), (iv) Adaptive polynomial compensator of geometrical distortions
of CRT- monitors, (v) analogue parallel-learning neural network (backpropagation
algorithm).
Thus, this thesis makes a dual contribution to the chosen field: it summarises the
present knowledge on the possibility of utilising analogue technology in up-to-date and
future computational systems, and it reports new results within the framework of the
subject. The main conclusion is that due to its promising power characteristics, small
sizes and high tolerance to degradation, the analogue neuromorphic systems will playa
more and more important role in future computational systems (in particular in systems
of artificial intelligence)
Method of Images for the Fast Calculation of Temperature Distributions in Packaged VLSI Chips
Thermal aware routing and placement algorithms are important in industry.
Currently, there are reasonably fast Green's function based algorithms that
calculate the temperature distribution in a chip made from a stack of different
materials. However, the layers are all assumed to have the same size, thus
neglecting the important fact that the thermal mounts which are placed
underneath the chip can be significantly larger than the chip itself. In an
earlier publication, we showed that the image blurring technique can be used to
calculate quickly temperature distribution in realistic packages. For this
method to be effective, temperature distribution for several point heat sources
at the center and at the corner and edges of the chip should be calculated
using finite element analysis (FEA) or measured. In addition, more accurate
results require correction by a weighting function that will need several FEA
simulations. In this paper, we introduce the method of images that take the
symmetry of the thermal boundary conditions into account. Thus with only "two"
finite element simulations, the steady-state temperature distribution for an
arbitrary complex power dissipation profile in a packaged chip can be
calculated. Several simulation results are presented. It is shown that the
power blurring technique together with the method of images can reproduce the
temperature profile with an error less than 0.5%.Comment: Submitted on behalf of TIMA Editions
(http://irevues.inist.fr/tima-editions
NASA Space Engineering Research Center for VLSI systems design
This annual review reports the center's activities and findings on very large scale integration (VLSI) systems design for 1990, including project status, financial support, publications, the NASA Space Engineering Research Center (SERC) Symposium on VLSI Design, research results, and outreach programs. Processor chips completed or under development are listed. Research results summarized include a design technique to harden complementary metal oxide semiconductors (CMOS) memory circuits against single event upset (SEU); improved circuit design procedures; and advances in computer aided design (CAD), communications, computer architectures, and reliability design. Also described is a high school teacher program that exposes teachers to the fundamentals of digital logic design
Energy efficient hardware acceleration of multimedia processing tools
The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores.
To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature.
The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings
A PC/AT-based ICT image archiving system.
by Ringo Wai-kit Lam.Thesis (M.Phil.)--Chinese University of Hong Kong, 1991.Includes bibliographical references.ACKNOWLEDGEMENTSABSTRACTLIST OF FIGURES --- p.iLIST OF TABLES --- p.iiiChapter CHAPTER 1 --- INTRODUCTION --- p.1-1Chapter 1.1 --- Introduction --- p.1-1Chapter 1.2 --- Transform Coding Theory --- p.1-2Chapter 1.2.1 --- Image Transform Coder and Decoder --- p.1-2Chapter 1.2.2 --- Transformation --- p.1-4Chapter 1.2.3 --- Bit Allocation --- p.1-5Chapter 1.2.4 --- Quantization --- p.1-7Chapter 1.2.5 --- Entropy Coding --- p.1-8Chapter 1.2.6 --- Error of Transform Coding --- p.1-9Chapter 1.3 --- Organization of The Thesis --- p.1-10Chapter CHAPTER 2 --- 2D INTEGER COSINE TRANSFORM CHIP SET --- p.2-1Chapter 2.1 --- Introduction --- p.2-1Chapter 2.2 --- The Integer Cosine Transform (ICT) --- p.2-2Chapter 2.3 --- LSI Implementation --- p.2-4Chapter 2.3.1 --- ICT Chip --- p.2-4Chapter 2.3.2 --- Data Sequencer --- p.2-7Chapter 2.4 --- Design Considerations --- p.2-8Chapter 2.4.1 --- ICT chip --- p.2-9Chapter 2.4.1.1 --- Specifications --- p.2-9Chapter 2.4.1.2 --- I/O Bit Length Consideration --- p.2-10Chapter 2.4.1.3 --- Selection of The Transform Matrix --- p.2-12Chapter 2.4.2 --- Data Sequencer --- p.2-16Chapter 2.4.2.1 --- Normal Operation --- p.2-16Chapter 2.4.2.2 --- Low-pass Filtering Operation --- p.2-16Chapter 2.4.2.3 --- Subsampling Operation --- p.2-17Chapter 2.5 --- Architecture --- p.2-18Chapter 2.5.1 --- ICT chip --- p.2-18Chapter 2.5.1.1 --- Input Stage --- p.2-18Chapter 2.5.1.2 --- Control Block --- p.2-19Chapter 2.5.1.3 --- Multiplier --- p.2-19Chapter 2.5.1.4 --- Accumulator --- p.2-20Chapter 2.5.1.5 --- Output Stage --- p.2-21Chapter 2.5.2 --- Data Sequencer --- p.2-21Chapter 2.5.2.1 --- Input Stage --- p.2-22Chapter 2.5.2.2 --- Control Logic --- p.2-22Chapter 2.5.2.3 --- Internal Storage --- p.2-23Chapter 2.5.2.4 --- Output Stage --- p.2-24Chapter 2.6 --- 2D Integer Cosine Transform System --- p.2-24Chapter 2.6.1 --- Hardware Architecture --- p.2-24Chapter 2.6.2 --- Timing --- p.2-26Chapter 2.7 --- Conclusion --- p.2-27Chapter CHAPTER 3 --- A PC/AT-BASED IMAGE ARCHIVING SYSTEM --- p.3-1Chapter 3.1 --- Introduction --- p.3-1Chapter 3.2 --- Design Consideration --- p.3-1Chapter 3.2.1 --- Specifications --- p.3-2Chapter 3.2.1.1 --- Operations Supported --- p.3-2Chapter 3.2.1.2 --- Image Formats --- p.3-3Chapter 3.2.1.3 --- Software --- p.3-6Chapter 3.2.2 --- Storage Format of the Coded Image --- p.3-6Chapter 3.3 --- Hardware Architecture --- p.3-8Chapter 3.3.1 --- Input Stage --- p.3-11Chapter 3.3.2 --- Inverse Transform Address Generator --- p.3-11Chapter 3.3.3 --- Input Memory --- p.3-13Chapter 3.3.3.1 --- Address Map --- p.3-14Chapter 3.3.3.2 --- Bit Map --- p.3-14Chapter 3.3.3.3 --- Class Map --- p.3-15Chapter 3.3.4 --- ICT Processor --- p.3-15Chapter 3.3.5 --- Output Memory --- p.3-16Chapter 3.3.6 --- Address Generator --- p.3-16Chapter 3.3.6.1 --- Address Generator 1 (AG1) --- p.3-17Chapter 3.3.6.2 --- Address Generator 2 (AG2) --- p.3-21Chapter 3.3.6.3 --- Address Generator 3 (AG3) --- p.3-22Chapter 3.3.7 --- Control Register --- p.3-22Chapter 3.3.8 --- Interface Consideration --- p.3-23Chapter 3.3.9 --- Frame Buffer --- p.3-23Chapter 3.4 --- Software Structure --- p.3-23Chapter 3.4.1 --- Main Menu --- p.3-24Chapter 3.4.2 --- Forward Transform --- p.3-25Chapter 3.4.3 --- Inverse Transform --- p.3-25Chapter 3.4.3.1 --- Normal --- p.3-26Chapter 3.4.3.2 --- Subsampling --- p.3-26Chapter 3.4.3.3 --- Filtering --- p.3-26Chapter 3.4.3.4 --- Album --- p.3-27Chapter 3.4.3.5 --- Display and System --- p.3-28Chapter 3.5 --- Conclusion --- p.3-29Chapter CHAPTER 4 --- SYSTEM PERFORMANCE EVALUATION --- p.4-1Chapter 4.1 --- Introduction --- p.4-1Chapter 4.2 --- Result of Image Display --- p.4-1Chapter 4.3 --- Computation Time Requirement --- p.4-12Chapter 4.4 --- Comparison to Other Transform Chips and Image Transform Systems --- p.4-16Chapter 4.5 --- Conclusion --- p.4-20Chapter CHAPTER 5 --- CONCLUSION --- p.5-1Chapter 5.1 --- Further Development --- p.5-1Chapter 5.1.1 --- Employment of JPEG Scheme --- p.5-1Chapter 5.1.2 --- ICT Chip Set --- p.5-5Chapter 5.2 --- Summary of the Image Archiving System --- p.5-6Chapter CHAPTER 6 --- REFERENCES --- p.6-1Chapter CHAPTER 7 --- APPENDIX --- p.7-
Implementation of JPEG compression and motion estimation on FPGA hardware
A hardware implementation of JPEG allows for real-time compression in data intensivve applications, such as high speed scanning, medical imaging and satellite image transmission. Implementation options include dedicated DSP or media processors, FPGA boards, and ASICs. Factors that affect the choice of platform selection involve cost, speed, memory, size, power consumption, and case of reconfiguration. The proposed hardware solution is based on a Very high speed integrated circuit Hardware Description Language (VHDL) implememtation of the codec with prefered realization using an FPGA board due to speed, cost and flexibility factors; The VHDL language is commonly used to model hardware impletations from a top down perspective. The VHDL code may be simulated to correct mistakes and subsequently synthesized into hardware using a synthesis tool, such as the xilinx ise suite. The same VHDL code may be synthesized into a number of sifferent hardware architetcures based on constraints given. For example speed was the major constraint when synthesizing the pipeline of jpeg encoding and decoding, while chip area and power consumption were primary constraints when synthesizing the on-die memory because of large area. Thus, there is a trade off between area and speed in logic synthesis
Recommended from our members
Concurrent error detection in 2-D separable linear transform
As process technology continues to scale to smaller geometries and reduces the supply voltage, reliability of the resulting semiconductor becomes a greater concern. The effect of deep submicron noise, soft errors, variation, and aging degradation pose challenges on the functional correctness of VLSI systems and places roadblocks on reductions in scale. On the other side, as computing moves toward mobile, the energy efficiency of digital systems becomes one of the most important design metrics. However, reliability and energy efficiency are contradicting design requirements. Adding a voltage guard band is the most common method to mitigate the reliability impacts in such instances. Low power design technique like voltage over-scaling (VOS) even reduces the power by scaling the supply voltage just before data-dependant timing errors start to appear. Concurrent error detection is the solution to tackle reliability and energy-efficiency in a unified manner. Fault tolerance can be deployed at different design hierarchies. Given its low overhead, algorithm level error detection is an attractive approach. In this work, a generic weighted checksum code based error detection algorithm targeted generic 2-D separable linear transform is proposed. This technique encodes the input array at the 2-D linear trans- formation level, and algorithms are designed to operate on encoded data and produce encoded output data. The proposed error detection technique is a system-level method and therefore can be used in existing hardware or software 2-D linear transformation architectures with low overhead. The mathematic proof of the algorithm is provided within the scope of this dissertation. The checksum weighting vector for several common transforms are derived as examples, error detection cost and algorithm effectiveness are analyzed. In traditional fault tolerance study, the error is often evaluated at the boolean level. Many DSP applications, like 2-D linear transformation used in the multimedia compression system, do not require exactly correct results, but rather that the quality of the output is within the acceptable range. A generic quality aware error detection in the 2-D separable linear transform is proposed by extending the above property and defining the errors at the functional level. As an example, the quality-aware error detection technique is deployed on a low-power wavelet lifting transform architecture in JPEG2000. A low-cost Signal to Noise Ratio (SNR) aware detection logic based on proposed scheme is integrated into the discrete wavelet lifting transform architecture. This detection logic checks whether the image quality degradation caused by voltage over-scaling induced timing errors is acceptable and determines the optimal voltage set point in operating conditions at run time. This novel quality-based error detection approach is significantly different from traditional error detection schemes which look for exact data equivalence. A simulation result for one design shows that the supply voltage can be scaled down to 75% of the nominal voltage in typical process corner without significant image quality degradation, which translates to 9.15mW power consumption (44% power saving).Electrical and Computer Engineerin
- …