35 research outputs found
Introduction to computer image processing
Theoretical backgrounds and digital techniques for a class of image processing problems are presented. Image formation in the context of linear system theory, image evaluation, noise characteristics, mathematical operations on image and their implementation are discussed. Various techniques for image restoration and image enhancement are presented. Methods for object extraction and the problem of pictorial pattern recognition and classification are discussed
Systems analysis of the space shuttle
Developments in communications systems, computer systems, and power distribution systems for the space shuttle are described. The use of high speed delta modulation for bit rate compression in the transmission of television signals is discussed. Simultaneous Multiprocessor Organization, an approach to computer organization, is presented. Methods of computer simulation and automatic malfunction detection for the shuttle power distribution system are also described
Future Computer Requirements for Computational Aerodynamics
Recent advances in computational aerodynamics are discussed as well as motivations for and potential benefits of a National Aerodynamic Simulation Facility having the capability to solve fluid dynamic equations at speeds two to three orders of magnitude faster than presently possible with general computers. Two contracted efforts to define processor architectures for such a facility are summarized
Analysis and design of parallel algorithms
The present state of electronic technology is such that factors
affecting computation speed have almost been minimised; switching for
instance is almost instantaneous. Electronic components are so good,
in fact, that the time taken for a logic signal to travel between two
points is now a significant factor of instruction times.
Clearly, with the actual physical size of components being very
small and the high circuit density, there is little scope for improving
computation speech significantly by such means as even denser circuitry
or still faster electronic components. Thus, development of faster
computers will require a new approach that depends on the imaginative
use of existing knowledge.
One such approach is to increase computation speed through
parallelism. Obviously, a parallel computer with p identical processors
is potentially p times as fast as a single computer, although this
limit can rarely be achieved
Research in computerized structural analysis and synthesis
Computer applications in dynamic structural analysis and structural design modeling are discussed
Ames Research Center publications: A continuing bibliography, 1978
This bibliography lists formal NASA publications, journal articles, books, chapters of books, patents and contractor reports issued by Ames Research Center which were indexed by Scientific and Technical Aerospace Abstracts, Limited Scientific and Technical Aerospace Abstracts, and International Aerospace Abstracts in 1978. Citations are arranged by directorate, type of publication and NASA accession numbers. Subject, personal author, corporate source, contract number, and report/accession number indexes are provided
๋ก์ง ๋ฐ ํผ์ง์ปฌ ํฉ์ฑ์์์ ํ์ด๋ฐ ๋ถ์๊ณผ ์ต์ ํ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2020. 8. ๊นํํ.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of processโvoltageโtemperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.ํ์ด๋ฐ ๋ถ์์ ๋ฐ๋์ฒด ํ๋ก ๊ฐ๋ฐ ํ์ ๊ณผ์ ์ค ํ๋๋ก, ์ต์ ๊ณต์ ์ผ์๋ก ๊ณต์ -์ ์-์จ๋ ๋ณ์ด ์ฆ๊ฐ๋ฅผ ํฌํจํ ๋ค์ํ ์์ธ์ผ๋ก ํ์ฌ๊ธ ๊ทธ ์ค์์ฑ์ด ์ปค์ง๊ณ ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ๋ก์ง ๋ฐ ํผ์ง์ปฌ ํฉ์ฑ๊ณผ ๊ด๋ จํ์ฌ ์ธ ๊ฐ์ง ํ์ด๋ฐ ๋ถ์ ๋ฐ ์ต์ ํ ๋ฌธ์ ์ ๋ํด ๋ค๋ฃฌ๋ค. ์ฒซ์งธ๋ก, ์ค๋๋ ๋๋ถ๋ถ์ ์ ์ ํ์ด๋ฐ ๋ถ์์ ๋ชจ๋ ํ๋ฆฝ-ํ๋กญ์ ํด๋ญ-์ถ๋ ฅ ๋๋ ์ด๊ฐ ๊ณ ์ ๋ ๊ฐ์ด๋ผ๋ ๊ฐ์ ์ ๋ฐํ์ผ๋ก ์ด๋ฃจ์ด์ก๋ค. ํ์ง๋ง ์ค์ ํด๋ญ-์ถ๋ ฅ ๋๋ ์ด๋ ํด๋น ํ๋ฆฝ-ํ๋กญ์ ์
์
๋ฐ ํ๋ ์คํ์ ์ํฅ์ ๋ฐ๋๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ์ด๋ฌํ ํน์ฑ์ ์ํ์ ์ผ๋ก ์ ๋ฆฌํ์์ผ๋ฉฐ, ์ด๋ฅผ ํ์ฅ ๊ฐ๋ฅํ ์๋ ํฅ์ ๊ธฐ๋ฒ๊ณผ ๋๋ถ์ด ์ฃผ์ด์ง ํ๋ก์ ํ์ด๋ฐ ๋ถ์ ๋ฐ ํด๋ญ ์คํ ์ค์ผ์ฅด๋ง ๋ฌธ์ ์ ์ ์ฉํ์๋ค. ๋์งธ๋ก, ์ ์ฌ ๋ฌธํฑ ์ฐ์ฐ์ ์ด๊ณ ์ง์ ํ๋ก ๋์์ ์๋์ง ํจ์จ์ ๋์ด ์ฌ๋ฆด ์ ์๋ค๋ ์ ์์ ๊ฐ๊ด๋ฐ์ง๋ง, ํฐ ํญ์ ์ฑ๋ฅ ๋ณ์ด ๋ฐ ๋น์ ํ์ฑ ๋๋ฌธ์ ๋๋ฆฌ ํ์ฉ๋๊ณ ์์ง ์๋ค. ์ด๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์ ์ฌ ๋ฌธํฑ ์ ์ ์์ญ ๋ฐ ์ต์ ๊ณต์ ๋
ธ๋์์ ๋ณด๋ค ์ ํํ ํ์ด๋ฐ ์์ธก์ ์ํ ํ๋์จ์ด ์ฑ๋ฅ ๋ชจ๋ํฐ๋ง ๋ฐฉ๋ฒ๋ก ์ ๋ฐ์ ์ ์ํ์๋ค. ๋ง์ง๋ง์ผ๋ก, ๋น๋๊ธฐ ํ๋ก๋ ๊ธฐ์กด ๋๊ธฐ ํ๋ก์ ๋์ ์ค ํ๋๋ก, ๊ทธ ์ค์์๋ ๋น๋๊ธฐ ํ์ดํ๋ผ์ธ ํ๋ก๋ ๋น๊ต์ ์ ์ ์ค๊ณ ๋
ธ๋ ฅ๋ง์ผ๋ก๋ ๊ตฌํ ๊ฐ๋ฅํ๋ค๋ ์ฅ์ ์ด ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ 2์์ ๋ฌถ์ ๋ฐ์ดํฐ ํ๋กํ ์ฝ ๊ธฐ๋ฐ ๋น๋๊ธฐ ํ์ดํ๋ผ์ธ ์ปจํธ๋กค๋ฌ ์์์, ์ ํํ ํธ๋์
ฐ์ดํน ํต์ ์ ์ํด ์ฝ์
๋ ๋๋ ์ด ๋ฒํผ์ ์ํ ๋ฉด์ ์ฆ๊ฐ๋ฅผ ์ํํ ์ ์๋ ํฉ์ฑ ๊ธฐ๋ฒ์ ์ ์ํ์๋ค.1 INTRODUCTION 1
1.1 Flexible Flip-Flop Timing Model 1
1.2 Hardware Performance Monitoring Methodology 4
1.3 Asynchronous Pipeline Controller 10
1.4 Contributions of this Dissertation 15
2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17
2.1 Preliminaries 17
2.1.1 Terminologies 17
2.1.2 Timing Analysis 20
2.1.3 Clock-to-Q Delay Surface Modeling 21
2.2 Clock-to-Q Delay Interval Analysis 22
2.2.1 Derivation 23
2.2.2 Additional Constraints 26
2.2.3 Analysis: Finding Minimum Clock Period 28
2.2.4 Optimization: Clock Skew Scheduling 30
2.2.5 Scalable Speedup Technique 33
2.3 Experimental Results 37
2.3.1 Application to Minimum Clock Period Finding 37
2.3.2 Application to Clock Skew Scheduling 39
2.3.3 Efficacy of Scalable Speedup Technique 43
2.4 Summary 44
3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45
3.1 Overall Flow of Proposed HPM Methodology 45
3.2 Prerequisites to HPM Methodology 47
3.2.1 BEOL Process Variation Modeling 47
3.2.2 Surrogate Model Preparation 49
3.3 HPM Methodology: Design Phase 52
3.3.1 HPM2PV Model Construction 52
3.3.2 Optimization of Monitoring Circuits Configuration 54
3.3.3 PV2CPT Model Construction 58
3.4 HPM Methodology: Post-Silicon Phase 60
3.4.1 Transfer Learning in Silicon Characterization Step 60
3.4.2 Procedures in Volume Production Phase 61
3.5 Experimental Results 62
3.5.1 Experimental Setup 62
3.5.2 Exploration of Monitoring Circuits Configuration 64
3.5.3 Effectiveness of Monitoring Circuits Optimization 66
3.5.4 Considering BEOL PVs and Uncertainty Learning 68
3.5.5 Comparison among Different Prediction Flows 69
3.5.6 Effectiveness of Prediction Model Calibration 71
3.6 Summary 73
4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75
4.1 Preliminaries and State-of-the-Art Work 75
4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75
4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76
4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77
4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78
4.2.1 Synthesizing Sharable Delay Paths 78
4.2.2 Validating Logical Correctness for Sharable Delay Paths 80
4.2.3 Reformulating Timing Constraints of Controller Template 81
4.2.4 Minimally Allocating Delay Buffers 87
4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88
4.3.1 Synthesizing Delay Path Units 88
4.3.2 Validating Logical Correctness of Delay Path Units 89
4.3.3 Updating Timing Constraints for Delay Path Units 91
4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95
4.4 Experimental Results 99
4.4.1 Environment Setup 99
4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99
4.4.3 Comparison of Power, Performance, and Area 102
4.5 Summary 107
5 CONCLUSION 109
5.1 Chapter 2 109
5.2 Chapter 3 110
5.3 Chapter 4 110
Abstract (In Korean) 127Docto
Parallel processors and nonlinear structural dynamics algorithms and software
Techniques are discussed for the implementation and improvement of vectorization and concurrency in nonlinear explicit structural finite element codes. In explicit integration methods, the computation of the element internal force vector consumes the bulk of the computer time. The program can be efficiently vectorized by subdividing the elements into blocks and executing all computations in vector mode. The structuring of elements into blocks also provides a convenient way to implement concurrency by creating tasks which can be assigned to available processors for evaluation. The techniques were implemented in a 3-D nonlinear program with one-point quadrature shell elements. Concurrency and vectorization were first implemented in a single time step version of the program. Techniques were developed to minimize processor idle time and to select the optimal vector length. A comparison of run times between the program executed in scalar, serial mode and the fully vectorized code executed concurrently using eight processors shows speed-ups of over 25. Conjugate gradient methods for solving nonlinear algebraic equations are also readily adapted to a parallel environment. A new technique for improving convergence properties of conjugate gradients in nonlinear problems is developed in conjunction with other techniques such as diagonal scaling. A significant reduction in the number of iterations required for convergence is shown for a statically loaded rigid bar suspended by three equally spaced springs
Some aspects of the efficient use of multiprocessor control systems
Computer technology, particularly at the circuit level, is fast
approaching its physical limitations. As future needs for greater
power from computing systems grows, increases in circuit switching
speed (and thus instruction speed) will be unable to match these
requirements.
Greater power can also be obtained by incorporating several processing
units into a single system. This ability to increase the performance
of a system by the addition of processing units is one of the major
advantages of multiprocessor systems. Four major characteristics of
multiprocessor systems have been identified (28) which demonstrate
their advantage. These are:-
Throughput
Flexibility
Availability
Reliability
The additional throughput obtained from a multiprocessor has been
mentioned above.. This increase in the power of the system can be
obtained in a modular fashion with extra processors being added as
greater processing needs arise. The addition of extra processors
also has (in general) the desirable advantage of giving a smoother
cost - performance curve ( 63). Flexibility is obtained from the
increased ability to construct a system matching the user 'requirements
at a given time without placing restrictions upon future expansion.
With multiprocessor systems; the potential also exists of making
greater use of the resources within the system.
Availability and reliability are inter-related. Increased availability
is achieved, in a well designed system, by ensuring that processing
capabilities can be provided to the user even if one (or more) of the
processing units has failed. The service provided, however, will
probably be degraded due to the reduction in processing capacity.
Increased reliability is obtained by the ability of the processing
units to compensate for the failure of one of their number. This
recovery may involve complex software checks and a consequent decrease
in available power even when all the units are functioning