The University of Maine

DigitalCommons@UMaine
Electronic Theses and Dissertations

Fogler Library

5-2006

High-Speed Digital and Mixed-Signal Components for X– and
KU–Band Direct Digital Synthesizers in Indium Phosphide DHBT
Technology
Steven Eugene Turner

Follow this and additional works at: https://digitalcommons.library.umaine.edu/etd
Part of the Computer and Systems Architecture Commons, and the Electronic Devices and
Semiconductor Manufacturing Commons

Recommended Citation
Turner, Steven Eugene, "High-Speed Digital and Mixed-Signal Components for X– and KU–Band Direct
Digital Synthesizers in Indium Phosphide DHBT Technology" (2006). Electronic Theses and Dissertations.
965.
https://digitalcommons.library.umaine.edu/etd/965

This Open-Access Dissertation is brought to you for free and open access by DigitalCommons@UMaine. It has
been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of
DigitalCommons@UMaine. For more information, please contact um.library.technical.services@maine.edu.

The University of Maine

DigitalCommons@UMaine
Electronic Theses and Dissertations

Fogler Library

5-2006

High-Speed Digital and Mixed-Signal Components for X– and
KU–Band Direct Digital Synthesizers in Indium Phosphide DHBT
Technology
Steven Eugene Turner

Follow this and additional works at: https://digitalcommons.library.umaine.edu/etd
Part of the Computer and Systems Architecture Commons, and the Electronic Devices and
Semiconductor Manufacturing Commons

Recommended Citation
Turner, Steven Eugene, "High-Speed Digital and Mixed-Signal Components for X– and KU–Band Direct
Digital Synthesizers in Indium Phosphide DHBT Technology" (2006). Electronic Theses and Dissertations.
965.
https://digitalcommons.library.umaine.edu/etd/965

This Open-Access Dissertation is brought to you for free and open access by DigitalCommons@UMaine. It has
been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of
DigitalCommons@UMaine. For more information, please contact um.library.technical.services@maine.edu.

HIGH–SPEED DIGITAL AND MIXED–SIGNAL COMPONENTS
FOR X– AND KU –BAND DIRECT DIGITAL SYNTHESIZERS IN
INDIUM PHOSPHIDE DHBT TECHNOLOGY
By
Steven Eugene Turner
B.S. University of Maine, 2001
M.S. University of Maine, 2003
A THESIS
Submitted in Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
(in Electrical Engineering)
The Graduate School
The University of Maine
May, 2006

Advisory Committee:
David E. Kotecki, Associate Professor of Electrical and Computer Engineering,
Advisor
Donald M. Hummels, Castle Professor of Electrical and Computer Engineering
Richard O. Eason, Associate Professor of Electrical and Computer Engineering
Bruce E. Segee, Associate Professor Electrical and Computer Engineering
Ali Abedi, Assistant Professor Electrical and Computer Engineering

LIBRARY RIGHTS STATEMENT
In presenting this thesis in partial fulfillment of the requirements for an advanced
degree at The University of Maine, I agree that the Library shall make it freely available
for inspection. I further agree that permission for “fair use” copying of this thesis for
scholarly purposes may be granted by the Librarian. It is understood that any copying
or publication of this thesis for financial gain shall not be allowed without my written
permission.

Signature:

Date:

HIGH–SPEED DIGITAL AND MIXED–SIGNAL COMPONENTS
FOR X– AND KU –BAND DIRECT DIGITAL SYNTHESIZERS IN
INDIUM PHOSPHIDE DHBT TECHNOLOGY

By Steven Eugene Turner
Thesis Advisor: Dr. David E. Kotecki
An Abstract of the Thesis Presented
in Partial Fulfillment of the Requirements for the
Degree of Doctor of Philosophy
(in Electrical Engineering)
May, 2006
Recently reported double heterojunction bipolar transistor (DHBT) devices manufactured in Indium Phosphide (InP) technology with ft and fmax both over 300 GHz enable
advanced high-speed digital and mixed-signal circuits. In this thesis, the use of InP
DHBT devices for high-speed accumulator circuits and X– and Ku –band direct digital
synthesizer (DDS) circuits are investigated. At these frequencies, new technological
challenges in the design of digital and mixed-signal circuits arise in areas including
power consumption and clock distribution. This thesis addresses the speed/power tradeoffs in high-speed accumulator designs, the design of DDS circuits, and clock distribution simulation. The results of six accumulator circuits and two DDS circuits are
reported as part of this thesis. The fastest 4-bit accumulator at a 41 GHz clock rate is reported, as well as the fastest DDS circuits operating at 13 GHz and 32 GHz clock rates.
The 13 GHz DDS has a worst case spurious-free dynamic range (SFDR) of 26.67 dBc
and consumes 5.42 W of power, while the 32 GHz DDS has a worst case SFDR of
21.56 dBc and consumes 9.45 W of power. In addition to the circuit designs, a methodology for simulating electrically long clock interconnects and a new figure of merit for
comparing DDS designs are developed.

ACKNOWLEDGMENTS
This work was supported by the U. S. Army Research Laboratory and by the
Defense Advanced Research Projects Agency (DARPA) under Contract DAAD17-02C-0115.
For supporting this work, the author would like to thank Dr. John Zolper and
Dr. Steve Pappert at DARPA, Dr. Alfred Hung at Army Research Lab, and Mr. Frank
Stroili at BAE Systems. For technical advice and insight, the author would like to thank
Mr. Richard B. Elder, Jr., Mr. Douglas Jansen, and Mr. Jeffrey Feng at BAE Systems.
For taking part in the thesis committee, the author would like to thank Dr. Ali Abedi,
Dr. Richard O. Eason, Dr. Donald M. Hummels, and Dr. Bruce E. Segee. Finally, the
author wishes to thank and extend his gratitude to Dr. David E. Kotecki for advising this
thesis and supporting this work.

ii

TABLE OF CONTENTS
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii

LIST OF TABLES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Chapter
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Major Design Challenges for High-Speed Digital and MixedSignal HBT Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1. Power Consumption Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2. Clock Distribution Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Power/Speed Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1. Timing Path Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2. Architecture Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3. Supply Voltage Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3. Motivation for Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4. Review of Previously Reported Work on Digital and MixedSignal HBT Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5. Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2. HBT Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1. Vitesse VIP-2 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. Circuit Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3. Clock Distribution Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4. Emitter Coupled Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1. Voltage Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2. Current Mode Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5. Voltage Swing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6. Bias Current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7. Current Source Output Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16
16
19
20
22
25
27
28
30
32
34

3. Design of Adders and Accumulators in InP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1. Review of Adder and Accumulator Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1. Full Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2. Carry Ripple Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.3. Carry Lookahead Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35
36
36
37
38

iii

3.2.

3.3.

3.4.

3.5.

3.1.4. Pipelined Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.5. Accumulators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
High-Speed 4-bit Accumulators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1. Test Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2. Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3. Accumulator ACCV1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3.1. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3.2. Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3.2.1. TC6 Measurement Results . . . . . . . . . . . . . . . . . . . .
3.2.3.2.2. TC7 Measurement Results . . . . . . . . . . . . . . . . . . . .
3.2.4. Accumulator ACCV2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4.1. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4.2. Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.5. Accumulator ACCV3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.5.1. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.5.2. Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.6. Accumulator ACCV4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.6.1. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.6.2. Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.7. Summary of High-Speed 4-bit Accumulators . . . . . . . . . . . . . . . . . . . . . .
Low Power 8-bit Accumulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1. Accumulator ACCV5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1.1. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1.2. Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2. Accumulator ACCV6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2.1. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2.2. Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3. Summary of Low Power 8-bit Accumulators . . . . . . . . . . . . . . . . . . . . . . .
Reduced Power Accumulator Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1. Accumulator with Triple-Tail Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2. Accumulator with Resistor-Only Current Sources . . . . . . . . . . . . . . . . .
3.4.3. Summary of Reduced Power Accumulator Experiments . . . . . . . . . .
Summary of Accumulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40
41
42
43
45
45
47
49
49
54
55
58
61
65
68
68
70
71
73
74
76
76
78
81
81
82
86
88
88
90
94
96
97

4. Direct Digital Synthesizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.1. DDS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2. Performance Metrics and Design Tradeoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.3. Recent DDS Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.4. Direct Digital Synthesizer DDSV1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.4.1. Sine-Weighted Digital to Analog Converter . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4.2. Phase Truncation Spurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4.3. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4.4. Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5. Direct Digital Synthesizer DDSV2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.5.1. Coding Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
iv

4.5.1.1. Coding with Ideal DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.5.1.2. Coding with Realistic DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.5.1.3. Simplified Coding with Realistic DAC . . . . . . . . . . . . . . . . . . 128
4.5.2. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.5.3. Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.6. Summary of Direct Digital Synthesizers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.1. Summary of Accomplishments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.2. Recommendations for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
BIOGRAPHY OF THE AUTHOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

v

LIST OF TABLES

Table 1.1.

Goals of the TFAST program.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

Table 1.2.

Reported static frequency divider circuits implemented in
III-V processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Table 1.3.

Reported static frequency divider circuits implemented in
Si-bipolar and SiGe processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Table 1.4.

Recent commercial and reported direct digital synthesizers. . . . . . . . . . . . 14

Table 3.1.

Truth table for the full adder building block. A and B are
the two input bits, Cin is the carry input, S is the sum, and
Cout is the carry output.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Table 3.2.

Simulated power consumption breakdown for the ACCV1
accumulator test circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Table 3.3.

Yield for the four-level series-gated carry divide by two test
circuit in fabrication run TC6.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Table 3.4.

Yield for the ACCV1 accumulator test circuit in fabrication
run TC6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Table 3.5.

Yield for the four-level series-gated carry divide by two and
ACCV1 accumulator test circuits in fabrication run TC7. A
device is considered to pass if it operates correctly above a
24 GHz clock rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Table 3.6.

Simulated power consumption breakdown for the ACCV2
accumulator test circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Table 3.7.

Yield for the four-level series-gated carry divide by two test
circuit in fabrication run TC6.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Table 3.8.

Yield for the ACCV2 accumulator test circuit in fabrication
run TC6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Table 3.9.

Yield for the ACCV3 accumulator test circuit in fabrication run TC7. A device is considered to pass if it operates
correctly above a 24 GHz clock rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Table 3.10. Yield for the ACCV4 accumulator test circuit in fabrication run TC7. A device is considered to pass if it operates
correctly above a 24 GHz clock rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

vi

Table 3.11. Comparison of divide by two carry test circuits. The fourlevel series-gated and single-level parallel-gated designs in
this work exceed the performance of the previous work by
more than 2.7 times the clock frequency. Both of the divide
by two carry test circuits results are from fabrication run TC6. . . . . . . . . 74
Table 3.12. Comparison of 4-bit accumulator circuits. The differences
in circuit performance are partially due to design differences and partially due to process variations. Results from
test structures indicate that the TC7 fabrication run was
slower than the TC6 fabrication run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Table 3.13. Comparison of 8-bit accumulator circuits. The ACCV6
design uses less pipelining than the ACCV5 design, so it
operates at a lower clock frequency, but it also has lower
power consumption. Overall, speed and power scale about
the same when comparing these two 8-bit designs. . . . . . . . . . . . . . . . . . . . . . 89
Table 3.14. Comparison of speed and power between triple-tail and ACCV5
accumulator components. In the comparison, the components are configured as divide by two circuits and the simulations are of the schematics without extracted parasitics. . . . . . . . . . . . . 94
Table 3.15. Speed and power comparison for 4-bit accumulator designs. . . . . . . . . . . 99
Table 3.16. Speed and power comparison for 8-bit accumulator designs. . . . . . . . . . . 99
Table 3.17. Speed and power comparison for accumulators extended to
16-bit bit-widths.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Table 4.1.

Simulated power breakdown for DDSV1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Table 4.2.

Logic necessary to implement the DDSV2 phase converter
with a realistic DAC.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Table 4.3.

Simplified logic used in the DDSV2 phase converter to drive
the DAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Table 4.4.

Comparison of recent InP DDS designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Table 4.5.

Recent commercial and reported direct digital synthesizers
compared using F OMDDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Table 5.1.

Summary of fabrication runs, dates, and designs in each
fabrication run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

vii

LIST OF FIGURES

Figure 1.1. Emitter coupled pair, a standard building block for ECL logic. . . . . . . . .

2

Figure 1.2. Typical ft curve for an HBT transistor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Figure 1.3. Structure of a 3-input ECL AND gate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

Figure 1.4. Reported static frequency dividers manufactured in Si-bipolar/SiGe
and III-V HBT technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 2.1. Diagram of the self-aligned DHBT device [1] from the Vitesse
VIP-2 InP process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 2.2. Stack-up of the Vitesse VIP-2 InP DHBT process [1] with 4
aluminum metal interconnect levels, thin-film resistors, and
MIM capacitors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 2.3. Example clock tree schematic for a 4-bit accumulator. The
transmission line parameters are determined from the layout. The clock drivers and clock loads are included for the
simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 2.4. Example simulation of the clock distribution interconnects
as microstrip transmission lines with line lengths, characteristic impedances, and loads estimated from the physical
layout. The best case and worst case register inputs are
shown both before and after the addition of series resistors
to the clock distribution paths. Without the series resistors,
there is some potential for overdrive in the worst case register inputs. The series resistors reduce the overdrive while
maintaining a clock bandwidth well above the operating
frequency of the circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 2.5. ECL inverter schematic. The input of the inverter is driven
by the differential inputs Ap and An. It has emitter follower
outputs so it can drive multiple loads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 2.6. ECL two-input logic gate schematic. In the configuration
shown, the logic gate operates as an AND gate. The sense
of the inputs and outputs can be swapped to achieve NAND,
OR, or NOR gates from the two-input topology. . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 2.7. ECL XOR logic gate schematic. The sense of the outputs
can be swapped to achieve an XNOR gate from the topology. . . . . . . . . . 26

viii

Figure 2.8. ECL latch gate schematic. In this configuration, the latch is
transparent when clkp has a higher voltage than clkn, and it
is in latch mode when clkn has a higher voltage than clkp.. . . . . . . . . . . . . 26
Figure 2.9. Test circuit for designing voltage differential. The voltage
differential is determined by Vdif f , and Vbias is used to bias
the differential pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Figure 2.10. Simulation of the percentage of current through each leg of
the differential pair as a function of voltage differential. . . . . . . . . . . . . . . . 29
Figure 2.11. Simple current mirror circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Figure 2.12. Current mirror with beta helper and emitter degeneration.
The left side of the circuit generator the bias voltage. The
bias voltage is used by multiple current sources in the circuit. An example current source is shown on the right side
of the figure.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Figure 2.13. DC sweep simulation of the current source current versus
the collector voltage. The output resistance is 3.7 kΩ over
the linear region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Figure 3.1. 4-bit carry ripple adder formed by stringing together full adders.. . . . . . 37
Figure 3.2. 4-bit carry lookahead adder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Figure 3.3. Block diagram of divide by two test circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Figure 3.4. Block diagram of the 4-bit accumulator test circuit. The
DAC combines the four high-speed digital sum outputs of
the accumulator (S(3:0)) into a single high-speed analog
output. This output can be observed on a sampling oscilloscope. . . . . . . 44
Figure 3.5. Test setup for frequencies below 50 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Figure 3.6. Test setup for frequencies above 50 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Figure 3.7. Four-level series-gated carry and latch circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 3.8. Four-level series-gated sum and latch circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Figure 3.9. Block diagram of the pipelined 4-bit accumulator using 2bit adders and 2-bit registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Figure 3.10. Block diagram of the 2-bit adder comprised of carry, sum,
and latch circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

ix

Figure 3.11. Simulation of the four-level parallel-gated carry test circuit.
The test circuit uses the carry circuit as a divide by two
circuit to estimate an upper bound for the accumulator operating frequency. This simulation includes extracted parasitic capacitors and is shown at the maximum operational
simulated clock frequency of 55 GHz.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Figure 3.12. Simulation of the ACCV1 4-bit accumulator using fourlevel series-gated logic. The plot shows the outputs of the
four sum bits. The simulation includes extracted parasitic
capacitors and is shown at the maximum operation frequency
of 46 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Figure 3.13. High-speed sampling oscilloscope screen capture of the 26 GHz
output from the four-level series-gated carry test circuit operating as a divide by two circuit with a 52 GHz clock. This
output is attenuated compared to the simulation results because of additional parasitic capacitance from the probes
and cables that connect the test chip to the test equipment. . . . . . . . . . . . . 53
Figure 3.14. The single-level parallel-gated carry circuit with cascaded latch. . . . . . . 56
Figure 3.15. Simulation of the output of the single-level parallel-gated
carry circuit. The upper plot shows the areas with reduced
differential for the Xp and Xn outputs for the states where
0, 1, 2, and 3 inputs are high. The lower plot illustrates how
the full differential is regained by buffering from the latch. . . . . . . . . . . . . 57
Figure 3.16. Simulation of the single-level parallel-gated carry test circuit. The test circuit uses the carry circuit as a divide by
two circuit to provide an estimate of the upper bound of the
accumulator operating frequency.This simulation includes
parasitic extracted capacitors and is shown at the maximum
operation frequency of 52 GHz. The simulation shows the
output at the output pad of the extracted test chip layout, so
the extracted parasitic capacitances affect the waveform shape. . . . . . . . 59
Figure 3.17. Simulation of ACCV2 showing the outputs of the four sum
bits. The simulation includes parasitic extracted capacitors
and is shown at the maximum operation frequency of 46 GHz. . . . . . . . . 60
Figure 3.18. Microphotograph of the ACCV2 single-level parallel-gated
carry test chip. The chip is 1220 µm by 1025 µm. . . . . . . . . . . . . . . . . . . . . . . 62
Figure 3.19. Oscilloscope screen capture of the carry test circuit output
at 27.5 GHz with a 55 GHz clock frequency.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Figure 3.20. Microphotograph of the ACCV2 test circuit. The chip is
1725 µm by 1025 µm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
x

Figure 3.21. Oscilloscope screen capture of the DAC output of ACCV2
4-bit accumulator test circuit with 41 GHz clock frequency
and input increment of 7. The digital output sequence is
labelled on waveform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Figure 3.22. Oscilloscope screen capture of the DAC output of the ACCV2
4-bit accumulator test circuit with 41 GHz clock frequency
and input increment of 8 acting as a divide by two circuit
with a 20.5 GHz output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 3.23. Three-level series-gated sum circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 3.24. Layout view of the ACCV3 4-bit accumulator test chip. The
test chip includes DAC output circuitry and is 1725 µm
by 1025 µm. The 4-bit accumulator occupies an area of
510 µm by 575 µm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Figure 3.25. Simulation of the ACCV3 4-bit accumulator using the singlelevel parallel-gated carry circuit and the three-level seriesgated sum. This simulation includes parasitic extracted capacitors and is shown at the maximum operation frequency
of 40 GHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Figure 3.26. Layout view of the ACCV4 4-bit accumulator test chip. The
test chip includes DAC output circuitry and is 1725 µm
by 1025 µm. The 4-bit accumulator occupies an area of
370 µm by 350 µm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Figure 3.27. Simulation of the ACCV4 4-bit accumulator using the singlelevel parallel-gated carry circuit, three-level series-gated sum
circuit, and additional clock buffer circuitry. This simulation includes parasitic extracted capacitors and is shown at
the maximum operation frequency of 43 GHz.. . . . . . . . . . . . . . . . . . . . . . . . . . 72
Figure 3.28. Two-level parallel-gated sum circuit and separate latch circuit. . . . . . . . . 77
Figure 3.29. Block diagram of the pipelined 8-bit accumulator using 2bit adders and 2-bit registers. The dotted boxes partition the
accumulator into 4-bit accumulator and 4-bit register blocks. . . . . . . . . . . 79
Figure 3.30. Simulation of the ACCV5 8-bit accumulator. This simulation includes parasitic extracted capacitors and is shown at
the maximum operation frequency of 34 GHz. This simulation is optimistic, because it ignores the additional loading
and parasitic capacitance from interconnects and circuitry
that would be connected to the output of the accumulator in
practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Figure 3.31. Schematic of the ACCV6 8-bit accumulator. It has an architecture similar to a carry ripple adder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
xi

Figure 3.32. Schematic of the ACCV6 full adder block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Figure 3.33. Schematic of the ACCV6 carry circuit. It is similar to the
other single-level parallel-gated carry circuits, except that
it is not followed by a latch that recovers full differential.
Instead it has an emitter follower so that it can drive subsequent carry and sum circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Figure 3.34. Schematic of the ACCV6 sum circuit. It is similar to other
two-level parallel-gated sum circuits, except that it has an
emitter follower so that it can drive a register circuit.. . . . . . . . . . . . . . . . . . . 85
Figure 3.35. Simulation of the ACCV5 8-bit accumulator. This simulation includes parasitic extracted capacitors and is shown at
the maximum operation frequency of 13 GHz.. . . . . . . . . . . . . . . . . . . . . . . . . . 87
Figure 3.36. Schematic of the triple-tail latch circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Figure 3.37. Voltage levels in the triple-tail latch. The voltage levels are
compatible with other triple-tail gates, where the clock signal is substituted for the logic signal on the third tail. . . . . . . . . . . . . . . . . . . 91
Figure 3.38. Schematic of the triple-tail sum circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Figure 3.39. Schematic of the triple-tail carry circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Figure 3.40. Schematic of a 12-bit accumulator truncated to 9-bits of
output. Since the accumulator output is truncated, registers
are eliminated to save power.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure 4.1. Block diagram of a general direct digital synthesizer. The
components are shown on the top and their outputs are shown
on the bottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Figure 4.2. Unit circle with phase information from accumulator. In
this example, N = 5, so there are 32 different phase values
with a symmetry of 8 in each of the 4 quadrants. Selected
accumulator outputs n are labelled using the mapping from
Equation 4.1 to illustrate the breakdown by quadrants. . . . . . . . . . . . . . . . . . 105
Figure 4.3. Frequency spectrum of a DDS output with the desired frequency Ap and largest spurious frequency As labelled. . . . . . . . . . . . . . . . . . 108
Figure 4.4. Block diagram of DDSV1 circuit with the outputs of each
stage illustrated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Figure 4.5. Block diagram of the thermometer-coder portion of the sineweighted DAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Figure 4.6. Block diagram of the summing junction and Gilbert multiplier showing the [3 5 5 4 4 3 2 1] tap weighting scheme. . . . . . . . . . . . . . . 113
xii

Figure 4.7. Gilbert multiplier schematic. Used to switch the sign of the
DAC output in DDSV1 to achieve a full-wave sine output. . . . . . . . . . . . . 114
Figure 4.8. Time-domain simulation output of DDSV1 with a 34 GHz
clock and FCW=1. The output frequency is 132.8125 MHz. . . . . . . . . . . 115
Figure 4.9. Frequency-domain simulation output of DDSV1 with a 34 GHz
clock and FCW=1. This output has a 29.91 dBc SFDR.. . . . . . . . . . . . . . . . 116
Figure 4.10. Time-domain simulation output of DDSV1 with a 34 GHz
clock and FCW=1. The output frequency is 16.8671875 GHz. . . . . . . . . 116
Figure 4.11. Frequency-domain simulation output of DDSV1 with a 34 GHz
clock and FCW=1. This output has a 29.91 dBc SFDR.. . . . . . . . . . . . . . . . 117
Figure 4.12. Microphotograph of the DDSV1 test chip, with dimensions
of 2700 µm by 1450 µm. The DDSV1 test chip contains
1891 transistors.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Figure 4.13. DDSV1 test setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Figure 4.14. SFDR versus DDSV1 output frequency at a 32 GHz clock
rate. The SFDR is measured within the Nyquist bandwidth.
The worst SFDR is 21.56 dBc at an FCW of 95. Over the
whole range of FCWs, the average SFDR is 26.95 dBc.. . . . . . . . . . . . . . . . 120
Figure 4.15. Sampling oscilloscope output of DDSV1 with fclk = 32 GHz
and fout = 125 MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Figure 4.16. Frequency spectrum of the DDSV1 output with fclk = 32 GHz
and fout = 125 MHz. The largest spur is located at 4.125 GHz
and the SFDR is approximately 31 dBc.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Figure 4.17. Frequency spectrum of the DDSV1 output with fclk = 32 GHz
and fout = 15.875 GHz at FCW=127. The largest spur is
located at 12.125 GHz and the SFDR is 30.44 dBc. . . . . . . . . . . . . . . . . . . . . 122
Figure 4.18. Traditional DDS architecture used in DDSV2. This DDS
consists of an accumulator, phase to sine converter and a
DAC. Representations of the outputs at each stage are shown
below the corresponding stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Figure 4.19. DAC with coarse and fine sections used for DDDV2. . . . . . . . . . . . . . . . . . . 124
Figure 4.20. MATLAB simulation of SFDR versus FCW for DDSV2
with an ideal DAC and ideal coding scheme.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Figure 4.21. MATLAB simulation of SFDR versus FCW for DDSV2
with coding scheme with the DAC outputs rounded to discrete output levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

xiii

Figure 4.22. MATLAB simulation of SFDR versus FCW for DDSV2
with simplified coding scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Figure 4.23. Simulated time-domain output of DDSV2 with a 13 GHz
clock and FCW=1. The plotted output is the difference of
the differential signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Figure 4.24. Simulation of the SFDR versus FCW for the DDSV2 design
with a 13 GHz. The simulations to determine SFDR are
done with extracted parasitic capacitors included. . . . . . . . . . . . . . . . . . . . . . . 131
Figure 4.25. Microphotograph of the DDSV2 chip. The chip is 2700 µm
by 1450 µm and contains 1646 transistors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Figure 4.26. Measured time-domain output of DDSV2 with a 13 GHz
clock and FCW=1. The output frequency is 50.78125 MHz. . . . . . . . . . . 133
Figure 4.27. Measured frequency-domain output of DDSV2 with a 13 GHz
clock and FCW=1. The SFDR is 34 dBc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Figure 4.28. Measured time-domain output of DDSV2 with a 13 GHz
clock and FCW=128. The output frequency is 6.5 GHz. . . . . . . . . . . . . . . . 134
Figure 4.29. Measured frequency-domain output of DDSV2 with a 13 GHz
clock and FCW=128. The SFDR is 50 dBc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Figure 4.30. Measured SFDR versus FCW for DDSV2 with a 13 GHz
clock frequency.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

xiv

LIST OF ABBREVIATIONS
CML
CMOS
DAC
DDS
DHBT
DTL
DUT
ECL
f0
fclk
FCW
FPGA
GSGSG
HBT
InP
LPF
LSB
LUT
MIM
MSB
MSFOM
RAM
RTL
SFDR
SHBT
TFAST
TTL
VBIC

current mode logic
complementary metal-oxide-semiconductor
digital to analog converter
direct digital synthesizer
double heterojunction bipolar transistor
diode-transistor logic
device under test
emitter coupled logic
fundamental output frequency
clock frequency
frequency control word
field programmable gate array
ground-signal-ground-signal-ground
heterojunction bipolar transistor
Indium Phosphide
low pass filter
least significant bit
look up table
metal-insulator-metal
most significant bit
mixed-signal figure of merit
random access memory
resistor-transistor logic
spurious-free dynamic range
single heterojunction bipolar transistor
Technology for Agile Digitally Synthesized Transmitters
transistor-transistor logic
Vertical Bipolar Inter-Company

xv

CHAPTER 1
Introduction
Continuing advances in heterojunction bipolar transistor (HBT) technology are
expanding the range of applications for HBT devices. Recently, HBTs have been reported in InP with ft and fmax both over 300 GHz [1]. As the switching time of HBT
devices decreases, they become useful in high-speed digital and mixed-signal circuits.
The best method for taking advantage of the inherent speed of HBTs for digital and
mixed-signal circuits is by using emitter coupled logic (ECL). Using the standard emitter coupled pair shown in Figure 1.1 as a current steering switch building block, complex
logic functions can be constructed. ECL is fast because currents are switched over small
signal swings, typically 200 mV to 300 mV, and node capacitances are charged and
discharged over small ranges [2]. Beyond high-speed operation, ECL is advantageous
because it incorporates differential signals, making it less susceptible to noise [3].
ECL has been around for many years, and it is used in circuits where high-speed
operation is the most critical factor. Examples of applications include microprocessors [4], random access memories (RAMs) [5], radars [6], field programmable gate
arrays (FPGAs) [7], and frequency synthesizers [8]. For the past couple of decades,
most of the work done in high-speed microprocessors and RAMs has moved away
from ECL and into complementary metal-oxide-semiconductor (CMOS) technologies.
In these applications, the lower power consumption of CMOS technologies outweighs
the extra speed provided by ECL and HBTs. However, ECL is still used for niche applications, such as state of the art advances in radars, FPGAs, and frequency synthesizers.
In these applications, higher frequencies are usually desired at the expense of higher
power consumption. Despite the longevity of ECL, the ever increasing clock frequencies in high-speed digital and mixed-signal applications bring about many challenges in
areas such as power consumption and clock distribution, that must be dealt with.

1

Zn
Zp
Ap

An

Figure 1.1: Emitter coupled pair, a standard building block for ECL logic.

1.1

Major Design Challenges for High-Speed Digital and MixedSignal HBT Circuitry
Two of the most difficult challenges that must be dealt with in the design of

high-speed digital and mixed-signal HBT circuitry are power consumption and clock
distribution. The two are related, and both become increasingly difficult to address as
circuit complexity and clock speeds increase. For logic gates operating at the maximum
switching speed in the Vitesse VIP-2 process [1], ECL circuits must have current densities around 5 mA/µm2 . This leads to basic gates (inverters, NANDs, etc.) requiring
around 15 mA to 20 mA, or 54 mW to 72 mW of power at a 3.6 V supply voltage.
Even at the lower figure, a circuit with 20 basic gates consumes more than 1 W of
power. Non-trivial systems for applications such as radars, frequency synthesizers, and
FPGAs require gate counts in the hundreds or thousands. For these high power gates,
that translates into tens to hundreds of Watts of power.

2

1.1.1

Power Consumption Issues
In circuits and systems, high power consumption is problematic for a number of

reasons. First, high power consumption leads to increased die and device temperatures.
Direct measurements of the temperature at the die and wafer level is often difficult to do
without expensive equipment. An indirect measure of temperature for wafers and dies
is power density, which is defined as the amount of power consumption per unit area.
Typically, a conservative maximum power density limit of 100 W/cm2 is cited [9] for the
ability to remove heat from a die in either HBT or CMOS technologies. As the power
density and temperature increase, the performance of the HBT devices and circuits decrease. As a result, steps must be taken to reduce the power density or to remove heat
from the die. While gates can be spaced out on the die to decrease junction temperatures,
large dies increase manufacturing costs, since more wafers must be processed to produce
the same number of chips. When there are large spaces between gates, physically long
lines must be used for gate connections. These long lines are prone to the effects of
increased interconnect capacitance. Additionally, if the lines are electrically long (see
the Section 1.1.2), microwave design techniques must be employed, further complicating the design process. While heat sinks, liquid cooling, and fans can be employed to
compensate for elevated die temperatures, they can add extra power consumption and
physical weight at the system level, decreasing portability.
Even if the heat related problems are ignored, high power consumption limits
the potential applications for high-speed circuits and systems. While it is possible to
supply power on the order of tens or hundreds of Watts in a vehicle or a fixed location,
massive amounts of power are not feasible for portable or hand-held systems. Since
many of the potential applications for high-speed digital and mixed-signal circuits lie
within the field of communications, it is advantageous to keep power consumption low
so that hand-held communication devices are viable.

3

1.1.2

Clock Distribution Issues
Systems with any sort of complexity require clock distribution to many regis-

ters and latches on locations throughout the chip. On highly integrated circuits with
high-speed clock signals, the distribution lengths of the clock interconnects can become
electrically long, so that they can no longer be modelled as lumped element interconnects. One estimate for when interconnects begin to be electrically long and the lumped
element model begins to be insufficient is when their length is greater than 1/6 of the
effective length of a rising edge of the signal [10]. This can be expressed as

l=

Tr
.
6·D

(1.1)

With an 11 ps rise time (Tr ) for a 30 GHz clock signal and a delay (D) of 0.0067 ps/µm
in InP, interconnects longer than 300 µm can be considered to be electrically long. Note
that the delay figure is an estimate for the top metal layer of the four metal layer stack up
in the Vitesse VIP-2 technology [1]. This layer is typically used for clock distribution in
the circuits in this thesis. Signal interconnects other than clock lines can be electrically
long.
The clock signal must also arrive at all of the distribution points with reasonably
matched phase, or low clock skew, in order to reduce setup and hold violations in the
latches [11]. While all of the clock interconnects can be designed to have the approximately same length to minimize clock skew, the Spectre simulation toolkit does not
directly deal with the effects of electrically long interconnects correctly. Instead, the
parasitic extraction of the Spectre toolkit incorrectly models the electrically long lines
with lumped element models that have large lumped parasitic capacitors connected. This
sort of model ignores the distributed resistance, capacitance, and inductance that would
be present in a transmission line model.

4

In order to work around this issue, separate microwave models of the clock tree
are constructed and simulated with the Spectre toolkit to determine proper line matching and to investigate the behavior of long line effects. Problems such as excessive gain
or attenuation at certain frequencies can be addressed by adding terminations or source
resistors to the long lines. Once it is determined that the interconnects are well matched
and well behaved in a microwave sense, they are assumed to be safe in the circuit simulations, even though no direct models can be simulated for the long interconnects in the
Spectre simulator. While this method gives reasonable results, there are many potential
pitfalls in using two separate simulation environments.
Clock distribution problems tend to require solutions that increase power consumption. This includes the addition of buffers in the distribution path or high-power
emitter followers for better drive capability. The buffers can be used for driving a large
number of clock branches in a distribution tree, or for recovering weak signals. Highpower emitter followers are used for driving the clock to compensate for parasitics and
transmission line effects. As system complexity increases, the clock distribution can be
a large proportion (40% or more [12]) of the total power for the circuit, and all of the
problems related to high power consumption are further aggravated.

1.2

Power/Speed Trade-Off
High power consumption is generally necessary to achieve the high switching

speeds necessary for high-speed circuits. Consequently, lower power designs typically
sacrifice speed for power. Figure 1.2 shows a typical ft curve, which is a general figure
of merit in HBT designs. The ft curve shows that the switching speed is reduced for
current densities below the peak ft . Since the current is proportional to the power, lower
power leads to lower switching speed. However, it is still possible to maintain high clock
rates while lowering power by reducing supply voltages, using alternative architectures,
or optimizing timing paths.
5

Switching Frequency

Current Density
Figure 1.2: Typical ft curve for an HBT transistor.

1.2.1

Timing Path Optimization
One method for decreasing the power consumption of a circuit is to take advan-

tage of the excess timing margin in gates that are part of non-critical paths. The excess
timing margin can exist when some logic gates are faster than needed to complete an operation within a clock cycle. Since the speed of the gates can be reduced by reducing the
current density and the corresponding power, the gates with excess timing margin are
effectively wasting power. For gates in non-critical paths, the power and resulting speed
can be reduced to some extent before the overall circuit performance is impacted. Thus,
it is possible to reduce the power with no impact on the overall circuit performance. In
some cases, gates can be operated at less than half power without a negative impact on
the overall circuit performance.
Power can be optimized by analyzing the timing characteristics of all of the data
paths and reducing the power in gates which have excess timing margin. Careful analysis and power optimization of the system timing paths can lead to significant power
6

savings. It is not necessary to specifically set the power for each instance of a gate. Instead, a few versions of each type of gate with different power and speed characteristics
can be created and used for optimization purposes. This scaled back method of optimization can allow for reduction in power that can be implemented relatively rapidly in
terms of engineering time.

1.2.2

Architecture Modifications
Additional reduction in power can be achieved through modifications to the cir-

cuit architecture or topology. Typically, high-speed digital systems require many stages
of pipelining in order to achieve high clock rates. Pipeline stages increase power consumption by adding registers and extra clock circuitry that is needed to drive the registers. In systems with the highest clock rates, the combination of registers and clock
circuitry easily accounts for a majority of the total system power. Any architecture modifications that can reduce the number of registers will have a large impact on reducing
the power consumption.

1.2.3

Supply Voltage Reduction
Power consumption can also be reduced by decreasing the supply voltage. How-

ever, this involves the elimination of logic levels in the gates. For example, in order
to implement a 3-input ECL AND gate, at least 3 NPNs must be stacked as shown in
Figure 1.3. In the AND gate in Figure 1.3, the inputs and outputs are differential. The
output differential pair is Zp/Zn, and the input differential pairs are Ap/An, Bp/Bn, and
Cp/Cn. This requires a voltage supply of ∼4.5 V in order to accommodate the minimum voltage swing (200 mV to 300 mV), 4 diode drops (0.8 V/diode), and the voltage
required for the current source (1.0 V). By removing a level from the logic and using
a 2-input AND gate, a diode drop can be removed from the supply voltage, reducing it
to about 3.7 V. This helps to decrease the power consumption, since many of the gates,
7

Ap

Zp1

An

Bp

Zn1

Bn
Cp

Cn

Figure 1.3: Structure of a 3-input ECL AND gate.

such as registers, are 2-input gates. It is important to note that while using a reduced
supply voltage decreases the power consumed by individual gates, it may potentially
increase the overall power if more gates are needed to implement a logic function. For
example, a 3-input gate must be implemented as two 2-input gates. However, this is
compensated for in systems with many pipeline stages, where registers comprise a large
portion of the total number of gates, and the registers do not need any additional current
to operate with a lower supply voltage. In these systems, the reduction in the power consumption of the registers is much greater than the increase in the power consumption in
the logic, so the overall power is reduced.

1.3

Motivation for Project
This thesis is supported in part by Defense Advanced Research Projects Agency

(DARPA), which is pursuing high performance mixed-signal circuits through the Technology for Frequency Agile Digitally Synthesized Transmitters (TFAST) program [13].

8

Metric
Emitter width
Current Density
ft /fmax (with BVCEO = 4.5 V)
Static Divider Clock Speed
Transistor Count

Phase I (mid-2004)
0.25 µm
500 kA/cm2
350 GHz / 400 GHz
150 GHz
1k

Program Goal (2007)
0.15 µm
1000 kA/cm2
500 GHz / 500 GHz
200 GHz
20 k

Table 1.1: Goals of the TFAST program.

Direct digital synthesizer (DDS) circuits are of particular interest for applications including radars, communications, and electronic warfare [9]. SiGe has some advantages over
InP, since it is generally more manufacturable, has better lithography, has smaller devices, and it can realize high levels of circuit complexity, than InP devices. The TFAST
program, however, is using InP HBTs, because they have much higher breakdown voltages and slightly better ft and fmax , resulting in a better mixed-signal performance than
InP. The TFAST program is also promoting the development of InP HBT technologies
to continue to increase their speed advantages while making them in highly manufacturable processes that are commercially viable for large transistor count circuits. The
goals of the TFAST program [14] are shown in Table 1.1. The Phase I goals were
reached in mid-2004.
This thesis focuses on the high-speed digital and mixed-signal circuit design for
direct digital synthesizers implemented in InP HBT technology. Of particular interest
are DDS circuits that can synthesize frequencies over X-band (8 GHz to 12 GHz) and
Ku -band (12 GHz to 18 GHz) frequencies. In order to maximize the potential applications of the high-speed DDS circuitry, it is also desirable to achieve the lowest power
consumption possible. Since power and speed are two conflicting metrics, the trade-offs
between them will be explored. Generally, increasing the power of HBT circuits increases the speed, but the increased power reaches a point of diminishing returns where
large increases in power yield small benefits for speed performance. Eventually, further
power increases actually have a negative impact on the speed after the peak point of

9

the ft curve is reached, as shown in Figure 1.2. This thesis investigates the trade-offs
involved in maximizing clock speed while minimizing power through the use of various
circuit architectures and design techniques. While there are numerous speed/power design points depending on the specific application, this thesis will provide some guidance
for choosing an architecture and using design techniques to yield the lowest possible
power at a particular clock frequency in InP HBT technology. This thesis will also
develop a figure of merit for comparing DDS designs.

1.4

Review of Previously Reported Work on Digital and MixedSignal HBT Circuits
The DDS is a mixed-signal system, since it contains both digital and analog com-

ponents. For InP and other III-V HBTs, one way to measure the relative performance
of mixed-signal circuits is through the mixed-signal figure of merit (MSFOM), defined
by [14]:
M SF OM =

JC ·BVCEO
,
CCB ·∆VLOGIC

(1.2)

where JC is the current density, BVCEO is the breakdown voltage, CCB is the extrinsic
capacitance between the base and collector, and ∆VLOGIC is the voltage swing. In Equation 1.2, the bottom term (CCB ·∆VLOGIC ) represents a time constant from parasitics and
voltage swing. As this time constant increases, the MSFOM is reduced. Likewise, increasing JC tends to overcome this time constant. Higher breakdown voltages improve
the MSFOM, since a higher BVCEO contributes to higher linearity [15] in digital to
analog converters (DACs), which are critical components of mixed-signal circuits. In a
purely digital InP HBT system, the major delay term [16] is analogous to the inverse of
the MSFOM without BVCEO .
From Equation 1.2, it is apparent that factors other than the ft and fmax of a
technology play a role in determining its relative usefulness for mixed-signal circuits.

10

While the MSFOM provides a more complete metric, not all of the components necessary for computing the MSFOM are always available in the literature. For digital
systems, one figure of commonly reported merit is the maximum clock frequency for
a static frequency divider. While this metric neglects the BVCEO term needed for the
MSFOM, it gives a relative estimate for the JC /(CCB ·∆VLOGIC ) term of the equation.
When compared to the ft and fmax , the maximum frequency of the static frequency divider provides an estimate of what level of performance degradation arises from layout
parasitics in simple digital circuits.
While useful as an estimate, the maximum static frequency divider is not always
a realistic metric. Reported static frequency dividers are usually designed to reach a
maximum frequency and are not necessarily practical in more complex circuits. These
frequency dividers are often operated at very high current densities which are only beneficial to circuit performance when the circuit is continuously oscillating. The dividers
sometimes use peaking inductors to improve high frequency response [17]-[19] but in
more complex circuits the peaking inductors are often impractical due their layout size
and the interconnect routing limitations they introduce.
Reported static dividers implemented in III-V technologies (including InP) are
summarized in Table 1.2 and static dividers implemented Si-bipolar and SiGe technologies are summarized in Table 1.3. The most recently reported dividers in both tables
operate at approximately ft /2. Variations in earlier reported static dividers tended to
be worse than ft /2 for III-V technologies and better than ft /2 for Si-bipolar/SiGe technologies. It has been noted that divider performance is limited by factors other than
ft , such as base resistance (Rb ) [20], collector capacitance (CJC ) [21], and interconnect
parasitics [22]. A graph of the maximum frequency of Si-bipolar/SiGe and III-V static
frequency dividers versus the year reported is shown in Figure 1.4.
For a complete comparison of MSFOM and high-speed mixed-signal circuits,
it is more appropriate to study reported DDS circuits. However, many of the reported

11

Year
Reported
1989
1991
1992
1998
1999
1999
2000
2001
2002
2002
2004
2004
2004
a
b
c
d

Frequency
(GHz)
34.8
36
39.5
52.9
66
69
72.8
75
87
100a
151.2
152
152

ft
(GHz)
68
110
130
134
164
195
198
165
205
135b
405c
301
300d

Company
or Group
NTT
HRL
HRL
HRL
UCSB
TRW
HRL
UCSB
UCSB
HRL
HRL
UCSB/Rockwell
BAE/Vitesse

Reference
[23]
[24]
[25]
[26]
[17]
[18]
[22]
[27]
[28]
[29]
[30]
[16]
[1]

Measurements were limited by available test equipment.
The fmax of this process is over 300 GHz.
The fmax of this process is 370 GHz.
ft is reported in the range of 300 GHz to 350 GHz.

Table 1.2: Reported static frequency divider circuits implemented
in III-V processes.

Year
Reported
1989
1991
1993
1995
1996
1997
2000
2001
2003
2003
a

Frequency
ft
(GHz)
(GHz)
12.5
30
21
40
25
45
30
NRa
35
50
42
68
67
122
71
123
86
200
96
210

Company
or Group Reference
Fujitsu
[31]
NEC
[32]
Siemens
[33]
Siemens
[34]
Siemens
[35]
Siemens
[36]
Hitachi
[37]
Hitachi
[38]
Infineon
[39]
IBM
[40]

ft not reported.

Table 1.3: Reported static frequency divider circuits implemented in Si-bipolar and SiGe processes.

12

160
140

III−V
Si−bipolar/SiGe

Frequency (GHz)

120
100
80
60
40
20
0
1988

1990

1992

1994

1996
Year

1998

2000

2002

2004

Figure 1.4: Reported static frequency dividers manufactured in Si-bipolar/SiGe and IIIV HBT technologies.

DDS circuits are implemented in CMOS technologies, and they operate at low frequencies (less than 200 MHz) to either illustrate new design techniques or to achieve high
spurious-free dynamic range (SFDR). Also, technologies can not be compared since
many of the high-speed DDS circuits are reported in data sheets or manuals that do not
specify the technology used. An idea of the current state of the art for high-speed DDS
systems can be derived from Table 1.4. Both commercial and research DDS circuits are
included in Table 1.4. For commercial DDS circuits, AD9858 [41] provides an foutmax
of 400 MHz and an SFDR of 50 dBc while consuming 2 W of power. The research
examples from A. Gutierrez-Aitken et al. at TRW [8] and K. Elliott at HRL [42] are
closest to the type of work presented in this thesis. The TRW DDS was reported in an
InP HBT technology in 2001 with an foutmax of 4.56 GHz, an SFDR of 30 dBc, and a
power consumption of 15 W. The HRL DDS was reported in an InP HBT technology in

13

Part Name
STEL-2375B
ADS-432-403
ADS-431-403
AD9858
None
None
Noned
Noned
a
b
c
d

fclk
foutmax
(GHz) (GHz)
1.0
0.4
1.6
0.4
1.6
0.4
1.0
0.4
0.8
0.4
2.0
0.992
9.2
4.56
12
5.5

FCWa SFDR
(bits) (dBc)
32
50
30
45b
30
40b
32
50c
32
25
8
35
8
30
10
30

Power
(W)
15
11
6
2
0.174
0.82
15
8

Company
or Group
ITT
Meret Optical
Meret Optical
Analog Devices
KAIST/ETRI
Auburn
TRW
HRL

Ref.
[43]
[44]
[45]
[41]
[46]
[47]
[8]
[42]

Frequency Control Word (FCW) or accumulator resolution.
Claims 20 dBc spectral purity for harmonics.
For 360 MHz output.
Uses InP DHBT Technology.
Table 1.4: Recent commercial and reported direct digital synthesizers.

2005. It had an foutmax of 5.5 GHz, an SFDR of 30 dBc, and a power consumption of
8 W.

1.5

Thesis Organization
Chapter 2 discusses a general overview of the Vitesse InP technology, design

tools, and ECL logic design techniques used in the thesis. Circuit design elements that
are common to many of the designs in the thesis, such as current sources, voltages
swings, and voltage levels are described in detail. An overview of the methods used for
simulation of transmission lines with the simulation toolkit is also given.
High-speed accumulators are discussed in Chapter 3. The chapter is divided
into four major sections. In Section 3.1, the basics of adder and accumulator design
are discussed. Section 3.2 discusses four high-speed 4-bit accumulator designs. These
accumulators operate up to a 41 GHz clock frequency. In Section 3.3, two 8-bit accumulator designs are discussed. These designs sacrifice some high-speed performance to
achieve lower power consumption. The designs are also integrated into DDS circuits

14

in Chapter 4. Finally, Section 3.4.1 discusses other techniques for power consumption
reduction, namely triple-tail circuits and resistor-only current sources.
Two full DDS circuits that use the accumulators from Section 3.3 are described
in Chapter 4. The DDSV1 design described in Section 4.4 operates up to a maximum
clock frequency of 32 GHz and consumes 9.45 W of power, while the DDSV2 design
described in Section 4.5 operates up to a maximum clock frequency of 13 GHz and
consumes 5.42 W of power. A new figure of merit for comparing the performance of
DDS designs is also introduced that takes the SFDR and frequency resolution of a design
into account.
The thesis is concluded by a summary of accomplishments and suggestions for
future work that are presented in Chapter 5.

15

CHAPTER 2
HBT Circuit Design
This chapter provides background information that is relevant to the circuits
designed in this thesis. An overview of the Vitesse VIP-2 InP process is given in Section 2.1. This technology is used for manufacturing the circuits designed in this thesis.
An overview of the simulation tools and methodology is presented in Section 2.2. The
final sections of this chapter cover emitter coupled logic, voltage swing, and current
sources.

2.1

Vitesse VIP-2 Process
Details of the Vitesse VIP-2 process used in this thesis were first reported in [1]

and described in further detail in [48]. The VIP-2 process uses double heterojunction
bipolar transistors (DHBTs). The DHBT devices have an n-InP emitter, a p-InGaAs
base, and an n-InP collector. A diagram of the DHBT device is shown in Figure 2.1.
The process has 4 levels of aluminum metal interconnect, thin-film resistors, and metalinsulator-metal (MIM) capacitors. There are no PNP devices in the process, only NPN
transistors. A stack-up of the process is shown in Figure 2.2. The process is self-aligned,
using dielectric and metal spacers. The use of self-alignment and the elimination of
the traditional lift-off process results in a highly manufacturable process. In the VIP-2
process, ft and fmax are both over 300 GHz, while BVCEO is over 4 V.
Although single heterojunction bipolar transistors (SHBTs) are generally faster
than DHBTs implemented in the same materials, the DHBTs have some advantages
over SHBTs [49], such as higher breakdown voltages and better thermal characteristics.
These advantages make the DHBT devices better suited for highly integrated digital and
mixed-signals circuits.

16

Base
Metal
Collector
Metal

Emitter
Metal
Emitter
Base

Base
Metal
Collector
Metal

Collector
Sub-collector

Figure 2.1: Diagram of the self-aligned DHBT device [1] from the Vitesse VIP-2 InP
process.

Metal 4 Passivation
Oxide 4
Metal 3
Oxide 3
Metal 2
Capacitor

Oxide 2

Metal 1
Resistor

Oxide 1

InP Substrate
Figure 2.2: Stack-up of the Vitesse VIP-2 InP DHBT process [1] with 4 aluminum metal
interconnect levels, thin-film resistors, and MIM capacitors.

17

The higher breakdown voltage in the Vitesse VIP-2 DHBT devices is due to the
fact that the p-InGaAs to n-InP base to collector junction has a larger energy gap than
the p-InGaAs to n-InGaAs base to collector junction in an equivalent SHBT device.
A higher breakdown voltage is advantageous for use in mixed-signal circuitry, since a
higher BVCEO allows for a larger DAC output range that leads to improved DAC linearity [15]. That is one of the reasons that BVCEO is included in the MSFOM outlined in
Equation 1.2 in Chapter 1, because DAC circuits are essential to mixed-signal circuitry
and a higher breakdown voltage will improve the MSFOM if other factors are constant.
The higher breakdown voltage is also one of the main reasons that InP DHBTs are used
instead of SiGe devices, since the VIP-2 process BVCEO is over 4 V, compared to below
2 V for SiGe [9].
The DHBT devices are also advantageous compared to similar SHBT devices,
because the InP collector in a DHBT device is more thermally conductive than the
InGaAs collector in a SHBT device [49]. High thermal conductivity is essential, particularly in densely packed mixed-signal circuits with thousands of transistors. The heat
dissipated from the high performance devices needs to be moved away from the devices
efficiently in order to minimize self-heating and reduce the the operating temperature.
This ensures that the DHBT devices perform optimally.
The Vitesse VIP-2 process facilitates high-speed digital and mixed-signal circuits. Reported digital circuits include a 152 GHz static frequency divider [1] and a
41 GHz 4-bit accumulator [50]. For mixed-signal circuits, a 50 GHz variable gain amplifier [48] and DDS circuits operating up to a 32 GHz clock frequency [51] have been
reported. The 41 GHz accumulator and 32 GHz DDS will be discussed in further detail
in later sections of this thesis.

18

2.2

Circuit Simulation
The Cadence design environment is used for schematic capture and layout. Un-

less otherwise noted, Cadence Spectre is used for circuit simulations. The Vitesse VIP-2
transistors are modelled by the Vertical Bipolar Inter-Company (VBIC) model [52]. In
most cases, transient time-domain simulations are used for simulation of the digital and
mixed-signal circuits. When other types of simulations are used, it is noted. When simulations are run to determine the maximum clock frequency of a circuit, the input clock
frequency is usually stepped in 1 GHz increments until the circuit fails. Failures in the
circuits generally are caused by timing margin failures in the digital portions of the circuitry. Thus, the failures aren’t evidenced by a reduced differential, but instead by an
incorrect output caused by an incorrect internal digital sequence. For example, in divide
by two test circuits, a failed output will not be a true divide by two signal. Instead, it
will “slip,” missing cycles when driven above its the maximum operating frequency.
For increased correlation between simulation and measured results, the simulation results almost always include parasitic extracted capacitances from layout. Unless
otherwise noted, the simulation results presented in this thesis will include extracted
parasitic capacitance. Schematic-only simulations without parasitics are also run for all
of the circuits, but this intermediate step is usually excluded from this thesis, because the
extracted parasitic capacitors have a significant impact on the designs. The schematiconly simulation results give guidance on design parameters, but the parasitic capacitors
that are introduced from the layout greatly impact on the performance of the circuit and
can not be ignored. In many of the designs, the layout is iterated and adjusted to reduce
the parasitic capacitance if possible. Another option is to increase the drive of the emitter followers that are loaded by the large parasitics. Compared to experimental results,
the simulations with parasitic capacitances extracted from layout generally provide a
reasonable match.

19

2.3

Clock Distribution Simulations
The parasitic extraction of this technology is limited since it only includes ca-

pacitance. It does not include the parasitic resistance or inductance that is also inherent
in fabricated circuits. Most interconnects are electrically short, so the parasitic capacitances are the dominant factor affecting circuit performance and they can be treated
as a lumped parameter. As mentioned in Section 1.1.2, the clock interconnections can
be electrically long and will not be modelled correctly by the transient time-domain
simulations in Spectre using just lumped parameters.
For these situations, the clock tree is modelled by microstrip transmission lines
that include the inherent distributed R, L, and C, along with the circuits that load the
clock distribution interconnects. Models for transmission lines are included in Spectre,
but they do not have corresponding models in layout. As a result, a clock tree schematic
of the clock distribution is constructed and simulated with a frequency-domain AC simulation. In the clock tree schematics, the transmission line parameters, such as the
characteristic impedance and line length, are extracted from the layout by hand, because
the design kit does not include an automated method for extraction of these parameters.
The clock tree schematic is simplified from the overall circuit schematic and includes
the circuits that drive the clock and the circuits that are loads on the clock. All other circuitry is omitted from the simplified schematic. An example of a clock tree schematic
for a 4-bit accumulator is shown in Figure 2.3.
Unlike most of the other simulations in this thesis that use time-domain analysis, Spectre frequency-domain AC simulations are used for the clock tree simulations.
These simulations aid in pinpointing frequencies where the clock signals feeding the registers are potentially over-damped or under-damped. Based on the initial simulations,
adjustments are made to the clock tree, such as adding buffers and source resistors, to
compensate and adjust the gain to appropriate levels over the targeted clock frequency
range. Once the clock gain is within appropriate levels, it is assumed that the clock
20

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Clock
Load

Level
Shifter

Level
Shifter

Level
Shifter

Clock
Input

Level
Shifter

Clock
Buffer

Figure 2.3: Example clock tree schematic for a 4-bit accumulator. The transmission
line parameters are determined from the layout. The clock drivers and clock loads are
included for the simulations.

circuitry will perform correctly, and the full circuit simulations are then run using timedomain transient analysis.
An example of this type of simulation output is shown in Figure 2.4. In these
simulations, the clock signal magnitude is normalized so that clock levels above 0 dB
are strong enough to drive registers. Based on the initial simulations that did not include
series resistors in the clock distribution path, the worst case register inputs were found
to peak 2.1 dB over the nominal normalized clock signal magnitude at a 30 GHz clock
frequency. This peaking could potentially overdrive the clock buffers and registers, so
resistors were added in series between the emitter followers of the clock drivers and the
long microstrip transmission lines feeding the registers. These resistors absorb reflections on the microstrip transmission lines and reduce the clock overshoot to acceptable
levels. As shown in Figure 2.4, adding series resistors in the clock distribution path reduces the overshoot down to acceptable levels. With the series resistors added, the best

21

case register input from the previous simulation now has the least clock bandwidth of all
the register inputs. For this example circuit, the targeted clock performance is 32 GHz,
so the addition of the series resistors does not negatively impact the circuit performance
because this register clock input still has 36 GHz of bandwidth. This method of clock
tree simulation was first reported in [51].

2.4

Emitter Coupled Logic
Historically, several forms of digital bipolar logic have been reported. These

include resistor-transistor logic (RTL), diode-transistor logic (DTL), transistor-transistor
logic (TTL), and emitter coupled logic. While TTL has been the most widely used,
particularly from the late 1960s to the late 1980s, ECL is the fastest [3]. ECL is based
on the bipolar differential pair with an emitter follower. An example of an ECL inverter
that uses the differential pair is shown in Figure 2.5. In ECL, the switching transistors
are kept in the active mode of operation, so the time needed to alter the base charge is
low, enabling high-speed operation [53] and current steering.
ECL uses a differential pair to steer the current (IEE ) depending on the input
differential signal (Ap/An). As described in Section 2.5, a sufficiently large input differential, will result in almost all of the current steered through the transistor with the
highest input voltage. Thus, when the voltage at node Ap is sufficiently higher than An,
almost all of the current is steered through the left transistor in the differential pair, so
node Zp has a potential of approximately -IEE *RC volts. Since almost all of the current is steered through the left side transistor, essentially no current flows though the
transistor on the right side of the differential pair, and node Zn is pulled up to the top
rail, which is 0 V in Figure 2.5. This convention is used for all of the circuits in this
thesis. The emitter followers on the outputs of the differential pair drive loads using the
differential signals Zp1 and Zn1. Without the emitter followers, differential pair alone

22

Normalized Clock Signal Magnitude (dB)

8

Worst Case after
series resistors added

Worst Case

6

4

Best Case

2
Best Case after
series resistors added
0
6

8

10

20
30
Frequency (GHz)

40 50

Figure 2.4: Example simulation of the clock distribution interconnects as microstrip
transmission lines with line lengths, characteristic impedances, and loads estimated from
the physical layout. The best case and worst case register inputs are shown both before
and after the addition of series resistors to the clock distribution paths. Without the series
resistors, there is some potential for overdrive in the worst case register inputs. The
series resistors reduce the overdrive while maintaining a clock bandwidth well above
the operating frequency of the circuit.

23

RC

RC

Zp
Zn
Ap

An

IEE

Zp1

Zn1

IEF

IEF

Figure 2.5: ECL inverter schematic. The input of the inverter is driven by the differential
inputs Ap and An. It has emitter follower outputs so it can drive multiple loads.

is only able to drive a low fan out of a couple of gates without losing performance and
reducing the output differential across nodes Zp and Zn.
ECL can also be extended to multiple input logic gates. A two-input ECL gate
is shown in Figure 2.6. In this two-input gate, two differential pairs are stacked upon
each other. In each differential pair, the current is steered though the transistor with
the highest voltage at the base node, Ap/An or Bp/Bn. The configuration shown in
Figure 2.6 performs the AND operation. If the sense of the outputs is switched, so
that nodes Zp and Zn are swapped, the logic gate in Figure 2.6 performs the NAND
operation. From Demorgan’s theorem [54], when the sense of the both inputs and the
output are swapped, the gate in Figure 2.6 functions as an OR gate. Likewise, if the
sense of the both inputs are swapped, but the output is left as is, then the gate functions
as a NOR gate. Thus, the two-input ECL gate topology shown in Figure 2.6 is very
flexible. It can be used as an AND, NAND, OR, or NOR gate, just by modifying the
sense of the inputs and outputs.

24

RC

RC

Zn
Zp
Ap

An

Bp

Zn1

Zp1

IEF

IEF

Bn

IEE

Figure 2.6: ECL two-input logic gate schematic. In the configuration shown, the logic
gate operates as an AND gate. The sense of the inputs and outputs can be swapped to
achieve NAND, OR, or NOR gates from the two-input topology.

Other two-input configurations implemented in ECL, include the XOR/XNOR
shown in Figure 2.7, and the latch shown in Figure 2.8. The current steering methodology can be extended to more than two inputs by stacking more differential pairs. For
example, the three-input ECL AND gate shown in Figure 1.3.

2.4.1

Voltage Levels
In ECL, the logic levels are defined in terms of the number of diode drops below

the top rail. Since the top rail is at ground in this thesis, the top voltage level goes from
0 to -IEE *RC volts. By convention, this range of voltages is defined as “level 0.” One
diode drop (800 mV to 900 mV) below “level 0” is “level 1.” Another diode drop lower
is “level 2.” Additional levels are possible, depending on the supply voltage and logic
gates used.

25

RC

RC
Zn

Zp
An

Ap

Ap

Bp

Zp1

Zn1

IEF

IEF

Bn

IEE

Figure 2.7: ECL XOR logic gate schematic. The sense of the outputs can be swapped
to achieve an XNOR gate from the topology.

RC

RC
Zp

Zn
Ap

An Zp1

Zn1

clkp

Zn1

Zp1

IEF

IEF

clkn

IEE

Figure 2.8: ECL latch gate schematic. In this configuration, the latch is transparent
when clkp has a higher voltage than clkn, and it is in latch mode when clkn has a higher
voltage than clkp.

26

Generally, the uppermost transistors in the ECL gate transistor stacks are driven
by “level 1” signals. Likewise, the next pair of transistors down in the stack are driven
by “level 2” signals.

2.4.2

Current Mode Logic
It should be noted that in the literature, the term ECL is often used interchange-

ably with current mode logic (CML). In this thesis, the term ECL is used for logic gates
with differential pairs followed by emitter followers. CML is used for the same gates,
but without emitter followers. CML gates consume less power than ECL gates, since
they eliminate the emitter follower circuitry. However, they do not have the ability to
drive a large fan out or large node capacitances. Typically, CML performance will degrade after driving only two or three gates. The addition of emitter followers increases
the drive capability of the logic gate and improves performance when there are multiple
loads.
The logic levels that drive the CML gates are different from ECL, since the
emitter followers are eliminated. The top differential pair is driven “level 0” instead of
“level 1.” For multiple input gates, emitter followers are still needed to shift the logic
level down the appropriate number of diode drops, so some of the power advantage of
CML is negated.
In this thesis, the majority of the circuits are ECL. It should be noted that it is
possible to drive the uppermost transistors of an ECL circuit with “level 0” signals. This
is advantageous, particularly if a logic gate only drives a single load. In this case, the
emitter followers on the gate can be eliminated to save power. This type of approach
is not strictly ECL, and it is used only in limited circumstances. For the most part,
however, the “standard” levels described in Section 2.4.1 are used.

27

2.5

Voltage Swing
For the ECL gates to operate properly, a sufficient voltage differential is needed

at the inputs so that a majority of the current is steered through only one transistor of
each differential pair. The voltage swing is the difference between the logic high and
logic low signal, or IEE *RC volts in the case of the circuit in Figure 2.5. If the voltage
swing is too low, the transistor in the differential pair that should be “off” will still have a
significant amount of current. While the voltage swing can be made very large to ensure
that the current is steered properly, the propagation delay of the logic gate will increase
as the output voltage swing increases. Thus, the voltage swing must be balanced so that
there is a sufficient differential to properly switch gates, but not an excessive differential
that degrades performance.
The voltage swing design point is determined by simulating the current through
the differential pair as a function of the input differential voltage. The test circuit for
this simulation is shown in Figure 2.9. The transistor sizes and values of RC and IEE
are typical of circuits used in the designs. This simulation uses a DC sweep of the
differential voltage (Vdif f ), and the simulation is for circuit schematic only, with no extracted parasitic capacitors. The simulation output in Figure 2.10 shows the percentage
of current through both transistors in the differential pair as a function of the input differential. From Figure 2.10, a voltage differential of 200 mV results in 99.25% of the
current steered through the “on” transistor and a 300 mV differential results in 99.95%
of the current steered through the “on” transistor. Below 200 mV, the percentage of
current though the “on” transistor is rapidly reduced, and above 300 mV the increase in
the percentage of current in the “on” transistor increases very slowly with Vdif f . Thus,
for a sufficient voltage swing that maximizes the gate performance, the internal voltage
swing in the ECL circuits is kept within the 200 mV to 300 mV range.

28

RC

Vbias

RC

Vdiff

Q1

Q2
IEE

Figure 2.9: Test circuit for designing voltage differential. The voltage differential is
determined by Vdif f , and Vbias is used to bias the differential pair.

100
% of current through Q1
% of current through Q2

90
80

% of Current

70
60
50
40
30
20
10
0
0

100

200
300
Vdiff (mV)

400

500

Figure 2.10: Simulation of the percentage of current through each leg of the differential
pair as a function of voltage differential.

29

Iin
Iout
Vbias

Figure 2.11: Simple current mirror circuit.

2.6

Bias Current
Providing a current source with a known, stable current is essential for circuit

operation. Not only is the operating speed of the transistors dependent on the bias current, but the voltage swing is also dependent on the bias current. One simple way to
generate a bias current is by using the simple current mirror [55] shown in Figure 2.11.
If the two transistors are equal, the input current (Iin ) is mirrored to the output (Iout ),
such that
Iout =

Iin
.
1 + β2

(2.1)

When β is large,
Iout ∼
= Iin .

(2.2)

In general, it is desirable to have the reference current on the right side of the circuit
supply the bias voltage (Vbias ) to several current sources. Since the transistor β is not
infinite, the sum of all of the base currents will become significant, so that Equation 2.2
will not hold and Iout will not longer be equal to Iin .
In this thesis, the current mirror is based on the beta helper design with emitter
degeneration added [55]. This current mirror is shown in Figure 2.12. On the left side
30

Iin

Rbias

Qb

Iout

Vbias

RE1

RE2

RE3

Bias Generator

Figure 2.12: Current mirror with beta helper and emitter degeneration. The left side
of the circuit generator the bias voltage. The bias voltage is used by multiple current
sources in the circuit. An example current source is shown on the right side of the
figure.

of Figure 2.12, the bias generator establishes the bias voltage used by current sources
in the circuit. The right side of the figure shows an example of a current source that
is used in the logic gates. The input current is established by using a resistive divider
(Rbias ). The additional transistor (Qb ) is the beta helper. In effect, Qb supplies the base
current needed by the load transistors, allowing for multiple loads to be connected while
maintaining an accurately mirrored current. The beta helper reduces the error in the
current mirror by a factor of (β + 1). When the transistors are identical,
Ã

Iout = Iin

!

2
1−
.
β(β + 1)

(2.3)

The current mirror also adds emitter degeneration in the form of resistors on the emitters
of the transistors. Since there are process variations in implementing transistors, it is difficult to achieve perfect matching of the transistors. However, the emitter degeneration
significantly improves current matching [55] by compensating for process mismatches.

31

In this thesis, the current mirror based on Figure 2.12 is used as a bias generator in all of the designs except for the resistor-only current source accumulator that is
described in Section 3.4.2. The resistor values are modified and a different number of
diodes may be used depending on the supply voltage of a particular design. In individual
circuits, when a current source is shown, it is actually implemented as a transistor with
its base connected to the Vbias node of a current source and with an emitter degeneration
resistor, as shown on the right side of Figure 2.12. Individual current sources can be
modified by changing the size of the transistor relative to the mirror transistor in the bias
generator. The degeneration resistor is also scaled to maintain a constant degeneration
voltage.

2.7

Current Source Output Resistance
Ideally, the current source outputs a constant current that is independent of the

voltage across its collector and emitter. However, these current sources are not ideal.
In the extreme case, the collector and emitter voltages may be so close that the output
current is driven to 0 mA. At higher collector voltages, the current becomes linearly
dependent on the collector voltage. This linear dependence is expressed as the output
resistance.
Using the bias generator and a current source that both have a supply voltage
of -3.8 V, the collector voltage of the current source transistor is swept using a DC
simulation in Spectre. The output of this simulation shows the dependence of the output
current on the collector voltage. It is shown in Figure 2.13. The DC sweep goes up to
only -1.8 V, because collector voltages higher than this are avoided to ensure that the
transistors are in the safe operating area.
Below -3.2 V, as shown in Figure 2.13, the bias transistor is not fully on, and
the current is very highly dependent on the collector voltage. In the circuits in this
thesis, the current sources are not operated in this region. In fact, they are generally
32

5
4.5
4

Current (mA)

3.5
3
2.5
2
1.5
1
0.5
0
−3.6

−3.4

−3.2

−3

−2.8 −2.6 −2.4
Collector Voltage

−2.2

−2

Figure 2.13: DC sweep simulation of the current source current versus the collector
voltage. The output resistance is 3.7 kΩ over the linear region.

33

kept above -2.8 V. At -2.8 V and above, the relationship between the current and voltage
is approximately linear. The slope of the current line in this region gives the output
resistance (Rout ), such that
Rout =

∆·Vcollector
.
Ibias

(2.4)

For this circuit, Rout is 3.7 kΩ. In this thesis, not all of the circuits will have the same
collector voltage at the current source, so it is important to factor in the output resistance
when biasing circuits.

2.8

Conclusion
This chapter provided an overview of some of the general aspects of circuit de-

sign in this thesis. Many of these aspects are common to multiple circuit designs in this
thesis. Accumulator and DDS circuits will be presented in the following chapters.

34

CHAPTER 3
Design of Adders and Accumulators in InP
In this chapter, several accumulator designs will be discussed. All of the accumulators are designed so that they can potentially be integrated as phase accumulators in
DDS circuits. While the accumulators discussed in Section 3.2.3 through Section 3.2.6
are not integrated into DDS circuits, they are fabricated inside test circuits containing a
DAC. The DAC is included to simplify testing by combining several high-speed digital
accumulator outputs into a single high-speed analog output. The accumulators discussed
in Section 3.3.1 and Section 3.3.2 are integrated and fabricated as phase accumulators
in DDS test circuits. Since these accumulators drive phase to sine converters, no direct
measured results of these accumulators are available. However, the operating speed of
these accumulators is inferred from the DDS circuit measurements. These DDS circuits
will be discussed in further detail in Chapter 4.
Two additional accumulator designs are discussed in Section 3.4. The first design uses a triple-tail circuit approach is discussed in Section 3.4.1. This approach is not
implemented as a complete accumulator circuit even though others have reported advantages using a triple-tail approach for high-speed lower power designs [56, 57], because
the simulation results do not show an improvement over the other designs. The second
design is described in Section 3.4.2. Instead of using the current mirror as a current
source, it uses resistor-only current sources. This approach saves a significant amount
of power, but may be risky. At the time of the thesis publication, it is still in fabrication.
Before the accumulator designs can be discussed in detail, it is necessary to
review some of the basics of adders and accumulators. Much of the information in the
following section can be found in an introductory digital logic text, such as R. Katz [54],
or a more advanced text, such as D. Hodges et al. [11].

35

A
0
0
0
0
1
1
1
1

B Cin
0
0
0
1
1
0
1
1
0
0
0
1
1
0
1
1

Cout
0
0
0
1
0
1
1
1

S
0
1
1
0
1
0
0
1

Table 3.1: Truth table for the full adder building block. A and B are the two input bits,
Cin is the carry input, S is the sum, and Cout is the carry output.

3.1

Review of Adder and Accumulator Basics
This section discusses some of the basics of adders and accumulators. The full

adder is introduced as the primary building block for multiple bit adders. Basic multiple
bit adder architectures including the carry ripple adder, carry look ahead adder, carry
select adder, and pipelined adders are then discussed. Finally, an overview of the unique
attributes of accumulator circuits in comparison to adder circuits is discussed.

3.1.1

Full Adder
The full adder is the basic building block in many adder designs. It performs the

operation of adding two bits (A and B), combined with a carry input (Cin ). It outputs
both the sum (S) and carry output (Cout ) of the addition operation. The truth table for
the full adder is shown in Table 3.1.
As shown in Table 3.1, the output of the full adder is the 2-bit sum of three 1-bit
inputs, where the Cout is the most significant bit (MSB) and S is the least significant bit
(LSB) of the 2-bit sum. The carry output is high whenever two or three of the inputs (a
majority of the inputs) are high. Logically, this is

Cout = A·B + A·Cin + B·Cin = A·B + Cin ·(A + B).

36

(3.1)

clk

Full
Adder

C(3)
clk

B(3)

A(3)

B(2)

A(2)
C(2)

Full
Adder

C(4)

S(3)

Full
Adder

S(2)

clk

B(1)

A(1)
C(1)

S(1)

Full
Adder
S(0)

clk

B(0)

A(0)
C(0)

Figure 3.1: 4-bit carry ripple adder formed by stringing together full adders.

The sum output is high when either one or three of the inputs are high. Logically, the
this is
S = A⊕B⊕Cin .

3.1.2

(3.2)

Carry Ripple Adder
The simplest type of multiple-bit adder is formed by stringing together multiple

full adders. An example of a 4-bit carry ripple adder is shown in Figure 3.1. The clock
signal latches the sum outputs at the end of the addition operation. The carry outputs
occur within a single clock cycle. Although a 4-bit example is shown, the concept can
be extended to larger bit-widths. This circuit is called the carry ripple adder, because
changes in the carry bits “ripple” or propagate from the LSB to the MSB of the adder.
The carry ripple adder is slow, especially for large bit-widths. The worst case
propagation delay of the carry ripple adder is dependent on the propagation delay of
the carry circuit (tcarry ), the propagation delay of the sum (tsum ), and the bit width. In
general, it is given by the propagation delay through all but the MSB carry, plus the
MSB sum propagation delay, or

tcarryripple = (n − 1) · tcarry + tsum .

37

(3.3)

For a 4-bit carry ripple adder, the propagation delay is 3·tcarry + tsum . While many
circuit implementations are possible, for simplicity it will be assumed that only twoinput logic gates are available and that all two-input gates have a propagation delay of
tgate . In reality, and as will be shown later in this thesis, this assumption does not always
hold. The gate delay is dependent on the technology and design approach used, so it
is not necessarily equal for the sum and carry circuits. Also, it is possible to use threeinput gates which may have a different propagation delay than two-input gates. Despite
the imperfections in the assumption, it is useful for making a rough comparison to other
adder architectures.
Continuing with the rough assumption of a tgate propagation delay for all twoinput gates, the carry (tcarry ) and sum (tsum ) propagations delays are both 2·tgate , since
these each of these gates are comprised of a pair of two-input gates. As a result, the
4-bit carry ripple adder has a worst case propagation delay of 8·tgate . This worst case
propagation delay increases linearly with bit-width.

3.1.3

Carry Lookahead Adder
An alternative to the carry ripple adder that reduces the propagation delay is

the carry lookahead adder. To reduce dependence on the carry propagation delay, the
full adders in the carry lookahead adder produce “generate” and “propagate” signals
independently of the Cin input. The generate signal (G) is high whenever the full adder
block would have a carry out, regardless of the state of Cin . The generate signal is

G = A·B.

(3.4)

The propagate signal (P) is high if and only if the full adder block would generate a
carry output when Cin is high. The propagate signal is

P = A⊕B.
38

(3.5)

For the ith stage of multi-bit wide adder, the carry input is Ci and the carry output
is Ci+1 . The sum and carry output at the ith stage of an adder are

Si = Pi ⊕Ci ,

(3.6)

Ci+1 = Gi + Pi ·Ci .

(3.7)

Assuming that the carry input to a 4-bit adder is a logic low (C0 =0), the propagate and
generate signals can be used to compute the sum such that:

S0 = P 0

(3.8)

S1 = P1 ⊕G0

(3.9)

S2 = P2 ⊕ (G1 + P1 ·G0 )

(3.10)

S3 = P3 ⊕ (G2 + P2 ·G1 + P2 ·P1 ·G0 ) .

(3.11)

The carry lookahead adder is not dependent on the carry rippling through every full
adder circuit, so its propagation delay is dependent on the propagation delay of the most
complex logic function in the circuit. An example of a 4-bit carry lookahead adder is
shown in Figure 3.2.
In the 4-bit adder example, the propagation delay is dependent on the logic used
to implement S3 . Using the rough assumption for comparison from Section 3.1.2 that
each two-input gate has a propagation delay of tgate , the 4-bit carry lookahead adder
has a worst case propagation delay of 5·tgate . For larger bit-widths, the carry lookahead
adder can be broken up into groups of 4-bit carry lookahead adders that each have a
group propagate and group generate signal. For a 16-bit carry lookahead adder, the
worst case propagation delay is 10·tgate . This is much faster than a 16-bit carry ripple
adder, which has a worst case propagation delay of 32·tgate .

39

C0

A0

B0

A1

B1

A2

B2

A3

B3

G0

P0

G1

P1

G2

P2

G3

P3

Carry Lookahead Logic
S0

S1

S2

S3

C4

Figure 3.2: 4-bit carry lookahead adder.

3.1.4

Pipelined Adders
Since the propagation delay determines the maximum clock frequency at which

the adder can be operated, a reduced propagation delay will lead to a faster adder. The
propagation delay can not always be reduced in a technology, but it can be reduced in
effect by using pipelining. In pipelining, latches are inserted in between sections of
logic within the adder. This effectively reduces the total adder propagation delay to the
propagation delay between latches and allows for the clock speed to be increased.
Pipelining can be integrated in to any adder architecture by inserting clocked
pipeline registers in between sections of logic. Thus, a carry ripple adder or a carry
lookahead adder can be implemented with pipelining. A potential disadvantage of using
pipelining in an adder is that pipelining introduces latency between the input of data and
when the specific result dependent on the data is output. The latency is determined by
the number of stages, or slices of logic between latches, in the adder. Depending on
the application, latency may or may not be a problem. In pipelined DDS applications,
for example, an increase in latency decreases DDS agility. However, this is usually
not a problem because the typical decreases is agility are generally acceptable for most
applications.
40

Despite the latency inherent in the pipeline approach, the maximum clock frequency is increased in comparison to other adders without pipelining. This is because
a sum is output every clock cycle, while partially computed sums are stored within the
pipeline latches. In the extreme case, latches can be inserted between every logic gate
so that the clock speed is dependent on the propagation delay of a single logic gate plus
the propagation delay of the latch. As the number of pipeline stages increases, so does
the power consumption of the adder. Not only is extra power needed for the latches, but
it is also required by the circuitry that supplies a clock signal to the latches.

3.1.5

Accumulators
Accumulators are a special case of adders. They are typically formed by feeding

the latched sum output of the adder back into the B input. For convenience, the discussion of accumulation feedback will deal with the feedback into the B input, but it is
possible for the A input to be used instead. During each clock cycle, the adder sums the
accumulation increment (A) and the previous sum (B) to operate as an accumulator.
In a DDS, the A input typically changes at a much lower frequency than the
clock, because it is used as the frequency control word (FCW). Thus, while the circuitry
associated with the B input must be capable of operating at the clock frequency, the
circuitry associated with the A input can have lower performance. The DDS phase
accumulator can be configured so that some of the performance on the A input circuitry
is traded for improved performance on the B input circuitry. Since the accumulator
performance is limited by the B input circuitry, the accumulator operating frequency is
increased. This allows for a higher accumulator maximum frequency compared to an
equivalent adder design.
Power can also be reduced in an adder configured as an accumulator. In a
pipelined adder, the data must be buffered both before and after the accumulation operation is completed to ensure that the pipelined data is in the proper clock cycle of

41

buffering. As described in [58], the accumulator can feed the sum output back locally
to the adder blocks, eliminating the need for the pre-buffering registers on the A inputs.
This modification cuts the number of buffering registers in half, greatly reducing the
power.
In the following sections, several accumulator designs will be discussed in depth.

3.2

High-Speed 4-bit Accumulators
The first set of accumulators are designed with high-speed operation as the main

design goal. All of the accumulators have a bit-width of 4-bits. This short bit-width allows for demonstrating the operation of the accumulators while keeping a relatively low
transistor count. This helps to minimize problems with power, heat, clock distribution,
and reliability, so that the basic circuit designs are the main focus.
In previous work in InP by T. Mathew et al. [27, 59], only accumulator components were implemented, not a complete accumulator circuit. The previous work
used four-level series-gated logic merged with latches to implement carry and sum circuits. This design methodology is described further in Section 3.2.3. The previous work
demonstrated a carry test circuit that was configured as a divide by two circuit operating
at a maximum clock frequency of 19 GHz.
Four accumulators are presented in this section. In Section 3.2.3, ACCV1 is
a 4-bit accumulator based on the design reported by T. Mathew et al. [27, 59]. Like
the previous work, a carry test circuit is implemented as a divide by two circuit for
comparison. This accumulator also extends upon the previous work by implementing a
full 4-bit accumulator, instead of just the accumulator components.
The four-level series-gated sum and carry circuits in ACCV1 require a large
supply voltage, but many of the other circuits in the accumulator do not. Particularly
in the latch circuits, which will become a large portion of the circuit as bit-widths are
extended, the large supply voltage results in wasting power. In Section 3.2.4, ACCV2 is
42

presented as a step towards reduced power consumption. This accumulator uses a singlelevel parallel-gated carry circuit instead of the four-level series-gated carry circuit used
in ACCV1. Since the sum circuit is not changed, the supply voltage and power are not
reduced compared to ACCV1. However, the ACCV2 design is useful as a test bench for
the new carry circuit and for a comparison to the previous design, since only the carry
circuit is changed.
Reduced power accumulator designs ACCV3 and ACCV4 are presented in Section 3.2.5 and Section 3.2.6. These accumulators use a three-level series-gated sum
circuit that allows for the reduction of the supply voltage by a diode drop. This power
supply reduction leads to a reduced power consumption. ACCV3 and ACCV4 have
a similar architecture, but some layout and clock buffering differences. ACCV3 uses
a layout and clock buffering that is similar to the ACCV1 and ACCV2 designs, so it
provides a more direct comparison. ACCV4 uses a more compact layout topology and
stronger clock buffers, so that a more compact layout can be investigated and a more
conservative approach to clock buffering is used.

3.2.1

Test Circuits
In the previous work reported by T. Mathew et al. [27, 59], the carry circuit

was the limiting factor for accumulator performance because it was slower than the
sum circuit. To estimate an upper bound for accumulator performance, the performance
limiting carry circuit is tested as a divide by two circuit. The divide by two test circuit
also provides a test of the four-level series-gated carry circuit functionality.
The divide by two carry test circuit is shown in Figure 3.3. It is used for testing
the four-level series-gated and single-level parallel-gated carry circuits in ACCV1 and
ACCV2 in Section 3.2.3 and Section 3.2.4. It uses a carry circuit with a logic high on
the ‘A’ input, a logic low on the ‘B’ input, and with the carry output inverted and fed
back into the ‘C’ input. Since the ‘A’ input is high and the ‘B’ input is low, the carry

43

C

output

C

0
B Carry &
1
A Latch
clk

0
B Carry &
1
A Latch
clk

Figure 3.3: Block diagram of divide by two test circuit.

A(3:0)

S(3:0)
4-bit
clk Accumulator

DAC

output

Figure 3.4: Block diagram of the 4-bit accumulator test circuit. The DAC combines the
four high-speed digital sum outputs of the accumulator (S(3:0)) into a single high-speed
analog output. This output can be observed on a sampling oscilloscope.

circuit tracks the ‘C’ input. The inversion of the carry output into the ‘C’ input results
in a divide by two circuit. A buffer drives the divide by two signal off of the test chip.
In addition to the divide by two test circuits for ACCV1 and ACCV2, all four of
the 4-bit accumulator test circuits contain a 4-bit DAC for testing purposes. An example
of a 4-bit accumulator test circuit is shown in Figure 3.4. The design allows for any
4-bit increment (0 through 15) to be input into the accumulator in A(3:0). Since the
accumulator has four high-frequency outputs (S(3:0)), the on-chip 4-bit DAC is used
to generate a single high-speed analog output. The DAC output preserves all of the
information for determining proper operation in an analog output that can be observed
on a sampling oscilloscope.

44

DC Probe
Card
HP 83650B
Clock Source

GSGSG
Probe

HP8340A
Trigger Source
GSGSG
Probe

DUT

Bias T

Agilent 86100B
Oscilloscope
Agilent 86117A
RF Input Card

DC Probe
Card

Figure 3.5: Test setup for frequencies below 50 GHz.

DC Probe
Card
HP 83650B
Clock Source

Source
Multiplier

GSGSG
Probe

DUT

HP8340A
Trigger Source
GSGSG
Probe

Bias T

Agilent 86100B
Oscilloscope
Agilent 86117A
RF Input Card

DC Probe
Card

Figure 3.6: Test setup for frequencies above 50 GHz.

3.2.2

Measurements
The divide by two carry test circuits and accumulator test circuits are tested on-

wafer using an Alessi probe station with HP and Agilent test equipment. DC inputs are
brought onto the device under test (DUT) with multi-pin probe cards, and RF inputs and
outputs are interfaced with high-frequency ground-signal-ground-signal-ground (GSGSG) probes. An additional frequency multiplier is needed for input clock frequencies
above 50 GHz for the carry divide by two test circuits. The test setup for clock frequencies below 50 GHz is shown in Figure 3.5, and the test setup for test frequencies above
50 GHz is shown in Figure 3.6.

3.2.3

Accumulator ACCV1
A design based on the work of T. Mathew et al. [27, 59] is implemented as a

baseline 4-bit accumulator, named ACCV1, in the InP DHBT technology described in
45

carryn
carryp

Cp

Bp

Cn Bp

Bn

Bn

Cp

Cn

Ap

An
carry logic

clkp

clkn

latch

Figure 3.7: Four-level series-gated carry and latch circuit.

Section 2.1. Unlike traditional pipelined designs, the ACCV1 design merges combinational logic structures with latches, using four-level series-gated logic. The four-level
series-gated carry and sum circuits are shown in Figure 3.7 and Figure 3.8, respectively.
The four-level series-gated logic leads to high power consumption compared to an approach with separate logic and latches, because it is necessary to add an extra diode drop
to the power supply in order to support the clock transistors added to the stack of transistors. However, overall circuit performance is increased in comparison to approaches
with separate logic and latches because the front-end buffers of the latches are replaced
with carry and sum logic, reducing the overall propagation delay.
Unlike the T. Mathew et al. design [27, 59], which only implements separate
sum and carry circuits, the ACCV1 is a full 4-bit accumulator. The 4-bit accumulator is
pipelined, and it is built from 2-bit adder blocks and 2-bit register blocks. The ACCV1 4bit accumulator is shown in Figure 3.9. The 2-bit adder blocks are formed by combining

46

sumn
sump
Cn

Cp

Cp

Bn

Bp

Ap

Bp

An
sum logic

clkp

clkn

latch

Figure 3.8: Four-level series-gated sum and latch circuit.

carry circuits, sum circuits, and latches, as shown in Figure 3.10. The components labelled ‘carry & latch’ and ‘sum & latch’ refer to the circuits in Figure 3.7 and Figure 3.8
that merge the logic with latches. The pipelined structure of the 4-bit accumulator in
Figure 3.9 can be easily expanded to an arbitrary 2N-bit width [58]. This is particularly
useful for DDS designs, where larger bit widths are typically required for improved
SFDR and finer frequency resolution. Since the accumulator is pipelined, an expanded
data word size can be achieved with little impact on high frequency performance.

3.2.3.1

Simulation Results
The divide by two carry test circuit simulates up to a maximum clock frequency

of 55 GHz. An output of the divide by two carry test circuit simulation is shown in
Figure 3.11. In simulation, the 4-bit accumulator ACCV1 operates up to a maximum
clock rate of 43 GHz. In the ACCV1 simulation results shown in Figure 3.12, only
the outputs of the accumulator prior to the DAC are shown in order to more clearly

47

B(1:0)

S(1:0)

2-bit

C(0)
A(1:0)

Adder

2-bit
C(2)

Register

clk

clk
B(3:2)

S(3:2)

C(2)

2-bit

A(3:2)

Adder

C(4)

clk

Figure 3.9: Block diagram of the pipelined 4-bit accumulator using 2-bit adders and
2-bit registers.

B(0)
A(0)
C(0)

C(2)

C(1)
A(0)

Carry &
Latch

clk

Carry &
Latch

clk
S(1)

B(1)
A(1)

Latch
clk

Sum &
Latch

clk

B(0)
A(0)
C(0)

S(0)
Sum &
Latch

Latch

clk

clk

Figure 3.10: Block diagram of the 2-bit adder comprised of carry, sum, and latch circuits.

48

Circuit
Accumulator Core

Sub-Circuits
2-Bit Accumulator
2-Bit Register
Clock Circuitry

Support Circuity
Input Buffers
Output Stage/DAC
Bias Generators
Total

Power
3.008 W
829 mW
550 mW
800 mW
1.163 W
472 mW
600 mW
91 mW
4.171 W

Table 3.2: Simulated power consumption breakdown for the ACCV1 accumulator test
circuit.

illustrate the proper operation of the 4-bit accumulator. It should also be noted that the
test chips have differential internal circuitry, but only single-ended outputs are shown in
Figure 3.12 for clarity.
The accumulator core, which consists of two 2-bit accumulators, a 2-bit register,
and the associated clock circuitry consumes 3.008 W of power in simulation. Other
circuitry needed for the accumulator test circuit, such as the DAC and input buffers uses
1.163 W. The ACCV1 accumulator test circuit uses a total of 4.171 W. A breakdown of
the simulated power consumption is shown in Table 3.2.

3.2.3.2

Measurement Results
The divide by two carry test circuit and the accumulator test circuit were fabri-

cated in both TC6 and TC7 design runs. They were tested using the test setup described
in Section 3.2.2. The test results are described below.

3.2.3.2.1 TC6 Measurement Results

The carry test circuit was found to operate up

to a maximum clock frequency of 52 GHz. The measured output signal of 26 GHz, generated from a 52 GHz clock is shown in Figure 3.13. This output was captured using a
high-speed sampling oscilloscope. There is some attenuation in the signal compared to

49

0
−0.05
−0.1

output (V)

−0.15
−0.2
−0.25
−0.3
−0.35
−0.4
−0.45
−0.5
150

200

250

300

time (ps)
Figure 3.11: Simulation of the four-level parallel-gated carry test circuit. The test circuit
uses the carry circuit as a divide by two circuit to estimate an upper bound for the accumulator operating frequency. This simulation includes extracted parasitic capacitors
and is shown at the maximum operational simulated clock frequency of 55 GHz.

50

S3 (V)

1.1

1.2

1.3

1.4

1.1

1.2

1.3

1.4

1.1

1.2

1.3

1.4

1.1

1.2
time (ns)

1.3

1.4

S2 (V)

−1
−1.2
−1.4
1

S1 (V)

−1
−1.2
−1.4
1

S0 (V)

−1
−1.2
−1.4
1
−1
−1.2
−1.4
1

Figure 3.12: Simulation of the ACCV1 4-bit accumulator using four-level series-gated
logic. The plot shows the outputs of the four sum bits. The simulation includes extracted
parasitic capacitors and is shown at the maximum operation frequency of 46 GHz.

51

Result
Fail (Short)
Fail (Correct Power, No Output at 50 GHz)
Pass (Operational for 50 GHz Clock or Above)

Number of Devices
5
1
6

Table 3.3: Yield for the four-level series-gated carry divide by two test circuit in fabrication run TC6.
Result
Fail (Short)
Fail (Low Current)
Fail (Correct Current, No Output)
Fail (Incorrect Output)
Pass

Number of Devices
10
11
2
6
0

Table 3.4: Yield for the ACCV1 accumulator test circuit in fabrication run TC6.

the simulation result because of extra parasitic capacitance that is added by the probes
and cables that connect the test chip to the test and measurement equipment. This result
is only 3 GHz, or 5.5% lower than the maximum simulated clock frequency. This difference could be due to process variations or inaccuracies in the models. The VIP-2 process
was being modified during fabrication run TC6, so the model accuracy is questionable.
It should be noted that circuit yield in this particular fabrication run was low,
so there were very few working test sites on-wafer for comparison to simulation data.
Shorts in the capacitors were a particular problem. A table of the yield for the divide
by two test circuit is shown in Table 3.3. Low yield in this fabrication run also resulted
in no working ACCV1 4-bit accumulator test circuits. Failure modes included shorted
capacitors, circuits that were under current, circuits with correct currents, but no outputs,
and circuits with incorrect outputs. A table of the yield for the ACCV1 accumulator
test circuit is shown in Table 3.4. Since the working carry test circuits were close to the
simulation frequency, it is expected that ACCV1 would have performed near the 43 GHz
clock rate expected from simulation had the yield been higher.

52

Figure 3.13: High-speed sampling oscilloscope screen capture of the 26 GHz output
from the four-level series-gated carry test circuit operating as a divide by two circuit
with a 52 GHz clock. This output is attenuated compared to the simulation results
because of additional parasitic capacitance from the probes and cables that connect the
test chip to the test equipment.

53

Result
Fail (Short)
Fail (Other)
Pass

Divide by Two
1
3
6

ACCV1
6
8
3

Table 3.5: Yield for the four-level series-gated carry divide by two and ACCV1 accumulator test circuits in fabrication run TC7. A device is considered to pass if it operates
correctly above a 24 GHz clock rate.

3.2.3.2.2 TC7 Measurement Results

The four-level series-gated carry divide by

two test circuit and the ACCV1 test circuit were also fabricated as part of fabrication
run TC7. The were tested in the same manner used for the TC6 fabrication run. The
divide by two carry circuit operated up to a maximum clock frequency 51 GHz, which
is comparable to the 52 GHz maximum measured from TC6. This is still slower than
expected from simulation. Three ACCV1 devices were operational in the TC7 fabrication run. The fastest ACCV1 test circuit operated up to a maximum clock frequency of
38 GHz and was measured to consume 3.8 W of power. This is slower than expected
from simulation, however, other test circuits were also found to operate slower than
expected on this fabrication run. Output waveforms were not captured for either test
circuit. The yield for both the divide by two and ACCV1 test circuits is shown in Table 3.5. Capacitor shorts were again a problem in this fabrication run, particularly in the
ACCV1 test circuit, which had a greater number and larger capacitors than the divide by
two test circuit. A device is considered to pass if it operates correctly above a 24 GHz
clock rate.
The measured power for ACCV1 was 3.8 W, including all of the circuitry such
as the DAC. This is lower than the simulated power consumption of 4.171 W. Discrepancies between the models and the fabricated devices could explain this difference.
The internal components of ACCV1 do not have separate power supplies, so the power
breakdown must be estimated. Using the ratios established Table 3.2, these estimates

54

can be determined. The total consumption of the 4-bit accumulator is 2.76 W. The 4bit accumulator has two 2-bit adders consuming 0.76 W of power each, a 2-bit register
consuming 0.51 W of power, and clock tree circuitry that consumes 0.73 W of power. If
the clock tree is partitioned so that the adders and registers include the power from the
clock tree circuitry that they require, the 2-bit adders would use 1.00 W and the 2-bit
registers would use 0.76 W.

3.2.4

Accumulator ACCV2
The use of four-level series-gated logic in ACCV1 described in Section 3.2.3

allows for high-speed operation, but it is achieved at the expense of high power consumption. The power supply voltage must be high enough to support the output voltage
swing, three diode drops from the logic, one diode drop from the clock, a diode drop
from the current source, and the voltage drop across the current source degeneration
resistor. In this technology, this requires a -5.5 V supply voltage. In some circuitry only
two total diode drops are actually required for operation. As a result, power is wasted
when these circuits are connected to a high supply voltage. If even one diode drop could
be removed from the power supply overhead, then substantial power savings can be realized. The latch and register circuits are an example of a circuit that requires only two
diode drops. In high-speed pipeline accumulators, the latch and register circuits comprise a significant portion of the total circuity. To move towards an accumulator with
a lower supply voltage, the four-level series-gate carry circuit shown in Figure 3.7 is
replaced with the single-level series-gated carry circuit [50] shown in Figure 3.14. A
patent application was filed for the single-level parallel-gated carry circuit [60].
Single-level parallel-gated logic is well suited for the carry circuit because it
performs a majority operation, essentially detecting when two or three of the inputs are
high. When all three inputs are either high or low, a full differential is seen across Xp
and Xn, since all of the current is steered through one leg of the circuit. When one or

55

carryp
carryn
Ap

Bp

Cp

Cn

Bn

An

clkp

carry logic

clkn

latch

Figure 3.14: The single-level parallel-gated carry circuit with cascaded latch.

two of the inputs are high, the differential across Xp and Xn is reduced, since 1/3 of the
current is steered through one leg of the circuit while 2/3 of the current is steered through
the other leg of the circuit. Although this method has a reduced differential across Xp
and Xn for some input states, this signal is sampled by the latch and a full differential
is generated for propagation to subsequent circuitry. Figure 3.15 shows a simulation of
the single-level parallel-gated carry circuit schematic illustrating the output voltage at
Xp and Xn of the carry for possible input combinations, as well as the full differential
output after the first stage of the latch.
Like ACCV1 described in Section 3.2.3, ACCV2 uses the four-level series-gated
sum circuit shown in Figure 3.8. As a result, the supply voltage can not be reduced in
ACCV2. In this implementation, the single-level parallel-gated carry circuit uses more
current than the four-level series-gated carry circuit, so overall power consumption in
ACCV2 is actually slightly higher than the previous design. However, ACCV2 provides an intermediate design point where only the carry circuit is changed compared to
56

Xp (V) (solid) , Xn (V) (dashed)

0

−0.5

0 inputs
high

1 input
high

2 inputs
high

3 inputs
high

−1
0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.35

0.4

Zp (V)

0
−0.2

0 or 1 inputs high

2 or 3 inputs high

−0.4
0.05

0.1

0.15

0.2
0.25
time (ns)

0.3

Figure 3.15: Simulation of the output of the single-level parallel-gated carry circuit.
The upper plot shows the areas with reduced differential for the Xp and Xn outputs for
the states where 0, 1, 2, and 3 inputs are high. The lower plot illustrates how the full
differential is regained by buffering from the latch.

57

ACCV1. The single-level parallel-gated carry circuit illustrates a proof of concept for
reducing power in later designs. If the sum circuit that is implemented in ACCV2 is
replaced by a circuit using fewer levels, then the accumulator can have a lower supply
voltage and operate at a lower power. The power reduction associated with the removal
of one diode drop from the supply voltage is expected to be around 15%. Accumulators with alternative sum circuit designs that allow for lower power supply voltages and
reduced power consumption are discussed in later sections.

3.2.4.1

Simulation Results
As in Section 3.2.3, a carry test circuit and an accumulator test circuit are both

simulated. The single-level parallel-gated carry test circuit simulates as a divide by two
circuit at a maximum clock frequency of 52 GHz. This simulation output is shown in
Figure 3.16. The result is slower than the simulation from the four-level series-gated
design by only 3 GHz, or 5.5%.
The ACCV2 4-bit accumulator test circuit simulates up to a maximum clock
frequency of 46 GHz. A plot of the four sum output bits from the accumulator at the
maximum frequency of 46 GHz is shown in Figure 3.17. Overall, the accumulator
simulated 3 GHz faster than the previous design. This is unexpected, since the carry
test circuit is slower and the carry should be the limiting factor on high-speed operation.
Even though ACCV1 and ACCV2 were very similar designs, subtle layout differences
between the two carry designs may have potentially led to enough variation in parasitic
capacitance to account for this difference.
The accumulator core, which consists of two 2-bit accumulators, a 2-bit register,
and the associated clock circuitry uses 3.320 W in simulation. Other circuitry needed
for the accumulator test circuit, such as the DAC and input buffers uses 1.163 W. The
ACCV2 accumulator test circuit uses a total of 4.483 W. This is higher than the 4.171 W

58

0
−0.05
−0.1

output (V)

−0.15
−0.2
−0.25
−0.3
−0.35
−0.4
−0.45
−0.5
150

200

250

300

time (ps)
Figure 3.16: Simulation of the single-level parallel-gated carry test circuit. The test
circuit uses the carry circuit as a divide by two circuit to provide an estimate of the
upper bound of the accumulator operating frequency.This simulation includes parasitic
extracted capacitors and is shown at the maximum operation frequency of 52 GHz. The
simulation shows the output at the output pad of the extracted test chip layout, so the
extracted parasitic capacitances affect the waveform shape.

59

S3 (V)

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.55

0.6

0.65

0.7 0.75
time (ns)

0.8

0.85

0.9

S2 (V)

−1
−1.2
−1.4
0.5

S1 (V)

−1
−1.2
−1.4
0.5

S0 (V)

−1
−1.2
−1.4
0.5
−1
−1.2
−1.4
0.5

Figure 3.17: Simulation of ACCV2 showing the outputs of the four sum bits. The simulation includes parasitic extracted capacitors and is shown at the maximum operation
frequency of 46 GHz.

60

Circuit
Accumulator Core

Sub-Circuits
2-Bit Accumulator
2-Bit Register
Clock Circuitry

Support Circuity
Input Buffers
Output Stage/DAC
Bias Generators
Total

Power
3.320 W
985 mW
550 mW
800 mW
1.163 W
472 mW
600 mW
91 mW
4.483 W

Table 3.6: Simulated power consumption breakdown for the ACCV2 accumulator test
circuit.

simulated power consumption of ACCV1, since the single-level parallel-gated carry circuit uses more current sources and the same supply voltage. However, as stated earlier,
ACCV2 provides a proof of concept for the single-level parallel-gated carry circuit, and
it is not intended to be a lower power stand-alone accumulator. A breakdown of the
simulated power consumption for the ACCV2 accumulator is shown in Table 3.6.

3.2.4.2

Measurement Results
The circuits were fabricated as part of the TC6 fabrication run. Both the carry

and accumulator test circuits were tested using the same test setup as the ACCV1 test
circuits. These test setups are described in further detail in Section 3.2.2.
A microphotograph of the ACCV2 divide by two carry test chip is shown in
Figure 3.18. The carry test circuit was measured to operate up to a maximum output
frequency of 27.5 GHz with a clock frequency of 55 GHz, and is shown in Figure 3.19.
This is faster than the simulation frequency of 52 GHz by 3 GHz. Yield for the carry
test single-level parallel-gated circuit is shown in Table 3.7.
A microphotograph of the ACCV2 4-bit accumulator test chip is shown in Figure 3.20. The sum bits from the accumulator could not be directly measured, however,
the output of the DAC demonstrates proper operation of the accumulator. Two examples

61

Figure 3.18: Microphotograph of the ACCV2 single-level parallel-gated carry test chip.
The chip is 1220 µm by 1025 µm.

Figure 3.19: Oscilloscope screen capture of the carry test circuit output at 27.5 GHz
with a 55 GHz clock frequency.

62

Result
Fail (Short)
Fail (Correct Power, No Output at 50 GHz)
Pass (Operational for 50 GHz Clock or Above)

Number of Devices
4
3
5

Table 3.7: Yield for the four-level series-gated carry divide by two test circuit in fabrication run TC6.
Result
Fail (Short)
Fail (Low Current)
Fail (Correct Current, No Output)
Fail (Incorrect Output)
Pass

Number of Devices
8
3
1
1
1

Table 3.8: Yield for the ACCV2 accumulator test circuit in fabrication run TC6.

of measured accumulator output sequences are shown in Figure 3.21 and Figure 3.22.
In the first example, an input increment of 7 is used the create the digital sequence of 15,
6, 13, 4, . . . In this configuration, the 16 discrete DAC output voltage levels are clearly
illustrated in Figure 3.21, while operating with a clock frequency of 41 GHz. In the second example, the accumulator is configured as a divide by two circuit by using an input
increment of 8. In this case, the most significant bit of the 4-bit digital value changes
every clock cycle, creating a 20.5 GHz output signal from a 41 GHz clock frequency, as
shown in Figure 3.22. The 41 GHz maximum operation frequency was lower than the
expected operating frequency of 46 GHz. However, yield was low for the ACCV2 test
circuit in this fabrication run (TC6), with only one operational test site out of 14 tested
sites. The yield for the ACCV2 test circuit is shown in Table 3.8. Therefore, there is not
enough information to determine if the performance difference was due to circuit design
or if the single operational accumulator was on a “slow” test site.
The measured power for ACCV2 was 4.1 W, including all of the circuitry such
as the DAC. This is lower than the simulated power consumption of 4.483 W. Discrepancies between the models and the fabricated devices could explain this difference.

63

Figure 3.20: Microphotograph of the ACCV2 test circuit. The chip is 1725 µm by
1025 µm.

Figure 3.21: Oscilloscope screen capture of the DAC output of ACCV2 4-bit accumulator test circuit with 41 GHz clock frequency and input increment of 7. The digital output
sequence is labelled on waveform.

64

Figure 3.22: Oscilloscope screen capture of the DAC output of the ACCV2 4-bit accumulator test circuit with 41 GHz clock frequency and input increment of 8 acting as a
divide by two circuit with a 20.5 GHz output.
The internal components of ACCV2 do not have separate power supplies, so the power
breakdown must be estimated. Using the ratios established Table 3.6, these estimates
can be determined. The 4-bit accumulator is estimated to consume 3.04 W, with two
2-bit adders at 0.90 W each, a 2-bit register at 0.51 W, and clock tree circuity at 0.73 W.
If the clock tree is divided so that the adders and registers include the power from the
clock tree circuitry that they require, the 2-bit adders would use 1.14 W and the 2-bit
registers would use 0.76 W. The 41 GHz ACCV2 design is reported in [50].

3.2.5

Accumulator ACCV3
The single-level parallel-gated carry circuit in Section 3.2.4 takes a step towards

reduced power consumption since it can use a lower power supply voltage than is required for designs using four-level series-gated logic. However, a lower supply voltage
can not be used for the ACCV2 circuit since the supply voltage is still constrained by
the four-level series-gated sum circuit. Alternative sum circuits with fewer series gates

65

sumn
Xn

sump

Xp
Bn

Bp

Ap

Xn

Bp Xp

An

Cp

Xp

Cn

clkp

clkn

sum logic

latch

Figure 3.23: Three-level series-gated sum circuit.
would allow for a power supply voltage reduction when used in conjunction with the
single-level parallel-gated carry circuit. One such alternative is the three-level seriesgated sum circuit [61] shown in Figure 3.23 and incorporated in accumulator ACCV3.
While this sum circuit topology still has logic merged with a latch, it uses an
additional stage of logic, so that one of the diode drops can be eliminated. In general,
the addition of an extra stage of logic leads to a longer propagation delay and a lower
maximum operating frequency. However, in the accumulator the relation of the inputs
to the sum circuit allows for operation at roughly the same speed as the four-level seriesgated sum circuit. In the first stage of the sum circuit, the ‘A’ (accumulation increment)
inputs change only at low frequency, and the ‘B’ inputs change only when ‘clkn’ is
active. In this configuration, the ‘X’ outputs of the first stage are settled before the
logic cascoded with ‘C’ and ‘clkp’ in the front-end of the second stage becomes active.
As a result, the added stage has no impact on propagation delay. The second stage of
the sum has only three cascaded levels as opposed to four in the previous design, so
it should actually have a slightly faster overall propagation delay. While the new sum
66

Figure 3.24: Layout view of the ACCV3 4-bit accumulator test chip. The test chip
includes DAC output circuitry and is 1725 µm by 1025 µm. The 4-bit accumulator
occupies an area of 510 µm by 575 µm.

circuit is faster than in the previous designs, the single-level parallel-gated carry circuit
dominates the critical timing path. The net effect is that both designs have roughly the
same maximum clock frequency, but the power is reduced in ACCV3.
The ACCV3 4-bit accumulator has a similar layout topology to the ACCV1 and
ACCV2 designs in Section 3.2.3 and Section 3.2.4, except that the accumulator circuit
is rotated by 90 degrees. The test circuit also uses a redesigned DAC circuit that has
improved linearity. The test circuit layout spreads out the transistors in order to minimize the thermal density of the design. A layout view of ACCV3 test circuit is shown
in Figure 3.24. The test chip includes all of the bond pads and the DAC circuitry. The
4-bit accumulator occupies an area of 510 µm by 575 µm. The layout view is shown
because a microphotograph of ACCV3 was not captured.

67

Result
Fail (Short)
Fail (Other)
Pass

Number of Devices
8
25
4

Table 3.9: Yield for the ACCV3 accumulator test circuit in fabrication run TC7. A
device is considered to pass if it operates correctly above a 24 GHz clock rate.

3.2.5.1

Simulation Results
The ACCV3 accumulator is simulated to operate up to a 40 GHz clock fre-

quency, and the simulation output is shown in Figure 3.25. The three-level series-gated
sum circuit allows for a reduction in the power supply voltage and a significant reduction in power consumption. Some of the design data concerning the ACCV3 design was
lost after tapeout, so a full reporting of a simulated power breakdown for the chip is not
available. However, measured power data is given in the next section.

3.2.5.2

Measurement Results
The ACCV3 accumulator test circuit was tested on-wafer using using the test

setup described in Section 3.2.2. In this fabrication run (TC7), the circuit yield was poor,
and there were only four operational ACCV3 accumulators out of 37 tested, as shown in
Table 3.9. Similarly, the yield was low for other baseline designs that were included on
the wafer in this fabrication run. Approximately one-fourth of the non-yielding circuits
were due to electrical shorts in defective capacitors. A device is considered to pass if it
operates correctly above a 24 GHz clock rate.
The maximum operating clock frequency for the ACCV3 accumulators was
34 GHz, or 15% slower than expected from simulation. Other baseline designs on the
test wafer performed slower by a similar factor. Although the available data is limited
by the poor yield, it is likely that this particular wafer and fabrication run were near a
“slow” process corner. Output waveforms for ACCV3 were not captured.

68

S3 (V)

−1

S2 (V)

−1.5
0.5

S1 (V)

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.55

0.6

0.65

0.7 0.75
time (ns)

0.8

0.85

0.9

−1

−1.5
0.5
−1

−1.5
0.5
S0 (V)

0.55

−1

−1.5
0.5

Figure 3.25: Simulation of the ACCV3 4-bit accumulator using the single-level parallelgated carry circuit and the three-level series-gated sum. This simulation includes parasitic extracted capacitors and is shown at the maximum operation frequency of 40 GHz.

69

The power breakdown for the accumulator components is estimated from a combination of simulation and measured data. The ACCV3 4-bit accumulator consumes a
total of 1.97 W of power. The two 2-bit adders each consume 0.59 W of power, the
2-bit register consumes 0.31 W of power, and the clock tree circuity consumes 0.48 W
of power. If the clock tree is divided so that the adders and registers include the power
from the clock tree circuitry that they require, the 2-bit adders would use 0.75 W and
the 2-bit registers would use 0.47 W.

3.2.6

Accumulator ACCV4
The ACCV4 accumulator uses a nearly identical architecture as the ACCV3 ac-

cumulator, with the exception of clock tree circuitry. The two designs also use different
layout topologies. While the ACCV3 4-bit accumulator uses a layout with spread out
transistors to achieve low thermal density, the ACCV4 accumulator uses a more compact layout topology with a reduced transistor to transistor spacing. While this increases
the thermal density, it reduces the length of the signal and clock interconnects and the
corresponding parasitic capacitances on these interconnects.
The ACCV4 accumulator also has additional clock buffer circuitry as a conservative approach to ensure that the latches have strong clock inputs. A layout view of
ACCV4 test circuit is shown in Figure 3.26. While it is in the same padframe as the
ACCV3 test circuit, the ACCV4 4-bit accumulator circuitry (excluding metal fill and
other circuitry) occupies an area of 370 µm by 350 µm. This is a 44% reduction in
accumulator area compared to the ACCV3 design. The layout view is shown because a
microphotograph of ACCV4 was not captured.

70

Figure 3.26: Layout view of the ACCV4 4-bit accumulator test chip. The test chip
includes DAC output circuitry and is 1725 µm by 1025 µm. The 4-bit accumulator
occupies an area of 370 µm by 350 µm.

3.2.6.1

Simulation Results
Simulations results indicate that the ACCV4 accumulator has a maximum clock

frequency of 43 GHz, as shown in Figure 3.27. The 3 GHz clock frequency improvement over ACCV3 is due to two factors. First, the additional clock buffer circuitry
improves the clock signal, which improves latch performance. Second, since the layout
is more compact, parasitic capacitance on the signal paths is reduced, which reduces the
propagation delay of the signals and increases the maximum clock frequency.
The three-level series-gated sum circuit allows for a reduction in the power supply voltage and a significant reduction in power consumption as was shown by ACCV3,
although the higher power clock circuitry in ACCV4 offsets some of the power savings.
As with ACCV3, a loss of design data prevents the reporting of a full power breakdown
for the chip, but measured power results are included in the next section.

71

S3 (V)

−1

S2 (V)

−1.5
1

S1 (V)

1.2

1.3

1.4

1.1

1.2

1.3

1.4

1.1

1.2

1.3

1.4

1.1

1.2
time (ns)

1.3

1.4

−1

−1.5
1
−1

−1.5
1
S0 (V)

1.1

−1

−1.5
1

Figure 3.27: Simulation of the ACCV4 4-bit accumulator using the single-level parallelgated carry circuit, three-level series-gated sum circuit, and additional clock buffer
circuitry. This simulation includes parasitic extracted capacitors and is shown at the
maximum operation frequency of 43 GHz.

72

Result
Fail (Short)
Fail (Other)
Pass

Number of Devices
6
9
1

Table 3.10: Yield for the ACCV4 accumulator test circuit in fabrication run TC7. A
device is considered to pass if it operates correctly above a 24 GHz clock rate.

3.2.6.2

Measurement Results
The ACCV4 accumulator test circuit was tested using the test setup described

in Section 3.2.2. This circuit was part of the TC7 fabrication run, which showed poor
circuit yield. The ACCV4 accumulator had worse yield than the other designs, with
only one operational site out of 16 tested. Defective capacitors accounted for one-third
of the non-yielding circuits in this design. The yield for ACCV4 in fabrication run TC7
is shown in Table 3.10. A device is considered to pass if it operates correctly above a
24 GHz clock rate.
The single operating ACCV4 accumulator operated at a 35 GHz clock frequency,
which is 18.6% slower than expected from simulation. This was comparable to other
baseline designs on the test wafer. As stated in Section 3.2.5.2, it is likely that this particular wafer and fabrication run were near a “slow” process corner. An output waveform
for ACCV4 was not captured.
The power breakdown for the accumulator components is estimated from a combination of simulation and measured data. The 4-bit accumulator consumes a total of
2.54 W of power. The two 2-bit adders each consume 0.59 W of power, the 2-bit register
consumes 0.31 W of power, and the clock tree circuity consumes 1.05 W of power. If
the clock tree is divided so that the adders and registers include the power from the clock
tree circuitry that they require, the 2-bit adders would use 0.94 W and the 2-bit registers
would use 0.66 W.

73

Design
T. Mathew et al. [27, 59]
Four-Level Series-Gated (ACCV1)
Single-Level Parallel-Gated (ACCV2)

Maximum Clock Frequency
19 GHz
52 GHz
55 GHz

Table 3.11: Comparison of divide by two carry test circuits. The four-level seriesgated and single-level parallel-gated designs in this work exceed the performance of the
previous work by more than 2.7 times the clock frequency. Both of the divide by two
carry test circuits results are from fabrication run TC6.

3.2.7

Summary of High-Speed 4-bit Accumulators
The four 4-bit accumulator designs establish a comparison to previous work by

T. Mathew et al. [27, 59], then extend beyond the previous work using modified sum and
carry circuit designs. The new accumulators go beyond the previous work by achieving
fabricated 4-bit accumulator designs as opposed to accumulator components only. The
ACCV3 and ACCV4 designs show that power can be reduced while still achieving high
clock frequencies.
Compared to the previous four-level series-gated divide by two carry test circuit [27, 59] that operated up to a maximum clock frequency of 19 GHz, the four-level
series-gated and single-level parallel-gated divide by two carry test circuits in Section 3.2.3 and Section 3.2.4 are more than 2.7 times faster. A summary of the divide
by two carry test circuit results is shown in Table 3.11.
The comparison of the 4-bit accumulators is not straightforward. ACCV2 was
measured from fabrication run TC6, while the three other designs have measured results
from fabrication run TC7. By comparing other test structures on the TC7 wafer, it
was discovered that TC7 was a “slow” fabrication run. As a result, the comparison
of the 4-bit accumulator results is not strictly a comparison of design differences only,
since there are some process differences embedded in the results. The maximum clock
frequency results for ACCV1, ACCV3, and ACCV4 are likely to be lower than the
ACCV2 design partially because of the process variations, so the comparison is slightly

74

Design
ACCV1
ACCV2
ACCV3
ACCV4

Fabrication
Run
TC7
TC6
TC7
TC7

Maximum Clock
Frequency
38 GHz
41 GHz
34 GHz
35 GHz

Power
Consumption
2.76 W
3.04 W
1.97 W
2.54 W

Speed/Power
Ratio
13.77 GHz/W
13.49 GHz/W
17.26 GHz/W
13.78 GHz/W

Table 3.12: Comparison of 4-bit accumulator circuits. The differences in circuit performance are partially due to design differences and partially due to process variations.
Results from test structures indicate that the TC7 fabrication run was slower than the
TC6 fabrication run.

flawed. A comparison of the accumulator results is given in Table 3.12. Note that since
the previous work by T. Mathew et al. [27, 59] did not construct a working accumulator,
it can not be compared to the 4-bit accumulator designs in this thesis.
Even without compensating for the “slow” fabrication run, it is clear that it is
possible to reduce power consumption, yet still maintain high-speed operation by moving from a four-level series-gated architecture to an architecture that uses fewer levels.
Comparing the results for ACCV2 and ACCV3 from Table 3.12, a 35% reduction in
power consumption is achieved with only a 17% reduction in speed. Had the process variation in TC7 been more favorable, the speed difference would have been even
smaller. Even comparing ACCV2 to ACCV4, the reduction in power consumption to
the reduction in speed is still favorable, with a 16% reduction in power consumption
and a 14% reduction in speed. It is interesting to note that while ACCV3 had the best
speed/power ratio, the other three designs had nearly identical speed/power ratios. The
ACCV4 design shows that the more compact layout works, and does not suffer from the
increase in thermal density. This is important, because compact layouts are necessary
for extending the accumulator design to larger bit-widths for use in DDS circuits. This
more compact layout topology is used in all the subsequent larger designs in this thesis.

75

3.3

Low Power 8-bit Accumulators
All of the accumulators discussed in Section 3.2 (ACCV1 through ACCV4) were

designed with maximum clock frequency as the most important design metric. The high
operating frequencies of those accumulators comes at the cost of high power consumption. Even with the modifications to reduce power in ACCV3 and ACCV4, the total
power consumption is still quite high. At best, it is nearly 2 W for a 4-bit accumulator.
Larger bit-width accumulators that use even more power are needed for DDS designs.
This section focuses on low power 8-bit accumulators that are used in DDS circuits.
Low power is achieved by reducing the clock rate of the accumulators. With a
target clock rate of around 30 GHz for X-band and Ku -band DDS applications, the extra
power needed to achieve accumulators with 40 GHz clock rates is unnecessary for the
DDS. The ACCV5 accumulator discussed in Section 3.3.1 has clock rate of 32 GHz.
It reduces the power consumption by reducing the maximum operating speed, and it
also extends its bit-width to 8 bits for use in a DDS. The 8-bit ACCV6 accumulator
reduces power consumption even further by operating at an even lower 13 GHz clock
rate. While this DDS is below the target of 30 GHz, it is useful for showing the effect a
reduced frequency has on power consumption and other accumulator architectures that
are possible. The DDS circuits that these 8-bit accumulators are integrated into will be
discussed in further detail in Chapter 4.

3.3.1

Accumulator ACCV5
In order to reduce the power consumption in the accumulator, it is important to

consider that much of the accumulator circuitry, particularly latches, requires a power
supply that supports two diode drops plus the overhead for voltage swing and the current
source. The three-level series-gated sum circuit, however, requires a supply that supports
three diode drops plus the overhead for voltage swing and the current source. Thus, it
constrains the voltage supply to a voltage higher than necessary for the latches, and it
76

sump
Xn

sumn

Xp
Bn

Bp

Ap

Bp Xp

An

Xn

Xp

Cp

Cn
clkp

sum logic

clkn

latch

Figure 3.28: Two-level parallel-gated sum circuit and separate latch circuit.
does not allow for further power reduction. By separating the logic gate and the latch, the
three-level series-gated sum circuit becomes a two-level parallel-gated sum circuit, as
shown in Figure 3.28. This sum circuit is simply two XOR gates followed by a latch. As
was the case in ACCV3 and ACCV4 (described in Section 3.2.5 and Section 3.2.6), the
first XOR gate is driven by the accumulation increment and the sum from the previous
state, so its Xp/Xn output settles shortly after the clock transition. Therefore, the overall
propagation delay of the sum circuit is dominated by the propagation delay of the second
XOR gate, which is dependent on the carry input. By using the two-level parallel-gated
sum circuit, one of the diode drops can be removed from the power supply requirement,
reducing it from -4.6 V to -3.8 V. Since the carry circuit is already parallel-gated, it
remains unchanged in the ACCV5 accumulator.
While low power accumulators are desired for DDS designs, accumulators with
larger bit-widths are also necessary. Unlike the aforementioned 4-bit accumulators, the
ACCV5 accumulator is extended to 8-bits. Using 2-bit accumulator and 2-bit register
blocks, the 8-bit accumulator has the topology shown in Figure 3.29. The dotted boxes
in Figure 3.29 partition the 8-bit accumulator into 4-bit accumulator and 4-bit register

77

blocks. The 8-bit accumulator is essentially an extension of the pipelined [58] 4-bit
accumulator shown in Figure 3.9.

3.3.1.1

Simulation Results
The overall propagation delay of the sum circuit is dependent on the carry cir-

cuit. In simulation, this propagation delay was 6.35 ps. Coincidentally, the carry circuit
also had a simulated propagation delay of 6.35 ps. The latch circuit has a simulated
propagation delay of 8.37 ps. The propagation delays were measured from sum, carry,
and latch circuits integrated in an accumulator circuit, so the loading and extracted parasitics are accounted for. Since the simulated propagation delays for the sum and carry
circuits are equal, the critical path of the accumulator is either through one carry gate
(6.35 ps) plus one sum gate (6.35 ps) and two latches (2x8.37 ps), or through two carry
gates (2x6.35 ps) and two latches (2x8.37 ps). Using the total propagation delay of either critical path, the propagation delay of the accumulator is expected to be 29.44 ps.
The critical path propagation delay represents the shortest clock period, or the maximum
operating frequency. In this case, the maximum estimated operating frequency of the accumulator is approximately 34 GHz. This is determined from the worst case transitions.
It should be noted that skew in the data paths prevents the design from improving the
speed by tuning the clock delay. A simulation of the ACCV5 8-bit accumulator at the
maximum 34 GHz clock frequency is shown in Figure 3.30. The analysis of the accumulator speed is slightly optimistic. While it includes loading and parasitic capacitances
internal to the accumulator, it ignores additional loading and parasitic capacitance from
circuitry that the accumulator would drive in practice.
Unlike the previously discussed designs, the sum and carry circuits in ACCV5
are not cascaded with latches. While this reduces the maximum clock frequency, it
allows for the elimination of two diode drops from the voltage supply compared to the
ACCV1 and ACCV2 designs in Section 3.2.3 and Section 3.2.4 and for the elimination

78

C(0)
A(1:0)

2-bit
Adder

2-bit
Register
clk

2-bit
Register
clk

2-bit
Register
clk

S(1:0)

clk

4-bit Register
C(2)
A(3:2)

4-bit
Accumulator

2-bit
Register
clk

2-bit
Adder

2-bit
Register
clk

S(3:2)

clk

C(4)
A(5:4)

2-bit
Adder

2-bit
Register
clk

S(5:4)

clk
S(7:6)
C(6)
A(7:6)

4-bit
Accumulator

2-bit
Adder

C(8)

clk

Figure 3.29: Block diagram of the pipelined 8-bit accumulator using 2-bit adders and
2-bit registers. The dotted boxes partition the accumulator into 4-bit accumulator and
4-bit register blocks.

79

S7 (V)

−1

S6 (V)

−1.2
1

2

3

4

5

6

7

1

2

3

4

5

6

7

1

2

3

4

5

6

7

1

2

3

4
time (ns)

5

6

7

1

2

3

4

5

6

7

1

2

3

4

5

6

7

1

2

3

4

5

6

7

1

2

3

4
time (ns)

5

6

7

−1

S5 (V)

−1.2
−1

S4 (V)

−1.2
−1

S3 (V)

−1.2
−1

S2 (V)

−1.2
−1

S1 (V)

−1.2
−1

S0 (V)

−1.2
−1

−1.2

Figure 3.30: Simulation of the ACCV5 8-bit accumulator. This simulation includes parasitic extracted capacitors and is shown at the maximum operation frequency of 34 GHz.
This simulation is optimistic, because it ignores the additional loading and parasitic capacitance from interconnects and circuitry that would be connected to the output of the
accumulator in practice.

80

of one diode drop from the voltage supply compared to the ACCV3 and ACCV4 designs
in Section 3.2.5 and Section 3.2.6.

3.3.1.2

Measurement Results
Unlike the previous accumulators, ACCV5 is not implemented inside an accu-

mulator test circuit with a DAC output. Instead, it is implemented inside of a whole
DDS circuit. Because of this, no direct output plots of the accumulator are captured.
However, the results from the DDS testing described in Section 4.4 showed that the
accumulator portion of the circuit operates up to a 32 GHz clock frequency. This is
2 GHz lower than expected from simulation, but this is reasonable considering that the
simulated circuit only included the accumulator and did not factor in the DDS components connected to the accumulator output. These components add extra loading and
parasitic capacitance to the accumulator feedback interconnects, lowering the operating
frequency. The fabricated ACCV5 accumulator was measured up to a 32 GHz maximum
clock frequency, so it exceeded the 30 GHz clock frequency requirement for an X-band
DDS. Including the clock tree circuitry with the components that it drives, the ACCV5
8-bit accumulator uses 4.89 W, with two 4-bit accumulators at 1.84 W each and a 4-bit
register at 1.21 W. Comparing only the 4-bit accumulator portion, the ACCV5 design
reduces power by more than 10%. When the accumulators are extended to 8-bits, the
difference is even larger. A comparison of the accumulators is presented in Section 3.5.
The ACCV5 accumulator is reported as part of a DDS design in [51].

3.3.2

Accumulator ACCV6
The ACCV5 accumulator design in Section 3.3.1 illustrates that power can be

reduced by relaxing the clock frequency requirements. If the frequency requirements
are relaxed even further to 10 GHz, more architecture changes can be made to facilitate

81

power reduction. This is the goal of the ACCV6 accumulator. In the previously discussed accumulators, the register and latch circuits consume a large portion of the total
accumulator power because of the high degree of pipelining that is necessary to achieve
high frequency operation. At lower frequencies, less pipelining is needed and many of
the registers and latches can be eliminated. This reduces power not only by eliminating
register and latch circuits, but by eliminating the clock tree circuitry that is required by
the clock inputs of these circuits.
ACCV6 is an 8-bit accumulator and is shown in Figure 3.31. Instead of using a
latch after every logical operation, as in the previous designs, the ACCV6 accumulator
uses only one stage of pipelining for the full 8-bit accumulation operation. This architecture resembles a carry ripple adder, with the critical timing path through the string
of carry circuits. The full adder shown in Figure 3.32 is the basic building block for
this accumulator. It uses a single-level carry circuit with an emitter follower output, but
without an output latch as shown in Figure 3.33. Likewise, the sum circuit shown in
Figure 3.34 is based on the two-level sum circuit in Figure 3.28. It is integrated with
emitter followers so it can drive the pipeline register. Since the ACCV6 architecture
eliminates the 2-bit and 4-bit register blocks along with the latches associated with the
carry circuits inside of the accumulators, large power savings are achieved.

3.3.2.1

Simulation Results
In the ACCV6 accumulator, the timing is dominated by the propagation delays

through the carry circuits. The worst case propagation delay in the accumulator occurs
when the carry signal propagates though all of the carry circuits and causes the MSB
of the accumulator to change state. The worst case propagation delay that determines
the maximum clock frequency is the sum of the propagation delay through seven carry
circuits, one sum circuit, and one latch. The propagation delays of the circuits are
simulated from the layout of the accumulator with parasitic capacitances included. The

82

A(3)

A(2)
C(6)
clk

S(3)
A(7)

A(6)

Full
Adder

Full
Adder

C(7)
clk

Full
Adder

C(8)

S(7)

clk

clk

C(4)

Full
Adder

S(6)

C(5)

S(5)

Full
Adder
S(4)

clk

clk

A(5)

A(4)
C(4)

C(3)

Full
Adder

S(1)

clk

C(2)

Full
Adder

S(2)

C(1)

Full
Adder
S(0)

clk

A(1)

A(0)
C(0)

Figure 3.31: Schematic of the ACCV6 8-bit accumulator. It has an architecture similar
to a carry ripple adder.

B(0)
C(0)

C(1)

Carry

A(0)

B(0)
C(0)
A(0)

S(0)

Sum

clk

Register

Figure 3.32: Schematic of the ACCV6 full adder block.

83

Ap

Bp

Cp

Cn

Bn

An
carryp

carry logic

carryn

emitter followers

Figure 3.33: Schematic of the ACCV6 carry circuit. It is similar to the other single-level
parallel-gated carry circuits, except that it is not followed by a latch that recovers full
differential. Instead it has an emitter follower so that it can drive subsequent carry and
sum circuits.

84

Xn
Xp
Bn

Bp

Xn

Bp Xp

Xp
sump

Ap

An

Cp

sumn

Cn

sum logic

emitter followers

Figure 3.34: Schematic of the ACCV6 sum circuit. It is similar to other two-level
parallel-gated sum circuits, except that it has an emitter follower so that it can drive a
register circuit.

85

propagation delay of the carry circuit is 8.11 ps, for the sum circuit it is 7.58 ps, and
for the latch it is 10.41 ps. This represents a total worst case propagation delay of
74.76 ps, for a maximum clock frequency of 13.4 GHz. To determine the maximum
clock frequency of the accumulator with parasitic capacitances in simulation, the clock
is adjusted in 1 GHz increments. The ACCV6 8-bit accumulator simulated up to a
maximum clock frequency of 13 GHz. This simulation output is shown in Figure 3.35.

3.3.2.2

Measurement Results
The ACCV6 accumulator was integrated as part of a DDS circuit described

further in Section 4.5, so no direct measurements from the accumulator are available.
However, the accumulator is expected to dominate the critical timing of the DDS. Since
the full DDS circuit worked up to a 13 GHz clock frequency, the maximum operating
frequency of the accumulator is expected to be 13 GHz. This correlates with the results
from simulation.
Including all of the clock tree circuitry, the ACCV6 8-bit accumulator consumes
2.13 W of power. Since this accumulator has a different architecture than the previous
designs and is not based on 2-bit adder blocks, there is no direct measurement for a
4-bit accumulator. For comparison purposes, half of the power of the 8-bit accumulator,
or 1.07 W, can be used as a power figure for the 4-bit accumulator. Since ACCV6
has a different architecture from the other designs, it is the only design for which this
method of estimation is valid. In the other designs, extra pipeline registers are needed to
extend a 4-bit accumulator design to an 8-bit accumulator design. Also, an extension of
the ACCV6 accumulator to larger bit-widths is based on an 8-bit accumulator building
block instead of 2-bit or 4-bit accumulator blocks. The ACCV6 accumulator is reported
as part of a DDS design in [62].

86

S7 (V)

−1

S6 (V)

−1.2
5

10

15

20

5

10

15

20

5

10

15

20

5

10

15

20

−1

S5 (V)

−1.2
−1

S4 (V)

−1.2
−1

S3 (V)

−1.2
time (ns)

−1

S2 (V)

−1.2
5

10

15

20

5

10

15

20

5

10

15

20

5

10

15

20

−1

S1 (V)

−1.2
−1

S0 (V)

−1.2
−1

−1.2
time (ns)

Figure 3.35: Simulation of the ACCV5 8-bit accumulator. This simulation includes parasitic extracted capacitors and is shown at the maximum operation frequency of 13 GHz.

87

3.3.3

Summary of Low Power 8-bit Accumulators
While both ACCV5 and ACCV6, which are described in Section 3.3.1 and Sec-

tion 3.3.2, are 8-bit low power accumulators, they have very different characteristics.
ACCV5 uses an architecture that is more similar to the 4-bit accumulators of Section 3.2, based on 2-bit accumulator blocks. Unlike the 4-bit accumulators, however,
ACCV5 separates the logic and latch circuitry in order to reduce the voltage supply and
the corresponding power consumption. Since ACCV5 is extended to 8-bits, there are
more register circuits, so the supply voltage reduction has a more significant impact on
power consumption.
ACCV6 uses an architecture that is vastly different from the 4-bit accumulators
described in Section 3.2 and ACCV5. Like ACCV5, ACCV6 is an 8-bit accumulator
with the logic and latches separated, for a reduced supply voltage compared to the 4-bit
accumulator designs. Instead of using the 2-bit accumulator as the basic building block,
ACCV6 instead uses the full 8-bit accumulator as a basic building block. This eliminates
many of the pipeline registers that are inherent in the other designs. While this reduces
the maximum clock frequency, it also greatly reduces the power consumption.
A comparison of the ACCV5 and ACCV6 8-bit accumulators is shown in Table 3.13. The speed-power ratios for both designs are roughly equal, so for the 8-bit
accumulator speed and power scale by about the same amount. If these designs are extended to larger bit-widths, the ACCV6 design would have a better speed/power ratio
than the ACCV5 design, because it would need fewer registers.

3.4

Reduced Power Accumulator Experiments
As accumulator bit-widths are extended beyond 8-bits, further reductions in

power consumption may be necessary. Maintaining high-speed operation is also necessary, so other techniques for power reduction must be explored. These other techniques

88

Design
ACCV5 [51]
ACCV6 [62]

Maximum Clock
Frequency
32 GHz
13 GHz

Power
Consumption
4.89 W
2.13 W

Speed/Power
Ratio
6.54 GHz/W
6.10 GHz/W

Table 3.13: Comparison of 8-bit accumulator circuits. The ACCV6 design uses less
pipelining than the ACCV5 design, so it operates at a lower clock frequency, but it also
has lower power consumption. Overall, speed and power scale about the same when
comparing these two 8-bit designs.

include triple-tail circuity and resistor-only current sources. Both approaches allow for
lower supply voltages and a corresponding lower power consumption.
In Section 3.4.1, the use of the triple-tail approach [56, 57] is examined. The
triple-tail approach allows for the reduction of the power supply voltage by an additional diode drop compared to the ACCV5 design described in Section 3.3.1. While a
promising approach, the triple-tail circuitry doesn’t have the performance necessary to
maintain high-speed operation. Also, despite the reduction in the supply voltage, the
triple-tail approach requires more current sources and ends up consuming slightly more
power than other approaches. Since the results in simulation were poor, this design was
not fabricated.
In Section 3.4.2, the ACCV5 design described in Section 3.3.1 is modified to use
resistor-only current sources instead of the current mirrors described in Section 2.6. This
allows for a reduction in the power supply voltage by the full current source overhead,
which includes a diode drop and the degeneration resistor voltage drop. Using resistoronly current sources significantly reduces the power consumption while maintaining
high-speed operation, but it is potentially risky, since the current source output resistance
is reduced from 3.7 kΩ, as described in Section 2.7 to around 50 Ω. At the time of the
publication of this thesis, this design was in fabrication.

89

latchp
latchn
Ap

Ap

clkn

clkp

Figure 3.36: Schematic of the triple-tail latch circuit.

3.4.1

Accumulator with Triple-Tail Circuitry
In an attempt to achieve speeds similar to ACCV5 while also reducing power

consumption, an accumulator using the triple-tail approach [56, 57] is examined. The
triple-tail approach aims to reduce power consumption by reducing the supply voltage.
By eliminating stacked gates, and moving the lower differential pair transistors up to the
upper differential pairs as a third tail, the supply voltage can be reduced by a diode drop
from -3.8 V to -3.0 V.
A schematic of a triple-tail latch circuit is shown in Figure 3.36. In the latch, the
clock differential pair is used as the third tail for the logic input differential pairs. For
proper switching, the logic and clock inputs must be offset by half of the differential
voltage. The logic levels for the triple-tail latch are illustrated in Figure 3.37. The
alignment of the voltage levels of the inputs is important, since the triple-tail circuits
will operate only if there is a half voltage swing difference between the inputs.

90

-0.90V

clkp

-1.05V

Ap

-1.20V

clkn

-1.35V

An

Figure 3.37: Voltage levels in the triple-tail latch. The voltage levels are compatible
with other triple-tail gates, where the clock signal is substituted for the logic signal on
the third tail.

Using the latch circuit as an example, when ‘clkn’ is high, the third tail ‘clkn’
transistor has a higher input voltage than either of the logic inputs on the left side of the
latch and a majority of the current goes through that transistor. On the right side of the
circuit, the ‘clkp’ transistor is in the logic low state, and although the voltage levels are
offset, it is still lower than the logic high of the A input. In effect, the clock transistor
clamps the current on the left side of the circuit, so only the right side of the circuit
determines the output.
The triple-tail sum circuit is modified from the ACCV5 sum circuit, and it is
shown in Figure 3.38. The move to a triple-tail approach requires two extra current
sources and two extra emitter follower circuits. The extra emitter followers are needed
to ensure that the triple-tail inputs are spaced properly.
The triple-tail carry circuit is shown in Figure 3.39. Compared to the ACCV5
carry circuit, emitter followers are added so that the voltage going into the latch circuit
is at the proper level. Since the carry has a non-standard voltage swing, there may be
some performance degradation when it is integrated with the latch circuit.
The triple-tail circuits are compared to their ACCV5 counterparts by both speed
and power consumption in Table 3.14. The speed figures are determined by simulations with the circuits configured as clock divide by two circuits. In the divide by two
91

Bp

Cp

Bn

Cn

An Ap

Xn Xp

Bn

Cn

Bp
Xp

Xn

sump

sumn

Cp

Figure 3.38: Schematic of the triple-tail sum circuit.

92

Ap

Bp

Cp

Cn

Bn

An
carryp

carryn

Figure 3.39: Schematic of the triple-tail carry circuit.

simulations, the carry and sum circuits also contain latch circuits. The simulations are
run with schematics only, without any parasitic capacitances for both ACCV5 and the
triple-tail design. Schematic simulations are used so that a layout of the ACCV6 tripletail components is not needed. The ACCV5 components must also be simulated with
schematics only so that the results can be compared. In the power measurements, the
triple-tail circuits have a -3.0 V supply voltage, and the ACCV5 circuits have a -3.8 V
supply voltage.
From Table 3.14, there is little difference in the triple-tail and ACCV5 latch
circuits. The power difference is negligible, and the triple-tail circuit is 4.4% slower.
It is possible that the speed difference might be reduced in layout, because the clock
signal might drive the triple-tail circuits better since it only goes through a single emitter
follower, instead of the double emitter follower that is necessary to obtain the correct
voltage level in the ACCV5 circuit. The sum circuits have similar performance, but the

93

Circuit
Latch
Sum
Carry

Triple-tail Speed
87 GHz
59 GHz
59 GHz

Triple-tail Power ACCV5 Speed
53.7 mW
91 GHz
107 mW
58 GHz
56.7 mW
74 GHz

ACCV5 Power
53.8 mW
34.0 mW
38.0 mW

Table 3.14: Comparison of speed and power between triple-tail and ACCV5 accumulator components. In the comparison, the components are configured as divide by two
circuits and the simulations are of the schematics without extracted parasitics.

triple-tail circuit uses more than three times as much power. Although the supply voltage
is reduced from the ACCV5 circuits to the triple-tail circuits, the extra current sources
needed to achieve proper logic voltage levels in the sum circuit increase the total power.
The triple-tail carry circuit requires emitter followers that are not needed in the ACCV5
carry, so the power is increased. The triple-tail carry circuit is also much slower than
the ACCV5 carry. The carry circuit uses a non-standard voltage level as described in
Section 3.2.4, but the triple-tail circuits need to have close alignment between the inputs
for proper operation. The non-standard voltage level of the carry circuit does not match
ideally with the latch, so performance is reduced.
The triple-tail circuits end up requiring more power and they operate at a lower
speed than the ACCV5 circuits. Also, the alignment required by the gate inputs makes
the triple-tail circuitry risky to implement. Since the triple-tail circuits did not show any
improvement over the ACCV5 circuits, they were not implemented into a full accumulator design or fabricated.

3.4.2

Accumulator with Resistor-Only Current Sources
Before the discussion of the accumulator with resistor-only current sources can

begin, a baseline for comparison must first be established. In DDS circuits, large bitwidth accumulators allow for fine frequency resolution. However, accumulator outputs
are often truncated so that the full bit-width of the accumulator is not used for phase

94

A(3:0)
clk

4-bit
Accumulator

clk

1-bit
Register

clk

1-bit
Register

C(4)
A(7:4)
clk

4-bit
Accumulator

clk

4-bit
Register

C(8)
A(11:8)
clk

4-bit
Accumulator

S(3)

S(7:4)

S(11:8)

Figure 3.40: Schematic of a 12-bit accumulator truncated to 9-bits of output. Since the
accumulator output is truncated, registers are eliminated to save power.

conversion. When this approach is taken, registers can be eliminated from the accumulator to save power. An example of this is a 12-bit accumulator that is truncated to a
9-bit output. This circuit is shown in Figure 3.40. The ACCV5 accumulator described
in Section 3.3.1 is extended to the architecture illustrated in Figure 3.40 for use as a
baseline for comparison. In simulation, this baseline accumulator operates up to a maximum clock frequency of 34 GHz, and it consumes 8.58 W of power. Note that this
power figure also includes the clock tree circuitry.
The baseline for comparison is modified to use resistor-only current sources.
The basic circuits, such as the sum, carry and latch are modified from the circuits used
in Section 3.3.1, so that all of the same values of current are used for every circuit. This
provides a good comparison of the resistor-only current source design technique, since
the only change between the new design and the baseline is current source.
While removing the current source should allow for a voltage supply reduction
of a diode drop, plus the degeneration resistor voltage drop, reducing the voltage supply
by this amount would result in very small resistor values for current sources. To allow
95

for more overhead, the supply voltage is reduced to 3.0 V. With this supply voltage,
the smallest current source resistors are 50 Ω. This new accumulator simulates up to a
maximum clock frequency of 33 GHz and consumes 6.06 W of power.
The use of the resistor-only current sources results in a 29.4% reduction in
power, with only a 2.9% reduction in maximum operating frequency. While the new design simulations show a large improvement over the baseline design, the use of resistoronly current sources may be risky because of a reduced current source impedance, and
could potentially have problems when fabricated. The original current sources described
in Section 2.6 had a much higher impedance of 3.7 kΩ compared to the worst case of
50 Ω for the resistor-only current sources. Variations in the voltage across the current
source resistor will have a much larger impact on the current than in the baseline that
uses current mirrors. There is a risk that a change in the voltage across the current source
resistor could reduce the current of a gate, so that it no longer has a sufficient differential
voltage to drive subsequent gates. Despite the potential risks, the resistor-only current
source accumulator will be fabricated. The measured results are expected to indicate
the viability of the design. At the time of the publication of this thesis, this accumulator
design was still in fabrication.

3.4.3

Summary of Reduced Power Accumulator Experiments
Two design approaches, triple-tail and resistor-only current sources, were at-

tempted to reduce power and maintain high-speed operation. While previously reported
use of the triple-tail approach [56, 57] showed promise, it did not work out well in accumulator simulations. Not only was the power consumption higher than a comparable
circuits from the ACCV5 design, but maximum clock frequency was reduced. Since the
triple-tail approach showed no benefits in simulation, it was not fabricated.
The resistor-only current source accumulator had much better simulation results.
Compared to a baseline design, it operated only 2.9% slower while consuming 29.4%

96

less power in simulation. The reduction in the current source impedance from 3.7 kΩ
to 50 Ω is concerning and considered risky, since the gate currents may vary greatly
as the voltage across the current source resistor changes. At the time of publication,
these circuits are not back from fabrication, so the level of risk and the measured circuit
performance is unknown.

3.5

Summary of Accumulators
As shown in the preceding sections, there are many possible methods for im-

plementing accumulator circuits. In Section 3.2, 4-bit pipelined accumulators that had
logic merged with latches were presented. The ACCV1 accumulator in Section 3.2.3
was based on the work by T. Mathew et al. [27, 59]. It was used as a baseline design
for comparison purposes. In Section 3.2.4, an alternative carry circuit was used in the
ACCV2 accumulator. This single-level parallel-gated carry circuit took a step towards
reduced power consumption. On its own, the carry circuit did not lead to reduced power
consumption, because the voltage supply was constrained by the four-level series-gated
sum circuit. It did, however, provide a proof of concept for the new carry circuit design. In Section 3.2.5, a three-level series-gated sum circuit was introduced for use in
the ACCV3 accumulator. Combined with the single-level parallel-gated carry circuit, a
reduction in the voltage supply and the corresponding power achieved in ACCV3. The
ACCV4 accumulator in Section 3.2.6, was modified from ACCV3 to have both a smaller
layout and increased drive in the clock tree circuitry. Using a smaller layout increases
the thermal density, but it decreases the parasitics on interconnects and shortens clock
distribution interconnections. As a conservative approach, the power in the clock tree
was also increased to ensure that the pipeline latches receive strong signals.
Compared to the previous work by T. Mathew et al. [27, 59], ACCV1 and
ACCV2 showed increased performance for the divide by two carry test circuit. Performance was increased from a maximum clock frequency of 19 GHz in the previous
97

work to 52 GHz in ACCV1 and 55 GHz in ACCV2. The 4-bit accumulators in this
work also improved upon the previous work, by yielding operating accumulators. In the
previous work, only components of accumulators were operational.
In Section 3.3.1, the ACCV5 accumulator uses similar circuitry and layout to the
ACCV4 design. Instead of merging the logic and the latches, these elements are separated. While this leads to a slightly lower operating frequency, it also results in lower
power consumption because the supply voltage is reduced. The ACCV5 accumulator is
also extended to 8-bits and integrated in a DDS test circuit. The DDS circuit (DDSV1)
is described in further detail in Section 4.4 and is reported in [51].
The ACCV6 accumulator in Section 3.3.2 is an 8-bit design that uses a different
approach from the previous designs. It is designed for a lower operating frequency,
so it does not use any intermediate pipeline stages. Eliminating the pipelining greatly
reduces the power consumption. The ACCV6 accumulator is also integrated in a DDS
test circuit. The DDS circuit (DDSV2) that uses ACCV6 as a phase accumulator is
described in further detail in Section 4.5 and is reported in [62].
Work was also undertaken to further reduce power consumption while maintaining high clock frequencies in Section 3.4. This is necessary for further evolutions of
DDS designs. The triple-tail circuit in Section 3.4.1 ended up with worse performance
and more power consumption than baseline designs, so the approach was abandoned.
The resistor-only current source approach in Section 3.4.2 showed promise since it had
significantly reduced power consumption with minimal impact on high speed performance. The design may be risky, however, since the current source impedance is greatly
reduced. The resistor-only current source approach is in fabrication, so measurement
results are not available.
Since the accumulator sizes are different, a comparison of the 4-bit accumulators
from Section 3.2 and the 8-bit accumulators in Section 3.3 is not direct. In comparing
the 8-bit accumulators to the 4-bit accumulators, the ACCV5 8-bit accumulator has a

98

Design Power (W)
ACCV1
2.76
ACCV2
3.04
ACCV3
1.97
ACCV4
2.54
ACCV5
1.84
ACCV6
1.07a
a

Speed (GHz)
38
41
34
35
32
13

4-bit Speed/Power Ratio (GHz/W)
13.76
13.49
17.25
13.78
17.39
12.15a

No 4-bit accumulator block exists for ACCV6, so the power is estimated
from the 8-bit accumulator results.

Table 3.15: Speed and power comparison for 4-bit accumulator designs.

convenient 4-bit accumulator sub-block, but the ACCV6 8-bit accumulator does not. A
comparison of the accumulators in terms of 4-bit accumulators is shown in Table 3.15.
The designs are also compared in terms of 8-bit accumulators in Table 3.16. For
the 4-bit accumulator designs (ACCV1 through ACCV4), the 8-bit accumulator power
consumption is estimated by the circuitry needed to build an 8-bit accumulator. For
these designs, the 8-bit accumulator power is estimated as the power from two 4-bit
accumulators and four 2-bit registers. Note that the combination of four 2-bit registers
is needed to construct a 4-bit register. The power figures used in these estimates include
the clock tree circuitry required by each block, so the extension to 8-bit accumulators
includes this additional clock tree overhead.
The ACCV5 design has the best speed/power ratio in both the 4-bit accumulator comparison in Table 3.15 and 8-bit accumulator comparison in Table 3.16. The
Design Power (W)
ACCV1
8.56
ACCV2
9.12
ACCV3
5.82
ACCV4
7.72
ACCV5
4.89
ACCV6
2.13

Speed (GHz)
38
41
34
35
32
13

8-bit Speed/Power Ratio (GHz/W)
4.44
4.50
5.84
4.53
6.54
6.10

Table 3.16: Speed and power comparison for 8-bit accumulator designs.

99

reduction in maximum operating frequency incurred by separating the merged logic and
latches is compensated for by the proportionally larger reduction in power consumption
due to the lower supply voltage. This benefit is more apparent in the 8-bit accumulator
speed/power ratios, since the pipeline buffering registers become a dominant portion
of the total power consumption as the bit-width increases. Since the ACCV1 through
ACCV4 designs have a higher supply voltage than the ACCV5 design, their speed/power
ratios suffer in larger bit-width designs because of the extra power that is essentially
wasted in the registers.
The impact of increased register power is further illustrated by the ACCV6
design. It has the worst speed/power ratio in the 4-bit accumulator comparison in
Table 3.15, yet it has the second best speed/power ratio in the 8-bit accumulator comparison in Table 3.16. Unlike the other accumulator designs, the ACCV6 design eliminates
the intermediate pipeline stages, so it requires fewer pipeline registers as the bit-width
increases. The reduction in the number of registers leads to a speed/power ratio that
becomes comparably better from the 4-bit comparison to the 8-bit comparison. If the
design comparison were extended to larger bit-widths, the ACCV6 design speed/power
ratio would overtake all other designs including ACCV5, because it would use much
less power from registers. Since pipelining performs no function other than to buffer
data to increase the operating frequency, it would follow that the optimal design for a
specific frequency would contain the least amount of pipelining necessary. This is accomplished by reducing the number of registers to the minimum necessary to operate at
a specific frequency. As the bit-width of the design increases, the lower relative power
in designs with the minimum number of registers would become more apparent.
To illustrate this point, Table 3.17 shows a hypothetical extension of the accumulators to a 16-bit bit-width. Unlike the accumulator in Section 3.4.2, the 16-bit extension
does not have any truncation. For these designs, the impact of differences in the register

100

Design Power (W)
ACCV1
29.28
ACCV2
30.40
ACCV3
19.16
ACCV4
26.00
ACCV5
14.62
ACCV6
5.14

Speed (GHz)
38
41
34
35
32
13

8-bit Speed/Power Ratio (GHz/W)
1.30
1.35
1.77
1.35
2.19
2.53

Table 3.17: Speed and power comparison for accumulators extended to 16-bit bitwidths.

power becomes very apparent. The power consumption of the 4-bit designs from Section 3.2 greatly increase to levels that are not feasible for fabrication. The speed/power
ratios of the 4-bit designs are also worse than the 8-bit designs from Section 3.3. ACCV6
has the best speed/power ratio in this comparison, because it use a lot less register circuits than any of the other designs. It also has a power consumption of 5.14 W, which is
reasonable for a fabricated design.
The results of the accumulators lead to some conclusions on choosing the proper
design point for a high-speed accumulator design. In general, pipelining is double-edged
sword. While it is necessary to achieve high-speed performance, it uses a large amount
of power as bit-widths are increased. This is apparent from the extension of the designs
to 16-bits in Table 3.17. Accumulators should be designed so that the minimum amount
of pipelining necessary to achieve a desired operating frequency is used. This will help
to maximize the speed/power ratio. It should be noted that this is valid when the bitwidth of an accumulator is relatively large. For small bit-widths, this may be obscured,
and there may be cases where more speed/power ratio is higher for designs with more
pipelining if the registers are not a significant portion of the circuit. It is also important
to keep total power consumption in mind when extending the bit-widths of designs. A
16-bit accumulator that consumes 30 W of power is not feasible for portable devices,
because power consumption is simply too high. The 14.62 W and 5.14 W versions that

101

are extended from ACCV5 and ACCV6 are much more reasonable alternatives that still
show high-speed performance.
Ultimately, the bit-widths of accumulators need to be extended for use in DDS
circuits. Not only must the power consumption of the accumulator be considered, but
there will also be additional power consumption from the phase conversion and DAC circuitry needed to make a DDS. In the next chapter, two DDS circuits will be discussed.
The first design will integrate the ACCV5 accumulator described in Section 3.3.1 into
a DDS and the second design will integrate the ACCV6 accumulator described in Section 3.3.2 into a DDS.

102

CHAPTER 4
Direct Digital Synthesizers
As InP HBT technologies improve and allow for faster digital and mixed-signal
circuits, more applications become viable. One application of growing interest is frequency synthesis in the microwave range. Synthesized microwave frequencies are useful
in areas such as communications, test equipment, and radar. Compared to traditional
methods of microwave frequency generation, the use of high-speed digital logic and
InP technology allows for improvements in terms of bandwidth, power, and size of the
system.
Several methods of frequency synthesis in the microwave frequency range have
been recently reported. Examples include optical methods [63], fractional-N methods [64], resonators [65], and direct digital synthesis. While optical methods have been
shown to generate frequencies over 60 GHz [63], they require multiple lasers for operation. Fractional-N methods have similar performance to direct digital synthesizers,
but they sacrifice agility, or frequency switching time, to improve spurious free dynamic
range [66]. Resonators are able to generate very pure output signals, but only over very
small or fixed frequency ranges. Direct digital synthesizers are able to deliver wide
band, high frequency, agile signals with high SFDR.

4.1

DDS Architecture
The basic DDS architecture was first reported by J. Tierney et al. [67] over thirty

years ago, and it has a fairly simple structure. The DDS block diagram shown in Figure 4.1 consists of a phase accumulator, a phase converter, a digital to analog converter,
and a low pass filter (LPF). Conceptually, the DDS operation is simple. The phase accumulator tracks the phase of a sine wave on the unit circle by using a clocked binary
adder. The phase converter translates phase information from the accumulator into a
103

Phase to Sine
Converter

DAC

Low Pass
Filter

k
Phase
Accumulator

Figure 4.1: Block diagram of a general direct digital synthesizer. The components are
shown on the top and their outputs are shown on the bottom.

digital representation of the desired output waveform. The DAC then converts the digital output of the phase converter into an analog output. The LPF eliminates unwanted
high frequency spectral components from the analog output.
The accumulator has a modular output over the range [0, 2N -1]. This modular
output is convenient for mapping the accumulator output to a unit circle representing the
phase of a sine wave. This mapping can be represented by

θ=

(2n + 1)π
,
2N +1

(4.1)

where n is output of the accumulator. This particular mapping does not include the
phases 0, π/2, π, and 3π/2. Instead the discrete phase steps are offset by half a step size,
so that θ = π/2N +1 for n = 0. This approach simplifies partitioning the phase into
quadrants, which is advantageous for the phase converter. An example of this mapping
on the unit circle is shown in Figure 4.2.
The phase converter is typically used to generate a sine or cosine output, although any periodic waveform can be implemented [68]. In this thesis, only sine waves
are generated by the DDS circuits. There are several methods for implementing the
phase converter including a ROM look up table [67], a CORDIC algorithm [69], and

104

Quadrant 01b

π/2

Quadrant 00b
n = 00101b

n = 01011b

n = 00011b
n = 01110b

n = 00001b
θ

π

0

n = 10001b
n = 11101b
n = 10011b
Quadrant 10b

3π/2

n = 11010b
Quadrant 11b

Figure 4.2: Unit circle with phase information from accumulator. In this example, N =
5, so there are 32 different phase values with a symmetry of 8 in each of the 4 quadrants.
Selected accumulator outputs n are labelled using the mapping from Equation 4.1 to
illustrate the breakdown by quadrants.

polynomial approximations [70]. Of these methods, one of the most common is the
ROM look up table (LUT). LUTs can be simplified by exploiting the symmetry of the
unit circle, particularly the quadrant symmetry described above, and by using a combination of coarse and fine value LUTs to form the digital sine value at a particular
phase [67]. While ROM look up tables are a simple method of phase to sine conversion, the implementation of a fast ROM is not always possible or feasible. When this
is the case, other conversion methods are necessary. One such method uses the iterative CORDIC technique [71] for conversion in a DDS [69]. The CORDIC technique
takes advantage of trigonometric identities and approximations to calculate the value of
sin(θ) by using shift and add operations. This method overcomes the potential speed
limitations of the ROM by using pipelining in the CORDIC conversion. An alternative to the CORDIC method uses the polynomial expansion approximation [70] of the
sine function. Circuitry can be simplified by lowering the order of the polynomial used

105

for approximation. Likewise, the order of the approximation polynomial can be adjusted to achieve a desired SFDR. Compared to the CORDIC method, the polynomial
approximation method uses fewer pipeline stages and less power.
The digital representation of the sine wave is converted into an analog waveform
by the DAC. The DAC is not necessarily on the same chip as the accumulator and phase
converter, although it generally is in high performance DDS circuits. The DAC is usually
noted as the limiting factor [72] for spectral purity performance.
The LPF is often implemented by discrete components off-chip, although it is
possible to implement it on-chip if on-chip inductors are available. The LPF is needed
to eliminate images above the Nyquist frequency or half of the clock frequency(fclk /2).
Since the filter has roll-off characteristics and must block frequencies above fclk /2, frequencies slightly below fclk /2 are attenuated. This sets a practical limit for a maximum
DDS output of about 40% of fclk [73]. In many cases, the LPF is not included in the
DDS circuit design, but it is a component of the whole DDS system.

4.2

Performance Metrics and Design Tradeoffs
The performance of the DDS can be measured by various criteria: frequency

resolution, maximum frequency, bandwidth, SFDR, agility, and power consumption.
The frequency resolution (f0 ) is the smallest incremental difference in DDS output frequencies. This is determined by the clock frequency and the bit-width of the
accumulator (N ) as given by
f0 =

fclk
.
2N

(4.2)

A larger bit-width (N ) will improve the resolution of the DDS and allow for finer frequency spacing. Higher frequency resolution leads to improved SFDR, at the expense
of higher power consumption.

106

The output frequency (fout ) of the DDS is controlled by the phase increment of
the accumulator (k) in multiples of the frequency resolution. It is given by

fout = kf0 =

kfclk
.
2N

(4.3)

Due to the Nyquist sampling theorem, the maximum value for k is effectively limited to
2N −1 , leading to the maximum output frequency (foutmax ) given by

foutmax =

2N −1 fclk
fclk
=
.
2N
2

(4.4)

As noted in the previous section, the maximum DDS output is typically limited to about
40% of fclk .
The bandwidth is the difference between the minimum and maximum output
frequencies. The phase increment k can have a value of 0, so the minimum output of
the DDS is 0 Hz. Thus, the bandwidth is given by foutmax . Since the maximum output
frequency and bandwidth are determined by fclk , they can be improved by using fast
technologies, such as InP HBTs, and design techniques that derive the maximum benefit
from the technology. In a particular technology, increasing bandwidth and maximum
frequency generally comes at the expense of increased power consumption.
Spurious-free dynamic range is a measure of the spectral purity of the sine wave
produced by the DDS. Ideally, the spectral output should only have energy at the desired
frequency. Due to factors described below, there is also energy at undesired frequencies.
These are referred to as spurious frequencies or spurs. The spurs are periodic, discrete
spectral lines [66]. For SFDR measurements, only in-band frequencies (≤fclk /2) are
taken into account since the LPF is expected to filter out frequencies above Nyquist.
The SFDR is the ratio of the amplitude of the desired signal Ap to the amplitude of the

107

Output Magnitude (dBm)

0

Ap

-25

SFDR = 48 dBc

-50

As

-75

-100
0.0
0.1
0.2
0.3
0.4
0.5
Output Frequency Normalized to Clock Frequency

Figure 4.3: Frequency spectrum of a DDS output with the desired frequency Ap and
largest spurious frequency As labelled.

largest in-band spur As [69], given by
µ

SF DR = 20 log

¶

Ap
.
As

(4.5)

An example of the output spectrum of a DDS with Ap and As labelled is illustrated in
Figure 4.3.
Spurs are generated in each stage of the DDS architecture shown in Figure 4.1.
Typically, the output of the accumulator is truncated, so not all N bits feed into the phase
converter. The quantization of phase information also introduces spurs [66]. The phase
converter introduces spurs from compression and quantization [74]. In most implementations of the phase converter, phase compression methods are used to save ROM space
or to simplify/reduce calculations. Quantization arises in all types of phase converter,
since the output has finite precision. Spurs in the DAC are caused by glitches from bit
switching, the finite slew rate of the DAC, and from quantization [72]. It has been noted
108

that for a low SFDR, the DAC should have low switching transients and a fast settling
time [73]. The LPF can also be a cause of spurs [74].
Agility is a measure of how quickly the DDS can change frequencies. Agility
is important in applications where the output frequency must hop frequencies quickly,
such as in secure communications systems. The agility is defined as the time between
changing the frequency control word (k in Figure 4.1) and when the output is at the
desired frequency. Using fewer levels of pipelining will tend to improve agility, but it
will also tend to decrease the maximum clock frequency.
Power consumption is determined by a combination of the technology used, the
clock frequency, the internal bit-width N , and the amount of pipelining internal to the
DDS. Since it is dependent on so many factors, it plays a part in most of the performance
tradeoffs. Improving any of the other metrics usually results in a design with increased
power consumption. In order to decrease the power consumption, the DDS must usually
be run at a lower clock frequency, with less frequency resolution, or with a worse SFDR.

4.3

Recent DDS Results
With a large number of conflicting tradeoffs, comparisons among reported DDS

circuits can be difficult. In CMOS technologies, many of the reported DDS designs are
mainly concerned with the demonstration of various phase conversion techniques [69,
70]. These circuits typically have a high SFDR, but a low clock frequency. Designs
that are more comparable to the work in this thesis are done in InP technologies. These
designs typically try to achieve high maximum clock rates, and SFDR figures in the
20 dBc to 30 dBc range.
In 2001, an InP DDS was reported by A. Gutierrez-Aitken et al. at TRW [8, 75]
that had a maximum clock rate of 9.2 GHz and consumed 15 W of power. The authors
did not report the SFDR for the full range of FCWs, but of the two FCWs reported, the

109

worst SFDR was 30 dBc. It is unknown if there were any other FCWs that had a worse
SFDR.
In 2005, K. Elliott at HRL reported [42] an InP DDS with a maximum clock rate
of 12 GHz that consumed 8 W of power. Up to 40% of the full range of FCWs, the
worst case SFDR was 30 dBc. This particular work was part of HRL’s contribution to
the TFAST project, so it is the closest reported comparison to the DDS circuits in this
thesis.
In the next two sections, two DDS designs employing different tradeoffs will
be discussed. Compared to the previous work, these DDS circuits achieve higher maximum clock frequencies and lower power, but with decreased SFDR. In Section 4.4,
the DDSV1 design [51] is intended for high-speed operation. To achieve this goal, the
SFDR is sacrificed, and the power consumption is high. Overall performance is up to
a 32 GHz clock frequency, with a worst case SFDR over the whole range of FCWs of
21.56 dBc and 9.45 W of power consumption. In Section 4.5, the DDSV2 design [62]
is intended for low power operation. This tradeoff impacts high-speed performance.
The DDSV2 design has a maximum clock frequency of 13 GHz, a worst case SFDR of
26.67 dBc, and 5.42 W of power consumption.

4.4

Direct Digital Synthesizer DDSV1
Unlike the traditional DDS architecture [67] which consists of a phase accu-

mulator, phase converter, and digital to analog converter, DDSV1 combines the phase
converter and DAC into a sine-weighted DAC. The resulting DDS architecture is shown
in Figure 4.1. The sine-weighted DAC is also an alternative to the traditional approach
of using ROM look-up tables. This approach is similar to the cosine-weighted DAC
implemented by A. Gutierrez-Aitken et al. [8], but it also adds a Gilbert multiplier for
analog inversion. Since the phase converter circuitry is eliminated, this approach allows
for a reduction in circuit complexity and power consumption. The accumulator is 8-bits
110

Frequency
Control
Word (k)

SineWeighted
DAC
Phase
Accumulator

Figure 4.4: Block diagram of DDSV1 circuit with the outputs of each stage illustrated.
wide, so the DDSV1 frequency resolution is 1/256 of the clock frequency with 128 steps
of frequency control. The DDSV1 design uses the ACCV5 8-bit accumulator described
in Section 3.3.1, which was simulated to operate up to a 34 GHz clock frequency. The
output of the accumulator is truncated, so that only the five MSBs (a4 , a3 , a2 , a1 , a0 ) are
used to generate a full-wave sine output.

4.4.1

Sine-Weighted Digital to Analog Converter
The sine-weighted DAC uses the five MSBs from the accumulator to generate a

full-wave sine output. The DAC is comprised of a thermometer-coder, a sine-weighted
summing junction, and a Gilbert multiplier [76]. In the thermometer-coder, the a3
output from the accumulator complements the three LSBs (a2 , a1 , a0 ) to expand the
thermometer-coder from a quarter-wave to a half-wave output. As shown in Figure 4.5,
the complemented bits are registered to meet timing requirements, then buffered to drive
the thermometer-coder logic. The thermometer-coder outputs drive the sine-weighted
taps of the summing junction, which has a tap weighting scheme of [3 5 5 4 4 3 2 1].

111

a3 (Complement)
a2
a1
a0

Fanout
Buffers

D Q

b2

D Q

b1

D Q

b2 & b1 & b0

D Q

b2 & b1

D Q

b2 & (b1 & b0)

D Q

b2

D Q

b2 & (b1 & b0)

D Q

b2 & b1

D Q

b2 & (b1 & b0)

D Q

b0

Clock

MSB

LSB

Figure 4.5: Block diagram of the thermometer-coder portion of the sine-weighted DAC.
The DAC is driven by a thermometer coder, thus the weights sum successively to generate a quarter-wave sine output. The first tap weight of 3 is always enabled to ensure
that the summing junction has non-zero outputs for all states. This ensures that distinct
positive and negative outputs exist for each state after the analog sum is inverted by the
Gilbert multiplier. Since the tap weight of 3 is always enabled, the center step level is
equal to 6. The Gilbert multiplier uses the accumulator MSB (a4 ) as a control signal for
inversion, resulting in a full-wave sine output. The sine-weighted summing junction and
the Gilbert multiplier are shown in Figure 4.6.
The Gilbert multiplier is shown in Figure 4.7. In this DDS it is used to flip the
sign of the output of the DDS. It is controlled by the sign bit (a4 ). Essentially, it functions
as a controllable analog inverter, with a4 as the control and DAC output as the analog
input. The Gilbert multiplier is advantageous because it is linear over a relatively wide
range [76]. Linearity is particularly important for the output of the DDS, because nonlinearities negatively impact the SFDR. The resistor connecting the emitters of the input
112

a4 (sign)

Σ

Thermometer
Coded Input
(7 bits)

X

Amp

Gilbert
Multiplier
3

5

5

4

4

3

2

LSB

1

MSB

Figure 4.6: Block diagram of the summing junction and Gilbert multiplier showing the
[3 5 5 4 4 3 2 1] tap weighting scheme.
signal transistors acts as a degeneration resistor and helps to improve linearity further
by introducing negative feedback [55].

4.4.2

Phase Truncation Spurs
Since DDSV1 does not use all of the output bits from the accumulator for phase

conversion, spurs due to phase truncation arise. Formulas for determining the location
and magnitude of these spurs have been developed by V. Kroupa et al. [77]. The magnitude and locations of the spurs are found to be a function of the accumulator bit-width
(N ), the number of bits used for phase conversion (W ), the number of bits truncated
from the accumulator (B), the FCW, and the clock frequency. The SFDR (in dBc) that
is due to truncation spurs has an upper bound estimate of

SF DR ≈ 6·W (dBc)

(4.6)

from the analysis in [77]. The actual SFDR is a function of both W and the normalized
output frequency (foutput /fclk ) [77], so the upper bound estimate discards the impact
from the normalized output frequency. In reality, the SFDR is reduced by a few dBc for

113

Outp
Outn
a3 n

a3 p

DACp

a3 n

DACn

Figure 4.7: Gilbert multiplier schematic. Used to switch the sign of the DAC output in
DDSV1 to achieve a full-wave sine output.
normalized output frequencies approaching 1/2. In the DDSV1 design, W is equal to
five, so the upper bound on SFDR is about 30 dBc.
Combining terms from [77], the location of the worst case spur from phase truncation is given by
fspur =

´
fclk ³
W
F
CW
±
2
P
,
2N

(4.7)

where P is the value of the truncated B LSBs of the frequency control word. P can
range from 0 to 2B−1 . Note that when these spurs are outside of the Nyquist band (0
to fclk /2), they will be aliased back into band. While Equation 4.7 gives the location
of the worst case phase truncation spur, other spurs will be located at harmonics of
the desired output and harmonics of the worst case phase truncation spur, as well as
intermodulations of these signals. Additional spurs occur due to offsets in the Gilbert
multiplier and from non-linearities in the DAC.
DDSV1 truncates the output of the 8-bit accumulator to 5-bits, so it has N =8 and
W =5. For the case when F CW =1, it is expected that the worse case phase truncation
114

300

200

Amplitude (mV)

100

0

−100

−200

−300
20

21

22

23

24
Time (ns)

25

26

27

28

Figure 4.8: Time-domain simulation output of DDSV1 with a 34 GHz clock and
FCW=1. The output frequency is 132.8125 MHz.
spurs (taking aliasing into account) will be located at fclk /8±fclk /256. Intuitively, when
F CW =1, the DAC output only changes every eight clock cycles, so spurs located at a
frequency of fclk /8 mixed with the desired output of fclk /256 are expected.

4.4.3

Simulation Results
The DDSV1 test circuit simulated up to a maximum clock frequency of 34 GHz.

A time-domain simulation output at 34 GHz with an FCW of 1 is shown in Figure 4.8.
At this setting, the output frequency of the DDS is 132.8125 MHz. This output has an
SFDR of 29.91 dBc, as shown in Figure 4.9.
A time-domain simulation output at 34 GHz with an FCW of 127 is shown in
Figure 4.10. At this setting, the output frequency of the DDS is 16.8671875 GHz. This
output has an SFDR of 27.71 dBc, as shown in Figure 4.11.
The DDSV1 design uses separate power supplies for the accumulator and the
thermometer coder/DAC sections of the circuit. The 8-bit accumulator portion of the
115

0

−10

Amplitude (dBc)

−20

−30

−40

−50

−60

−70
0

2

4

6

8
10
Frequency (GHz)

12

14

16

Figure 4.9: Frequency-domain simulation output of DDSV1 with a 34 GHz clock and
FCW=1. This output has a 29.91 dBc SFDR.

300

200

Amplitude (mV)

100

0

−100

−200

−300
48

49

50

51

52
Time (ns)

53

54

55

56

Figure 4.10: Time-domain simulation output of DDSV1 with a 34 GHz clock and
FCW=1. The output frequency is 16.8671875 GHz.

116

0

−10

Amplitude (dBc)

−20

−30

−40

−50

−60

−70
0

2

4

6

8
10
Frequency (GHz)

12

14

16

Figure 4.11: Frequency-domain simulation output of DDSV1 with a 34 GHz clock and
FCW=1. This output has a 29.91 dBc SFDR.
circuit uses a -3.8 V power supply, and the thermometer coder/DAC portion uses a 4.5 V power supply. The use of separate supplies allows for lower power consumption,
since the accumulator portion of the circuit is operated with a lower supply voltage. A
complete power breakdown of DDSV1 is shown in Table 4.1.

4.4.4

Measurement Results
The DDSV1 test chip was fabricated as part of the TC8 fabrication run. A mi-

crophotograph of the test chip is shown in Figure 4.12. It has 1891 transistors in an area
of 2700 µm by 1450 µm. As was the case with the accumulator circuits in Chapter 3,
DDSV1 was tested on-wafer. The DDSV1 test setup is shown in Figure 4.13. The DDS
clock input is differential, but it is driven single-ended, with the non-driven side connected to ground through a 50 Ω termination. The DDS output is also differential, with
one single-ended output driving a spectrum analyzer and the other single-ended output

117

Component
8-bit Accumulator

Sub-Component
4-bit Accumulator
4-bit Register

Input Buffers
DAC
Thermocoder
8-bit Register
Gilbert Cell/Output Buff.
DAC Driver/Core
Other

Current
1419 mA
533 mA
353 mA
125 mA
1010 mA
370 mA
319 mA
77 mA
147 mA
97 mA

Voltage
-3.8 V
-3.8 V
-3.8 V
-3.8 V
-4.5 V
-4.5 V
-4.5 V
-4.5 V
-4.5 V
-4.5 V

Total

Power
5.3922 W
2.0254 W
1.3414 W
475 mW
4.545 W
1.665 W
1.4355 W
346.5 mW
661.5 mW
436.5 mW
10.4122 W

Table 4.1: Simulated power breakdown for DDSV1.

driving a high-frequency sampling oscilloscope. Having both outputs available allows
simultaneous testing of both the time and frequency domain outputs of the DDS.
The DDS operates up to a maximum clock frequency of 32 GHz for all frequency
control words. A full sweep of all FCWs at 32 GHz is shown in Figure 4.14. For
all SFDR measurements, the SFDR is measured within the full Nyquist bandwidth.
The worst case SFDR over the range of FCWs is 21.56 dBc at an FCW of 95, which
corresponds to an output frequency of 11.875 GHz. The average SFDR over the whole
range of FCWs is 26.95 dBc. The maximum operation frequency is better than expected
from simulation, since the DDS operates up to a maximum clock frequency of at 32 GHz
instead 28 GHz.
A 125 MHz sine-wave output synthesized from a 32 GHz clock frequency with
FCW=1 is shown in Figure 4.15. This output represents the fundamental output frequency (f0 ) and frequency resolution of the DDS with a 32 GHz clock. With an output
frequency of 125 MHz, the DDS has an SFDR of 31.00 dBc with the largest in-band spur
located at 4.125 GHz. The frequency spectrum of this output is shown in Figure 4.16.
The worst spurs in the output spectrum are due to the truncation of the phase word from
the accumulator to the sine-weighted DAC. They are located at fclk /8±fclk /256, which

118

Figure 4.12: Microphotograph of the DDSV1 test chip, with dimensions of 2700 µm by
1450 µm. The DDSV1 test chip contains 1891 transistors.

HP8340A
Source
(Trigger)

DC Probe
Card
HP8365B
Source
(Clock)

Bias T
GSGSG
Probe

DUT

GSGSG
Probe
Bias T

50Ω

LPF

DC Probe
Card

Figure 4.13: DDSV1 test setup.

119

HP562A
Spectrum
Analyzer

Agilent 86100B
Oscilloscope
Agilent 86117A
RF Input Card

36
34

SFDR (dBc)

32
30
28
26
24
22
20
0

2

4

6
8
10
12
Output Frequency (GHz)

14

16

Figure 4.14: SFDR versus DDSV1 output frequency at a 32 GHz clock rate. The SFDR
is measured within the Nyquist bandwidth. The worst SFDR is 21.56 dBc at an FCW of
95. Over the whole range of FCWs, the average SFDR is 26.95 dBc.
matches the analysis from Equation 4.7. The largest of the additional spurs occur at
harmonics of the worst case spurs. The SFDR magnitude is 31.00 dBc, which is close
to the 30 dBc result predicted from Equation 4.6 for a DDS using five bits for phase
conversion.
To illustrate high output frequency operation of the DDS, the frequency spectrum of the DDS with FCW=127 is shown in Figure 4.17. At this FCW, the output
frequency is 15.875 GHz and the SFDR is 30.44 dBc. The largest in-band spur is located at 12.125 GHz, with a similar magnitude spur at 11.875 GHz. These are truncation
spurs located aliased frequencies of what would be expected from Equation 4.7.
Of the 32 DDSV1 sites tested, 8 showed either no output or the incorrect output
at a 24 GHz clock frequency. The remaining 24 test sites operated at least a 24 GHz
clock frequency, and were considered functional for yield purposes. Thus, the functional
yield for DDSV1 in fabrication run TC8 was 75%. Overall, the DDS test chip was

120

150

Output Voltage (mV)

100

50

0

−50

−100

−150
0

2

4

6

8

10

Time (ns)

Figure 4.15: Sampling oscilloscope output of DDSV1 with fclk = 32 GHz and
fout = 125 MHz.

0 f0= 125 MHz, 0 dBc
−10

SFDR = 31 dBc

Magnitude (dBc)

−20
−30

fspur= 4.125 GHz, −31 dBc

−40
−50
−60
−70
−80
0

2

4

6
8
10
Frequency (GHz)

12

14

16

Figure 4.16: Frequency spectrum of the DDSV1 output with fclk = 32 GHz and
fout = 125 MHz. The largest spur is located at 4.125 GHz and the SFDR is approximately 31 dBc.

121

0

f

=15.875 GHz, 0 dBc

out

−10

SFDR = 30.44 dBc

Magnitude (dBc)

−20
f

−30

=12.125 GHz, −30.44 dBc

spur

−40
−50
−60
−70
−80
0

2

4

6
8
10
Frequency (GHz)

12

14

16

Figure 4.17: Frequency spectrum of the DDSV1 output with fclk = 32 GHz and
fout = 15.875 GHz at FCW=127. The largest spur is located at 12.125 GHz and the
SFDR is 30.44 dBc.
measured to consume a total power of 9.45 W, which is 9.2% lower than the expected
from simulation. Extrapolating from Table 4.1, the 8-bit accumulator uses 4.89 W and
the DAC uses 4.12 W. DDSV1 is reported in [51].

4.5

Direct Digital Synthesizer DDSV2
The DDSV2 design uses a traditional DDS architecture [67] with a phase accu-

mulator, phase converter, and DAC, as shown in Fig. 4.18. This design uses the ACCV6
8-bit accumulator described in Section 3.3.2. Since the accumulator is 8-bits wide, the
fundamental output frequency of the DDS is 1/256 of the input clock frequency, and the
DDS has 128 steps of frequency control. The 8-bit accumulator output is truncated to
6-bits for use in the phase conversion logic. From the simulations carried out in Section 3.3.2, it was estimated that the ACCV6 accumulator operates up to a 13 GHz clock
frequency.

122

Frequency
Control
Word (k)

Phase to Sine
Converter

DAC

Phase
Accumulator

Figure 4.18: Traditional DDS architecture used in DDSV2. This DDS consists of an
accumulator, phase to sine converter and a DAC. Representations of the outputs at each
stage are shown below the corresponding stage.

While using a ROM is a more common approach to phase conversion, a ROM
design in this technology had not been completed when this DDS was implemented.
ROM-less techniques have been reported in other designs [75, 47] and are an option for
phase conversion. ROM-less approaches are particularly useful in designs with a narrow
bit-width, where the amount of conversion logic necessary is small. In this DDS, logic
gates are used for a ROM-less phase conversion. The combinational logic performs a
look up table operation and drives the DAC. The coding scheme is discussed in further
detail in Section 4.5.1.
The DAC has seven inputs and is partitioned into a 3-bit coarse DAC and a 4-bit
fine DAC. The fine DAC portion is binary, with bit weights of 8, 4, 2, and 1 units of
current. In the 3-bit coarse DAC, each bit is 16 units of current. The 3-bit coarse DAC
section is driven by thermometer coded outputs from the phase converter. This is in
contrast to typical DACs that internally generate thermometer coded signals. Since the
DDS design is monolithic, an approach with thermometer coding outside of the DAC
can be used. All of the currents from the fine and coarse DAC are summed together
through a resistor. This DAC has 64 discrete outputs and is shown in Figure 4.19.
123

Coarse DAC
Input from
Phase
Converter
(7 bits) 16 16 16

Fine DAC

Σ
8

4

2

1

LSB

MSB

Figure 4.19: DAC with coarse and fine sections used for DDDV2.

4.5.1

Coding Scheme
The coding scheme used in the LUT attempts to approximate the sine wave as

accurately as possible, while reducing the complexity of the conversion logic. Using
MATLAB, various coding approaches were tested, and an approximation of the SFDR
was determined. In these coding schemes, the accumulator output is truncated to six bits
(S7 , S6 , S5 , S4 , S3 , S2 ). Quarter-wave symmetry is used to drive the fine DAC and halfwave symmetry is used to drive the coarse DAC. Exploiting the symmetry inherent in the
sine wave reduces the amount of logic necessary for phase conversion. The four LSBs
input to the phase converter (S5 , S4 , S3 , S2 ) are XORed with S6 to achieve half-wave
symmetry for both portions of the DAC. After the conversion logic, the four bits that
drive the fine DAC are XORed with the accumulator MSB (S7 ) to achieve quarter-wave
symmetry. The logic that drives the coarse DAC is configured as a thermometer coder,
and it represents coarse (or large) steps in the output voltage. The coarse voltage steps
are symmetric about the zero-crossing of the sine wave, hence the half-wave symmetry.
While the DAC is capable 64 discrete outputs, the phase converter only drives 32 distinct
outputs. This is necessary because the DAC steps are all equal size, but the sine step
size varies with the sine phase.
124

Three different coding schemes are investigated. In the first scheme, an ideal
DAC that allows for fractions of a unit current is used. This is unrealistic, since it would
require a much more complex DAC with more input bits. However, it does provide an
upper bound for the maximum SFDR of the DDS, taking the accumulator truncation into
account. In the second scheme, the outputs of the DAC are rounded to whole units of
current, so that the output levels possible with the DAC that has 3 coarse bits and 4 fine
bits are represented. This coding scheme would be used if a ROM were available. With
a combinational logic LUT, the logic necessary for implementation is complex. In the
third scheme, the LUT from the second scheme is modified to simplify the complicated
logic. This reduces propagation delays that could reduce the maximum speed, and it
decreases power consumption.

4.5.1.1

Coding with Ideal DAC
The first coding scheme results in the SFDR versus FCW plot shown in Fig-

ure 4.20. This scheme has a worst case SFDR of 27.57 dBc at an FCW of 122. Over
the whole range of FCWs, the average SFDR is 36.06 dBc. Every fourth FCW, the
output SFDR reaches a local maxima, because there is effectively no truncation at these
FCWs. This eliminates truncation spurs every fourth FCW. This coding scheme uses an
ideal, but unrealistic DAC with a very fine output resolution that is implemented only in
MATLAB. The ideal DAC has outputs that have fractions of the unit current weighting.
It is useful for providing an estimate of the bounds of the SFDR when using an 8-bit
accumulator truncated to 6 bits.

4.5.1.2

Coding with Realistic DAC
A more realistic coding scheme uses DAC output levels that are rounded off to

whole values. This scheme provides a simulation of the DAC that is actually implemented in circuity. This coding scheme would be used if a ROM were available. Since

125

50

SFDR (dBc)

45

40

35

30

25
0

20

40
60
80
100
Frequency Control Word (FCW)

120

140

Figure 4.20: MATLAB simulation of SFDR versus FCW for DDSV2 with an ideal DAC
and ideal coding scheme.

there is no ROM in this design, combinational logic is used. The combinational logic
needed to implement this phase converter is shown in Table 4.2. The SFDR versus FCW
plot for this scheme is shown in Figure 4.21. This has a worst case SFDR of 27.57 dBc
at an FCW of 122. Over the whole range of FCWs, the average SFDR is 35.69 dBc. As
in the first scheme, there are local SFDR maxima every fourth FCW due to the absence
of truncation spurs. Since the DAC outputs are rounded off from the ideal case, there is
some reduction in the average SFDR. However, the worst case SFDR has the same magnitude and occurs at the same FCW as in the first coding scheme. Although the DAC in
this scheme is realistic, the logic needed to implement the LUT shown in Table 4.2 is
six levels deep and it requires 38 logic gates.

126

Inverted Inputs
A = S5 ⊕ S6
B = S4 ⊕ S6
C = S3 ⊕ S6
D = S2 ⊕ S6
Coarse DAC
DAC6 = S7
DAC5 = (A + B · C) · S7 + S7
DAC4 = (A + B · C) · S7
Binary Fine DAC
DAC3 = ((A · C + A · D) + (B · C + (C · C) · D) ⊕ S7
DAC2 = (((A ⊕ B) ⊕ (C ⊕ D)) + (A · B + B · D)) ⊕ S7
DAC1 = (((A ⊕ B) · C + (C ⊕ D) · B) + (A · B) · (C ⊕ D)) ⊕ S7
DAC0 = (((A ⊕ B) · C + (A ⊕ B) · D) + (A · C) · (B · D))S7
Table 4.2: Logic necessary to implement the DDSV2 phase converter with a realistic
DAC.

50

SFDR (dBc)

45

40

35

30

25
0

20

40
60
80
100
Frequency Control Word (FCW)

120

140

Figure 4.21: MATLAB simulation of SFDR versus FCW for DDSV2 with coding
scheme with the DAC outputs rounded to discrete output levels.

127

4.5.1.3

Simplified Coding with Realistic DAC
If some simplifications are made to the LUT from the second scheme, the com-

plexity and power consumption can be reduced. Using the coding scheme in Table 4.3,
the complexity is reduced by one level of depth to five levels of depth, which reduces
the LUT propagation delay by one gate delay. Power consumption is also reduced by
eliminating 10 logic gates, leaving a total of 28 logic gates in the LUT. Assuming that
all logic gates in the LUT use the same amount of power, which is a reasonable assumption for the logic gates in this design, this represents a 26.3% reduction in power by
simplifying the coding scheme. The SFDR versus FCW plot for this scheme is shown
in Figure 4.22. This has a worst case SFDR of 27.57 dBc at an FCW of 122. Over the
whole range of FCWs, the average SFDR is 33.33 dBc. Compared to the other schemes,
the SFDR is reduced at many of the FCWs, leading to the reduction in average SFDR
over the FCW range. While this scheme has the same worst case SFDR magnitude at
the same FCW, some of the FCWs that resulted in local maxima in the previous schemes
become local minima in this scheme. In these cases, the simplification added spurs that
are not present in the other schemes. This is particularly noticeable in the FCWs that do
not have truncation spurs in the previous schemes. The difference is particularly large at
FCWs 16, 80, and 96, where the SFDR is 31.97 dBc, 29.53 dBc, and 27.77 dBc respectively. Although there is some loss in performance compared to the second scheme, the
savings in complexity and power make the tradeoff worthwhile.

4.5.2

Simulation Results
DDSV2 is constructed by combining the 8-bit accumulator from Section 3.3.2,

the phase converter described in Table 4.3, and the DAC from Figure 4.19. To illustrate
the maximum clock frequency of the DDS, a simulation output of the DDS is with an
FCW of 1 and a 13 GHz clock frequency is shown in Figure 4.23. This simulation shows

128

Inverted Inputs
A = S5 ⊕ S6
B = S4 ⊕ S6
C = S3 ⊕ S6
D = S2 ⊕ S6
Coarse DAC
DAC6 = S7
DAC5 = (A + B · C) · S7 + S7
DAC4 = (A + B · C) · S7
Binary Fine DAC
DAC3 = ((B ⊕ C) + A) ⊕ S7
DAC2 = (A · C + B) ⊕ S7
DAC1 = ((A · B + C) + (A · D + B · D)) ⊕ S7
DAC0 = ((A · B + D) + (A · C + A · B)) ⊕ S7
Table 4.3: Simplified logic used in the DDSV2 phase converter to drive the DAC.

50

SFDR (dBc)

45

40

35

30

25
0

20

40
60
80
100
Frequency Control Word (FCW)

120

140

Figure 4.22: MATLAB simulation of SFDR versus FCW for DDSV2 with simplified
coding scheme.

129

0.25
0.2

output voltage (V)

0.15
0.1
0.05
0
−0.05
−0.1
−0.15
−0.2
−0.25
5

10

15

20

25

time (ns)

Figure 4.23: Simulated time-domain output of DDSV2 with a 13 GHz clock and
FCW=1. The plotted output is the difference of the differential signals.

the difference of the differential outputs, so it has a 0 V DC component. At 14 GHz, the
DDS has timing glitches and does not operate properly.
The DDS is also simulated over all FCWs to compare the MATLAB simulations
of the SFDR to the simulated results. These simulations included extracted parasitic
capacitors for the full DDS chip. All 128 FCWs were simulated, taking approximately
64 hours of computer simulation time. The SFDR versus FCW simulation output is
shown in Figure 4.24. Compared to the MATLAB simulation of this coding scheme
in Figure 4.22, the Cadence Spectre simulation provides a reasonable match. In this
simulation, the worst case SFDR is 26.25 dBc, located at an FCW of 96. While this
FCW isn’t the minimum in the MATLAB simulation, it is the second to worst FCW.
The average SFDR for the simulation of DDSV2 with extracted parasitic capacitors is
33.38 dBc, while it is 33.33 dBc in the MATLAB simulation of the coding scheme.

130

50

SFDR (dBc)

45

40

35

30

25
0

20

40

60
FCW

80

100

120

Figure 4.24: Simulation of the SFDR versus FCW for the DDSV2 design with a 13 GHz.
The simulations to determine SFDR are done with extracted parasitic capacitors included.

4.5.3

Measurement Results
The DDSV2 test chip was fabricated as part of the TC9 fabrication run, and it

is shown in Figure 4.25. The chip is 2700 µm by 1450 µm and contains 1646 transistors. The circuit was tested on-wafer using similar test setup to the one used for
DDSV1, shown in Figure 4.13. Accordingly, time-domain outputs were captured with
a sampling oscilloscope and frequency-domain outputs were captured with a spectrum
analyzer. The DDS operated up to a maximum clock frequency of 13 GHz, matching
the simulation results. The DDS was able to synthesize outputs up to 6.5 GHz in steps
of 50.78125 MHz.
With a 13 GHz clock rate and an FCW of 1, the DDS produces the fundamental
output frequency of 50.78125 MHz, as shown in the time domain output in Figure 4.26.
At this output frequency, the SFDR was measured to be 34 dBc, as shown in the spectrum analyzer output in Figure 4.27. At the maximum FCW of 128 and a 13 GHz clock
131

Figure 4.25: Microphotograph of the DDSV2 chip. The chip is 2700 µm by 1450 µm
and contains 1646 transistors.

frequency, the DDS output measured to be 6.5 GHz, as shown in Figure 4.28. At this
output frequency, the SFDR is measured to be 50 dBc. The spectrum analyzer output
for the 6.5 GHz output is shown in Figure 4.29.
Using automated test software, the SFDR of the DDS was measured over the
whole range of FCWs with a 13 GHz clock input. The sweep of SFDR vs. FCW is
shown in Figure 4.30. The measured SFDR is better than 30 dBc over most of the FCWs.
The worst case SFDR was 26.67 dBc at an output frequency of 6.389775 GHz, which is
an FCW of 126. Over the whole range of FCWs, the average SFDR is 33.08 dBc.
Compared to the simulation results, the measured results are a reasonable match.
The average SFDR figures are within 0.30 dBc, and the worst case SFDR magnitude
is 0.90 dBc lower compared to the MATLAB simulation of the coding scheme, and
0.42 dBc higher compared to the simulation with extracted parasitic capacitors included.
While the FCWs with the worst case SFDR are not the same, there is correlation in the
shape of the SFDR versus FCW curves, so that many of the local minima and maxima of

132

150

DDS Output (mV)

100
50
0
−50
−100
−150
0

5

10

15
Time (ns)

20

25

30

Figure 4.26: Measured time-domain output of DDSV2 with a 13 GHz clock and
FCW=1. The output frequency is 50.78125 MHz.

0

Magnitude (dBm)

−10 −10.33 dBm
−20
SFDR = 34 dBc
−30
−40
−43.33 dBm
−50
−60
−70
−80
0

1

2

3
4
Frequency (GHz)

5

6

Figure 4.27: Measured frequency-domain output of DDSV2 with a 13 GHz clock and
FCW=1. The SFDR is 34 dBc.

133

60

DDS Output (mV)

40
20
0
−20
−40
−60
0

0.05

0.1

0.15
0.2
Time (ns)

0.25

0.3

Figure 4.28: Measured time-domain output of DDSV2 with a 13 GHz clock and
FCW=128. The output frequency is 6.5 GHz.

0
−10
Magnitude (dBm)

−14.83 dBm
−20
−30
SFDR = 50 dBc
−40
−50
−60

−64.83 dBm

−70
−80
0

1

2

3
4
5
Frequency (GHz)

6

7

Figure 4.29: Measured frequency-domain output of DDSV2 with a 13 GHz clock and
FCW=128. The SFDR is 50 dBc.

134

50

SFDR (dBc)

45
40
35
30
25
0

1

2
3
4
5
Output Frequency (GHz)

6

7

Figure 4.30: Measured SFDR versus FCW for DDSV2 with a 13 GHz clock frequency.

SFDR are at the same FCWs. The DDSV2 design consumes 5.42 W of power. DDSV2
is reported in [62].

4.6

Summary of Direct Digital Synthesizers
Two DDS designs were presented in this chapter. The DDSV1 design described

in Section 4.4 is intended as a high-speed DDS. It uses the ACCV5 8-bit accumulator
described in Section 3.3.1, and it also uses a sine-weighted DAC approach. The sineweighted DAC is allows for reduced complexity compared to a traditional ROM lookup
table. The DDSV1 design consumes 9.45 W of power and operates up to a maximum
clock frequency of 32 GHz. Over the whole range of FCWs, it has a worst case SFDR
of 21.56 dBc.
The DDSV2 design described in Section 4.5 is intended as a low power DDS. It
uses the low power ACCV6 8-bit accumulator described in Section 3.3.2. At the time
of design, a ROM was not available, so DDSV2 instead uses a logic lookup table. Some
135

Design
A. Gutierrez-Aitken et al. [8, 75]
K. Elliott [42]
DDSV1 [51]
DDSV2 [62]

Power
(W)
15
8
9.45
5.42

Speed
(GHz)
9.2
12
32
13

SFDR
(dBc)
30.00
30.00
21.56
26.67

Speed/Power
Ratio (GHz/W)
0.61
1.50
3.39
2.40

Table 4.4: Comparison of recent InP DDS designs.

simplifications were made to the logic LUT to improve propagation delay and reduce
power. These modifications had a small negative impact on the output signal purity, and
it is expected that using a ROM would have improved the SFDR. The DDSV2 design
consumes 5.42 W of power and operates up to a maximum clock frequency of 13 GHz.
Over the whole range of FCWs, it has a worst case SFDR of 26.67 dBc.
A comparison of DDSV1 and DDSV2 to recently reported DDS circuits is shown
in Table 4.4. The DDS circuits from this thesis perform better than the recently reported
DDS circuits in all areas except SFDR. The speed/power ratio is used as a DDS figure of
merit by K. Elliott [42]. DDSV1 has a speed/power ratio if 3.39, which is 5.56 and 2.26
times better than the previously reported designs and 1.41 times better than DDSV2.
The DDSV2 speed/power ratio is 3.93 and 1.60 times better than the previously reported
results.
While using the speed/power ratio as a figure of merit for DDS designs provides
some insight into the relative value of different designs, it has some major flaws. Notably, it does not factor in frequency resolution or SFDR. For example, if a two similar
DDS designs are compared, but one has twice the phase resolution, it would need an
accumulator twice as large. This would drive up the power consumption, so that even
if the two designs had the same maximum clock frequency and SFDR, the design with
more frequency resolution would have a worse speed/power ratio. Similarly, if two designs had the same power consumption and the same maximum speed, they would have
identical speed/power ratios, even if they had vastly different SFDRs. If all other factors

136

are equal, a design with a higher SFDR should be consider to be “better” than a design
with a lower SFDR, but this is not reflected in the speed/power ratio. Likewise, the extra
power consumption necessary to increase the frequency resolution of a DDS should be
taken into consideration when comparing designs.
To deal with the shortcomings of the speed/power ratio figure of merit for DDS
designs, a new figure of merit is developed that “rewards” improved SFDR and does not
penalize for increased frequency resolution. A new figure of merit to consider for DDS
circuits is
F OMDDS =

speed·SF DR·bitwidth
.
power

(4.8)

For the high-speed DDS circuits presented in this thesis, the units of F OMDDS are
GHz·dBc·bits/W. The F OMDDS scales the speed/power ratio up linearly by both the
SFDR and the accumulator bit-width. Thus, if all other metrics are equal, a design with
twice the SFDR will have twice the F OMDDS . A design with better frequency resolution will also gain a boost in F OMDDS because the increase in power consumption is
be offset by the increase in bit-width.
A calculation of the F OMDDS for recently reported research and commercial
DDS circuits is given for comparison in Table 4.5. The only designs with a better
F OMDDS than DDSV1 and DDSV2 are the AD9858 [41] part, which does not report a full-Nyquist SFDR, so it is not an accurate comparison of F OMDDS , and the low
power DDS designs [46, 47]. DDSV1 and DDSV2 have a better F OMDDS than the InP
DDS state of the art designs. It should be noted that the DDSV1 design is penalized
in F OMDDS in comparison to the speed/power ratio, since it has a lower SFDR than
the other designs. Compared to the InP designs only, the F OMDDS for the DDSV1
design is 3.97 times that of the A. Gutierrez-Aitken et al. [8] design, 1.62 times the K.
Elliott [42] design and 1.14 times the DDSV2 design. Since the DDSV2 design has a
similar SFDR to the previously reported designs, it is not penalized as much, and its

137

Part Name
STEL-2375B
Noned
ADS-432-403
ADS-431-403
Noned
DDSV2d
DDSV1d
None
AD9858
None
a
b
c
d

fclk
(GHz)
1.0
9.2
1.6
1.6
12
13
32
2.0
1.0
0.8

FCWa
(bits)
32
8
30
30
10
8
8
8
32
32

SFDR
(dBc)
50
30
45b
40b
30
26.67
21.56
35
50c
25

Power
(W)
15
15
11
6
8
5.42
9.45
0.82
2
0.174

F OMDDS
(GHz·dBc·bits/W)
107
147
196
320
450
512
584
683
800
3678

Ref.
[43]
[8]
[44]
[45]
[42]
[62]
[51]
[47]
[41]
[46]

Frequency Control Word (FCW) or accumulator resolution.
Claims 20 dBc spectral purity for harmonics.
For 360 MHz output.
Uses InP DHBT Technology.

Table 4.5: Recent commercial and reported direct digital synthesizers compared
using F OMDDS .

F OMDDS is 3.48 times the A. Gutierrez-Aitken et al. [8] design and 1.42 times the K.
Elliott [42] design.
The new F OMDDS figure of merit from Equation 4.8 should prove useful for
comparing DDS designs. It includes more information on the metrics that are important
for DDS performance, namely SFDR, frequency resolution, clock frequency, and power
consumption, and improves upon previous DDS figures of merit.

138

CHAPTER 5
Conclusion
In this thesis, high-speed digital and mixed-signal circuits, with clock frequencies in the range of 12 GHz to 41 GHz, have been implemented in the Vitesse VIP-2
InP DHBT technology [1]. The main focus of the work was on maximizing the performance of accumulator and DDS circuits that are suitable for radars and communications
systems in the X-band and Ku -band range of 8 GHz to 16 GHz, while minimizing the
power consumption. The two conflicting tradeoffs of high-speed and low power lead to
several accumulator and DDS designs over a range of speed/power design points.
This thesis also focused on issues with the simulation of transmission lines for
on-chip clock signals, that become an issue for clock frequencies above 30 GHz routed
on interconnects with lengths above 300 µm. At such high frequencies and long interconnect lengths, the clock distribution lines can no longer be modelled as lumped
element lines, and instead must be dealt with as transmission lines. These transmission
line models are not a direct part of the design kit, and must be handled in a special way
in order to properly capture their behavior.
Finally, a new and improved figure of merit for comparing DDS designs was developed and used to compare DDS designs. This figure of merit incorporates more of the
important metrics of DDS performance, such as SFDR and frequency resolution, than
previous figures of merit, so it provides a more complete method of design comparison.
The major accomplishments of this thesis and recommendations for future work
are summarized below.

5.1

Summary of Accomplishments
The first major accomplishment of this thesis was the development of the ACCV2

accumulator described in Section 3.2.4. This accumulator was fabricated as part of the
139

Fabrication Run
TC6
TC7
TC8
TC9

Date
January 2004
June 2004
November 2004
March 2005

Designs
ACCV1, ACCV2
ACCV1, ACCV3, ACCV4
ACCV5, DDSV1
ACCV6, DDSV2

Table 5.1: Summary of fabrication runs, dates, and designs in each fabrication run.

TC6 fabrication run. Results of its performance were published in 2005 [50]. At the
time of its publication, it was the fastest reported 4-bit accumulator, with a maximum
clock frequency of 41 GHz and a power consumption of 3.04 W. The previous state of
the art accumulator circuit reported by T. Mathew et al. [27, 59] in 2001 only consisted
of accumulator components, and only operated up to a maximum clock frequency of
19 GHz. Thus, not only was ACCV2 more than twice as fast as the previous state of
the art, but it was also integrated as a complete accumulator, instead of just accumulator components. The ACCV2 accumulator included a novel single-level parallel-gated
carry circuit design. This was the first time that this carry circuit had been demonstrated.
This carry circuit enabled reduced power consumption in later accumulator versions. In
the next generation fabrication run TC7, ACCV3, which is described in Section 3.2.5,
used the single-level parallel-gated carry circuit along with a three-level series-gated
sum circuit to improve the speed/power ratio compared to ACCV2 from 13.49 GHz/W
to 17.26 GHz/W. Table 5.1 summarizes the fabrication runs, dates, and the designs in
each fabrication run.
Further iterations of accumulator designs extended the bit-width to 8-bits for use
in DDS circuits. Two 8-bit accumulator circuits described in Section 3.3 show further
improvement in the area of speed/power ratio. For comparison, if the ACCV3 4-bit
accumulator is extended to 8-bits, it would have a speed/power ratio of 5.84 GHz/W. The
ACCV5 that was part of fabrication run TC8 operated up to a 32 GHz clock frequency
and has a speed/power ratio of 6.54 GHz/W. In fabrication run TC9, the low power
ACCV6 operated up to 13 GHz and had a speed/power ratio of 6.10 GHz/W. Both
140

designs showed improvement over the ACCV3 design. It should also be noted that
although the ACCV6 design had a lower speed/power ratio than ACCV5, it would have
a better speed/power ratio if the designs were extended to larger bit-widths, because
it uses fewer registers, and the registers will become the dominant portion of power
consumption when the bit-width is extended.
As part of the implementation of high-speed digital and mixed-signal circuits, it
became necessary to develop a method for dealing with transmission line effects for interconnects. This was of particular importance for the clock signal interconnects, since
these interconnects operate at the highest frequency on the chip and tend to have the
longest lengths, since they must route clock signals to many locations on the chip. Since
the design kit did not automatically include the transmission line models, it was necessary to create separate models of the clock tree with the line parameters and loads used
in layout, as described in Section 1.1.2. These models were then simulated with AC
simulation in the Cadence Spectre circuit simulator to determine the frequencies over
the area of interest that had excessive gain or attenuation. The gain or attenuation could
then be pinpointed and dealt with by adding series resistance or clock buffers. Once
the AC simulations showed reasonable results, there was sufficient confidence in the
clock tree for use in transient simulations, where the transmission line effects are not
modelled. Since the high-speed clock rates and interconnection lengths are approaching transmission lines, and the simulation environment does not directly handle these
effects, this method of handling the transmission lines was necessary. This approach
had not been previously discussed in literature. In some design kits for other processes,
microwave models are included for the transmission lines, however, that is not the case
in the Vitesse VIP-2 process. It is expected that the transmission line effects will only
become more of a problem and begin to affect signal interconnects as well as clock interconnects as clock frequencies increase in the near future and the interconnects become

141

electrically long as defined by Equation 1.1. To deal with this problem, either design
kits will need to address the issue, or the method outlined above can be utilized.
The second major accomplishment of this thesis was the design of high-speed
DDS circuits that resulted in the fastest reported DDS circuits. Previously, the fastest
DDS circuits were reported by A. Gutierrez-Aitken et al. [8] in 2001, operating up to
a 9.2 GHz clock frequency, and by K. Elliott [42] in 2005, operating up to a 12 GHz
clock frequency. DDSV1, the first generation DDS, which was described in Section 4.4,
operated up to a maximum clock frequency of 32 GHz and consumed 9.45 W of power.
This greatly exceeded both of the previous designs. The second generation DDS named
DDSV2 was described in Section 4.5. Even though DDSV2 was designed as a low
power DDS, its performance and power consumption exceeded that of the A. GutierrezAitken et al. [8] and K. Elliott [42] designs. It operated up to a maximum clock frequency of 13 GHz with a 5.42 W power consumption.
There was also improvement over the previous state of the art in terms of the
speed/power figure of merit presented by K. Elliott [42]. The A. Gutierrez-Aitken et
al. [8] DDS had a speed/power ratio of 0.61 GHz/W and the K. Elliott [42] DDS had a
speed/power ratio of 1.50 GHz/W. In comparison, DDSV1 has a speed/power ratio of
3.39 GHz/W and DDSV2 has a speed/power ratio of 2.40 GHz/W.
While the speed/power figure of merit is often used and provides some idea of
relative accumulator performance, it only factors in the speed and power of a DDS.
Thus, it omits important DDS metrics such as SFDR and frequency resolution. To provide a better means of comparing DDS circuits, a new figure of merit for DDS circuits
(F OMDDS ) was devised in this thesis and defined in Equation 4.8. This metric modifies
the speed/power figure of merit used by K. Elliott [42] to include SFDR and frequency
resolution (through the accumulator bit-width). This new figure of merit more completely captures the important metrics of DDS, so it is better suited for comparing such

142

circuits, and a comparison to recently reported DDS circuits, in InP and other technologies, was shown in Table 4.5. The only designs with a better F OMDDS than DDSV1 and
DDSV2 are the AD9858 [41] by Analog Devices, which does not report a full-Nyquist
SFDR, so it is not an accurate comparison of F OMDDS , and the low power DDS designs by B.-D. Yang et al. and X. Yu et al. [46, 47]. Compared to the previous state of
the art InP designs by A. Gutierrez-Aitken et al. [8] and K. Elliott [42], DDSV1 [51]
and DDSV2 [62] have an improved F OMDDS .

5.2

Recommendations for Future Work
Based on the work in this thesis, three recommendations for future work are

identified: DDS circuit improvements for higher SFDR, implementation of similar circuits in SiGe technology, and the improvement of the interconnect technology along
with the development of microwave models for interconnects.
The first recommendation is to push for DDS circuits with improved SFDR. This
would be accomplished by improving all of the blocks of the circuit. First, the bit-width
of the accumulator could be extended to increase the frequency resolution of the DDS.
This would also provide a larger phase bit-width for the phase conversion logic. The
accumulators in this thesis have an architecture that facilitates extensions in bit-width,
but there would likely be some challenges from the increased clock distribution and
power consumption.
The next area of the DDS to improve would be the phase conversion circuitry.
In this thesis, DDSV1, which is described in Section 4.4, used a phase conversion that
was integrated with the DAC, and DDSV2, which is described in Section 4.5, used a
logic LUT. These phase conversion schemes were fairly simple, but they were not very
flexible. They also only allowed for a worst case SFDR that was below 30 dBc. Using
a ROM would allow for more flexibility, in that the ROM coding can be changed much
easier than a logic coding. A 16 by 6 bit ROM with a 36 GHz clock frequency has
143

been reported S. Manandhar et al. [78] using this InP technology. The 36 GHz clock
frequency leaves margin for expanding the size of the ROM and maintaining high-speed
performance so that it could be integrated into either of the DDS designs. Using the
ACCV5 circuit with a ROM as a phase converter and a different DAC circuit would
be likely to lead to a DDS with an improved SFDR, with a worst case SFDR closer
to 30 dBc, while maintaining the 32 GHz clock frequency. This would also lead to an
improvement in the F OMDDS . The ROM would be particularly advantageous in the
DDSV2 accumulator. As noted in Section 4.5.1, the logic LUT coding scheme was
simplified from what could have been used had a ROM been available. Although the
simulation results did not show a difference in the worst case SFDR, the average SFDR
was simulated to be 33.33 dBc with the simplified logic LUT and 35.69 dBc with a
ROM. A ROM would not have to be used alone, either. A combination of the ROM with
some logic could allow for a coding scheme that would provide improvements SFDR
beyond 40 dBc.
Finally, more work could be done on DAC design. In order to improve the
SFDR, not only would the phase converter need to be improved, but the DAC would
need an extended bit-width. There is much work to be done in the area of high-speed
DAC design, but it is essential for new high-speed DDS circuits with worst case SFDR
above 40 dBc. By further investigating all of the components of the DDS, the DDS
operation could be improved and allow for more applications.
The second recommendation for future work is to duplicate and extend this work
in SiGe. InP was used in this thesis, because one of the aims of the TFAST project was
to push the development of InP technologies. There is interest in developing InP technologies because their superior BVCEO compared to SiGe is expected to result in high
performance mixed-signal circuits. Despite the lower BVCEO , SiGe technology has
some advantages in other areas that could be exploited. First of all, since the SiGe process is built on top of a CMOS process, PMOS and NMOS transistors are available.

144

An NMOS transistor could be used as a high impedance current source with a lower
voltage drop than the current sources in this thesis. The high impedance of the NMOS
source would be an improvement over the low impedance resistor-only current sources
that had a low voltage drop. This would allow for a reduction in power consumption.
Secondly, the SiGe process has copper interconnects. The InP process is currently limited to aluminum interconnects with wide pitches compared to the copper interconnects
in SiGe. Smaller interconnects would allow for reduced parasitic capacitances and improve the relative performance. Finally, the SiGe process in more manufacturable than
InP, so yields and consistency between fabrication runs would be improved. Since SiGe
is more manufacturable than InP, it also costs less. SiGe is rapidly developing due to
interest in RF technologies, so at some point in the future, the performance of SiGe
devices may exceed the performance of InP devices. Some of the designs in this thesis could be ported to a SiGe technology, to compare the mixed-signal performance of
SiGe vs. InP. The designs could be modified for reduced power by using NMOS current
sources, and the differences in layout parasitic capacitance could be explored.
The final recommendation for future work is an improvement of the interconnect
technology and the development of microwave models for interconnects. As described
in Section 2.1, the Vitesse VIP-2 technology [1] has four levels of aluminum interconnects. The minimum size of the aluminum interconnect is 1 µm, so parasitic capacitances quickly add up and hamper high-speed performance. The interconnect is also not
planarized, so the extraction of parasitic capacitances is prone to errors. Moving to a
planarized copper interconnect technology with finer feature sizes would allow for a reduction in parasitic capacitances that would lead to improved performance. This would
also neutralized one of the advantages that SiGe technologies currently have over this
InP technology.
With planarized interconnects, microwave models for the interconnects could
also be developed. These would be useful for the clock tree or any other signal lines that

145

are electrically long by Equation 1.1. The microwave models would greatly simplify
simulation by eliminating the extra steps that were developed in this thesis and described
in Section 2.3. This too would eliminate one of the advantages that some other toolkits
have over this InP technology.

146

REFERENCES
[1] G. He, J. Howard, M. Le, P. Partyka, B. Li, G. Kim, R. Hess, R. Bryie, R. Lee,
S. Rustomji, J. Pepper, M. Kail, M. Helix, R. B. Elder, D. S. Jansen, N. E. Harff,
J. F. Prairie, E. S. Daniel, and B. K. Gilbert, “Self-aligned InP DHBT with ft and
fmax over 300 GHz in a new manufacturable technology,” IEEE Electron Device
Lett., vol. 25, no. 8, pp. 520–522, Aug. 2004.
[2] R. R. Spencer and M. S. Ghausi, Introduction to Electronic Circuit Design, 1st ed.
Upper Saddle River, NJ: Prentice Hall, 2003.
[3] A. S. Sedra and K. C. Smith, Microelectronic Circuits, 4th ed. New York: Oxford
University Press, 1998.
[4] W. K. Owens, P. Chu, D. Wong, and C. Chu, “8-bit ECL microprocessor slices with
subnanosecond performance,” in Int. Solid State Ciruits Conf. Dig. Tech. Papers,
Feb. 1979, pp. 42–43.
[5] R. Rathbone, H. Ernst, H. Glock, and U. Schwabe, “A 1024-bit ECL RAM with
15-ns access time,” in Int. Solid State Ciruits Conf. Dig. Tech. Papers, Feb. 1976,
pp. 188–189.
[6] N. M. Desai, R. Agrawal, J. G. Vachhani, V. R. Gujraty, and S. S. Rana, “High
speed data acquisition systems for ISRO’s airborne and spaceborne radars,” in Int.
Conf. Electromagnetic Interference and Compatibility, Dec. 2003, pp. 29–36.
[7] B. S. Goda, J. F. McDonald, S. R. Carlough, J. T. W. Krawcyzk, and R. P. Kraft,
“SiGe HBT BiCMOS FPGAs for fast reconfigurable computing,” IEE Proc. Comput. Digit. Tech, vol. 147, no. 3, pp. 189–194, May 2000.
[8] A. Gutierrez-Aitken, J. Matsui, E. Kaneshiro, B. Oyama, D. Sawdai, A. Oki, and
D. Streit, “Ultra high speed direct digital synthesizer using InP DHBT technology,”
in GaAs IC Symp. Tech. Dig., Oct. 2001, pp. 265–268.
[9] J. C. Zolper. (2004, Dec.) Challenges and opportunities for InP HBT
mixed signal circuit technology. DARPA website. [Online]. Available: http:
//www.darpa.mil/mto/tfast/presentations/challanges oppurtunities.pdf
[10] H. W. Johnson and M. Graham, High-Speed Digital Design.
NJ: PTR Prentice Hall, 1993.

Englewood Cliffs,

[11] D. A. Hodges, H. G. Jackson, and R. Saleh, Analysis and Design of Digital Intergrated Circuits, 3rd ed. New York: McGraw-Hill, 2004.
[12] P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon,
“High-performance microprocessor design,” IEEE J. Solid-State Circuits, vol. 33,
no. 5, pp. 676–686, May 1998.

147

[13] (2004, Dec.) TFAST overview. DARPA website. [Online]. Available: http:
//www.darpa.mil/mto/tfast/
[14] J. C. Zolper, “Challenges and opportunities for InP HBT mixed signal circuit technology,” in Int. Conf. on Indium Phosphide and Related Materials, May 2003, pp.
8–11.
[15] J. C. Zolper, “Status, challenges, and future opportunities for compound semiconductor electronics,” in GaAs IC Symp. Tech. Dig., Nov. 2003, pp. 3–6.
[16] Z. Griffith, M. Dahlstrom, M. J. W. Rodwell, M. Urtega, R. Pierson, P. Rowell,
B. Brar, S. Lee, N. Nguyen, and C. Nguyen, “Ultra high frequency static dividers > 150 GHz in a narrow mesa InGaAs/InP DHBT technology,” in Proc.
Bipolar/BiCMOS Circuits and Tech. Meeting, Sept. 2004, pp. 176–179.
[17] Q. Lee, D. Mensa, J. Guthrie, S. Jaganathan, T. Mathew, Y. Betser, S. Krishnan,
S. Ceran, and M. J. W. Rodwell, “66 GHz static frequency divider in transferredsubstrate HBT technology,” in IEEE RFIC Symp., June 1999, pp. 87–90.
[18] A. Gutierrez-Aitken, E. Kaneshiro, B. Tang, J. Notthoff, P. Chin, D. Streit, and
A. Oki, “69 GHz frequency divider with a cantelevered base InP DHBT,” in Int.
Electron Devices Meeting Tech. Dig., Dec. 1999, pp. 779–782.
[19] E. Sovero and B. Li, “Monolithic InP HBT w-band VCO-static divider,” in IEEE
MTT-S Int. Microwave Symposium Dig., June 2004, pp. 1325–1328.
[20] M. Sokolich, C. Fields, G. Raghavan, D. A. Hitko, M. Lui, D. P. Docter, Y. K.
Brown, M. G. Case, A. R. Kramer, J. A. Henige, and J. F. Jensen, “Optimizing
InP HBT technology for 50 GHz clock-rate MSI circuits,” in Int. Conf. on Indium
Phosphide and Related Materials, May 1999, pp. 195–198.
[21] K. Washio, E. Ohue, K. Oda, R. Hayami, M. Tanabe, and H. Shimamoto, “Optimizatio of characteristics related to the emitter-base junction in self-aligned SEG
SiGe HBTs and their application in 72-GHz-static/92-GHz-dynamic frequency dividers,” IEEE Trans. on Electron Devices, vol. 49, no. 10, pp. 1755–1760, Oct.
2002.
[22] M. Sokolich, C. Fields, B. Shi, Y. K. Brown, M. Montes, R. Martinez, A. R.
Kramer, S. T. III, and M. Madhav, “A low power 72.8 GHz static frequency divider implemented in AlInAs/InGaAs HBT IC technology,” in GaAs IC Symp.
Tech. Dig., Nov. 2000, pp. 81–84.
[23] Y. Yamauchi, O. Nakajima, K. Nagata, H. Ito, and T. Ishibashi, “A 34.8 GHz 1/4
static frequency divider using AlGaAs/GaAs HBTs,” in GaAs IC Symp. Tech. Dig.,
Oct. 1989, pp. 121–124.
[24] J. F. Jensen, W. E. Stanchina, R. A. Metzger, T. Liu, and T. V. Kargodorian, “36
GHz static digital frequency dividers in AlInAs-GaInAs HBT technology,” in Dig.
Research Conf., June 1991, pp. VIA 5–0 78.
148

[25] J. F. Jensen, M. Hahizi, W. E. Stanchina, R. A. Metzger, and D. B. Rensch, “39.5GHz static frequency divider implemented in AlInAs/GaInAs HBT technology,”
in GaAs IC Symp. Tech. Dig., Oct. 1992, pp. 101–104.
[26] M. Sokolich, D. P. Docter, Y. K. Brown, A. R. Kramer, J. F. Jensen, W. E.
Stanchina, S. T. III, C. H. Fields, D. A. Ahmari, M. Liu, and J. Duvall, “A low
power 52.9 GHz static divider implemented in a manufacturable 180 GHz AlInAs/InGaAs HBT IC technology,” in GaAs IC Symp. Tech. Dig., Nov. 1998, pp.
117–120.
[27] T. Mathew, H.-J. Kim, D. Scott, S. Jaganathan, S. Krishnan, Y. Wei, M. Urtega,
S. Long, and M. J. W. Rodwell, “75 GHz ECL static frequency divider using InAlAs/InGaAs HBTs,” Electronics Lett., vol. 37, no. 11, pp. 667–668, May 2001.
[28] S. Krishnan, Z. Griffith, M. Urtega, Y. Wei, D. Scott, M. Dahlstrom,
M. Parthasarathy, and M. Rodwell, “87 GHz static frequency divider in an InPbased mesa DHBT technology,” in GaAs IC Symp. Tech. Dig., Oct. 2002, pp.
294–296.
[29] M. Mokhtari, C. Fields, and R. D. Rajavel, “100+ GHz static divide-by-2 circuit in
InP-DHBT technology,” in GaAs IC Symp. Tech. Dig., Oct. 2002, pp. 291–293.
[30] D. A. Hitko, T. Hussain, J. F. Jensen, Y. Royter, S. L. Morton, D. S. Matthews,
R. D. Rajavel, I. Milosavljevic, C. H. Fields, S. Thomas, A. Kurdoghlian, Z. Lao,
K. Elliott, and M. Sokolich, “A low power (45mW/latch) static 150GHz CML
divider,” in IEEE Compund Semiconductor Integrated Circit Symp, Oct. 2004, pp.
167–170.
[31] A. Tahara, K. Hashimoto, H. Katakura, I. Amano, T. Deguchi, and S. Sudo, “Lowpower high-speed ECL circuit with 0.5-µm rule and 30-GHz ft technology,” in
Proc. Bipolar/BiCMOS Circuits and Tech. Meeting, Sept. 1989, pp. 169–172.
[32] M. Kurisu, Y. Sasayama, M. Ohuchi, A. Sawairi, M. Sugiyama, H. Takemura, and
T. Tashiro, “A Si biploar 21 GHz 320 mW static frequency divider,” in Int. Solid
State Ciruits Conf. Dig. Tech. Papers, Feb. 1991, pp. 158–160.
[33] A. Felder, R. Stengl, J. Hauenschild, H.-M. Rein, and T. F. Meister, “Static frequency dividers for high operating speed (25 GHz, 170 mW) and low power (16
GHz, 8 mW) in selective epitaxial Si bipolar technology,” Electronics Lett., vol. 29,
no. 12, pp. 1072–1074, June 1993.
[34] A. Felder, M. Möller, J. Popp, J. Böck, M. Rest, H.-M. Rein, and L. Treitinger, “30
GHz static 2:1 frequency divider and 46 Gb/s multiplexer/demultiplexer ICs in a
0.6µm Si bipolar technology,” in Symp. VLSI Tech. Dig., June 1995, pp. 117–178.
[35] J. Böck, A. Felder, T. F. Meister, M. Franosch, K. Aufinger, M. Wurzer, R. Schreiter, S. Boguth, and L. Treitinger, “A 50 GHz implanted base silicon bipolar technology with 35 GHz static frequency divider,” in Symp. VLSI Tech. Dig., June
1996, pp. 108–109.
149

[36] M. Wurzer, T. F. Meister, H. Schäfer, H. Knapp, J. Böck, R. Stengl, K. Aufinger,
M. Franosch, M. Rest, M. Möller, H.-M. Rein, and A. Felder, “42 GHz static
frequency divider in a Si/SiGe bipolar technology,” in Int. Solid State Ciruits Conf.
Dig. Tech. Papers, Feb. 1997, pp. 122–123.
[37] K. Washio, R. Hayami, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and
M. Kondo, “67-GHz static frequency divider using 0.2µm self-aligned SiGe
HBTs,” in IEEE RFIC Symp., June 2000, pp. 31–34.
[38] E. Ohue, R. Hayami, H. Shimamoto, and K. Washio, “5.3-ps ECL and 71-GHz
static frequency divider in self-aligned SEG SiGe HBT,” in Proc. Bipolar/BiCMOS
Circuits and Tech. Meeting, Oct. 2001, pp. 26–29.
[39] H. Knapp, M. Wurzer, T. F. Meister, K. Aufinger, J. Böck, S. Boguth, and
H. Schäfer, “86 GHz static and 110 GHz dynamic frequency dividers in SiGe bipolar technology,” in IEEE MTT-S Int. Microwave Symposium Dig., vol. 2, June 2003,
pp. 1067–1070.
[40] A. Rylyakov and T. Zwick, “96 GHz static frequency divider in SiGe bipolar technology,” in GaAs IC Symp. Tech. Dig., Nov. 2003, pp. 288–290.
[41] “AD9858 data sheet,” Analog Devices, Norwood, MA, 2003.
[42] K. R. Elliott, “Direct digital synthesis for enabling next generation systems,” in
IEEE Compund Semiconductor Integrated Circit Symp, Nov. 2005, pp. 125–128.
[43] “STEL-2375B direct digital chirp synthesizer (DDCS) data sheet,” ITT Industries
Microwave Systems, Lowell, MA.
[44] ADS-432-X03 ADS-432-X03A Operating Instructions, Meret Optical Communications, San Diego, CA, 2000.
[45] Operating Instructions ADS-431-403 Frequency Synthesizer ADS-431-403A
Quadrature Frequency Synthesizer, Meret Optical Communications, San Diego,
CA.
[46] B.-D. Yang, J.-H. Choi, S.-H. Han, L.-S. Kim, and H.-K. Yu, “An 800 MHz lowpower direct digital frequency synthesizer with an on-chip D/A converter,” IEEE
J. Solid-State Circuits, vol. 39, no. 5, pp. 761–774, May 2004.
[47] X. Yu, F. F. Dai, Y. Shi, and R. Zhu, “2 GHz 8-bit CMOS ROM-less direct digital
frequency synthesizer,” in IEEE Int. Symp. on Circuits and Systems, vol. 5, May
2005, pp. 4397–4400.
[48] M. Le, G. He, R. Hess, P. Partyka, B. Li, R. Bryie, S. Rustomji, G. Kim, R. Lee,
J. Pepper, M. Helix, R. Milano, R. Elder, D. Jansen, F. Stroili, J.-W. Lai, and
M. Feng, “Self-aligned InP DHBTs for 150GHz digital and mixed signal circuits,”
in Proc. IEEE International Conference on Indium Phosphide and Related Materials, Glasgow, Scotland, May 2005, pp. 325–330.
150

[49] W. Snodgrass, B.-R. Wu, W. Hafez, K.-Y. Cheng, and M. Feng, “Graded base typeII InP/GaAsSb DHBT with ft = 475 GHz,” IEEE Electron Device Lett., vol. 27,
no. 2, pp. 84–86, Feb. 2006.
[50] S. E. Turner, R. B. Elder, Jr., D. S. Jansen, and D. E. Kotecki, “4-bit adderaccumulator at 41-GHz clock frequency in InP DHBT technology,” IEEE Microwave Wireless Compon. Lett., vol. 15, no. 3, pp. 144–146, Mar. 2005.
[51] S. E. Turner and D. E. Kotecki, “Direct digital synthesizer with sine-weighted
DAC at 32 GHZ clock frequency in InP DHBT technology,” IEEE J. Solid-State
Circuits, accepted for publication.
[52] C. C. McAndrew, J. A. Seitchik, D. F. Bowers, M. Dunn, M. Foisy, I. Getreu,
M. McSwain, S. Moinian, J. Parker, D. J. Roulston, M. Schröter, P. van Wijnen,
and L. F. Wagner, “VBIC95, the vertical bipolar inter-company model,” IEEE J.
Solid-State Circuits, vol. 31, no. 10, pp. 1476–1483, Oct. 1996.
[53] J. M. Rabaey, Digital Integrated Circuits: A Design Perspective, ser. Prentice Hall
Electronics and VLSI Series. Upper Saddle River, New Jersey: Prentice-Hall,
1996.
[54] R. H. Katz, Contemporary Logic Design. Benjamin/Cummings Publishing, 1994.
[55] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, Analysis and Design of
Analog Integrated Circuits, 4th ed. New York: John Wiley and Sons, 2001.
[56] B. Razavi, Y. Ota, and R. G. Swartz, “Design techniques for low-voltage highspeed digital bipolar circuits,” IEEE J. Solid-State Circuits, vol. 29, no. 3, pp.
332–339, Mar. 1994.
[57] G. Schuppener, C. Pala, and M. Mokhtari, “Investigation on low-voltage lowpower silicon bipolar design topology for high-speed digital circuits,” IEEE J.
Solid-State Circuits, vol. 35, no. 7, pp. 1051–1054, July 2000.
[58] C. G. Eckroot and S. I. Long, “A GaAs 4-bit adder-accumulator circuit for direct
digital synthesis,” IEEE J. Solid-State Circuits, vol. 23, no. 2, pp. 573–580, Apr.
1988.
[59] T. Mathew, S. Jaganathan, D. Scott, S. Krishnan, Y. Wei, M. Urtega, M. Rodwell,
and S. Long, “2-bit adder carry and sum logic circuits clocking at 19 GHz clock
frequency in transferred substrate HBT technology,” in Proc. IEEE International
Conference on Indium Phosphide and Related Materials, Nara, Japan, May 2001,
pp. 505–508.
[60] S. Turner, “Single-level parallel-gated carry/majority circuits and systems
therefrom,” World Intellectual Property Organization Patent Application
PCT/US2005/024 010, Feb. 2, 2006.

151

[61] S. E. Turner and D. E. Kotecki, “Benchmark results for high-speed 4-bit accumulators implemented in indium phosphide dhbt technology,” in International Journal
of High Speed Electronics and Devices, vol. 14, no. 3, Aug. 2004, pp. 646–651.
[62] S. E. Turner and D. E. Kotecki, “Direct digital synthesizer with ROM-less architecture at 13-GHz clock frequency in InP DHBT technology,” IEEE Microwave
Wireless Compon. Lett., vol. 16, no. 5, pp. 296–298, May 2006.
[63] S. Fukushima, C. F. C. Silva, Y. Muramoto, and A. J. Seeds, “Optoelectronic
synthesis of milliwat-level multi-octave millimeter-wave signals using an optical
frequency comb gnerator and a unitraveling-carrier photodiode,” IEEE Photon.
Technol. Lett., vol. 13, no. 7, pp. 720–722, July 2001.
[64] A. M. ElSayed and M. I. Elmasry, “Phase-domain fractional-n frequency synthesizers,” IEEE Trans. Circuits Syst. I, vol. 51, no. 3, pp. 440–449, Mar. 2004.
[65] A. S. Gupta, D. A. Howe, C. Nelson, A. Hati, F. L. Walls, and J. F. Nava, “High
spectral purity microwave oscillator: Design using conventional air-dielectric cavity,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., vol. 51, no. 10, pp. 1225–
1231, Oct. 2004.
[66] B.-G. Goldberg, Digital Frequency Synthesis Demystified, ser. Demystified Series.
LLH Technology Publishing, 1999.
[67] J. Tierney, C. M. Rader, and B. Gold, “A digital frequency synthesizer,” IEEE
Trans. Audio Electroacoust., vol. AU-19, no. 1, pp. 48–57, Mar. 1971.
[68] L. Cordesses, “Direct digital synthesis: A tool for periodic wave generation (part
1),” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 50–54, July 2004.
[69] A. Madisetti, A. Y. Kwentus, and J. Alan N. Wilson, “A 100-MHz, 16-b, direct
digital frequency synthesizer with 100-dBc spurious-free dynamic range,” IEEE J.
Solid-State Circuits, vol. 34, no. 8, pp. 1034–1043, Aug. 1999.
[70] D. De Caro, E. Napoli, and A. G. M. Strollo, “Direct digital frequency synthesizers
with polynomial hyperfolding technique,” IEEE Trans. Circuits Syst. II, vol. 51,
no. 7, pp. 337–344, July 2004.
[71] J. Volder, “The CORDIC trigonometric computing technique,” IEEE Trans. Comput., vol. EC-8, pp. 330–334, Sept. 1959.
[72] K. A. Essenwanger and V. S. Reinhardt1, “Sine output DDSs a survey of the state
of the art,” in Proc. IEEE Freq. Cont. Symp., May 1998, pp. 370–378.
[73] Qualcomm ASIC Products, “Synthesizer products data book,” Qualcomm Inc.,
San Diego, CA, Tech. Rep., 1997.
[74] J. Vankka, “Spur reduction techniques in sine output direct digital synthesis,” in
Proc. IEEE Freq. Cont. Symp., June 1996, pp. 951–959.
152

[75] A. Gutierrez-Aitken, J. Matsui, E. N. Kaneshiro, B. K. Oyama, D. Sawdai, A. K.
Oki, and D. C. Streit, “Ultrahigh-speed direct digital synthesizer using InP DHBT
technology,” IEEE J. Solid-State Circuits, vol. 37, no. 9, pp. 1115–1119, Sept.
2002.
[76] B. Gilbert, “A precise four-quadrant multiplier with subnanosecond response,”
IEEE J. Solid-State Circuits, vol. SC-3, no. 4, pp. 365–373, Dec. 1968.
[77] V. F. Kroupa, V. Čı̈žek, J. Štursa, and H. Švandová, “Spurious signals in direct
digital frequency synthesizers due to phase truncation,” IEEE Trans. Ultrason.,
Ferroelect., Freq. Contr., vol. 47, no. 5, pp. 1166–1172, Sept. 2000.
[78] S. Manandhar, S. E. Turner, and D. E. Kotecki, “36-GHz, 16x6 bit ROM in InP
DHBT technology,” IEEE J. Solid-State Circuits, submitted for publication.

153

BIOGRAPHY OF THE AUTHOR
Steven Turner was born in Portland, Maine. He received his high school education from Westbrook High School in Westbrook, Maine in 1997.
He entered The University of Maine in 1997 and obtained his Bachelor of Science degree in Electrical Engineering and Computer Engineering in May 2001 and his
Master of Science degree in Computer Engineering in May 2003.
Since September 2001, he has served as a Research Assistant at The University
of Maine. During the summer months he has served internships at Tundra Semiconductor in South Portland, Maine and BAE Systems in Nashua, New Hampshire. His current
research interests include high-speed digital and mixed-signal microelectronics design.
He is a member of IEEE, Tau Beta Pi, and Eta Kappa Nu.
He is a candidate for the Doctor of Philosophy degree in Electrical Engineering
from The University of Maine in May 2006.

154

