Electric Analog Circuit Design with Hypernetworks and a Differential
  Simulator by Rotman, Michael & Wolf, Lior
ELECTRIC ANALOG CIRCUIT DESIGNWITH HYPERNETWORKS
AND A DIFFERENTIAL SIMULATOR
Michael Rotman1 and Lior Wolf1,2
1 Tel Aviv University 2 Facebook AI Research
ABSTRACT
The manual design of analog circuits is a tedious task of pa-
rameter tuning that requires hours of work by human experts.
In this work, we make a significant step towards a fully au-
tomatic design method that is based on deep learning. The
method selects the components and their configuration, as
well as their numerical parameters. By contrast, the current
literature methods are limited to the parameter fitting part
only. A two-stage network is used, which first generates a
chain of circuit components and then predicts their parame-
ters. A hypernetwork scheme is used in which a weight gen-
erating network, which is conditioned on the circuit’s power
spectrum, produces the parameters of a primal RNN network
that places the components. A differential simulator is used
for refining the numerical values of the components. We show
that our model provides an efficient design solution, and is su-
perior to alternative solutions.
Index Terms— Analog Circuits, Sequence Generation,
Hypernetworks.
1. INTRODUCTION
An analog circuit is an electric circuit that supports a contin-
uous range of voltages. The information it contains is usu-
ally encoded as a time-varying signal. Analog circuits are
key elements in the construction of many electronic systems.
The building blocks of these circuits are electric components,
such as resistors, transistors, diodes, etc. The task of design-
ing an analog circuit is considered a difficult combinatorial
task, since each component behaves differently according to
the circuit configuration.
The focus of our work is the complete design of one type
of an analog circuit, namely the two-port analog circuit. The
two-port circuit can be depicted as a one-dimensional chain
of varying linear electric components such as resistors, ca-
pacitors and inductors. While solving for the circuit’s power
spectrum, i.e., the output voltage and current as a function
of the frequency, is a rather easy task. The inverse problem,
that is, designing a circuit to hold certain properties is a chal-
lenging task. The reason is that as the number of compo-
nents increases, the number of different circuit combinations
rises exponentially, making brute-force approaches to circuit
reconstruction unfeasible. Furthermore, even the replacement
of one electric component by another in a given circuit config-
uration would typically result in a completely different power
spectrum.
In this work, we present an end-to-end solution to the
complete design problem of two-port linear analog circuits.
Instead of encoding the power spectrum into a latent space,
later fed to a decoder as an input as in a traditional sequence
generator, a hypernetwork f encodes the power spectrum di-
rectly to the space of the recurrent neural network (RNN) de-
coder g weights. Our contributions include (i) unlike previous
work on recurrent hypernetworks [1], we produce the weights
of the RNN g from a convolutional neural network, (ii) our
method incorporates the domain knowledge using a differen-
tial simulator, which enables the interchange between discrete
variables and continuous ones, and (iii) as far as we can ascer-
tain, this is the first method to infer the circuit structure from
its power spectrum.
1.1. Related Work
The analysis and design of analog circuits is an applied field
that has been extensively studied [2, 3]. Circuit analysis aims
to compute different induced properties, such as the output
voltage and current, given a design. The inverse problem,
i.e., the estimation of the components and parameters using
various voltage and current measurements, has also been ad-
dressed [4, 5]. Deep reinforcement learning was previously
applied to a subset of analog circuits in order to estimate
circuit parameters from output measurements [6]. However,
these methods aim to solve for the circuit’s parameters, given
that the circuit configuration is already known.
The hypernetwork [1] approach utilizes one network in
order to learn the weights of another network. This network
design has been used successfully in many tasks in computer
vision [7], language modeling [8] and sequence decoding [9].
While conventional sequence decoders vary the hidden state
and the input sequence between recurrent steps, and the con-
ditioning on either the initial state or the input changes be-
tween one instance to the next, hypernetworks allow for more
elaborate adaption, by changing the weights of the recurrent
network itself.
ar
X
iv
:1
91
1.
03
05
3v
1 
 [c
s.L
G]
  8
 N
ov
 20
19
Vin
Iin
1 Ω
1 mF 0.5 µH
Iout
+
−
Vout
Fig. 1. An example of a two-port analog circuit with three
components. From left to right, an alternating power supply,
Vin, a capacitator in parallel with a capacity of 1 [mF ], a re-
sistor in series with a resistance of 1 [Ω] and an inductor with
an inductance of 0.5 [µH].
2. PROBLEM FORMULATION
A Two-Port circuit is an analog circuit with four terminal
nodes. Two are connected to an alternating power supply, and
the other two are used for the circuit’s output. Linear ana-
log circuits utilize three different electric components: resis-
tors, capacitors and inductors. Each component comes with
a different numerical value (parameter), and is connected in
a different alignment, either in parallel or in series. The con-
figuration, S, of a two-port circuit of length n is an ordered
list of tuples S = { (ai, ci, vi)| 1 ≤ i ≤ n}. Each tuple de-
scribes the the ith electric component alignment(ai), type (ci)
and value (vi). An example of a circuit with a length n = 3
composed of a capacitor, a resistor and an inductor can be
seen in Fig. 1.
Each two-port circuit is characterized by two complex
functions, V (k) : R → C and I(k) : R → C. These func-
tions determine the voltage and current measured over the two
output nodes given a frequency k. The problem of designing
a two-port analog circuit can be formulated as a mapping be-
tween two characteristic complex functions V and I , as sam-
pled at d different frequencies, to the circuit configuration, S.
Two-Port Circuit Symmetries The number of different
two-port circuits of length n is (2ncnv)
n, where nc is the
number of different electric components, and nv is the num-
ber of different values each of these components might yield.
However, due to symmetries governed by Kirkhoff’s laws,
there are some circuits which are indistinguishable from one
another. Given a configuration S , any successive subset of
components connected with the same alignment, in parallel
or in series, could be permuted to produce S ′, which is char-
acterized by the same voltage and current functions as S.
Since the required mapping is not one-to-one, we pro-
pose a canonical ordering of a circuit. Under this order-
ing, resistors are always followed by capacitators, which are
followed by inductors. Furthermore, electric components
from the same type in the same alignment configuration, are
ordered by their numerical value, vi. For example, the non-
canonical configuration {(S,R, 1.0) ,(P,R, 0.5), (P,C, 0.1),
(P,R, 0.05)} contains two sub-configurations,{(S,R, 1.0)}
and {(P,R, 0.5) , (P,C, 0.1), (P,R, 0.05)}, and corresponds
to the canonical configuration {(S,R, 1.0), (P,R, 0.05),
(P,R, 0.5), (P,C, 0.1)} where S and P are the possible
alignments (series or parallel), and R (C) stands for a resistor
(capacitator).
The number of canonical two-port circuits of length n can
be derived from the following generating function:
P (z, nc, nv) =
1
2(1− z)nvnc − 1 =
1 + 2nvncz +
(
nvnc + 3(nvnc)
2
)
z2 + . . . (1)
The number of canonical circuits of length n corresponds to
the coefficient of zn in Eq. (1). Since this is a generating
function of a geometrical sum, the coefficient of zn is of the
order of O (nnvn
n
c ).
3. METHOD
While small variations to the different components’ numeri-
cal values {vi}ni=1 do not change the characteristic functions
drastically, changing the component alignments and types
does. This suggests that a two-step method ought to be used:
(phase I) given V and I sampled at d frequencies, the compo-
nents’ alignment and type are inferred. Components’ values
candidates, {v˜i}ni=1, are also proposed at this stage. (phase II)
to refine these values, we simulate the circuit using these can-
didates with a differential simulator and optimize the values
only to obtain the final value {vi}ni=1.
Since the length of a proposed circuit configuration, S,
varies, a recurrent neural network (RNN), g, generates the cir-
cuit’s configuration, S. Naively, the input x, to such a generat-
ing network is some embedding of the characteristic functions
V and I , f (V, I), so that S = g (f (V, I) ,Wg), where Wg
is the set of weights of g. However, we suggest employing a
different scheme, in which one neural network explicitly pre-
dicts the weights of another. This hypernetwork autoencoder
setup essentially produces a different decoder per input, be-
cause unlike the usual autoencoder, the decoder g’s weights
vary, based on the characteristic function Wg = f (V, I).
As the weight generating network, f , we utilize a variant
of the MultiScale Resnet(MS-Resnet) [10]. The MS-Resnet
consists of three branches, intended to capture different 1D
signal scales. Each branch is constructed using three consec-
utive Resnet Blocks [11]. There are three sizes of receptive
fields, 3, 5, 7 assigned to convolutional layers of f ’s branches.
The output of the branches is then averaged and projected to a
vector space with dimension d = 256. It is then concatenated
and passed to a fully connected layer to produce an output
of dimension d = 26, 380 – the exact number of learnable
parameters in the primary network, g.
Network g is an a RNN utilizing a Gated Recurrent
Unit [12] with a hidden layer size of 64. At each time step,
the outputs of g are passed to three fully connected layers,
each corresponding to a different target, alignment, type and
numerical value. The input of g at each time step is composed
of an application of a ReLU activation on the embeddings of
the previous electric component.
The embedding of an electric component is the concatena-
tion of three sub-embedding, each representing a different at-
tribute: alignment, type and numerical value. The dimension
of the alignment sub-space is 2, whereas the dimension of the
type and numerical value sub-spaces is 31. The numerical
values are quantized to five values and all three embeddings
are generated by a learned look up table (LUT).
3.1. Signal Normalization
Since the characteristic functions image varies greatly, we
normalized these functions using a tanh function, so the input
to hypernetwork f consists of 4 stacked channels:
tanh (Re (V )) tanh (Im (V )) tanh (Re (I)) tanh (Im (I))
Both V and I were sampled on d = 512 frequencies running
on a logarithmic scale from 1 [Hz] to 1 [MHz].
3.2. Circuit Simulator
We constructed a differentiable circuit simulator using Py-
Torch [13]. This simulator calculates the characteristic func-
tions V and I given a circuit configuration S, which allows
the estimation of the values of various components given a
required signal by back-propagating throughout the simula-
tion. Estimation of the characteristic function of a circuit
can be followed by the consecutive multiplication of ABCD-
parameter matrices T [14],(
Vout
−Iout
)
= Tn · · ·T1
(
Vin
Iin
)
(2)
Each of these matrices contains complex numbers, and the
following representation was used:
a+ ib =
(
a b
−b a
)
, (3)
i.e., replacing complex number operations with matrix ones.
3.3. Training
We use three cross entropy losses throughout training, where
each matches a different property of the electric component,
L = LAlignment + LType + LValue (4)
where each Li is Li = −
ni∑
ci=1
yo,ci log (po,ci), and yo,ci
equals 1 if and only if ci is the class of input o. Our network
was trained for 700 epochs with a learning rate of 10−4 and
using Teacher Forcing [15] with a probability of 0.5. Picking
~0
< SOS > (a1, c1, v˜1) (a2, c2, v˜2) (a3, c3, v˜3) (a4, c4, v˜4)
(a1, c1, v˜1) (a2, c2, v˜2) (a3, c3, v˜3) (a4, c4, v˜4) < EOS >
Multi Scale ResNet
1 10 100 1000 1e4 1e+05 1e+06
Frequency [Hz]
0
1
Amplitude
Voltage (V (k))
1 10 100 1000 1e4 1e+05 1e+06
Frequency [Hz]
0
1
Amplitude
Current (I (k))
~0
< SOS > (a1, c1, v˜1) (a2, c2, v˜2) (a3, c3, v˜3) (a4, c4, v˜4)
(a1, c1, v˜1) (a2, c2, v˜2) (a3, c3, v˜3) (a4, c4, v˜4) < EOS >
Multi Scale ResNet
1 10 100 1000 1e4 1e+05 1e+06
Frequency [Hz]
0
1
Amplitude
Voltage (V (k))
1 10 100 1000 1e4 1e+05 1e+06
Frequency [Hz]
0
1
Amplitude
Current (I (k))
RNN Weights
Fig. 2. Our proposed architecture. The voltage and current
functions are sampled at 512 frequencies and are fed into a
Multi-Scale Resnet f , which outputs the weight matrices of a
GRU g. The GRU outputs the circuit configuration S.
the best model over the validation set was accomplished by
the model achieving the lowest partial loss function,
L = LAlignment + LType , (5)
since fixing the parameter values given the correct configura-
tion is relatively easy.
3.4. Inference
While in real-life applications the numerical values of each
electric component are discretized, in general the values
themselves are continuous. An infinitesimal change to a com-
ponent’s value should result only in a slight change in the
characteristic function. In order to benefit from this prop-
erty, we refine the network predictions with the differentiable
circuit simulator.
Given a candidate circuit configuration, S = {(ai, ci, v˜i)},
generated by the hypernetwork decoder g given V, I , we op-
timize the following L2 loss:
LS = 1
d
d∑
i=1
∣∣Vi − V (S)∣∣2 + ∣∣Ii − I (S)∣∣2 , (6)
with a Adam optimizer (learning rate of 0.01), where V and
I are the simulated characterstic functions computed by the
differential simulator. Optimization stops once LS < 10−8.
3.5. Genetic Algorithm
A genetic algorithm was used as a baseline for comparison.
The population size was set to 100 with 10 elite samples kept
aside from each generation. The samples were mutated with
a probability of 0.01, where a mutation to the configuration
could be one of the following: an addition of a random elec-
tric component, a removal of an electric component or the re-
placement of an electric component by another one. Breeding
between different circuit configurations has also taken place
at each generation. The probability of selecting a sample xi
for the next generation was proportional to e−LS . The algo-
rithm was executed for 1000 generations.
4. EXPERIMENTS
We applied our method on the canonical circuit dataset, which
contains circuit configurations and their corresponding char-
acteristic functions. The dataset is split into three sets: train-
ing, validation and testing. The training set consists of cir-
cuit configurations of lengths n = 1, 2, .., 10, where for n =
1, 2, 3, all possible canonical circuit configurations were in-
cluded. For n = 4, .., 10, 1, 120 random circuit configura-
tions were drawn. In total, the training set contains 23, 870
samples. The validation set and test set contain random cir-
cuit configurations with lengths n = 4, .., 10, with 480 and
400 samples from each length, respectively. In total, the vali-
dation (test) set contains 3, 360 (2800) samples.
We evaluated our method on several scenarios. As a clas-
sical baseline, we have applied a genetic algorithm, as de-
scribed in Sec. 3.5. Another baseline we experimented with
is a vanilla GRU where the hidden representation obtained
by f is fed to the decoder g as the hidden state at t = 0.
Next, we applied the hypernetwork scheme with and without
our differential simulator. As ablation variants, we have also
experimented with a variant where f does not infer g’s classi-
fication and look up table weights, in this case, f only infers
the GRU weights.
As a success metric, we do not compare the characteris-
tic functions using a distance metric since the functions vary
on a logarithmic scale, and the euclidean distance between
two completely different, low amplitude functions is smaller
than similar high amplitude functions. In addition, even after
a normalization, low frequency points contribute to the dis-
tance much more than high frequency points, which creates a
highly unbalanced distance metric. Instead, we employ two
different classification metrics. A complete classification is
correct, when all the electric components in the configura-
tion were correctly inferred, in their location, alignment and
quantized numerical values. A value agnostic classification
is correct when all the electric components in the configura-
tion were correctly inferred, in their location and alignment.
The complete classification accuracy obtained with the differ-
ent methods on the test set of the canonical circuit dataset are
shown in Tab. 1. The value agnostic classification accuracy
on the same set is presented in Tab. 2.
As can be seen, our method greatly outperforms the base-
lines, except for length 4 where the genetic algorithm is able
to cover enough of the search space. In addition, the dif-
ferential render produces an additional improvement, since it
manages to move some values between quantized bins. There
is also clear benefit for using a hypernetwork for predicting
both the set of weights of the classifier’s fully-connected lay-
Table 1. A comparison for the classification accuracy over
the test set for the different methods.
Length Genetic
Algo-
rithm
GRU GRU
per
length
Ours,
GRU-
only
hypernet
Ours
w/o
simula-
tor
Ours
4 0.58 0.01 0.30 0.07 0.30 0.32
5 0.28 0 0.08 0.05 0.41 0.42
6 0.09 0 0.02 0.04 0.50 0.51
7 0.03 0 0 0.01 0.46 0.46
8 0.01 0 0 0 0.44 0.44
9 0 0 0 0 0.42 0.42
10 0 0 0 0 0.42 0.43
Table 2. A comparison for the classification accuracy over
the test set for the different methods while ignoring the clas-
sification of the numerical values of the components.
Length Genetic
Algo-
rithm
GRU GRU
per
length
Ours,
GRU-
only
hypernet
Ours
4 0.66 0.03 0.59 0.26 0.36
5 0.35 0 0.21 0.26 0.45
6 0.15 0 0.10 0.18 0.54
7 0.05 0 0.03 0.10 0.47
8 0.02 0 0 0.04 0.49
9 0 0 0 0.01 0.47
10 0 0 0 0 0.46
ers, as well as the look up tables. This required adaptivity
demonstrates that each circuit utilizes the electric component
embeddings differently, and could hint that our architecture
is able to learn high level “semantics” rather than just low-
level “syntax”. Note that one may expect that the results of
our method would decay with respect to the circuits’ length,
however, as our model learns over a range of lengths, it best
predicts at mid-range.
Despite considerable efforts, the vanilla GRU has not pro-
vided any satisfactory results. Therefore, we trained separate
GRU models for different sequence lengths. As can be seen
in both tables, the baseline GRUs are not competitive.
5. CONCLUSIONS
We proposed a method that, to the best of our knowledge, is
the first method to infer the circuit’s structure given a power-
spectrum. Our method outperforms both evolutionary and
deep learning baselines by a large margin. The differential
simulator we introduce incorporates the domain knowledge
in a direct way and is able to further enhance our results.
The method can be generalized in a straightforward manner
to other sequence design tasks of 1D specifications.
6. REFERENCES
[1] David Ha, Andrew Dai, and Quoc V Le, “Hypernet-
works,” arXiv preprint arXiv:1609.09106, 2016.
[2] Behzad Razavi, Design of Analog CMOS Integrated
Circuits, McGraw-Hill, Inc., New York, NY, USA, 1
edition, 2001.
[3] Mourad Fakhfakh, Esteban Tlelo-Cuautle, and Fran-
cisco Fernandez, Design of Analog Circuits through
Symbolic Analysis, Bentham Science Publishers, 01
2012.
[4] M. Iordache, L. Dumitriu, L. Mandache, and D. Niculae,
“On analog circuit parameter estimation,” in 2012 Inter-
national Conference on Applied and Theoretical Elec-
tricity (ICATE), Oct 2012, pp. 1–6.
[5] Deng Yong and Zhang He, “Parameter estimation of
analog circuits based on the fractional wavelet method,”
Journal of Semiconductors, vol. 36, no. 3, pp. 035006,
2015.
[6] Hanrui Wang, Jiacheng Yang, Hae-Seung Lee, and Song
Han, “Learning to design circuits,” in NIPS Workshop
on Machine Learning for Systems, 12 2018.
[7] Gidi Littwin and Lior Wolf, “Deep meta functionals for
shape representation,” in International Conference on
Computer Vision, 2019.
[8] Joseph Suarez, “Language modeling with recurrent
highway hypernetworks,” in Advances in neural infor-
mation processing systems, 2017, pp. 3267–3276.
[9] Eliya Nachmani and Lior Wolf, “Hyper-graph-network
decoders for block codes,” 2019.
[10] Fei Wang, Jinsong Han, Shiyuan Zhang, Xu He,
and Dong Huang, “Csi-net: Unified body charac-
terization and action recognition,” arXiv preprint
arXiv:1810.03064, 2018.
[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[12] Kyunghyun Cho, Bart Van Merrie¨nboer, Caglar Gul-
cehre, Dzmitry Bahdanau, Fethi Bougares, Holger
Schwenk, and Yoshua Bengio, “Learning phrase rep-
resentations using rnn encoder-decoder for statistical
machine translation,” arXiv preprint arXiv:1406.1078,
2014.
[13] Adam Paszke, Sam Gross, Soumith Chintala, Gregory
Chanan, Edward Yang, Zachary DeVito, Zeming Lin,
Alban Desmaison, Luca Antiga, and Adam Lerer, “Au-
tomatic differentiation in PyTorch,” in NIPS Autodiff
Workshop, 2017.
[14] K Suresh Kumar, Electric circuits and networks, Pear-
son Education India, 2008.
[15] Ronald J Williams and David Zipser, “A learning algo-
rithm for continually running fully recurrent neural net-
works,” Neural computation, vol. 1, no. 2, pp. 270–280,
1989.
