We study the problem of formal verification of Binarized Neural Networks (BNN), which have recently been proposed as a powerefficient alternative to more traditional learning networks. More precisely, given a trained BNN and a relation between possible inputs and outputs of this BNN, we develop verification procedures for establishing that the BNN indeed meets this specification for all possible inputs. For solving the verification problem of BNNs we build on well-known methods for hardware verification. The BNN verification problem is first encoded as a combinational miter. In a second step this miter is then transformed into a corresponding propositional satisfiability (SAT) problem. The main contributions of this paper are a number of essential optimizations for making this approach to BNN verification scalable. First, we provide a transformation on fully conntected BNNs for reducing the order of the number of bitwise operations in each layer of the BNN from quadratic to linear. Second, we are identifying redundant computations in a BNN based on optimal factoring techniques, and we provide transformations on BNNs for avoiding these multiple computations. We prove that the problem of optimal factoring is NP-hard, and we design efficient search procedures for generating approximate solutions of the optimal factoring problem. Third, we design a compositional verification procedure for analyzing each layer of a BNN separately, and for iteratively combining and refining local verification results. We experimentally demonstrate the scalability of our verification techniques to moderately-sized BNNs for embedded applications with thousands of neurons and inputs.
Introduction
Artificial neural networks have become essential building blocks in realizing many automated and even autonomous systems. They have successfully been deployed, for example, for perception and scene understanding [15, 19, 23] , for control and decision making [5, 12, 17, 26] , and also for end-to-end solutions of autonomous driving scenarios [3] . Implementations of artificial neural networks, however, need to be made much more power-efficient in order to deploy them on typical embedded devices with their characteristically limited resources and power constraints. Moreover, the use of neural networks in safety-critical systems poses severe verification and certification challenges [1] .
Binarized Neural Networks (BNN) have recently been proposed [7, 14] as a potentially much more power-efficient alternative to more traditional feedforward artificial neural networks. Their main characteristics are that trained weights, inputs, intermediate signals and outputs, and also activation constraints are binary-valued. Consequently, forward propagation only relies on bit-level arithmetic. Since BNNs have also demonstrated good performance on standard datasets in image recognition such as MNIST, CIFAR-10 and SVHN [7] , they are an attractive and potentially power-efficient alternative to current floating-point based implementations of neural networks for embedded applications.
In this paper we study the verification problem for BNNs. Given a trained BNN and a specification of its intended input-output behavior, we develop verification procedures for establishing that the given BNN indeed meets its intended specification for all possible inputs. Notice that naively solving verification problems for BNNs with, say, 1000 inputs requires investigation of all 2 1000 different input configurations.
For solving the verification problem of BNNs we build on well-known methods and tools from the hardware verification domain. We first transform the BNN and its specification into a combinational miter [4] , which is then transformed into a corresponding propositional satisfiability (SAT) problem. In this process we rely heavily on logic synthesis tools such as ABC [4] from the hardware verification domain. Using such a direct neuron-to-circuit encoding, however, we were not able to verify BNNs with thousands of inputs and hidden nodes, as encountered in some of our embedded systems case studies. The main challenge therefore is to make the basic verification procedure scale to BNNs as used on current embedded devices.
The main contributions of this paper are the development of a number of essential optimizations for scaling the verification of BNNs to networks with thousands of inputs and neurons in the hidden layers. Based on the specific structure of BNNs we propose several BNN-specific optimizations for generating simpler constraints, including XNOR extraction and factoring of counting units. We prove that it is NP-hard to find an optimal factoring for counting units, and we develop a polynomial time heuristic algorithm which achieves, on average, around 60% sharing for counting computations in our experiments. Lastly, for BNN specifications with separable input and output conditions, we develop a compositional verification approach, which is based on separately solving local SAT problems for each hidden layer of nodes.
The paper is structured as follows. Section 2 defines basic notions and concepts underlying BNNs. Section 3 presents our verification workflow including BNN-specific optimizations such as XNOR extraction (Section 3.3) and for the factoring of counting units (Section 3.4). Our compositional approach to BNN verification is described in Section 4. We summarize experimental results with our verification procedure in Section 5, compare our results with related work index j 0 (bias node) 1 2 3 4 x Table 1 . An example of computing the output of a BNN neuron, using bipolar domain (up) and using 0/1 boolean variables (down). from the literature in Section 6, and we close with some final remarks and an outlook in Section 7.
Preliminaries
Let B be the set of bipolar binaries ±1, where +1 is interpreted as true and −1 as false. A Binarized Neural Network (BNN) [7, 14] consists of a sequence of layers labeled from l = 0, 1, . . . , L, where 0 is the index of the input layer, L is the output layer, and all other layers are so-called hidden layers. Superscripts (l) are used to index layer l-specific variables. Elements of both inputs and outputs vectors of a BNN are of bipolar domain B.
Layers l are comprised of nodes n (l) i (so-called neurons), for i = 0, 1, . . . , d (l) , where d (l) is the dimension of the layer l and is even. By convention, nodes of index 0 are bias nodes and have a constant bipolar output +1. Nodes n (l−1) j of layer l − 1 are connected with nodes n (l) i in layer l by means of directed edges of weight w (l) ji ∈ B. We use w (l) i ∈ B |d (l−1) |+1 to represent the array of all weights associated with neuron n (l) i . Supporting additional structures such as max-out layers is usually straightforward. Notice that we consider all weights in a network to have fixed bipolar values.
Given an input to the network, computations are applied successively from neurons in layer 1 to L for generating outputs. Figure 1 shows how a neuron of a BNN computes under the bipolar domain. In hidden layers, first a weighted sum is performed to compute the intermediate value. Recall that as the number of neurons (including the bias node) in hidden layers is odd, the weighted sum can never be equal to 0. Now, the activation function is applied to the intermediate value, which outputs +1 if the weighted sum is greater than 0 and −1 otherwise. For output layers (l = L), the activation function is omitted.
For l = 1 to L we use x (l) i to denote the output value of node n (l) i and x (l) i (a 1 , . . . , a d ) denotes the output value x (l) i for the input a 1 , . . . , a d , sometimes abbreviated by x (l) i (a). We also use x (l) ∈ B |d (l) |+1 to be the array of all outputs from layer i, including the constant bias node; x (0) refers to the input.
For a given BNN and a relation φ risk specifying the undesired property between the bipolar input and output domains of the given BNN, the BNN verification problem holds if there exists no input a to the BNN such that the risk property φ risk (a, b) holds, where b is the output of the BNN for input a.
Verification of Binarized Neural Networks
Our basic approach is to reduce the BNN verification problem to a corresponding hardware verification problem. BNNs might be encoded as a combinational circuit, since they are deterministic functions. Now, the BNN verification problem is encoded by means of a combinational miter [4] , which is a hardware circuit with only one Boolean output and the output should always be 0.
The main step of this encoding is to replace the bipolar domain operation in the definition of BNNs with corresponding operations in the 0/1 Boolean domain. For example, operations such as previously mentioned max-out max(x
i4 ) in the bipolar domain may be implemented by means of logical disjunction in the Boolean domain.
From Bipolar to Boolean Domain
We recall the encoding of the update function of an individual neuron of a BNN in bipolar domain (Eq. 1) by means of operations in the 0/1 Boolean domain [7, 14] : (1) perform a bitwise XNOR (⊕) operation, (2) count the number of 1s, and (3) check if the sum is greater than or equal to the half of the number of neurons in the previous layer (for fully connected layers). Table 1 illustrates the concept by providing the detailed computation for a neuron connected to five predecessor nodes. Therefore, the update function of a BNN neuron in the Boolean domain is as follows.
where count1 simply counts the number of 1s in an array of Boolean variables, and
, and 0 otherwise. Notice that the value |d (l−1) |+1 2 is constant for a given BNN.
Likewise, specifications in the bipolar domain can also be easily encoded in the Boolean domain. Let (x 
An illustrative example is provided in Table 1 , where im
In the remaining of this paper we assume that properties are always provided in the Boolean domain.
From BNN to hardware verification
We are now ready for stating the basic decision procedure (Algorithm 1) for solving BNN verification problems. This procedure first constructs a combinational miter for a BNN verification problem, followed by a an encoding of the combinational miter into a corresponding propositional SAT problem. Here we are relying on standard transformations techniques as implemented in logic synthesis tools such as ABC [4] or Yosys [27] for constructing SAT problems from miters.
Line 1 in Algorithm 1 sets the miter input size to the input size of the BNN; by definition the miter output size is 1. Lines 2 to 3 build, for each neuron, a hardware module module[n (l) i ] to be contained as a submodule of miter. These modules (cmp. Equation 1) differ only in their weight constants. Lines 4 connects an input port of the current layer with associated output ports of the previous layer for realizing the topological structure of the given network. Lines 5 connect the first hidden layer with inputs of the miter.
As the first neuron (of index j = 0) in each layer is a bias node, Line 6 replaces the input to be fed into by constant 1. At Line 7, the output of the circuit is set to be the given specification φ risk with proper variable renaming. Finally, Line 8 generates a propositional SAT formula in conjunctive normal form and Line 9 checks for possible violations of the given BNN specification based on solving the constructed SAT problem.
In the remainder of this section we describe some essential transformations and optimizations of the verification procedure in Algorithm 1 for solver largerscale BNN verification problems.
Algorithm 1: Decision procedure for BNN verification problems.
Data: BNN network description (see Section 2) and an input-output spec φ risk Result: Whether there exists an input a such that φ risk (
XNOR optimization
The XNOR optimization reduces the number of bit-wise XNOR operations in each layer. Based on Equation 1 and the computation w
ji equals 0 -the operation is the same for every neuron in the same layer. Overall, there are d (l−1) XNOR computations involving x (l−1) j (as each neuron needs to compute it once) but only two different results are possible.
Therefore, we describe a transformation for caching the results of these XNOR computations and for sharing these results among the nodes of each layer. For each layer l, two modules module
performs bit-wise XNOR operation between x (l) and a weight of all 1s, and module (l) ⊕ 0 performs bitwise XNOR operation between x (l) and a weight of all 0s.
Then for inputs x = 1 for the first input. Altogether, for a fully connected BNN for every layer l, the XNOR optimization step reduces the number of required XNOR operations from d (l) × d (l−1) (cmp. Algorithm 1) to 2 × d (l−1) .
Counting optimization
Another essential optimization involves the identification and factoring of redundant counting units. We illustrate these concepts on the basis of the network in Figure 2 . Notice that the network in Figure 3 Fig. 3 . Maximum counting factoring for the neural network in Fig. 2 (a) , and the result of factoring in the generated miter (b). Definition 2 (non-overlapping factorings). Two factorings f 1 = (I 1 , J 1 ) and f 2 = (I 2 , J 2 ) are non-overlapping when the following condition folds: if (i 1 , j 1 ) ∈ f 1 and (i 2 , j 2 ) ∈ f 2 , then either i 1 = i 2 or j 1 = j 2 . In other words, weights associated with f 1 and f 2 do not overlap.
Definition 3 (k-factoring optimization problem). The k-factoring optimization problem searches for a set F of size k factorings {f 1 , . . . , f k }, such that any two factorings are non-overlapping, and the total saving sav(f 1 ) + · · · + sav(f k ) is maximum.
For the example in Fig. 3 , there are two non-overlapping factorings f 1 = ({1, 2}, {0, 2}) and f 2 = ({2, 3}, {1, 3, 4, 5}). {f 1 , f 2 } is also an optimal solution for the 2-factoring optimization problem, with the total saving being (2 − 1) · 2 + (2 − 1) · 4 = 6.
Even finding one factoring f 1 which has the overall maximum saving sav(f 1 ), is computationally hard. This NP-hardness result is established by a reduction from the NP-complete problem of finding maximum edge biclique in bipartite graphs [21] . 1 Theorem 1 (Hardness of factoring optimization). The k-factoring optimization problem, even when k = 1, is NP-hard.
Proof. The proof proceeds by a polynomial reduction from the problem of finding maximum edge biclique in bipartite graphs [21] . Given a bipartite graph G, this reduction is defined as follows.
1.
For v 1α , the α-th element of V 1 , create a neuron n (l) α . 2. Create an additional neuron n (l) δ 3. For v 2β , the β-th element of V 2 , create a neuron n
This construction can clearly be performed in polynomial time. Figure 4 illustrates the construction process. It is not difficult to observe that G has a maximum edge size κ biclique {A; B} iff the neural network at layer l has a factoring (I, J) whose saving equals (|I| − 1) · |J| = κ. The gray area in Figure 4 shows the structure of maximum edge biclique {{1, 2}; {6, 8}}. For Figure 4 -c, the saving is (|{n
3 }| − 1) · 2 = 4, which is the same as the edge size of the biclique. As factoring optimization is computationally hard, we present a polynomial time heuristic algorithm (Algorithm 2) that finds factoring possibilities among neurons in layer l. The main function (lines 1 to 10) uses the set "used" (line 2) to track if pair (i, j) has been used in some factoring. Then it tries to find the best factoring for each neuron, where the factoring is stored as f opt i (line 3, 4). ji to be included. We use again the example in Fig. 3 where ji should never be included. The (3, {1, 2, 3}) in Fig. 5-b is thus trimmed to (3, {1, 2}). Line 19 prepares the final return value f ij which will be updated in line 26. Line 20 creates list by sorting omap such that it is ascending based on the size (as in Fig. 5-d) . Lastly, lines 21-26 traverse the list; for each element (i 1 line 21-22), check if it is possible to reuse the set i 1 as factoring (olap.get(i 1 ) ⊆ olap.get(i 2 )) by traversing the list after i 1 (line 24), and update f ij is the generated saving is larger (lines [25] [26] . For the example in Fig. 5 , the final returned factoring is thus f 10 = {{1, 2}, {0, 2, 3}} with sav(f 10 ) = 4.
When one encounters a huge number of neurons and long weight vectors, we further partition neurons and weights into smaller regions as input to Algorithm 2. By doing so, we find factoring possibilities for each weight segment of a neuron. In our implementation, the tool starts with a partition which takes a span of 50 neurons and 100 weights, followed by another round of analysis taking a span of 100 neurons and 200 weights. We also fix the weights to be a fixed value of 31, such that the output of factored counting uses 5 bits.
BN N1 BN N2
φin φout Fig. 6 . Viewing a BNN as connecting two BNNs, where the output of BNN 1 is the input of BNN 2 (a), and the decomposed two SAT problems in (b) and (c). The set S of shared variables among two SAT problems is {x
3 }.
Structural decomposition for BNN verification
We now describe localized constraint solving techniques which is based on the layered structure of BNNs. This method restricted to specifications of the form φ risk = φ in ∧ φ out , where φ in is a constraint over inputs of a BNN and φ out is a constraint over outputs of the BNN. In a first step, a BNN is split into subsequent BNNs for each hidden layer. For ease of explanation we restrict ourselves in the following, however, in splitting a a BNN into two subsequent BNNs by performing a cut on two consecutive hidden layers(see Figure 6 ). The first BNN, denoted by BNN 1 in Figure 6 -a, takes as input the original inputs and outputs a set S of intermediate variables. We refer to S as a set of shared variables, because these are the inputs of the second network (BNN 2 in Fig. 6-a) , which has the same outputs as the original BNN. We denote a formula F in clausal normal form (CNF) as a set of clauses, and the two BNNs are encoded by CNFs F 1 and F 2 respectively, as described previously. F 1 (resp. F 2 ) contains the restrictions on the inputs (resp. outputs) of the original BNN that are given by the property to be verified. As BNN 1 has output size greater than one, a combinational miter can not be constructed directly. We follow the standard approach of equi-satisfiability and introduce additional input variables and equality gates, based on the construction in Figure 6 
Based on the decomposition, Algorithm 3 determines if the property φ risk = φ in ∧ φ out can ever be true. It first creates a SAT-solver instance for each CNF formula (line 2, 3). Initially, it asks if F 1 is satisfiable (line 4); when F 1 is unsatisfiable, as F 1 is built upon the neural network and the condition φ in , the unsatisfiability implies that there exists no input assignment to satisfy φ in -the risk property is trivially unsatisfiable because φ in is unsatisfiable.
The main loop (lines 5-13) terminates when a definite answer is found or it is aborted due to timeout. First, check the satisfiability of F 2 . If it is unsatisfiable (line 6), as the unsatisfiability implies that no input of BNN 2 can violate the property, return false. Otherwise, proceed with a satisfying assignment α of F 2 (line 8), where assignments over shared variables S are inputs to BNN 2 that make φ out hold. We need to check if it is possible for BNN 1 to produce such an assignment over S. This is based on fixing shared variables S in F 1 to their values in α by adding a unit-clause to F 1 for each s ∈ S (line 9).
If after fixing the assignment over shared variables, solver 1 returns SAT (line 10), then there exists an input for BNN 1 to make the risk property hold (due to chained reasoning). Thus the algorithm returns true.
Otherwise (solver 1 .solve() = UNSAT), instead of directly inhibiting solver 2 from producing the same assignment and continue the loop, the algorithm checks if it can generalize from the assignment to be excluded. This is achieved by learning a clause to be excluded in F 2 from a minimal unsatisfiability core (MUC) of F 1 (line 12). Importantly, within the MUC, there is at least one unit clause over the variables S, as F 1 without fixing any variable is known to be SAT (cf. line 4), thus the result of UNSAT is due to the fixing of at least one shared variable. To this end, line 13 forbids a partial assignment by extracting the set of unit clauses over the shared variables extract unit clause MUC(U α , S)). Line 14 unfixes the variables from F 1 and the loop continues. The process guarantees termination and is complete, as line 13 in the worst case, prohibits the same assignment created by line 8. 
Implementation and Evaluation
We are now describing experimental results for our BNN verification techniques and optimizations. For this purpose we have created a verification tool, which first reads a BNN description based on the Intel Nervana Neon framework 3 . Then, the BNN verification tool generates a combinational miter in Verilog and calls Yosys [27] and ABC [4] for generating a CNF formula. No further optimization commands (e.g., refactor) are executed inside ABC to create smaller CNFs. Finally, Cryptominisat5 [24] is used for solving SAT queries. To validate that optimization algorithms are correctly implemented, we created randomized examples with 10 inputs, and feed 2 10 input combinations in the Verilog testbench to examine the waveform for both unoptimized and optimized version. The experiments are conducted in a Ubuntu 16.04 Google Cloud VM equipped with 18 cores and 300 GB RAM, with Cryptominisat5 running with 16 threads.
For our experiments we use three different datasets, namely the MNIST dataset for digit recognition [16] , the German traffic sign dataset [25] , and lastly a randomized dataset for evaluating the performance of factoring. We further binarize the gray scale data to ±1 before actual training. For the traffic sign dataset, every pixel is quantized to 3 Boolean variables. Table 2 summarizes the experimental results for networks with 400 inputs and 600 neurons in hidden layers, where binarized weights are randomly created. As can be observed from the data, for harder-to-verify problems, our optimization techniques can sometimes create substantial savings in verification time. Table 3 summarizes the result of verification in terms of SAT solving time, with a timeout set to 2 hours. The properties that we use here are characteristics of a BNN given by numerical constraints over outputs, such as "simultaneously classify an image as a priority road sign and as a stop sign with high confidence". We gradually increase the numeric value of the confidence of the classification to create satisfiable properties as well as unsatisfiable properties for benchmarks. Clearly, the proposed optimization techniques are essential to, first, formally establish these kind of safety properties (UNSAT), and secondly, to create a counter example (SAT) for BNNs with thousands of inputs and neurons. However, we also observe that solvers like Cryptominisat5 might get trapped in some very hard-to-prove properties. Regarding the instances in Table 3 where the result is unknown, we suspect that the numeric value of the confidence, where the property flips from satisfiable to unsatisfiable is around 60%, i.e., very close to the chosen value in these instances. This makes SAT solving on such cases extremely difficult for solvers as the instances are close to the "border" between SAT and UNSAT instances.
Related Work
There has been a flurry of recent results on formal verification of neural networks (e.g. [6, 8, 13, 18, 22] ). These approaches usually target the formal verification of floating-point arithmetic neural networks (FPA-NNs). Huang et al. propose an (incomplete) search-based technique based on satisfiability modulo theories (SMT) solvers [11] . For FPA-NNs with ReLU activation functions, Katz et al. propose a modification of the Simplex algorithm which prefers fixing of binary variables [13] . This verification approach has been demonstrated on the verification of a collision avoidance system for UAVs. In our own previous work on neural network verification we establish maximum resilience bounds for FPA-NNs based on reductions to mixed-integer linear programming (MILP) problems [6] . The feasibility of this approach has work has demonstrated, for example, by verifying a motion predictor in a highway overtaking scenario. The work of Ehlers [8] is based on sound abstractions, and approximates non-linear behavior in the activation functions. Scalability is the overarching challenge for these formal approaches to the verification of FPA-NNs. Case studies and experiments reported in the literature are usually restricted to the verification of FPA-NNs with a couple of hundred neurons.
To the best of our knowledge the work presented in this paper is the first to specifically target the formal verification of BNNs, and is also the first to use methods and tools from the hardware verification domain, and to develop specialized transformations and optimizations for verifying BNNs (with thousands of inputs and neurons). The compositional approach for layer-wise verification of (deep) neural networks as presented in this paper is also novel.
Researchers from the machine learning domain (e.g. [9, 10, 20] ) target the generation of adversarial examples for debugging and retraining purposes. Adverserial examples are slightly perturbed inputs (such as images) which may fool a neural network into generating undesirable results (such as "wrong" classifications). Using satisfiability assignments from the SAT solving stage in our verification procedure, we are also able to generate counterexamples to the BNN verification problem. Our work, however, goes well beyond current approaches to generating adverserial examples in that it does not only support debugging and retraining purposes. Instead, our verification algorithm establishes formal correctness results for neural network-like structures.
Conclusions
We are solving the problem of verifying BNNs by reduction to the problem of verifying combinatorial circuits, which itself is reduced to solving SAT problems. Altogether, our experiments indicate that this hardware verification-centric approach, in connection with our BNN-specific transformations and optimizations, scales well to BNNs with thousands of inputs and nodes. This kind of scalability makes our verification approach attractive for automatically establishing correctness results at least for moderately-sized BNNs as used on current embedded devices.
Our developments for efficiently encoding BNN verification problems, however, might also prove to be useful in optimizing forward evaluation of BNNs. In addition our verification framework may also be used for debugging and retraining purposes of BNNs; for example, for automatically generating adverserial inputs from failed verification attempts.
Our layer-wise verification algorithm exhibits promising initial experimental results. However, this compositional approach still needs to be thoroughly eval-uated on deeper network structures in the future. It may also be interesting to generalize the layer-wise BNN verification algorithm to other feed-forward, and also recurrent, neural network structures.
In the future we also plan to directly synthesize propositional clauses without the support of 3rd party tools such as Yosys in order to avoid extraneous transformations and repetitive work in the synthesis workflow. Similar optimizations of the current verification tool chain should result in substantial performance improvements. It might also be interesting to investigate incremental verification techniques for BNN, since weights and structure of these learning networks might adapt and change continuously.
Finally, our proposed verification workflow might be extended to synthesis problems, such as synthesizing bias terms in BNNs without sacrificing performance or for synthesizing weight assignments in a property-driven manner. These kinds of synthesis problems for BNNs are reduced to 2QBF problems, which are satisfiability problems with a toplevel exists-forall quantification. The main challenge for solving these kinds of synthesis problems for the typical networks encountered in practice is, again, scalability.
