Abstract-Recent research into artificial neural networks algorithm is based on the variation of weight perturbation [5] (ANN) has highlighted the potential of using compact analogue algorithms; it adds a penalty term for power consumption in ANN hardware cores in embedded mobile devices, where the objective function. We have applied our algorithm on a power consumption of ANN hardware is a very significant sample class AB ANN described in [4] for various implementation issue. This paper proposes a learning classification and function approximation tasks. The results mechanism suitable for low-power class AB type analogue of these experiments are discussed in this paper. As specialized ANN hardware finds many potential additional penalty-term in the cost function of learning applications in mobile embedded devices, the power algorithms, which penalizes overly high model consumption becomes a major issue [3]. Since shrinking complexity(also known the penalty term pruning) [6;7]. biasing voltages makes it difficult to process high resolution data in voltage-mode, there has been increasing emphasis on 0(w) E(w) ± Xc C(w)
techniques for this are known as complexity regularization,
The compact size and low power dissipation of analogue which aims to prevent the learning algorithm from over-ANN has made it an attractive choice for hardware and it has fitting the training data by restricting the complexity of the attracted considerable research efforts in recent years [1;2] .
ANN function. A popular approach is to include an
As specialized ANN hardware finds many potential additional penalty-term in the cost function of learning applications in mobile embedded devices, the power algorithms, which penalizes overly high model consumption becomes a major issue [3] . Since shrinking complexity(also known the penalty term pruning) [6;7] . biasing voltages makes it difficult to process high resolution data in voltage-mode, there has been increasing emphasis on 0(w) E(w) ± Xc C(w) (1) the low power current mode (CM) implementation of the O(w) is the objective function that is to be minimized ANN [4] , which gives better results at lower bias. The Class with respect to weight vector w, the vector of synaptic AB CM implementations are particularly attractive options weights. E(w) is the error function, usually the Mean as they remove the necessity to maintain large bias current Squared Error (MSE) over the training samples. C(w) is the levels (leading to very low-power consumption) and this complexity penalty term. ),, the regularization parameter allows the input signal magnitude to exceed the bias current determines the influence of the complexity penalty on the (improving calculation precision) [3] . current i, is incoming current from the previous layer) The O(w) = E(w) + )-P(w) (2) total power consumption of ANN is assumed to be the sum of total power consumption of all the multiplier and P(w) is the penalty term for power 2) Probeni benchmark dataset We also carried out experiments on the number of C. Observations problem available in the Probenl benchmark datasets [9] 1) The power regularization parameter u (various real-life classification and approximation tasks).
The algorithm is quite sensitive to the value of) and it is
The experiments where performed on the 'pivot difficult to tune. We have tried number of different strategies architecture' [9] for each problem with shortcut connections.
to set )p. The most successful strategy amongst our
Precision of all the calculations and weights were restricted experiments was to first train network with p = 0 to obtain to 0.001 to reflect the limited precision available in the minimum validation error and then slowly increasing )p.
analogue hardware, and the weights and calculation results This strategy is similar to the complexity regularization were scaled and restricted within the interval of [-10,10] to strategies described in [7] . When the training was stared with reflect the limitation imposed by limited operating range of non-zero ) , ANN was generally unable to reduce the error the transistor devices. Table 1 . Benchmark. This is not surprising, because initially with (further details of each benchmark problems can be found in 4=o, learning has only a single objective; while after the [9] ). It can be seen from the table that our proposed approach power-aware learning is switched-on, the network is achieves significant power reduction in a variety of complex attempting more complex multi-objective learning. For the problems without increasing error.
problems attempted from Prbenl benchmarks, the ratio Tpowerl T_error is typically [5] [6] [7] [8] [9] [10] In this paper, we have proposed a novel power-aware In class AB CM ANN design, the lower values of learning mechanism for class AB analogue neural network weights is likely to consume low power due to the direct VLSI which is suitable for on-chip implementation. relationship between signal levels and power consumption.
Experiments on the standard Probenl benchmark problems Hence back propagation learning with a weight decay [6] indicate that it is capable of significant power reduction over complexity regularization mentioned in section 2.1 can also a wide range of problems. Key observations on training drive such circuits towards lower power consumption. With time, regularization parameter and issue of generalization this in mind we need to consider the potential advantages of were discussed. The proposed algorithm shows significant using the suggested power-aware weight perturbation in advantages over the other possible low power training comparison with the weight decay regularization. Power-method i.e. weight decay regularization. The implications of aware weight perturbation has several appealing aspects: this work are that an on-chip implementation could lead to 1) Weight decay using (1) is basically aimed at the significant benefits for practical ANN applications in mobile complexity reduction for improved generalisation and not for embedded devices.
power reduction. The relation between P(w) and C(w) can
