Abstract-Since human beings have limited perceptual abilities, in many digital signal processing (DSP) applications, e.g., image and video processing, the outputs do not need to be computed accurately. Instead, they can be approximated so that the area, delay, and/or power dissipation of the design can be reduced. This paper presents an approximation algorithm, called AURA, for the multiplierless design of the constant matrix vector multiplication (CMVM) which is a ubiquitous operation in DSP systems. AURA aims to tune the constants such that the resulting matrix leads to a CMVM design which requires the fewest adders/subtractors, satisfying the given error constraints. This paper also introduces its modified version, called AURA-DC, which can reduce the delay of the CMVM operation with a small increase in the number of adders/subtractors. Experimental results show that the proposed algorithms yield significant reductions in the number of adders/subtractors with respect to the original realizations without violating the error constraints, and consequently, lead to CMVM designs with less area, delay, and power dissipation. Moreover, they can generate alternative CMVM designs under different error constraints, enabling a designer to choose the one that fits best in an application.
I. INTRODUCTION
The constant matrix vector multiplication (CMVM) operation realizes the multiplication of an m×n constant matrix C by an n×1 variable vector X, i.e., y j = ∑ k c jk x k , with 0 ≤ j ≤ m−1 and 0 ≤ k ≤ n−1. It appears in hybrid form finite impulse response filters [1] and linear DSP transforms, such as discrete cosine transforms (DCTs) and Hadamard and Reed-Muller transforms [2] . Depending on the size of the constant matrix and the values of constants, the hardware complexity of the CMVM design may be dominated by a large number of multipliers. Since the realization of a multiplier in hardware is expensive in terms of area and power dissipation and the constants of the matrix C are determined beforehand, the CMVM operation is generally implemented using only shifts and adders/subtractors [2] . Note that shifts by a constant value can be realized using only wires which represent no hardware cost. Thus, the CMVM problem is defined as finding a minimum number of adders/subtractors which realize the CMVM operation. Over the years, many efficient methods, whose main task is to maximize the sharing of common subexpressions, have been introduced for the CMVM problem [2] - [10] .
Approximate computing refers to a class of methods that relax the requirement of exact equivalence between the specification and implementation of a computing system [11] . This relaxation allows trading the accuracy of numerical outputs for reductions in area, delay, or power dissipation of the design [12] . Research activities on approximate computing range from transistor level to algorithmic level [13] - [22] .
This paper introduces an efficient method, AURA, for the approximation of the CMVM operation under the shift-adds architecture. Given the original constant matrix C and the error constraints to be satisfied, AURA aims to find an optimized matrix C ′ , where the total number of nonzero digits of constants is minimum. As it has been observed, this is due to the fact that the constants with a smaller number of nonzero digits lead to a CMVM design with a smaller number of adders/subtractors [13] . To this end, finding such a constant matrix is formulated as a 0-1 integer linear programming (ILP) problem. Since there may exist many such matrices, each leading to a CMVM design with a different number of adders/subtractors, in each iteration, AURA finds one of them, computes the number of occurrences of all possible 2-term subexpressions in the corresponding CMVM operation, and assigns the matrix with the highest value of this parameter to the optimized matrix C ′ . Finally, the shift-adds realization of the optimized CMVM operation based on C ′ is found using the state-of-art method of [10] which considers the sharing of most common 2-term subexpressions. This paper also presents a modified version of AURA, AURA-DC, proposed to find a solution with the fewest adders/subtractors under a delay constraint given in terms of the number of adder-steps which denotes the maximum number of operations in series. Experimental results indicate that significant reductions in the number of operations and adder-steps can be obtained without violating the error constraint. It is shown on the 8 × 8 DCT designs that the solutions of the proposed methods lead to significant reductions in area, delay, and power dissipation according to the original 8 × 8 DCT design with a slight decrease of performance in image compression.
II. BACKGROUND
This section gives the background concepts related to the proposed algorithms and presents an overview on the algorithms used for the multiplierless design of the CMVM operation and on the approximation algorithms. 
A. Matrix Norms
The errors in the constant matrix C are measured using matrix norms. In the proposed algorithms, both normone (maximum column sum) and norm-infinity (maximum row sum), which are defined as ∥C∥ 1 = max k ∑ j |c jk | and ∥C∥ ∞ = max j ∑ k |c jk |, respectively, are used. Thus, the normone and norm-infinity error constraints are respectively formed as ∥C − C ′ ∥ 1 ≤ ε 1 and ∥C − C ′ ∥ ∞ ≤ ε ∞ , where ε 1 , ε ∞ > 0 are tolerable error values. Note that any other matrix norms can be easily adapted to the proposed algorithms.
B. 0-1 ILP Problem
The 0-1 ILP problem is the optimization of a linear objective function subject to a set of linear constraints and is generally defined as follows 1 :
In the objective function of the 0-1 ILP problem given in Eqn. 1, w i in w is a weight value associated with each variable x i , where 1 ≤ i ≤ k and w ∈ ℤ k . In Eqn. 2, A ⋅ x ≥ b denotes a set of j linear constraints, where b ∈ ℤ j and A ∈ ℤ j × ℤ k .
C. Multiplierless Design of the CMVM Operation
The linear transforms, which represent the CMVM operation, are obtained by multiplying each row of the constant matrix by the variable vector. A straightforward approach for the shift-adds design of the CMVM operation, called the digitbased recoding (DBR) technique [23] , has two steps: i) define the constants under a number representation, e.g., binary or canonical signed digit (CSD) 2 ; ii) for the nonzero digits in 1 The minimization objective can be converted to a maximization objective by negating the objective function. Less-than-or-equal and equality constraints are respectively accommodated by the equivalences,
. 2 An integer can be written in CSD using j digits as ∑ the representations of constants, shift the variables according to the digit positions and add/subtract the shifted variables with respect to the digit values.
As a simple example, consider the constant matrix C= [23 37; 11 25] and the corresponding linear transforms y 0 =23x 0 +37x 1 and y 1 =11x 0 +25x 1 . Their decompositions under CSD are given as follows:
requiring 10 operations, as shown in Fig. 1a .
The number of operations can be further reduced by sharing the common subexpressions. The common subexpression elimination (CSE) method of [3] finds all possible implementations of linear transforms by extracting only the 2-term subexpressions and formalizes the problem of maximizing the sharing of subexpressions as a 0-1 ILP problem. The exact CSE algorithm of [9] follows a similar approach, but considers all possible realizations of linear transforms. However, these methods can only be applied to CMVM instances with a small size of constant matrices due to the exponential growth in the size of 0-1 ILP problems. The CSE heuristics of [4] , [6] iteratively find the most common 2-term subexpression and replace it within the linear transforms. They differ in the selection of subexpressions that have the same number of occurrences. The CSE algorithm [2] iteratively searches a subexpression with the maximal number of terms and with at least 2 occurrences. The CSE heuristic of [8] chooses its subexpressions based on a cost value which is computed as the product of the number of terms in the subexpression and the number of its occurrences in the linear transforms. In turn, the algorithm of [5] initially computes the differences between each two linear transforms and determines their implementation cost values. Then, it uses a minimum spanning tree algorithm to find the realizations of linear transforms with differences, that have the minimum cost, and replaces the linear transforms with the required differences. The hybrid algorithm of [10] iteratively finds the most promising differences of linear transforms and applies an improved CSE heuristic to further reduce the complexity of the CMVM operation.
Returning to our example, the hybrid algorithm of [10] , HCMVM, finds a solution with 6 operations when the CSD representation is used in its CSE algorithm, sharing the common subexpressions x 0 +x 1 and 3x 0 +3x 1 (Fig. 1b) .
In many DSP systems, performance is a crucial parameter and circuit area is generally expandable in order to achieve a given performance target. Although the delay parameter is dependent on several implementation issues, such as placement and routing, the delay of a CMVM operation is generally considered in terms of the number of adder-steps. The minimum adder-steps of a linear transform y j is found in three steps: i) decompose its constants c jk under a given number representation; ii) find the total number of terms in its decomposed form T (y j ), determined as ∑ k S(c jk ), where S(c jk ) denotes the number of nonzero digits of the constant c jk under a given number representation; iii) compute ⌈log 2 T (y j )⌉ as if all its terms in the decomposed form were realized in a binary tree. Thus, the minimum adder-step of the CMVM operation, MAS CMV M , is the maximum of the minimum adder-step of its each linear transform, i.e., max j {⌈log 2 T (y j )⌉} [24] . The methods of [7] , [10] aim to find the fewest adders/subtractors realizing the CMVM block and satisfying a delay constraint greater than or equal to MAS CMV M .
For our example, the minimum adder-steps for y 0 and y 1 under the CSD representation is 3, and thus, MAS CMV M is computed as 3. The hybrid algorithm of [10] , HCMVM-DC, finds a solution with 7 operations when the delay constraint is set to 3 and the CSD representation is used in its CSE algorithm, sharing the common subexpressions 3x 0 and x 0 −33x 1 (Fig. 1c) . Observe that its solution has 1 more operation, but 2 less adder-steps compared to the solution of HCMVM (Fig. 1b) .
D. Approximate Computing Methods for DSP Systems
In the last two decades, many design techniques, circuits, and algorithms have been introduced for approximate computing. The reader is referred to [18] - [20] for detailed surveys on approximate, stochastic, and probabilistic computing. The approaches for the design of approximate DSP systems can generally be grouped in three categories: i) transistor-level; ii) gate-level, and iii) algorithmic-level. At transistor-level, the stochastic behavior of a binary switch under the influence of thermal noise is exploited and a probabilistic CMOS transistor model is used to realize first the arithmetic circuits, and then, the DSP systems [20] . At gate-level, approximate adders and multipliers are used to implement the DSP systems [21] , [22] . At the algorithmic-level, design methodologies have been introduced for approximate DSP systems [13] - [17] .
To the best of our knowledge, the proposed algorithms are the only methods that realize the approximation of the CMVM operation under the shift-adds architecture, considering the sharing of common subexpressions and satisfying the error constraints. Similar to the proposed algorithms, multiplier-free approximation of linear transforms was considered in [13] , [14] , but the constants are approximated to the nearest integer power of two without sharing of common subexpressions.
III. PROPOSED APPROXIMATION ALGORITHMS
Under a tolerable error at the outputs of the CMVM operation, the constants of C can be changed such that the resulting (optimized) matrix C ′ leads to a CMVM design with the fewest adders/subtractors or adders-steps. Note that a matrix with constants including a small number of nonzero digits yields a CMVM design with a small number of operations [13] . Hence, the proposed algorithms find a constant matrix with a minimum total number of nonzero digits of constants under the given number representation, denoted as N(C) which is computed as ∑ j ∑ k S(c jk ), satisfying the error constraints. In both AURA and AURA-DC, this problem is formulated as a 0-1 ILP problem. Observe that there may exist many matrices with the minimum N(C) value. Also, the hybrid algorithms of [10] , which consider the sharing of the most common 2-term subexpressions, are used to find the shift-adds design of the optimized CMVM operation in the proposed algorithms. Hence, both AURA and AURA-DC find more than one possible matrix with the minimum N(C) value, for each matrix, compute the total number of occurrences of all possible 2-term subexpressions, denoted as O(C), and return the one with the maximum O(C) value as the optimized matrix C ′ . These steps are described in detail in the following two subsections using the constant matrix C=[23 37; 11 25] as an example when ε 1 and ε ∞ are 2. First, for each entry in C, the following constraint is generated to ensure that only one constant is chosen from R jk .
Second, the norm-one error constraint is formulated by generating the following constraint for each column of C.
Third, the norm-infinity error constraint is formulated by generating the following constraint for each row of C. After the 0-1 ILP problem is generated, a generic 0-1 ILP solver is applied to find a solution. Each entry of the new constant matrix is determined by finding the variables vR jk i @ jk set to 1 by the 0-1 ILP solver. For our example given in Section II-C, the constant matrix [24 36; 10 24] is found when r = 4 and CSD representation is used. Note that while the N(C) value of this matrix is 8, the N(C) value is 12 for the original matrix.
Then, the corresponding linear transforms are obtained and
In order to find another matrix with the minimum N(C) value, the solution of the 0-1 ILP solver is turned into a constraint, indicating that it should not be found again. Suppose that a constant was chosen from each array of R 11 , R 12 ,...,R mn for each entry of the matrix and these constants were denoted as R 11 s , R 12 s ,...,R mn s where s stands for the index of the constant selected from each array. Thus, the following constraint is generated and added to the 0-1 ILP problem. This process iterates until a total of ni constant matrices are considered, where ni denotes the number of iterations, or a constant matrix with an N(C) value greater than the minimum is obtained. The constant matrix with the maximum O(C) value is determined to be the optimized matrix C ′ . For our example, C ′ is found as [24 36; 12 24] .
Finally, the shift-adds design of the CMVM operation based on C ′ is found using HCMVM. For our example, the solution of AURA has 4 operations and 3 adder-steps, as shown in Fig. 1d . We note that the solution of HCMVM on [24 36; 10 24] , i.e., the solution of AURA in the first iteration, has 5 operations.
B. Details of AURA-DC
AURA-DC is introduced to approximate the CMVM operation to have the fewest number of operations under the 2  36  68  132  11  15  23  4  144  272  528  31  39  55  8  576  1088  2112  95  111  143  16  2304  4352  8448  319  351  415  32  9216  17408  33792  1151  1215  1343  64  36864  69632  135168  4351  4479  4735 minimum number of adder-steps, satisfying the error constraints. To do so, whenever a constant matrix is found, its MAS CMV M value is computed and the one with the minimum MAS CMV M value is favored. In case of equality of the minimum MAS CMV M values, the one with the maximum O(C) value is preferred. Also, the shift-adds design of the CMVM operation based on C ′ is obtained using HCMVM-DC [10] when the delay constraint is set to its MAS CMV M value. For our example, AURA-DC finds the optimized matrix as AURA, i.e., [24 36; 12 24] . However, its multiplierless design has 5 operations and 2 adder-steps as shown in Fig. 1e . Compared to the solution of AURA (Fig. 1d) , it has 1 more operation, but 1 less adder-step. Observe that AURA and AURA-DC may find the same optimized matrix, but exploit different realizations.
C. Complexity of the Proposed Algorithms
Given the parameter r, the range of possible constants to be considered for each entry of the matrix is determined as [c jk −r, c jk +r]. Thus, the 0-1 ILP problem has (2r+1)mn variables. In the proposed algorithms, r is set to 4. In turn, the number of constraints in the 0-1 ILP problem generated in the first iteration is mn+m+n. Given the number of iterations ni, the number of constraints in the 0-1 ILP problem generated in the last iteration is mn+m+n+ni−1, if the proposed algorithms were not terminated in an earlier iteration. In the proposed algorithms, ni is set to m+n. Observe that r and ni have an effect only on the number of variables and constraints, respectively. However, the length of a constraint, i.e., the number of terms, is directly proportional to r. Table I presents the number of variables and constraints of the 0-1 ILP problem generated in the proposed algorithms for square matrices with different r and ni values. Note that the sizes of 0-1 ILP problems in Table I are in the reach of generic ILP solvers and real world DSP systems generally have m and n values less than or equal to 16.
Also, suppose that each linear transform y j has t j terms in its decomposed form. Thus, the number of 2-term subexpressions, to be searched for their occurrences, is ∑ j t j (t j − 1)/2 with 0 ≤ j ≤ m−1. Note that for a 2-term subexpression extracted from the jth transform y j , its occurrences are searched in y i with j ≤ i ≤ m − 1 and the subexpressions, whose occurrences have already been identified, do not need to be considered.
IV. EXPERIMENTAL RESULTS
This section presents the optimized results of the proposed algorithms on randomly generated constant matrices under different error constraints and compares them against the original results of these instances. It also introduces the results of original and optimized 8 × 8 DCTs, their results on image compression, and their synthesis results on application specific integrated circuit (ASIC) and field programmable gate arrays (FPGA) design platforms. We note that AURA and AURA-DC were written in MATLAB and run on a PC with Intel Xeon at 2.4GHz. Their solutions were found when the CSD representation was considered and SCIP2.0 [25] was used as a 0-1 ILP solver.
As the first experiment set, we used randomly generated n×n matrices with 8-bit constants, where n ranges between 2 and 16, in steps of 2. For each group, there were 50 matrices, a total of 400. Fig. 2 presents the results of HCMVM (denoted as original) and AURA (denoted as optimized) on these constant matrices when ε 1 and ε ∞ are .5n, n, and 2n. Note that CPU time of AURA includes the CPU time of HCMVM to find the shift-adds design of the optimized CMVM operation.
Observe from Fig. 2 that as the tolerance on the output error increases, the number of adders/subtractors required to realize the CMVM operation decreases, since the number of alternative constants to be considered in an entry of the matrix increases. Note that the highest reduction in the number of operations between the original and optimized results are found as 10.7%, 18.7%, and 28.2% when ε 1 and ε ∞ are .5n, n, and 2n, respectively. As the tolerance on the output error increases, the adder-step values of the CMVM operations also decrease. This is simply because the optimized matrix has the minimum N(C) value under the given error constraints. For example, on 16 × 16 instances, while the average N(C) value of the original matrices is 877.9, this value for the optimized matrices is 758.8, 691.3, and 671.4 when ε 1 and ε ∞ are .5n, n, and 2n, respectively. Since the number of 2-term subexpressions to be considered in HCMVM decreases as the N(C) value decreases, the run time of AURA decreases as the error tolerance increases and is getting smaller than the run time of HCMVM on the original matrices as n increases. Fig. 3 presents the results of HCMVM-DC (denoted as original) and AURA-DC (denoted as optimized) on the same constant matrices with the same error tolerances used for HCMVM and AURA in Fig. 2 . In HCVM-DC, the delay constraint was set to MAS CMV M . Note that the CPU time of AURA-DC includes the CPU time of HCMVM-DC to find the shift-adds design of the optimized CMVM operation.
The observations given for the results in Fig. 2 are also valid for the results in Fig. 3 . However, the solutions of HCMVM-DC and AURA-DC have adder-steps values significantly smaller than those of the solutions of HCMVM and AURA. But this comes with a penalty in the number of operations. Observe that the adder-step values of the original and optimized solutions are very close to each other, which is simply because both HCMVM-DC and AURA-DC target finding a solution with the minimum adder-step of the CAVM operation. It is also observed on 10 × 10 instances that some optimized matrices lead to designs with a smaller number of adder-steps, but with a larger number of operations with respect to the designs found based on the original matrix. This is the reason why there is not so large reduction in the number of operations on these instances with respect to others. Table II presents the results of AURA obtained using different r and ni values on 8 × 8 matrices when ε 1 and ε ∞ are 16. In this table, nzd denotes the average N(C ′ ) value of the optimized matrices, oper and step stand respectively for the average number of operations and adder-steps of the CMVM designs, and cpu is the average run time of AURA in seconds.
Observe from Table II that as r increases, increasing the range of possible constants to be considered for each entry of the matrix, the number of required operations decreases, except when r and ni are 16. This is simply because the number of alternative constants increases, as r increases, reducing the N(C) value of the matrix. Note that the case when r and ni are 16 may occur, because there may exist many possible matrices with the minimum N(C) value and all of them cannot be considered in a given number of iterations, ni. Also, as ni increases, the number of operations decreases, because more matrices are considered. Observe that 1.2 operations on average are saved when r is 16 and ni is 64 with respect to the case when r is 4 and ni is 16, which are the actual parameters of AURA for these 8 × 8 matrices. However, this reduction comes with an almost 20× increase in the run time of AURA, which is due to the increase in ni and in the complexity of 0-1 ILP problems generated by AURA as shown in Table I . Since AURA targets the optimization of the number of operations, increasing the values of r and ni does not have a significant impact on the adder-step value of the CMVM design. Note that similar results were obtained when AURA-DC was used on the same instances with the same parameters.
As the second experiment set, we used the 8 × 8 DCT matrix. Note that the n × n DCT matrix D is an orthonormal matrix, i.e., DD T = I, where I is the identity matrix, and its entries d jk with 0 ≤ k ≤ n − 1 are determined as follows:
which leads to the following form [22] :
The original DCT matrix was obtained when the floatingpoint constants are converted to integers using a quantization value 8. Its coefficients are given in Table III. The proposed   TABLE III  VALUES OF DCT COEFFICIENTS.   Constant  Original  Optimized  Optimized  Optimized  ε1=ε∞=4  ε1=ε∞=8  ε1=ε∞=16   a  126  126  128  128  b  106  106  106  104  c  71  72  72  68  d  25  24  24  24  e  118  118  118  120  f  49  48  48  48  g  91  91  92  92   TABLE IV algorithms are applied to the original DCT matrix when ε 1 and ε ∞ are 4, 8, 16. We note that the proposed methods were modified to respect the same change in the same constants of the DCT matrix to guarantee that the optimized DCT has the same matrix form as the original DCT. Table V . These DCTs are applied to the compression of three wellknown gray-scale images, namely, cheese, cameraman, and lena whose size is 128×128, 256×256, and 512×512, respectively. The peak signal to noise ratio (PSNR) values of the compressed images are given in Table IV . Observe from Table IV that as the tolerable error is increased, the PSNR value of the compressed image tends to decrease. However, the PSNR values of the images compressed by the optimized DCT matrix are very close to or the same as the PSNR values of the images compressed by the original DCT matrix. Fig. 4 shows the cameraman image and its compressed versions using the original and optimized DCT matrices. Table V presents the synthesis results of these DCT matrices on ASIC and FPGA design platforms. The DCTs were described in VHDL when the bitwidth of the input variables was 8. For the ASIC design, the Synopsys Design Compiler was used with the UMCLogic 0.18-μm Generic II library. For the FPGA design, the Xilinx ISE Design Suite 13.1 was used with the Virtex 6 xc6vlx75T-2ff484 target device. The functionality of linear transforms was verified on 10,000 randomly generated input signals in simulation, from which we obtained the switching activity information that was used by the synthesis tool to compute the power dissipation. In Table V , A, D, and P stand respectively for the area in mm 2 , the delay of the critical path in ns, and the total dynamic power dissipation in mW , respectively. Also, LUTs, and slices denote the number of look-up tables and slices, respectively. Note that original shift-adds designs were found by applying HCMVM and HCMVM-DC to the original DCT matrix.
Observe from Table V that as the tolerable error increases, the number of required operations decreases, and thus, the CMVM designs require smaller area on the ASIC design and smaller number of LUTs and slices on the FPGA design than the DCT implementations based on original coefficients. We note that the reduction in area between the original and optimized synthesis results obtained while targeting the optimization of the number of operations are found as 5.0%, 18.1%, and 22.0% when ε 1 and ε ∞ are 4, 8, and 16, respectively. These values are 7.4%, 20.0%, and 25.7% when the optimization of the number of operations under a delay constraint is considered. Also, the reduction in the number of slices between the original and optimized synthesis results obtained while targeting the optimization of the number of operations are found as 5.0%, 15.6%, and 19.6% when ε 1 and ε ∞ are 4, 8, and 16, respectively. These values are 11.9%, 22.4%, and 27.9% when the optimization of the number of operations under a delay constraint is considered. Moreover, the delay and power dissipation of the design decrease as the tolerable error increases due to the decrease in the hardware complexity of the DCT design. For the ASIC design, the reduction in the delay and power dissipation of the optimized DCT design with respect to those of the original DCT design reaches up to 17.3% (17.4%) and 37.4% (39.4%) obtained while targeting the optimization of the number of operations (the optimization of the number of operations under a delay constraint), respectively. For the FPGA design, the reduction in the delay and power dissipation of the optimized DCT design with respect to those of the original DCT design reaches up to 15.2% (9.0%) and 9.7% (10.2%) obtained while targeting the optimization of the number of operations (the optimization of the number of operations under a delay constraint), respectively. However, the optimized DCT design may consume more power than the original DCT design as shown when ε 1 and ε ∞ are 8 and while targeting the optimization of the number of operations under a delay constraint on FPGA, since the primary objective of the proposed algorithms is not to optimize the power consumption. Observe from the results of DCT designs on FPGA that although finding a CMVM design with a smaller number of adder-step may increase the number of required operations, and consequently, the area and the number of LUTs and slices, it yields a CMVM design that has less delay and consume less power than that with a larger number of adder-steps. This fact is also valid for the DCT designs on ASIC for the delay parameter.
V. CONCLUSION
An efficient approximation algorithm was introduced for the multiplierless design of the CMVM operation, targeting the optimization of the number of operations without violating the error constraints. Its modified version, which can find an approximate CMVM design with a smaller number of addersteps, was also presented. Experimental results showed that the proposed methods can significantly reduce the number of adders/subtractors and adder-steps in the CMVM operation, satisfying the error constraints. It was indicated that the solutions of the proposed algorithms lead to significant reductions in area, delay, and power dissipation of the CMVM designs. It was shown that another advantage of the proposed techniques is to offer a designer alternative CMVM circuits with different complexity and performance values, which can be found by changing the error constraints, so that the designer can choose the one which fits perfectly in an application.
VI. ACKNOWLEDGMENT
This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013.
