ABSTRACT Physical systems may be carried out in both hardware and software. Hardware is based on implementing appropriate logic functions in FPGA structures connected with a physical layer of cyber-physical systems (CPSs). Effective technology mapping of these functions in FPGA structures is directly connected with logic synthesis. Thus, in the case of CPSs, we can talk about cyber-physical synthesis. The effectiveness of this synthesis is key from the point-of-view of designing CPSs. This paper focuses on the methods aiming at shortening synthesis time of circuits carried out in FPGA circuits. The proposed synthesis methods stem from an original way of a function representation enabling to quick search for appropriate decompositions. This paper presents a series of original methods. They include enabling for a quick relocating of variables between bound and free sets, time effective and multilevel technology mapping of multi-output function, and techniques of quick efficiency assessing of technology mapping taking non-disjoint decomposition into account. The methods are the basis of the algorithms implemented in the author's synthesis tools such as dekBDD and MultiDec. The effectiveness of the proposed methods was proved via experiments. The main contribution of the paper in the area of cyber-physical synthesis is proposing effective methods of carrying out a hardware layer cyber-physical synthesis that uses the FPGA circuits.
I. INTRODUCTION
Cyber-Physical Systems (CPS) are characterized by the lack of clear border between a physical layer of a system and implemented calculating techniques. A physical layer usually includes a tool or a set of appliances able to carry out given functions. In CPS systems, the functionality of a physical layer includes steering, measurement, communication and identification. Implementation of calculating algorithms in CPS systems, is a very complex issue. As a rule, the tools in a CPS system do not work single-handedly but together with other tools. Thus, the notion of CPS infrastructure may be introduced. This attitude enables to use more effective and reliable calculating techniques such as calculations in
The associate editor coordinating the review of this manuscript and approving it for publication was Remigiusz Wisniewski.
'a cloud'. A flexible attitude towards carrying out these calculations arises the question about the way of carrying out appropriate algorithms in given tools being a part of CPS infrastructure. In general, it may be said that appropriate algorithms may be carried out in either a hardware way (FPGA) or a software way (Embedded systems) [1] . Both approaches have a series of advantages. However, they have also some drawbacks. In this situation, a good solution may be a mixed implementation. One part of functions is carried out in a processor and an equipment part is carried out on logic resources of FPGA. An equipment processor together with FPGA matrix are the core of circuits defined as SoC (System on Chip). SoC circuits great cope with the requirements of the equipment included in CPS. It is because of the fact that the cores of processors may be used to carry out tasks connected with communication or to implement RTOS (Real Time Operation System). In the case of logic resources of FPGA, they are used to carry out high risk tasks taking time into consideration.
Designing a system, taking into account its programmable and equipment resources, is defined as Hardware-Software Codesigning in literature. This idea is directed at CPS and it was shown in the paper [2] in which the author directs it at MPSoC (Multi-Processor System on Chip) and introduces the notion of Cyber-Physical Synthesis. In general, it assumes that synthesis methods dedicated to SoC should give fast and reliable solutions. Besides, they also should limit the usage of logic resources included in SoC. In addition, it often happens that it is required in CPS to minimize power consumption. Synthesis strategies may be directed at various tasks and a synthesis itself is a complex as well as multilevel process. In this paper, the authors focus on FPGA structure included in SoC.
One of the key elements of a logic synthesis, directed at FPGA, is a decomposition of multi-output functions. It reflects in a mathematical way, a partition of an implemented circuit between configurable logic blocks (CLB) included inside FPGA structure. Decomposition is extremely vital and its result includes partitions which shall have a low number of logic levels (the speed of operating), little usage of logic resources of FPGA circuits, minimization of power consumption and maximize the speed of synthesis. Original works, connected with decomposition [3] , [4] , are the basis of synthesis tools from 90s such as Demain [5] , Trade [6] , BDDsyn [7] or LGsyn [8] . Also later solutions such as Decomp [9] or IRMA2FPGA [10] stem from basic theorems about decomposition.
It turns out that the way of logic function representation is especially vital from the point of view of synthesis tools. It seems that recently, AIG (And -Inversion Graph) networks and BDD diagrams have gained popularity. Decomposition carried out as a partition of AIG was implemented in a popular system ABC [11] . This tool is universal and is characterized by a special flexibility. It also enables resynthesis [12] - [14] . Using BDD [15] in a synthesis process is well known from literature [16] - [19] . BDS -PGA 2.0 is a good example of using BDD for decomposition [20] .
In general, in the case of synthesis tools, directed synthesis strategies may be mentioned. The aim of these strategies is to obtain minimum structures taking into account the number of used logic blocks and the number of logic levels. In the case of a strategy directed at minimization of the number of blocks, the goal is to effectively use resources included in FPGA. One of the examples may be ALMmap system [21] . In the case of the strategy directed at limitation of the number of logic levels, the aim is to develop the most efficient structures taking into account the time of propagating signals from inputs to outputs. One of the examples include the systems such as DDBDD [22] and DAOmap [23] .
According to the authors, an especially vital aspect of decomposition algorithms is its calculating complexity directly associated with synthesis time [24] . Despite its importance, this fact is often omitted. The authors have decided to focus on this issue that is why, the paper proposes a series of solutions to accelerate decomposition time in the case of a function representation in the form of BDD.
The main aim of the paper is to present various solutions improving time efficiency of decomposition algorithms of the functions described using BDD. The presented ideas are the result of long-term experiments conducted for various FPGA families.
In Section 2, was presented theoretical basis of a function decomposition using BDD. Section 3 includes proposals of new and time efficient methods of moving variables between appropriate sets in BDD. The next section presents the method known from literature. It is based on a quick assessing the effectiveness of technology mapping using triangle tables. In this Section, the authors have proposed a modification of this method so as to take into account a non-disjoint decomposition. Section 5 focuses on carrying out multi-output function using PMTBDD graphs. Section 6 presents two decomposition algorithms (dekBDD and MultiDec) using previously proposed methods. Section 7 illustrates experimental results confirming the efficiency of proposed solutions. The paper ends up with a summary.
II. THEORETICAL BACKGROUND
The essence of a logic function decomposition is an effective partition of a circuit into separate logic blocks included in FPGA. Decomposition divides variables into a bound set Xb and a free set Xf [3] , [4] . In a physical structure, a function decomposition corresponds to a partition of a circuit into a bound block and a free block. The key problem is to determine the number of connections between a bound block and a free block. From the point of view of decomposition theory, these connections correspond to so called bound functions g. It is especially important to search for such decompositions that will lead to minimization of the number of bound functions p. It usually causes limitation of the number of needed logic resources necessary to carry out a decomposed function. Bound functions are carried out in a bound block on the basis of a set of variables included in Xb. A free block carries out a free function taking variables from the set Xf . In the case of a classic serial decomposition, both sets are disjoint Xb ∩ Xf = Ø.
The theorem, referring to simple disjunctive decomposition, was formulated by Ashenhurst [3] and developed by Curtis [4] in the 60s. Relating this theorem to multi-output function in the form of BDD, it may be said that a function decomposition is associated with a horizontal cutting of a diagram [16] , [17] . A single serial decomposition (a single bound and a free block) may be carried out in two ways: using a method of a single cutting or a multiple cutting of a BDD diagram [25] . In the next part of the paper, the authors use the notion of BDD that should be understood as a reduced and ordered form of a BDD diagram (ROBDD) [17] .
Carrying out decomposition, using the method of a single cutting, results in the partition of a diagram into two parts. Variables associated with the nodes, placed above the cutting line, belong to a bound set Xb. Variables associated with the nodes placed below the cutting line, belong to a free set Xf . The number of bound functions p depends on the number of so called cut nodes ν(Xf | Xb) ≤2 p . Cut nodes are defined as the nodes placed below the cutting line and they are indicated by edges from the top part of a diagram. The essence of carrying out decomposition using the method of a single cutting is shown in Fig. 1a .
In the case of carrying out decomposition using a multiple cutting method, a part of a diagram, placed between cutting lines, is associated with a bound set Xb (Fig. 1b) . The variables associated with the nodes, placed above the top cutting line, may be attached either to a free set or to a bound set (creating the next bound set). In order to determine the number of needed bound functions, it is necessary to determine so called complexity of a root table as is presented in the papers [26] , [27] .
A simple serial decomposition, carried out using the described methods, is the basis of more complex decomposition models such as iterative, multiple, and mixed decomposition [25] .
In both methods of carrying out decomposition (a single and multiple cutting), it is essential to select the levels on which the cutting of BDD is done. The problem is associated with technology mapping in FPGA circuits that include logic blocks which have given configurable abilities. As a rule, levels of cuttings depend on the number of inputs (k) of configurable logic blocks. The essence of this technology mapping was shown in the papers [28] , [26] .
The number of bound functions depends on variable orderings in a BDD diagram. Variable ordering influences the number of cut nodes and a column complexity of a root table. It is especially important for the variables to be ascribed to appropriate sets. Figure 2 illustrates two variable orderings of the same function that lead to different number of cut nodes. What is more, they lead to different decompositions.
It is hard to determine clear rules concerning the order of variables in BDD that would lead to the minimum number of cut nodes (NP-hard problem). Thus, it is necessary to search for the best solution by a cyclic change of a variable ordering. As a rule, the number of variables in a diagram is high. That is why, it is necessary to develop fast methods of relocating variables in BDD so as to limit synthesis time.
III. THE METHODS OF RELOCATING VARIABLES BETWEEN A FREE AND A BOUND SETS
A combinational circuit is usually described using a multioutput function and not a single function. A common carrying out of chosen functions naturally leads to sharing of logic resources. It is advantageous from the point of view of the number of used logic blocks. On initial stages of a synthesis, it is hard to determine which multi-output functions should be carried out together and which of them, separately. It may be noticed that not all functions depend on the same VOLUME 7, 2019 input variables. Initially, a function may be partitioned into clusters in such a way, in one cluster there are such multioutput functions whose set of common variables is the most numerous. It is possible that in some cases, it is better to carry out a single function separately (in a separate cluster). It is because of the fact that the set of variables is substantially different from other multi-output functions. Thus, next clusters undergo synthesis together with the functions which they include. In some cases in which synthesis is conducted within a single cluster, it may fail. In this situation, a partition of a cluster is needed. The process of a partition may be carried out till a single cluster with a hard decomposable function will be obtained. In this case, such a function may be carried out using Shannon's theory [17] .
Within a single cluster, the functions, described in the form of BDD, require an initial variable ordering. It turns out that a good solution is to order variables n in such a way the nodes, placed closest to the root of BDD, correspond to those variables which a given multi-out function (or a single function) strongly depend on, i.e. a given variable often occurs in the form of SOP. In some cases, an ordering, obtained after searching for an effective decomposition, is very close to an initial variable ordering. An initial variable ordering is gained by analyzing the frequency of appearing separate variables in the expression that is in the form of the sum of products.
Depending on a method of searching for decomposition (a single or multiple cutting), the idea of an effective searching for a variable ordering in BDD is a little bit different. In both cases, it is essential to determine the levels of cutting a diagram. It enables to determine the number of separate bound sets card(Xb) and a free set card(Xf ).
In well-known implementations of BDD [29] , a unique ID is ascribed to each node in a diagram. This identifier determines the level on which a given node appears. The information common for all diagrams such as operation cache and index tables, are kept in the structure data called bdd_manager [30] , [31] . All diagrams, which are ascribed to one bdd_manager, have the same variable ordering so as to maintain its canonicity. During the process of decomposition, there is a necessity to store the diagrams of many functions simultaneously with different variable orderings. A special attitude was used to enable effective transferring of nodes either above or below a cutting line. Besides, it is also aimed at enabling to operate on diagrams with various orderings.
In the case of a single cutting method, it was proposed to take into account the number of variables that is higher than it would result from the number of variables. It was assumed that the number of these variables is twice as higher than the number of variables of a function. As a rule, each variable has got one ascribed ID in BDD. It turns out that from the point of view of effective transferring of variables above and below a cutting line (between a bound and a free set), it is worth to follow a rule in which variables of a bound set will be described by odd ID and variables of a free set will be described by even ID. A single variable of a function is associated with two IDs (above and below a cutting line). Depending on which set an analyzed variable is ascribed to, only one ID describing this variable is 'active' in a given moment. And relocating between sets is carried out using the function bdd_swap [32] . From the point of view of decomposition, the order of variables in either a top or a bottom part of a diagram does not matter. It is key which set belongs to a function variable. It may be easily changed.
As the result of carrying out decomposition in BDD, there is a necessity to introduce additional variables representing bound functions g. The situation is similar to primary function variables as two ID numbers (in the top and bottom parts of a diagram) are associated with bound functions. The way of relocating these variables between sets is the same as in the previous situation. The idea of transferring variables between sets using BDD is illustrated on Fig. 3 .
It is necessary to modify a given idea shown previously for the method of multiple cutting of a BDD diagram. In this case, the number of variables taken into account in BDD is n 2 . It results from the need of taking into account substantially higher number of orderings than in the case of a single cutting method. Because of the fact that in this method, the number of sets may be higher, it is necessary to accurately determine which set includes a given variable (previously, it was enough to determine whether variables are placed above or below a cutting line). In order to make searching for the best ordering easier, n ID numbers correspond to n variables of a function. It was assumed that a BDD diagram was partitioned into symbolic levels. Each level includes a given number of nodes which are all exclusively associated with one variable. Originally, each level includes the set of n ID numbers. One of these numbers is active in a given moment. Each of these numbers corresponds to one of function variables. Depending on which variable has to be placed on a given level of BDD, an appropriate ID, associated with this variable, is activated. Each variable includes an ID number on every level of BDD. Relocating variables in BDD changes an ID number of a variable into an ID that makes a variable appearing on a given level in BDD. The essence of this method is presented on Fig. 4 .
Originally, a variable ordering, in which the variable x 1 occurs above the variable x 2 , is shown in Fig. 4a . The variable x 1 occurs on the first level. Thus, among the set of ID numbers, associated with this level, ID=1 is active and corresponds to x 1 (what is marked with a continuous line on the left side of BDD). On the second level, the variable x 2 occurs. Thus, among the set of ID numbers, associated with this level, ID= n+2 is active and corresponds to the variable x 2 . In order to obtain a variable ordering, just like this in the Fig. 4b , it is necessary to change ID numbers that correspond to the variables x 1 and x 2 . In this case, ID=2, which correspond to the variablex 2 , should be active on the first level. On the second level, ID= n+1, which correspond to the variable x 1 , should be active.
Thus, the following expression (1) may be developed which may enable to modify an ID number to obtain a desirable ordering:
where: level -a level on which an analyzed variable has to be placed in BDD, numb_of_var -the number of variables placed in BDD, idx -the index of a variable to relocate (numbering from 1).
As the result of decomposition, in BDD may appear the next variables associated with bound functions. Depending on the number of levels in a BDD diagram gained after decomposition, each next variable is associated with such ID numbers. In accordance with the rule, each ID number from this set must be associated with other level of BDD. The idea of relocating variables g (bound functions) is exactly the same as previously.
Both methods of changing variable orderings were implemented in academic synthesis tools. In the case of a single cutting method, it is dekBDD [28] . In the case of a multiple cutting method, it is MultiDec [26] .
IV. THE METHOD OF ASSESSING TECHNOLOGY MAPPING TAKING INTO ACCOUNT A NON-DISJOINT DECOMPOSITION
Logic blocks, included in modern FPGA structures, are flexible. There is a possibility to configure the number of inputs of a logic block (k). The number of inputs of a logic block shall be associated with the level of cutting a BDD diagram. Regardless of a decomposition method, the number of elements of separate bound sets card(Xb) cannot be higher than the number of inputs of a logic block (k). Searching for the best decomposition path, which ensures the best multilevel mapping in the blocks with determined configurable abilities (a selected number of inputs), is a very complex process. It turns out that searching for the best variable ordering in BDD is not enough. In addition, it is necessary to take into account various values card(Xb) (various cutting levels) where an obtained structure would be the smallest, i.e. using the smallest resources of a programmable structure. It determines the need of an analysis of the same ordering for various cutting levels resulting in configurable abilities of available blocks. Thus, there is a necessity to quickly assess a technology mapping with a given variable ordering for the analyzed levels of cutting on a given stage.
In literature, there are well-known methods of assessing efficiency of technology mapping δ using a triangle tables [27] , [28] . In this method, the number of bound functions (numb_of_g) is the parameter determining the 'quality' of a variable ordering. And the number of elements of a bound block (it was assumed that card(Xb)=k) is associated with possible configurations of the number of inputs in a logic block. Both parameters are associated with rows and columns of a triangle table, respectively. Moreover, the values of an efficiency cofactor of mapping δ are placed in the cells of a table and take into account the third parameterthe number of blocks (numb_of_bl) needed to carry out a given decomposition. The essence of assessing the efficiency of mapping on a given decomposition stage, is based on determining the value of δ cofactor on the basis of the pair of parameters (card(Xb), numb_of_g). Apart from advantages such as fast assessing of decomposition, such triangle tables also have some drawbacks. Not taking into account so called non-disjoint decomposition in the process of assessment, is considered as their drawback [33] , [34] .
A non-disjoint decomposition is an extension of a simple serial decomposition in which one part of variables is attached to both a bound Xb and a free set Xf (Xb ∩ Xf = Ø). It may be said that these variables are the third set: a shared set Xs (Xb ∩ Xf = Xs). The variables in this set may replace some bound functions what may reduce a logic structure associated with a bound block. Thus, a non-disjoint decomposition leads to optimization of obtained solutions taking into account the number of resources needed to carry it out. The methods of VOLUME 7, 2019 FIGURE 5. The set of tables describing a three dimensional triangle table taking into account the number of the set X s.
searching for a non-disjoint decomposition was presented in the papers [26] , [35] .
The authors propose a modification of a triangle table in the way to take into account a non-disjoint decomposition [36] . Thus, It is necessary to add an additional dimension to a triangle table that would determine the number of a shared set. In order to make this three dimensional table more readable, it is shown in Fig. 5 as the set of tables associated with appropriate values of the parameter card(Xs).
The cells of the tables, presented on Fig. 5 , include the value of a cofactor δ in a given case. In general, the lower value of the cofactor δ, the better technology mapping is. The value of the cofactor δ may be presented using the following expression (2):
It may be seen that in the expression above, the parameter numb_of_g does not occur. This parameter is included in the parameter numb_of_bl. If an analyzed logic block has a steady number of inputs (e.g. k =5), numb of g will be numb_of_bl. In the case of the blocks Spartan 3 [37] , there are possible configurations in which occur a single 5-input block (LUT5) or two 4-input blocks (2 * LUT4). numb_of_g = 2 * numb_of_bl, then. The essence of using three dimensional triangle tables to assess a technology mapping, is illustrated on Fig. 6 . Let us consider three mappings presented on Fig. 6 . Let's assume that we use triangle tables which do not take into account a non-disjoint decomposition card(Xs)=0. In the case of a solution from Fig. 6a , it is seen that two bund functions are necessary and the value δ = −1. The case 6b is a better case as there is a non-disjoint decomposition which was not taken into account while assessing the efficiency of mapping. That is why, originally δ = −3. The best solution is the mapping shown in Fig. 6c where there is a single bound function without a non-disjoint decomposition. The value of efficiency cofactor is δ = −3. Thus, in order to assess a technology mapping in a classic attitude, the cases b) and c) are equal. From the point of view of carrying out a bound block, it is true. But in the case of a free block, the best solution is the case c) because to carry out a free function there is k-1 inputs instead of k-2, as in the case b). In the case of using a three dimensional triangle table, the cases b) and c) are distinguished. In the case b), to assess the efficiency of a mapping, the table for card(Xs) = 1 is used. It makes that δ = −2.75. Thus, on the basis of the value of the cofactor δ it may be said that the case b) is substantially better than the case a) but slightly worse than the case c), what is true.
V. THE METHODS OF COMBINING FUNCTIONS INTO MULTI-OUTPUT FUNCTIONS
From the point of view of synthesis time, it is necessary to develop quick methods of combining functions, described using BDD, into multi-output functions. A new form of PMTBDD [28] diagrams was proposed. Combining diagrams enables to use common bound blocks and minimize resources needed to implement a multi-output function. Figure 7 presents function diagrams and their implementation using common bound blocks. Figure 7a illustrates the diagrams of three functions f 0 , f 1 , f 2 . On diagrams, a cutting line as marked below the third level of variables so as to match the implementation of a function to three-input LUT3 blocks. Each of single functions has got three cut nodes. In order to code them, it is required to have two functions g (two blocks LUT3) for each of functions f 0 , f 1 , f 2 . Overall, to implement single functions, 9 LUT3 blocks are required. Figure 7b shows MTBDD diagram for a multioutput functions f 0 , f 1 , f 2 . As it can be seen in the presented functions, after having combined BDD diagrams into one MTBDD diagram, the number of cut nodes did not increase (it is still 3). In order to code cut nodes, two functions g (two LUT3 blocks) are enough. But this time, the same two functions g may be used for each of the functions f 0 , f 1 , f 2 . Thanks to a bound block, which is common for three functions, the total number of LUT3 blocks was decreased to 5.
The above observation is the basis of the algorithm aimed at searching for common bound blocks. In an algorithm, single functions are combined into MTBDD diagrams till the 20624 VOLUME 7, 2019 number of cut nodes will not be higher. That is why, it is necessary to discover a method which would enable to quickly combine and split diagrams. For that purpose, a new type of a diagram called Pseudo-MTBDD was introduced [25] .
Additional variables were introduced in PMTBDD. Thanks to these variables, the operation of combining several function diagrams or multi-output functions is based on summing (bdd_or). Each leaf of MTBDD diagram for a multi-output m-function f i : B n → B where i =0,...,m-1, is replaced in PMTBDD diagram with a subdiagram, described using the following expression (3):
where f i is the value of a function for a determined path from the root to a leaf. And f i ' are additionally introduced variables. 
In order to merge the diagrams into one PMTBDD diagram of a multi-output function, logic summing was done Fig. 9b fully corresponds to the diagram from Fig. 7 . The group of nodes, additionally described with variables, is integrated as a leaf. For instance, going from the root of MTBDD through the path 0010 x0,x1,x2,x3 the leaf 111 f0,f1,f2 is gained. In PMTBDD diagram, going the same path, is gained the diagram that represents the expression f 0 + f 1 + f 2 and corresponds to the leaf 111 f0,f1,f2 . The difference is also in PMTBDD diagrams that may be quickly merged and split.
Thanks to introducing additional variables in PMTBDD a series of advantages was obtained. They enable to effectively merge and split diagrams such as: -1) the lack of separate table of unique identifier of leavesin MTBDD diagrams, it is necessary to have a separate table with unique identifiers so as to save canonicity of the form, 2) fast merging of diagrams using a standard operation of a logic sum (bdd_or()), 3) splitting diagrams using a standard function bdd _restrict() or bdd_compose().
VI. SYNTHESIS ALGORITHMS
Previously presented methods enabled to create two synthesis strategies. The first one is called dekBDD [25] and it uses a single cutting line to decomposition. The second one is called MultiDec [27] and it uses a multiple cutting of BDD to decomposition. Synthesis strategy of dekBDD is described by algorithm 1 and MultiDec strategy is described by algorithm 3.
A block scheme of the algorithm 1 is shown in Fig. 10a . One of the parameters of the algorithm 1 is the number of a bound set card(Xb). A decomposition strategy for the circuits, based on LUT blocks that have k inputs, focuses on evoking the solution for card (Xb)= (2, . . . , k) . It has the lowest number of blocks in accordance with the proposed triangle tables.
Algorithm 1 consists of three repetitive stages in a double loop with changing indexes i, j: -1) a change of a sequence of variables xiixj, 2) matching functions into groups using quick merging and splitting (algorithm 2) 3) searching for a non-disjoint decomposition [36] .
At the end, it is checked whether a cofactor of mapping δ tmp has a lower value than previously found solutions δ. A better solution is remembered. Figure 10b presents an algorithm of matching functions into groups so as to use common bound blocks. The essence of this algorithm is matching functions into PMTBDD diagrams and choosing these functions so as the number of cut nodes will not increase for a given variable ordering. Because of the fact that merging too many functions into a multi-output leads to a substantial increase of the number of required functions gi (inefficient solutions), parameter α was additionally introduced. It determines what is the maximum number of functions in one group. An algorithm operates on a list of the function F : f 0 ,. . . ,f m−1 . Firstly, the list is sorted mounting in accordance with the number of cut nodes. Next, after having checked whether it is not transgressed the number of functions attached to a group (parameter α) in a loop to the present function (or a F[i] group), it is attempted to attach the next one F[j]. For a newly created diagram T, the number of cut nodes is checked. The number of cut nodes (CutNodes(T)) is not so important as the number of required functions gi needed to carry out a bound block. The number of cut nodes is compared with the number of cut nodes rounded up to the nearest power 2 (RoundPow2CutNodes (F[i]) ). If the process of scaling is accepted, F[i] is deleted from the list and F[j] is replaced with a newly created group T. In the next stages, it is attempted to attach next functions to a newly created group. Figure 11 illustrates the algorithm 3 that describes a MultiDec strategy.
The algorithm 3 (MultiDec) is very similar to the algorithm 1 (dekBDD). The main difference is based on the fact that in the algorithm 3 (MultiDec) is determined by the set of acceptable positions for the variable xi in a BDD diagram. A given variable may only take a precise determined position depending in which set, either a bound or free, it should be found. All the options for xi are analyzed, what is indexed using the variable p. In addition, xi is analyzed in a MultiDec algorithm in the range to n (in the case of dekBDD, the range is only to k) what enables to change an ordering for all bound sets. In the case of the algorithm 3, δ tmp is determined taking into account all bound blocks.
Both algorithms (1 and 3) operate on a single cluster. Obviously, separate functions must be partitioned into separate clusters and an initial variable ordering must be taken. It may turn out that as the result of operating algorithms dekBDD and MultiDec, a solution would not be found for an analyzed cluster (Result= −1). Thus, it is necessary to divide a cluster or carry out decomposition with Shannon's development. All the methodology is shown in the form of the algorithm 4 and presented on Fig. 12 .
VII. EXPERIMENTAL RESULTS
In order to prove efficiency of the proposed algorithms, a series of experiments has been conducted. Popular benchmarks [38] , which describe combinational circuits in the pla format, underwent experiments.
In the first series of experiments, were compared the results gained for dekBDD with the result obtained for MultiDec. It is shown in Table 1 . The first column (''Blocks''), together with the results, describes the number of blocks Spartan 3 [37] needed to carry out a circuit. The authors have decided to use technology mapping in the blocks with a low number of inputs k =4 or k =5 and good configurable abilities. The aim was to obtain a substantial number of bound functions what will enable to determine effectiveness of our methods, especially in the area of non-disjoint decomposition. The next columns include the number of logic levels ("Levels''), the number of disjoint bound functions ("g_disjoint''), the number of non-disjoint bound functions ("g_nondisjoint''), the ratio of these values ("g_non / g_dis''), the number of function recalls ("bdd swap'') and the time expressed in seconds, respectively.
Comparing the results obtained for dekBDD and MultiDec, it may be noticed that decomposition for MultiDec often fails. It may result from substantially higher system requirements concerning memory storage. The last row of the table includes the value of the sum only for benchmarks for which the comparison was possible. It is seen that dekBDD gives better results taking into account the number of blocks. But it is worse as far as the number of logic levels is concerned. Analyzing the ratio of the number of non-disjoint bound functions to the number of disjoint bound functions, there are much more non-disjoint decompositions for dekBDD. It results with a lower number of blocks. It turns out that in order to ensure an effective technology mapping, non-disjoint decomposition is especially vital. For instance, analyzing the results for dekBDD, only for three benchmarks it was not possible to find a non-disjoint decomposition. For circuits with the number of blocks higher than 300, the number of non-disjoint variables has maximum 36 % of the number of functions g. In three cases, it is two or three times higher than the number of functions g (and LUT blocks). And among 45 benchmarks, it is at least a half of the number of functions g. It proves how much potential lies in a non-disjoint decomposition. Comparing two systems, taking into account the number of recalls of a swap function, it is seen that it is lower in the case of MultiDec. For MultiDec, synthesis time is over 120 times longer. It may be stated that in the case of a MultiDec algorithm, its weak time optimization is not connected with the number of swap operations but with other ancillary operations such as creating and coping additional multi-dimensional tables and other structures. DekBDD is substantially better time optimized. It is worth to determine dependency of synthesis time from the number of recalls of a swap function. It is illustrated in the form of a graph on Fig. 13 Figure 13 presents dependency of synthesis time and the number of operations that change the sequence of variables (bdd_swap()). It is seen that the time of processing with certain irregularities is growing linearly together with the number of operations bdd_swap. It means that it has the biggest influence on total synthesis time. For small processing times, the points on the graph are grouped into vertical lines because of small resolution of measuring time that is limited by the coating of an operating system. Irregular arrangement of the points results from the fact that on the graph, the number of functions in a given benchmark is not taken into account. Only the number of bdd_swap is considered.
It is especially important to compare both algorithms with other academic tools. In the case of bound functions, such a comparison would be difficult. And in the case of the number of recalls bdd_swap, it would be almost impossible. There is a possibility to compare systems taking into account the number of blocks, needed to carry out a function, the number of logic levels, and synthesis time.
The systems dekBDD and MultiDec were compared with ABC [11] . For ABC, the experiments were conducted using two scripts ABC_1 (strash; dch; if -K 5; mfs;) and ABC_2 (resyn; resyn2; if -K 5;) which is especially directed at resynthesis. The computer Intel Core i5-3210M 2.5GHz, 8GB RAM was used to conduct experiments. The results are presented in Tab.2 in which the last columns show additionally VOLUME 7, 2019 added results taken from literature for IRMA2FPGA [10] . As far as synthesis time is concerned, it is hard to reliably compare the obtained results with the results from literature from before several years. That is why, IRMS2FPGA will not be taken into account while comparison. Depending on a system, the number of blocks was presented in Tab.2 either as 'Blocks' (the number of the blocks Spartan 3) or as 'LUT5' (the number of blocks at k =5). This list is not reliable and only a general outline illustrates the efficiency of separate solutions taking into account effective usage of logic resources. The authors highlight that the key element of the paper is to compare synthesis times and the outline of the number of blocks was presented only ancillary.
In the last row of Tab. 2, separate sums were indicated. They are presented in the form of graphs on Fig. 14. Analyzing the graph 14a, taking into account the number of logic blocks in dekBDD and MultiDec, it may be said that they ensure better results than ABC. The number of levels, presented on Fig. 14b , indicates that MultiDec gives better results than dekBDD. In the case of ABC, it is hard to notice a substantial improvement. Comparing synthesis time, presented on Fig.14c , MultiDec is the worst. DekBDD is twice faster than ABC_2 and four times faster than ABC_1. It may be stated that the proposed idea of changing a variable ordering best works in a decomposition with a single cutting line. In the case of a multiple cutting method, it is a very weak method concerning time efficiency.
VIII. CONCLUSIONS
The methods, presented in the paper, are the core of two decomposition systems: dekBDD and MultiDec. As the experiments have shown, effective technology mapping, which takes into account non-disjoint decomposition, substantially influences the efficiency of obtained solutions. In the case of dekBDD, new methods of changing a variable ordering are based on time efficiency of developed synthesis methods. A new form of BDD, called PMTBDD, is also especially vital in the paper.
The ideas, presented in the paper, increase the efficiency of a logic synthesis dedicated at FPGA circuits. The algorithms are universal and may be used for various families of FPGA circuits. The key feature of developed methods is technology mapping whose essence is based on an appropriate choice of a decomposition path. Besides, it is also important to fully use logic resources of configurable logic blocks. It is aimed at using of configurable abilities of logic blocks and an appropriate choice of a decomposition path matched to a possible gained number of inputs and outputs of LUT blocks. It takes part in the development of cyber-physical systems in the area of Cyber-Physical Synthesis.
While designing, it is essential to take into account an equipment part of CPS as it may lead to the reduction of costs, foster its operating or increase the reliability of CPS systems. The authors are aware that it is possible to improve the presented solutions what will be the subject of future considerations. 
