Branch Target Buffer (BTB) plays an important role for pipelined processors in branch prediction during the execution of loops, if-then-else, call-return, and multiway branch statements. It has been observed that 20% of instructions in a program are related to branch. Access to BTB consumes 10% of total energy consumption of a program in execution. The present work introduces the use of K -d tree and pattern matcher to generate efficient code, i.e., lesser execution time, for multiway branch. However, instead of enhancing performance, Voltage Frequency Scaling (VFS) can be applied to achieve energy efficiency without degradation in performance. The present work is evaluated on a wide range benchmark programs. The BTB energy saving in the present work lies in the range 20% to 80% with small improvement performance as well. The total energy reduction is in the range 3-12%.
INTRODUCTION
The present work introduces some techniques to reduce Branch Target Buffer (BTB) energy consumption through efficient translation of multiway branch. Low energy code generation is an important aspect of modern compilers. 1 It has been observed that 20% of instructions in a program are branch instructions. 2 BTB consumes 10% of total energy consumption of a program in execution. 10 11 In most of the high-level languages, the construct 'Multiway Branch' (MB) is widely used for the selection of one out of several possible blocks of code to be executed. For example, it is the case statement in Pascal, it is the switch statement in C and it is the SELECT statement in Fortran 90. Figures 1(a) and (b) shows the multiway branch as switch and if-then-else ladder, respectively, containing n branch destinations. Where, BC j is the block of code at jth branch destination (BD j , 1 ≤ j ≤ n. One or more index variables form an index expression. The index expression should match with the jth matching value (value j to jump to BD j and execute BC j .
In modern processors Dynamic Branch Prediction is done and Branch Target Buffer (BTB) is commonly used to improve the performance of execution of branch instructions. Dynamic Branch Prediction uses the information about taken or not taken branches gathered at run-time to predict the outcome of a branch. BTB is a small cache memory used to hold the branch history and the target addresses corresponding to different branch instructions.
There are three possible alternatives for the implementation of multiway branch. The three implementations are based on the way the index expression with value j is searched to find out BD j . These are linear search, binary search or hashing. 3 4 For a given MB the compiler implements either B linear , B binary , or B hash on the basis of value(s) of index expression(s). B linear , B binary , and B hash requires O n , O(log 2 n and O 1 BTB accesses, respectively, to find out the target address of the BD j . The first choice of the compiler is to implement a B hash . The generation of B hash depends on the possibility to find a hash function by analyzing the values matched by the index expression(s). This may not be possible for every MB. But it is always possible to generate a B binary . However, the simplest implementation is the B linear . In case of if-then-else ladders, most of the modern compilers generate B linear , when multiway branch decision depends on more than one index expressions. The present work shows that it is possible to implement B hash or B binary for such if-then-else ladders. It introduces the utility of k-d tree 6 to generate B binary . Many modern programming languages like C# and Ruby supports MB where the index expression values are strings. The present work helps to generate efficient code called B pattern for such MB using pattern matching. It considers a source code containing m MBs and translates them to m TMBs (Translated Multiway Branch) as shown in Figure 2 . The MB l is a based on B linear , on the other hand TMB l utilizes either a B binary or a B hash .
As TMB l utilizes either a B binary or a B hash , its execution time is smaller than that of B linear . However, instead of enhancing performance, it is possible to reduce energy consumption by scaling down the voltage along with frequency, commonly known as Voltage and Frequency Scaling (VFS). However, the processor on which the code is executed should be a special type of processor that can operate at different voltages and frequencies, such as Strong ARM 1100. Here, we have used Intel's XScale processor, which works on nine different voltage-frequency (v f pairs and supports VFS. Table I shows the (v f pairs supported by XScale. The (v 1 f 1 is the peak (v f pair and (v 9 f 9 is the least. The BTB Energy Reduction Algorithm with VFS Algorithm takes MB l as input and generates TMB l as output, for 1 ≤ l ≤ m. The VFS Algorithm scales down the (v f to minimize energy consumed by MB and the execution of TMB finishes within the deadline, i.e., T translated ≤ deadline. Where, deadline = T linear is execution time of MB which is a B linear . T translated is the execution time of TMB which is based on either B binary or B hash . It may be noted that the voltage-frequency The proposed scheme is simulated on XEEMU, 7 8 which simulates Intel's XScale processor. The related works are discussed in Section 2. Section 3 illustrates the proposed scheme with illustrative examples and explains the application of VFS. Section 4 describes the experimental setup and evaluates the proposed scheme with benchmark programs. Section 5 concludes the present work with its future scopes.
RELATED WORKS
The past works on BTB energy/energy reduction were implemented either by hardware or by software. Both techniques concentrated on the reduction of BTB access.
Hardware Techniques
In Ref. [9] Deris et al. introduced Speculative BTB Access (SABA), to identify cycles where there is no control flow instruction among those fetched, at least one cycle in advance. By identifying such cycles and eliminating unnecessary BTB accesses BTB energy reduction varies between 6-15% with an average performance loss of 1.5%. In Ref.
[10] the non-necessary accesses to BTB are reduced by taking into account this fact that there exists distances between different consecutive branch instructions. This method decides the access to BTB by a constant value and a counter. After an instruction entrance, the BTB is accessed if the counter is zero, and if the instruction is a branch instruction and exists in the BTB the counter is reset. The approach achieves BTB energy saving by 25%.
In Ref. [11] the authors introduced the use of a static BTB that achieves the similar performance to the traditional branch target buffer but which eliminates most of the state updates thus reducing the energy consumption of the BTB significantly. They also introduce a correlation based static prediction scheme into a dynamic branch predictor so that those branches that can be predicted statically or can be correlated to the previous ones will not go through normal prediction algorithm. This reduces the activities and conflicts in the branch history table. It saves 43.9% energy of the branch prediction unit without degradation of performance.
Hu et al. in Ref. [12] proposed two approaches to reduce BTB accesses. The first approach expects the distance of every two dynamic branch instructions to be a constant N , where N can be statically profiled, and forces BTB to response for N instructions after a BTB hit. The second approach dynamically predicts the address of the next branch instruction, and accesses BTB only on the predicted address. This reduces 22.033% of useless BTB access.
In Ref. [13] the authors studied two mechanisms that reduce dynamic energy dissipation. The first one is a serial-BTB configuration. The second mechanism is the filter-BTB, a combination of a low energy counting Bloom filter placed in front of a conventional BTB. They also studied the effect of placing a small 32 entry directmapped BTB, functioning as a bypass, in parallel with the first two mechanisms. The filter-BTB reduces the number of lookups relative to a conventional BTB and the dynamic energy dissipated. The serial-BTB variant only accesses the data array of the BTB upon a hit, therefore for most of the accesses the actual energy dissipated is only what is dissipated by accessing the tag array. The bypass is used in parallel to either the filter-BTB or the serial-BTB and reduces the performance cost by providing a low latency response in case of a hit. By integrating these mechanisms into a BTB design the scheme achieved an average reduction of 51% in the dynamic energy dissipation of the BTB. These benefits come at a small performance cost that is on average slightly less than 1.2%.
In Ref. [14] Kahn et al. investigated three architectural methods to reduce the leakage energy dissipated by the BTB data array. The first method (called here window) periodically places the entire BTB data array into drowsy mode. A drowsy entry is woken up by the first access in the time interval and remains active for the remainder of the interval (window). There is an associated performance loss which is related to the size of the window, since there is a delay when a specific line must be woken up. The second method, awake line buffer (ALB), limits the number of active BTB entries to a predetermined maximum. While this reduces energy dissipation it comes with a performance penalty that is relative to the size of the buffer. ALB, however, reduces the energy dissipation of the data array more than the window method. The third method, 2-level ALB (2L-ALB), uses a two level buffer with the identical number of combined entries as the previous method. This method exploits the fact that many branches operate numerous times in a fixed sequence.
By predicting the next BTB access, 2L-ALB achieves further reduction in leakage energy without incurring any further performance loss, compared to the ALB method.
Levison et al. in Ref. [15] proposed two BTB designs that fit the tight energy budgets of embedded processors. In the first design, the energy consumption of a single BTB access is reduced by reading only the lower part of the predicted target address bits. This design has energy savings of up to 25% dynamic energy, with effectively no performance degradation. In the second design, they avoid redundant BTB accesses to the same set by using a small buffer that holds the most recently accessed set. This design results in 75% dynamic energy savings at the cost of up to 0.64% system slowdown in a 2-way BTB, and 80% dynamic energy savings at the cost of up to 0.58% system slowdown in a 4-way BTB.
In Ref. [16] Baniasadi et al. introduced branch predictor prediction (BPP) which reduces branch prediction energy dissipation by selectively turning on and off two of the three tables used in the combined branch predictor BPP which relies on a small buffer that stores the addresses and the sub-predictors used by the most recent branches executed. They refer to this buffer to decide if any of the subpredictors and the selector could be gated without harming performance. They show that on the average and for an 8-way processor, BPP can reduce branch prediction energy dissipation by 28% and 14% compared to non-banked and banked 32 k predictors respectively. This comes with a negligible impact on performance (1% max).
The authors in paper Ref. [17] proposed to use the loop cache to reduce static energy consumption as well as dynamic one. They combined it with CMOS circuits having sleep mode, and thus instruction cache can go to sleep mode when the loop cache is active. They also apply the technique to branch target buffer, and its static and dynamic energy consumption is reduced by up to 40.4% and 40.7%, respectively. In Ref. [18] Tomas et al. analyzes at what extent tag and target address lengths could be reduced to benefit both dynamic and static energy consumption, silicon area, and access time, while sustaining performance. The tag length and the target address could be reduced by about a half and one byte, respectively with no performance losses. BTB energy savings can reach about 35%.
Levison et al. in Ref. [19] propose a novel microarchitectural method referred to as Shifted-Index BTB with a Set-Buffer, which reduces both dynamic and static energy. It achieves up to 80% reduction in dynamic energy is achieved at the cost of up to 0.64% system slowdown. 58% reduction is static energy is also achieved by applying low-leakage energy techniques that mesh well with the Set-Buffer design.
In Ref. [20] Deris et al. introduce Branchless Cycle Prediction (BLCP) which predicts cycles where there is no branch instruction among those fetched, at least one cycle in advance. They avoid accessing BTB during such cycles.
By using BLCP, it is possible to reduce BTB energy dissipation by 32% while paying a negligible performance cost (average: 0.2%).
The paper Ref. [21] proposes an energy-aware branch predictor by accessing the BTB selectively. To enable the selective access to the BTB, the PHT (Pattern History Table) in the proposed branch predictor is accessed one cycle earlier than the traditional PHT if the program is executed sequentially without branch instructions. As a side effect, two predictions from the PHT are obtained through one access to the PHT, resulting in more energy savings. In the proposed branch predictor, if the previous instruction was not a branch and the prediction from the PHT is untaken, the BTB is not accessed to reduce energy consumption. If the previous instruction was a branch, the BTB is always accessed, regardless of the prediction from the PHT, to prevent the additional delay/accuracy decrease. The proposed branch predictor reduces the energy consumption by 29-47% with little hardware overhead, not incurring additional delay and never harming prediction accuracy.
Briejer et al. in Ref. [22] proposed energy-efficient dynamic branch predictors for the Cell SPE, which normally depends on compiler-inserted hint instructions to predict branches. The prediction scheme predecodes instructions when they are fetched from the local store and accesses the BTB only for branch instructions, thereby saving energy compared to conventional dynamic predictors that access the BTB for every instruction. The authors also introduce branch warning instructions which initiate branch prediction before the actual branch instruction is fetched. This allows fetching the instructions starting at the branch target and thus completely removes the branch penalty for correctly predicted branches. For a 256-entry BTB, a speedup of up to 18.8% is achieved. The energy consumption of the branch prediction schemes is estimated at 1% or less of the total energy dissipation of the SPE and the average energy-delay product is reduced by up to 6.2%.
Software Techniques
Software techniques like loop unrolling and loop fusion reduce BTB access as well as BTB energy consumption. In Ref. If IPC is held constant, however, by reducing frequency and voltage-particularly on a processor with multiple clock domains then energy improvements may significantly exceed run time improvements. They demonstrate energy savings ranging from 7-40%, with run times ranging from 1% slowdown to 17% speedup.
PRESENT WORK
The present work proposes BTB Energy Reduction Algorithm which takes MB l as input and produces TMB l as output. Figure 3(a) shows the format of an MB l . Here MB l is a B linear enclosed in a loop, which executes p times, where p ≥ 1. B linear contains a multiway branch construct having n branch destinations. In other words, B linear can be considered as an if-then-else ladder having n branch destinations. The proposed scheme applies VFS. The VFS_Algorithm finds the opportunity to scale down (v f ) of TMB l . Table II shows the two cases of VFS algorithm. These cases are based on the input dependency of p, where, p is the number of times the MB l will execute. The value of p is input dependent means p's value is obtained at runtime as an input. If p is input independent, then its value is always a constant. The proposed scheme considers two different forms of VFS algorithm. (i) 'Total branches' is the total number of branch instructions executed in the program, (ii) 'Miss prediction taken' is the total number wrong predictions taken by the Branch Prediction Unit (BPU) when a branch takes place, (iii) 'Miss prediction not taken' is the total number wrong predictions taken by the BPU when no branch takes place, (iv) 'Non prediction taken' is the total number of branches taken when no predictions are taken by the BPU because the BTB has no entry for the branch history and target addresses of the corresponding branch instructions.
Illustrative Example 1 (EX1)
EX1 considers a simple MB which can implemented as if-then-else and switch-case, as shown in Figures 4(a) and (b), respectively. Here, 'marks' is the index variable that forms the index expression. The matching value set for index variable marks is value (marks) = {4, 5, 6, 7, 8, 9, 10}. The GCC compiler xscale-gcc-elf translates the source code in Figure 4 (a) to B linear code. For the source code in Figure 4 (b) the xscale-gcc-elf generates B hash code. This depends on the ability of the compiler to find a possible hash function. Sometimes it is not possible to find a hash function. However, it is always possible to generate a B binary code for a MB. The MB in EX1 can be translated to B binary code as shown in Figure 19 , in Appendix A. Figure 5 shows the binary search tree formed with all possible values to be matched with index variable. B binary is generated by preorder traversal of the binary search tree. 
Illustrative Example 2 (EX2)
The MB in Figure 6 is an if-then-else ladder which performs a two-dimensional range testing. The if-thenelse ladder contains three branch destinations BD 1 , BD 2 and BD 3 for the blocks of code 'z = 1,' 'z = 2' and Figure 6 . The k-d tree decomposition for the point set value(x y) (as shown in Fig. 7) is done with the help of Bentley's approach in Ref. [6] . The resulting k-d tree for the point set value(x y) is shown in Figures 8 . In Figure 7 lines l 3 , l 7 , l 10 and l 14 encloses the region related to BD 1 . The lines l 9 , l 6 , l 13 and l 1 enclose the region related to BD 2 . The lines l 8 , l 12 , l 2 and l 4 enclose the region related to BD 3 . The rest of the regions are related to NEXT. Each non-leaf node of the k-d tree has left and right edges which connects it to its left and right subtrees, respectively.
Sumanta Pyne and Ajit Pal
Branch Target Buffer Energy Reduction Through Efficient Multiway Branch Translation Techniques The left edge either labeled with the symbol '<' or with '≤'. The right edge is either labeled with the symbol '>'or with '≥'. The left and right edge symbols of a node depend on the source code. For example, the left edge symbol of the node l 1 is '<' and its right edge symbol is '≥'. This is because in the source code in Figure 6 there is an expression 'x ≥ 6' and l 1 is the line representing 'x = 6'. So for any node with 'x ≥ 6' will be in the right subtree of l 1 , while nodes with 'x < 6' will be in the left subtree of l 1 . The leaf nodes of the k-d tree contain the branch destinations. The leaf nodes BD 1 , BD 2 , and BD 3 contain the branch destinations for the blocks of code 'z = 1,' 'z = 2' and 'z = 3,' respectively. The rest of the leaves contain NEXT as branch destination. There are two kinds of non-leaf k-d tree nodes considered in this work. The circular non-leaf nodes are the mandatory nodes required to form the k-d tree. The square non-leaf node ensures that a branch destination is enclosed within the desired region. For example in Figure 7 the lines l 12 , l 13 and l 14 provides enclosure for the regions related to BD 3 , BD 2 , and BD 1 , respectively. To jump to BD 3 , the following sequences of conditions are to be satisfied, 'x < 6,' 'y < 3,' 'x ≥ 3,' 'y ≥ 1' and 'x ≤ 15'. But, the conditions 'x < 6' and 'x ≤ 5' are redundant because 'x' is an integer variable. The node l 12 can be deleted to obtain the modified k-d tree in Figure 9 . Similarly, l 13 can also be deleted.
Branch Target Buffer Energy Reduction Through Efficient Multiway Branch Translation Techniques

Sumanta Pyne and Ajit Pal
In Figure 8 all the leaves of the right subtrees of the nodes l 2 and l 6 contain NEXT. Each of these subtrees are pruned and replaced with a leaf node containing NEXT as shown in Figure 9 . Figure 10 shows two possible assembly language implementations of the if-then-else ladder in EX2. These assembly language code fragments are written using ARM instruction set. B linear in Figure 10 (a) is a brute-force implementation. B binary in Figure 10(b) is obtained by a preorder traversal of the modified k-d tree in Figure 9 . The preorder traversal algorithm of the k-d tree in Figure 9 is shown in Appendix C.
The detailed B linear and B binary implementations of EX2 are shown in Figures 20 and 21 
Illustrative Example 3 (EX3)
Programming languages like Ruby provides multiway branch with strings as shown in Figure 11 . B linear implementations of these multiway branches are inefficient in terms of time and energy. The pattern matcher in form of a finite state machine in Figure 11 can help to generate B pattern which is both energy and time efficient. The matching value set for index variable 'month' is value(month) = {"JANUARY," "FEBRUARY," "MARCH," "APRIL," "MAY," "JUNE," "JULY," "AUGUST," "SEPTEMBER," "OCTOBER," "NOVEMBER," "DECEMBER"}. B pattern is generated by breadth first traversal of the pattern matcher graph. B pattern makes use of a data structure called trie (or prefix-tree) to restrict the state transition time while pattern matching to O 1 . 
VFS_Algorithm
This subsection explains the VFS algorithms in detail. The VFS algorithms find the value of min_vf_pair. The min_vf_pair is the (v, f that minimizes the energy consumed by TMB l . The VFS algorithms calculates the energy overhead (E overhead and time overhead (t overhead due to VFS. They are calculated using the following formulae. 25 Overheads when switching from (v i f i to (v w , f w where, is the energy efficiency of the energy regulator which is considered as 90%, C is the voltage regulator's capacitance to be 10 F, I MAX is the maximum current allowed which is assumed to be 1 A and 1 ≤ w ≤ 9. The VFS algorithms make use of a C library function sprintf which prints a formatted output to the string S. The subroutine generate code generates the assembly equivalent of the high-level code in S and inserts it to the target program file.
VFS_Algorithm_A
VFS_Algorithm_A finds the possibility of VFS to save energy of TMB l when p is input independent.
VFS_Algorithm_A(B linear B translated 1. { 2. char S 50 ; 3.
p:=constant value fixed in compile time;
for(i = 1; i ≤ 9; i + +) 10. { calculated in steps 11 and 12. The algorithm finds the value of min_vf_pair, the (v, f that will minimize energy consumed by TMB l and allow the execution of the TMB l to finish within the deadline.
11.
T i = 2 × t overhead 1 i × 1 n × n j=1 t translated_ij + t branch_exe_ij × p 12. E i = 2 × E overhead 1 i × 1 n × n j=1 e translated_ij + e branch_exe_ij × p 13. if (T i ≤ deadline) then 14. { 15. if (E i < E
VFS_Algorithm_B
VFS_Algorithm_B finds the possibility of VFS to save energy of TMB l when p is dependent. T linear , E linear , T i and E i are calculated in a similar way as in VFS_Algorithm_A.
VFS_Algorithm_B(B linear
B translated 1. { 2 char S[50]; 3. p = 10 6
4.
Linked_List linkedlist := null;
17. Since, p is input dependent; its value is not known at compile time. The value of p is assigned 10 6 in step 3. Apart from finding min_vf_pair the algorithm calculates P i for every (v i f i . P i is the minimum value of p required to execute TMB l at (v i f i . The formula for P i is derived as follows. Let, t lin_avg be the average execution time of B linear , executed once at (v 1 f 1 . Let, t i be the average execution time of B translated , executed once at (v 1 , f 1 . Steps 8 and 15 of VFS_Algorithm_B calculates t lin_avg and t i , respectively, when, t lin_avg > t i . If B translated is executed P i times at (v i f i , then the time taken to do this should be atmost that of P i time execution of B linear at (v 1 f 1 . Considering the overhead of (v, f scale up and scale down, this can be written as The obtained expression of P i is the minimum value of p required to execute TMB l at (v i f i . In other words, if the value of p is obtained at runtime and p ≥ P i then TMB l can be executed at (v i f i . The algorithm also generates a linked list as shown in Figure 13(b) . Each node of the linked list is an instance of a node type structure as shown in Figure 13(a) . The vf_pair field of the header node of the linked list contains the min_vf_pair. The nodes of the linked list are ordered by the value of vf_pair field, as min_vf_pair, min_vf_pair-1, min_vf _pair-2 and 2, where, 1< min_vf_pair ≤ 9. The linked list is arranged in such a manner because
The reason behind this is, as (v i f i decreases, P i increases. After the formation of the linked list the algorithm generates the TMB l shown in Figure 3(b) . The utility of the VFS_Algorithm_B is explained with the help of an MB and its equivalent TMB, as shown in Figure 14 . Figure 14 considers MB and its equivalent TMB, where p is input dependent. The MB in Figure 14 is the deadline for TMB in Figure 14 (b). In Table VI  T TMB 
This ensures the utility of VFS_Algorithm_B. The VFS algorithms can save more energy, when the delays of blocks of code at all the branch destinations are equal and the blocks of code contain few branch instructions. Figure 14 . 
EXPERIMENT AND RESULT
The proposed scheme is evaluated on eight benchmark programs on XEEMU simulator. 7 8 XEEMU simulates Intel's XScale processor. Since there does not exist standard benchmark programs involving MB, several representative examples in which MB are possible are considered as synthetic benchmarks. These synthetic benchmark programs impose the workload on the branch prediction unit causing BTB access, which implies their utility for testing the proposed work. This section explains the experimental procedure along with the analysis of the experimental results.
Experiment
The benchmark programs in Table VII examples as discussed in Section 3.2. All the energy and performance values in this work are measured in XEEMU. The translated codes are written using ARM instruction set. All the programs are run on XEEMU which simulates Each entry contains the address of a branch instruction, the target address associated with the branch instruction, and a previous history of the branch being taken or not-taken.
The history is recorded as one of four states: strongly taken, weakly taken, weakly not-taken, or strongly nottaken. If the address of the branch instruction hits in the BTB and its history is strongly or weakly taken, the instruction at the branch target address is fetched; if its history is strongly or weakly not-taken, the next sequential instruction is fetched. In either case the history is updated. Each BTB access and update of its state causes energy consumption. The experimental setup in Figure 16 shows the proposed scheme in a sequence right from syntactical analysis (parsing) to translated multiway branch generation.
Result
The Table VIII shows 
CONCLUSION AND FUTURE WORK
The present work reduces energy consumption for BTB access by translating multiway branch with VFS. The translated multiway branch also improves the performance of the program. It first transforms the multiway branch and then applies VFS to scale down the (v, f to minimize energy consumed by MB under the execution time constraint. It introduces the use of k-d tree and pattern matcher to generate efficient code for multiway branch when hashing is not applicable. A wide range of illustrative examples and benchmark programs are used to highlight the efficacy of the approach. The energy savings ranges from 21 to 80% with performance improvement ranging from 1 to 22%. The total energy is reduced within a range of 3 to 12%. As in the present work, the access to BTB is reduced; the future work will concentrate on reducing runtime leakage energy of BTB when it is not in use. We have restricted the index variables and matching values to integers and strings. The work may be extended to consider real numbers. There are if-then-else ladders where the index expressions are formed with several index variables, and the conditions are separated by several logical or conditional operators. The future work will also investigate on efficient translation of such MBs. The B linear and B hash codes of EX1 are generated by xscaleelf-gcc compiler.
Step 30 of B hash code of EX1 in Figure 18 shows the application of hashing. The B binary code of EX1 is generated by preorder traversal of the binary search tree in Figure 5 .
B. B linear and B binary Implementations of EX2
The B linear code of EX2 is generated by xscale-elf-gcc compiler. The B binary code of EX2 is generated by preorder traversal of the k-d tree in Figure 8 . Appendix C illustrates the algorithm for preorder traversal of K-d tree.
C. B binary Code Generation from K-d Tree
C.1. Structure of the K-d Tree Node
struct Kd_Tree_Node { char variable_name [20] ; int value; char left_edge_symbol, right_edge_symbol; boolean left_tree_visited, right_tree_visited; Kd_Tree_Node *left_child, *right_child; };
C.2. Code Generation
B_binary_code_generation_from_Kd_Tree(Kd_Tree_node *root) if(root is a non leaf node) then 5. { 6.
C.3. Preorder Traversal of k-d Tree
sprintf(S, "cmp %s, %d," root → variable_name, root → value); write(S); 7.
if(root → right_edge_symbol = '>') then 8.
sprintf(S1, "bgt"); 9. else 10.
sprintf(S1, "bge"); 11.
if(all leaf nodes of root node's right subtree contain NEXT) then 12. { 13.
sprintf(S, "%s NEXT," S1); 14.
root->right_tree_visited = true; 15. } 16. else 17.
sprintf(S, "%s L%d," S1, 2 * label+1); 18.
write(S); 19 .
if(root->left_edge_symbol = '<') then 20.
sprintf(S1, "blt"); 21. else 22.
sprintf(S1, "ble"); 
