Visible light communication (VLC) is a promising technology for both wireless communications and illumination via light-emitting diodes (LEDs). Although conventional run-length-limited (RLL) codes are employed to mitigate modulation-induced flickers, they can suffer from data rate reduction, worst-case bit sequences, and hardware overheads, resulting in a performance bottleneck. In this paper, we introduce a novel VLC data-encoding algorithm using bit shuffling to resolve these problems while alleviating light flickers. In contrast to existing RLL coding approaches, bit shuffling with an Omega network can generate codewords dynamically, which guarantees short runs of consecutive 0's or 1's, avoidance of worst-case bit sequences, and a relatively short code length. To illustrate the performance of hardware implementations, we discuss the hardware designs of the proposed bit-shuffle coding scheme. Our simulation results demonstrate the effectiveness of the bit-shuffle coding approach in terms of mitigation of flickering, transmission efficiency, and hardware overheads.
I. INTRODUCTION
In recent years, the characteristics of light-emitting diodes (LEDs), including fast switching rates and efficient illumination, have attracted increasing interest with respect to visible light communication (VLC), which can simultaneously provide illumination and information transmission services [1] - [9] . Because the visible light spectrum is unregulated and offers thousands of times the bandwidth of the radio frequency spectrum, VLC technology is very attractive as a potential complement to 5G communications, including for machine type communication (MTC) in Internet-of-Things (IoT) applications [10] , [11] .
VLC transmits data by modulating LED intensity, and successive identical bits can be generated according to the distribution of the transferred data bits. If the change in dimming level is not sufficiently fast, fluctuations in brightness may be perceived as flicker, which fatigues the eyes rapidly The associate editor coordinating the review of this manuscript and approving it for publication was Usama Mir . and can damage eyesight. According to the IEEE 802.15.7 standard [12] - [15] , flicker is imperceptible to human eyes at a maximum flicker time period (MFTP) smaller than 5 ms (optical rate = 200 Hz) and can be reduced by modulating the LED intensity using run-length-limited (RLL) coding, which maintains a constant brightness of 50 % and avoids long runs of 0's and 1's. In the VLC standard, the PHY I and PHY II categories utilize Manchester, 4B6B, and 8B10B codes to mitigate light flicker. These codewords maintain a constant illumination intensity at a 50 % dimming level by ensuring balanced repetition of 0's and 1's. The VLC standard also supports dimming functionality during data transmission by inserting compensation symbols before each data frame.
Although RLL coding methods under this standard can mitigate light flicker in high-speed VLC systems with a fast optical rate, the performance is severely limited in terms of transmission efficiency, worst-case bit sequences, and hardware overhead. Using Manchester coding, each logical bit is encoded into two physical bits, resulting in limited applications at low data rates. In 4B6B and 8B10B coding, a lookup VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ table is required to match the input data to the output data. To achieve this, the use of associative non-volatile memory is indispensable in hardware implementations. However, this becomes a significant hurdle when producing small-scale and complexity-constrained IoT devices [10] , [11] . Furthermore, IoT devices with simplified transceivers limit the optical rate of LED lamps, and thus, may fail to meet the flicker requirement of VLC.
In recent studies on flicker mitigation of VLC systems, Mejia et al. [16] proposed a code design using finite state machines to reduce the bit error rate (BER) while mitigating flicker. Also, pulse slope modulation (PSM) has been proposed to modify the slope of the rising edge of the pulse while maintaining a constant frequency carrier [17] . A flicker-free forward error correction (FEC) coding scheme using polar codes has also been proposed [18] . Cailean et al. [19] used Miller code for PHY I applications, and compared its performance to that of Manchester code in terms of bandwidth and channel coexistence. In the next-generation IoT era, VLC systems can be used in IoT applications with strong requirements for minimal hardware overheads while improving transmission efficiency. To the best of our knowledge, no flicker mitigation schemes that take into account both enhanced transmission efficiency and reduced hardware overheads have been proposed in the literature.
To implement flicker mitigation technology for robust VLC services and lightweight IoT applications, we propose a novel bit-shuffle coding scheme, that limits code length, worst-case bit sequences, and hardware overheads. By employing an Omega network [20] , the bit shuffling approach generates codewords dynamically without the burden of a static lookup table, and guarantees limited codeword length. The proposed scheme also performs validity and Hamming distance checks to achieve both non-flickering lights and low DC bias variance for stable LED illumination. This ensures high transmission efficiency and reduced hardware overheads, which is critical for implementing lightweight IoT devices. Our simulation results confirmed that the proposed scheme outperforms conventional RLL coding schemes in terms of mitigation of flickering, transmission efficiency, and hardware overhead.
In summary, the main contributions of this paper are as follows.
• We propose a technique to reduce the perceived flicker of VLC with a bit-shuffle coding scheme that generates dynamic codewords and limits the successive identical bit level.
• We significantly improve the transmission efficiency, compared to standard technologies, while also reducing hardware overheads.
• To analyze hardware overheads and facilitate practical implementation, we present possible hardware designs of an encoder and decoder for the proposed scheme.
• We present and discuss a performance comparison of the proposed scheme with conventional RLL coding methods. The remainder of this paper is organized as follows. In Section II, we introduce the general concept of the bit-shuffle coding scheme. In Section III, we present a hardware implementation of the proposed scheme. In Section IV, we present the results of performance comparisons of the proposed scheme with conventional RLL schemes in terms of flicker mitigation, transmission efficiency, bit error rate (BER) and hardware overheads. Finally, we conclude the work in Section V.
II. BIT-SHUFFLE CODING TECHNIQUE
In this section, we describe the overall architecture of the proposed bit-shuffle coding and its encoding/decoding algorithms in detail. For a better understanding of the structure of the proposed method, we provide examples of the shuffling encoding process.
A. OVERALL ARCHITECTURE Figure 1 illustrates the encoder-decoder (CODEC) design of the proposed bit-shuffle coding scheme. At the transmitter of a VLC system, a bit shuffling-based encoder generates pairs of encoded data and metadata to be transmitted by LED illumination. The encoder process begins with four different input data formats, D in , i.e., D in 0 , D in 1 , D in 2 , and D in 3 , by either applying or not applying inverse and exclusive-OR (XOR) operations to input data D in . When the input data are N -bit, the XOR operation is applied using an N -bit arbitrary bit sequence, e.g., 1010. . .10. In the encoding algorithm, data candidates diverge further by generating N candidates for each input data format; thus, encoded output that meets the transmission requirements of VLC to the largest possible extent can be produced. After checking the validity and Hamming distance, the best codeword of flicker-free and low DC bias variance is selected. Finally, the encoder concatenates the encoded data and metadata {D|M } en , and transmits it via LED light varying in intensity. Here, the metadata M include a hash H for bit shuffling and two additional flag bits, I and X , indicating whether D in is inverted or XORed, respectively, i.e., M = {H |I |X }.
At the receiver, a photo detection device, such as a photodiode (PD) or high-speed camera, detects the visible light data and the decoder performs the XOR and inverse operations according to the I and X bits of the received metadata. The decoding algorithm performs the bit shuffling to obtain decoded data D de from the encoded data and hash {D|H } en .
B. ENCODING AND DECODING ALGORITHMS
Algorithm 1 describes the encoding algorithm for N -bit input data. The inputs to the algorithm are {D|M } old , D in , I , X , and Z , which denote the previously encoded data and metadata, new input data, an inverse flag, an XOR flag, and a ''running selector'', respectively. The output, i.e., {D|M } en , is a concatenation of the encoded data and metadata.
{D|M } can
i is the i-th candidate of the data and metadata, which can be selected as the encoded data and metadata, where M can i is the concatenation of the i-th hash H , I , and X . 
// Check validity 9 :
// Calculate Hamming distances 11 :
end while 14 :
The encoding algorithm involves the following four steps. First, the algorithm generates N data candidates by shuffling the input data with N different ''hash candidates''. To perform the bit shuffling, an Omega network [20] , which is a widely known multistage interconnection network, is employed. The shuffled bit positions of a data candidate are calculated using N XORs with a specific hash, h, for each bit position of the input data. This can be presented as follows:
(1) Figure 2 shows an example of shuffling 8-bit input data with a 3-bit hash using the Omega network. If h is set to 3 (011), data bits b 0 and b 1 are shuffled and propagated to b 3 and b 2 , respectively. Second, the encoding algorithm performs a validity check that excludes candidates in which the number of consecutive 0's or 1's exceeds a given threshold as following the running selector bit R is 0 or 1, respectively. Now, we consider the validity of the codeword sequences by validating a concatenation of the previously encoded data and each candidate codeword, i.e., {D|M } old and {D|M } can i . Assuming a VLC system with a dimming level greater than 50 %, the average DC bias is greater than half the amplitude of the signal of 1; thus, the number of consecutive 0's mainly determines whether the MFTP condition, which in turn determines noticeable flicker, is satisfied. Conversely, if the dimming level is less than 50 %, consecutive 1's will have a significant effect on the flicker. Therefore, the extent of the problem of limiting the consecutive 0's or 1's depends on the predetermined dimming level of the VLC system. The bit shuffling scheme assigns the running selector bit during the validity check process to select the limit of consecutive 0's or 1's; in this paper, it is assumed that the dimming level is greater than 50 %.
Third, we estimate the Hamming distances of N different candidates based on the previously encoded data and metadata. The codeword with the largest Hamming distance with respect to the previous codeword is selected because, if the previous bit-shuffled code and the selected code are consecutively connected, the overall dimming of the encoded bit stream remains constant. Since it is desirable that the DC bias variance of the transmit signal be minimized, the maximum Hamming distance is considered in the bit shuffling encoder. Finally, the algorithm concatenates and returns a valid set of encoded data and metadata with the maximum Hamming distance. If all of the candidates are invalid, the candidate with the smallest number of consecutive 0's or 1's is selected as the encoded output.
Note that the proposed method yields diverse codeword candidates, where the codeword with the maximum Hamming distance from the previously encoded output is selected. We can obtain an almost constant dimming level, since the maximum Hamming distance intends to deploy 0 and 1 alternatively at a specific bit position of the codewords. In addition, by repeatedly applying 0 and 1 in sequence to the running selector bit, thus limiting the number of consecutive 0's or 1's, the dimming level converges at approximately 50 %. Figure 3 shows an example encoding, where the previously encoded data and metadata are both 1010. Here, the new input data is 1001, and I and X are 0 and 1. Also, Z is 0. The encoding algorithm first creates four data candidates by shuffling the input data using all of the hash candidates, i.e., 00, 01, 10, and 11. For instance, in the case of candidate 1, the input data bits {b 3 , b 2 , b 1 , b 0 } are shuffled to data candidate bits {b 2 , b 3 , b 0 , b 1 } by hash candidate 01. Then, the algorithm checks the validity of a bit sequence of data and hash candidates, I , and X by checking the number of consecutive 0's. For candidates satisfying the flicker constraint, the encoding algorithm calculates the Hamming distances of all four candidates created using the previously encoded data and metadata; thus, the candidate with the maximum Hamming distance is then selected. Finally, a pair of encoded data and metadata with 0110 and 0101, respectively, is obtained, and the encoded output is generated by concatenating the encoded data and metadata to give 01100101. D de = ShuffleBit(D en , H en ) 3: end procedure Algorithm 2 shows the decoding algorithm of the proposed bit-shuffle coding scheme. D de and {D|H } en are the output decoded data and a concatenation of the input encoded data and hash, respectively. The decoding algorithm can decode the output by simply shuffling the encoded data using its hash. Note that the decoder employs only inversion, XOR, and bit shuffling in sequence. Unlike the existing RLL coding schemes, the proposed approach does not rely on a lookup table; thus it can be implemented easily using dynamic code generation, while also avoiding the worst-case bit sequences. 
III. HARDWARE IMPLEMENTATION
To investigate the performance of a hardware implementation of this algorithm, we first describe the hardware designs for encoding and decoding of the proposed bit-shuffle coding. To facilitate practical implementation, we describe the design of the hardware components in detail. Figure 4 shows the overall hardware design of a bit-shuffle encoder with an input bit width of N bits. The encoder applies the encoding algorithm, which is shown in Algorithm 1, to the N -bit input data, inverse bit, and XOR bit during one clock period. It generates N shuffled candidates in parallel and then chooses the valid candidate with the maximum Hamming distance from the previous encoder output. As shown in Figure 5 , the N × 1 encoder is primarily composed of the N ShuffleBit, N HDist, N CheckValid, and one FindMaxHDist modules.
A. ENCODER AND DECODER
Furthermore, we can implement a bit-shuffle decoder by employing only one ShuffleBit module, as described in Algorithm 2. After the XOR and inverse operations, the decoder shuffles the N -bit encoded data using a log 2 N -bit hash and finally outputs N -bit decoded data. Due to the asymmetric designs of the encoder and decoder for bitshuffle coding, we implement a decoder with relatively small hardware resources, and which consumes a low amount of power compared to existing coding schemes. Unidirectional VLC systems (e.g., for hall lighting) generally employ multiple receivers embedding decoders for a single transmitter; thus, the hardware overhead of the decoders is significant from the point of view of the entire VLC system. Figure 5 shows the detailed design of four hardware modules that implement the encoding and decoding algorithms of the bit-shuffle coding method. First, Figure 5(a) shows the ShuffleBit module that outputs the N -bit shuffled data by shuffling N -bit input data using a given hash candidate occupying log 2 N -bit. The module is designed to employ N instances of the shuffling bit position (SBP), where each SBP consists of a log 2 N -bit XOR gate and an N -to-1 multiplexer. The i-th SBP returns the i-th bit of shuffled data by selecting one bit from among all of the bits of input data based on the result of XORing the hash candidate and bit index i.
B. HARDWARE MODULES
Second, the HDist module is used to calculate the Hamming distance of the shuffled candidate from the previous encoder output. As shown in Figure 5(b) , the module is simply designed to count the number of 1 bits in the XORed output for the encoder output and shuffled candidate, where the shuffled candidate is the concatenation of the shuffled data, hash candidate, and two bits of inverse and XOR.
Third, the CheckValid module checks whether a given shuffled candidate is valid or not according to a predefined threshold. Figure 5(c) shows an example module design with a threshold value of 2. The module consists of N -2 instances of partial check valid (PCV), where each PCV outputs 0 if all three successive input bits are equal to 0. Furthermore, if any PCV outputs 0, the CheckValid module will return 0 to indicate that the shuffled candidate is invalid. In other words, if the maximum number of consecutive 0 bits of a shuffled candidate is greater than a threshold value, i.e., 2, the module will output 0. Otherwise, it will output 1. By assigning 1 to the running selector bit for inverting the shuffled candidate, we can also check the validity based on the number of consecutive 1 bits in the candidate.
Finally, the FindMaxHDist module is designed to select the shuffled candidate with the maximum Hamming distance as encoded output, by deploying multiple 2-to-1 Max modules similar to a binary tree. As shown in Figure 5(d) , each of VOLUME 7, 2019 the 2-to-1 Max modules consists of five 2-to-1 multiplexers and one greater than comparator. This presents only one possible design, which is maximally parallelized to perform the bit-shuffle encoding within one clock cycle. We can design more efficient modules using different approaches, such as those based on pipelined architectures.
IV. PERFORMANCE EVALUATION
In this section, we compare the performance of our proposed bit-shuffle coding algorithm to that of conventional RLL schemes in terms of flicker mitigation and transmission efficiency. We also analyze the overheads of their hardware implementations.
A. FLICKER MITIGATION
For the performance evaluation, we compared the proposed bit-shuffle coding with conventional Manchester, 4B6B, and 8B10B RLL coding schemes, which are included in the IEEE 802.15.7 standard. As performance metrics, we evaluated the codeword length and run length histogram of each scheme by producing codeword sequences via a Monte-Carlo simulation with 2,000 runs. To evaluate inherent characteristics of the proposed method, channel coding was not considered. Table 1 presents the codeword lengths of the RLL codes when the bit widths were 8-and 16-bits. We can see that the proposed scheme produces considerably shorter codewords than Manchester coding. When N -bit input data are encoded, Manchester coding causes a severe reduction in the data transmission rate, as it produces 2N -bit encoded output, whereas the proposed scheme returns only (N +log 2 N +2)-bit encoded output, which is composed of N -bit encoded data, a log 2 N -bit hash, an inverse bit, and an XOR bit. The length gain of the bit shuffling code increases with the bit width. Although the bit shuffling code has a comparable codeword length to the 4B6B and 8B10B schemes, it can prevent the worst case bit sequences, which cause flickers, by generating codewords dynamically. Figure 6 shows the run length histograms of continuous codewords according to the number of consecutive 0's in the conventional RLL coding scheme and the proposed bitshuffle coding scheme. In the validity check, the performance of the proposed scheme was measured by setting the threshold value for the number of consecutive 0's to 2. As we can see, while the 4B6B and 8B10B schemes generate three or more consecutive 0's more frequently, with the proposed technique there is a maximum of two-consecutive 0's. If the optical rate of the LEDs is not sufficiently fast due to the use of simple transceivers, three or more consecutive 0's may lead to flickering of the light, thus failing to meet the transmission requirements of VLC. Note that the encoded codewords of the Manchester code and the bit shuffled code are not equal in length, so the values are not equal at the two consecutive 0's and the sum count of the number of occurrences is not the same. Also, the proposed coding scheme outperformed the Manchester coding for bit widths of both 8-and 16-bits, in terms of codeword length, while the flicker mitigation performance was comparable.
Note that the proposed coding scheme outperforms the 4B6B and 8B10B coding schemes in terms of the worstcase performance, as it is designed to perform dynamic code generation; the 4B6B and 8B10B coding schemes employ lookup tables to encode input data into encoded data, and vice versa. When input data yielding in large numbers of successive 0's are transmitted repeatedly, each encoded output is identical due to the use of predetermined lookup tables; this will give rise to severe flickering. The proposed scheme dynamically selects the candidate with the maximum Hamming distance based on the previously encoded output; thus the next encoded output differs from the previous output, although the input data are the same.
Among the RLL codes, the Manchester code has only two consecutive bits and thus can be implemented simply. However, from the viewpoint of transmission efficiency, the Manchester code has the drawback of a low data rate, because it needs to transmit a signal twice the length of the original signal. If a VLC system is equipped with highperformance hardware capable of processing high-speed visible light data at the transmitting and receiving end, a high optical rate can be considered for data transmission. However, if the high optical rate cannot be guaranteed due to the limitations of the hardware performance of the small IoT device, flicker may occur due to the consistently identical bit level. Therefore, use of the bit shuffling technique, which can limit the number of consecutive identical bits and has a shorter code length than the Manchester code, is beneficial and can be applied to various fields where hardware overhead is a constraint.
To observe the effect of bit shuffling coding on the probability of bit error, the BER graph is shown in Figure 7 . As can be seen, the BER performance of the proposed scheme is similar to that of other RLL codes. Despite the excellent flicker mitigation performance and simplified hardware implementation of the bit shuffling scheme, degradation of BER performance does not occur either the low or high signal-to-noise ratio (SNR) region. Table 2 shows the hardware overheads of the encoder and decoder for the proposed bit-shuffle coding scheme and the existing RLL coding scheme in terms of area and power consumption. For all coding schemes, we implemented both the encoders and decoders of 8-bit and 16-bit input data for all coding techniques using Verilog HDL. We synthesized all of the encoders and decoders to operate at a clock frequency of 1 MHz using Synopsys Design Compiler [21] , and a Samsung 65 nm standard cell library. The dynamic power consumption was measured using Synopsys PrimeTime [21] by configuring the toggle rate to 10 %.
B. OVERHEAD EVALUATION
Our results demonstrated that the proposed bit-shuffle coding is widely applicable to flicker-free VLC services, even for devices with small-scale and limited hardware resources, such as IoT devices. This is because we significantly reduced the hardware overheads of the bit-shuffle decoder by applying an asymmetric design to the encoder and decoder. The decoder is markedly more important than the encoder in terms of the performance of the overall system for unidirectional VLC applications. Typically, a much larger number of receivers with limited resources employ only the decoder, whereas a transmitter without constraints uses the encoder. For this reason, this paper ignores the fact that bit shuffle encoder takes up a considerably large area and consumes more dynamic power than the other RLL coded encoders. Also, we can drastically reduce the hardware overheads of the encoder by employing a pipelined architecture.
For the bit widths of 8-and 16-bit, the bit-shuffle decoder has a significantly smaller area and lower power consumption compared to the 4B6B and 8B10B decoders. The bitshuffle decoder occupies 32.8 % and 22.5 % less area, and consumes 16.5 % and 13.9 % less dynamic power, for 8-and 16-bit inputs, respectively, relative to the 8B10B decoder. Furthermore, our proposed decoder has a 10.8 % and 2.8 % smaller area than the 4B6B decoder for 8-and 16-bit inputs, respectively. The area of the bit-shuffle decoder is slightly larger than that of the Manchester decoder, but we showed that our decoder is more suitable for VLC by confirming that its codeword lengths are much shorter. In addition, our decoder consumes 6.5 % and 9.8 % less dynamic power than the 8-and 16-bit Manchester decoders, respectively.
V. CONCLUSION
In this paper, we proposed a novel flicker-free encoding and decoding algorithm based on bit shuffling in VLC systems. Although conventional RLL codes can be used for flicker mitigation, these do not take into account transmission efficiency, worst-case bit sequences, and hardware overheads. To solve this problem, the bit-shuffle process, which can alleviate light flickering by limiting the number of consecutive identical bits, and generate dynamic codewords of considerably shorter length to improve transmission efficiency, is introduced. Furthermore, the proposed coding scheme significantly reduces the hardware overheads of the decoder because it does not require predefined lookup tables. Our simulation results demonstrate that bit-shuffle coding is robust and more suitable for a wide range of VLC applications than existing RLL coding schemes.
