This paper proposes a novel High E ciency Video Coding (HEVC) Tile partitioning method for parallel processing by analyzing the computing ability of asymmetric multicores. The proposed method (i) analyzes the computing ability of asymmetric multicores and (ii) makes a regression model of computational complexity per video resolutions. Finally, the model (iii) determines the optimal HEVC Tile resolution for each core and partitions/allocates the Tiles to suitable cores.
INTRODUCTION
In recent years, parallel ultra-high definition (UHD) video processing has emerged as a preferred technology and the usage of the computing systems that have asymmetric multicore processor such as ARM big.LITTLE is actively increasing [1] .
A new international video standard High E ciency Video Coding (HEVC) provides two new parallel processing tools employing di erent picture partitioning strategies such as Tiles and Wavefront Parallel Processing (WPP) [3, 9] . Tiles Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s partition a picture with horizontal and vertical boundaries so that it provides better coding gains compared to multiple slices. However, it does not take into account computational abilities of asymmetric CPU cores such as ARM's big/LITTLE cores, and divides a picture into a grid of equalsized rectangular regions. This results in performance degradation of multicore parallel processing. Thus, this paper proposes a new HEVC Tile partitioning method for parallel processing by analyzing the computing ability of asymmetric multicores as well as the computational complexity of each Tile. In addition, this paper demonstrates the results of the study by implementing the proposed method on Samsung Galaxy S7 Edge, a smartphone introduced in the market in the recent past.
VIDEO PARALLEL PROCESSING USING THE PROPOSED NON-UNIFORM TILE PARTITIONING METHOD
On asymmetric multicore systems, the conventional uniform Tile partitioning method causes performance bottlenecks, because the faster decoding threads(on big cores) are forced to wait for slower decoding threads(on little cores) to finish decoding of each picture. This paper proposes a method to minimize the relative workload gap between the cores to minimize the performance bottlenecks. Diverse researches have been conducted in the parallel video processing field to equalize the relative workload of each core. One of the researches proposes the HEVC Tile partitioning algorithm by estimating decoding complexities. The method counts the encoded bits of each coding tree unit (CTU) and segments multiple Tiles by distributing workloads to multicores as uniformly as possible. It is quite practical, but it does not consider the asymmetric multicore environments. Hence, this paper focuses on relative workload equalization for asymmetric multicore systems. The proposed method works on the concept of dividing video pictures into multiple non-uniform Tiles and allocating them to big and little cores that have asymmetric performances. Figure 1 depicts the concept of the mapping HEVC Tiles onto multiple cores.
Among many factors a ecting video decoding complexity, this paper focuses on the resolution of each Tile to estimate the decoding complexity [2] . The proposed Tile partitioning method involves the following processing steps: (i) Analyze the computational ability of asymmetric multicores and (ii) Apply the pre-defined regression model [5] [6] [7] [8] for a computational complexity per video resolutions. (iii) Determine the optimal HEVC Tile resolution for each core. (iv) Partition/Allocate Tiles to the best cores as shown in Figure  2 .
The proposed method does not work for pre-encoded videos and broadcast systems, which does not take into account each decoder side. However, the proposed method works for real-time video communication systems such as video conference applications and first person view (FPV) video streaming systems on unmanned aerial vehicles (UAV), which is a key di erentiator over competing methods because it allows encoders of the video systems to employ particular non-uniform Tile partitioning options by taking into account environment of the decoder side real-time. Table 1 . The internal option TileUniformSpacing is set to value '0' for non-uniform Tile partitioning. TileColumnWidthArray and TileRowHeightArray options are used to adjust resolutions of each Tile. Figure 3 and 4 show the partitioned Tiles using conventional and proposed methods. The proposed method is implemented as additional functional modules with typical HEVC decoder, and Figure 5 shows the block diagram of the proposed HEVC decoder structure. For real-time demonstration, open source OpenHEVC decoder is used [4] . This paper describes the modification of function hls_decode_entry_tiles in OpenHEVC decoder to implement the proposed method. A function sched_seta nity is used to allocate video decoding threads to big and little cores.
EXPERIMENTAL RESULTS AND DEMONSTRATION
This paper conducts a demonstration on two Android smart phones(Samsung Galaxy S7 Edge) which have asymmetric multicores, as shown in Figure 6 . These two phones decode test sequences segmented by conventional uniform and the proposed non-uniform Tile partitioning methods using the modified OpenHEVC decoder, and this paper calculates decoding speed di erences between the two phones. The Samsung Galaxy S7 Edge has four big and four little cores, but two big cores are always on online state, on the other hand, Table 2 show the measured performance gains in decoding time through PeopleOnStreet and Tra c test sequences. The results show that the proposed method achieved an average 25% decoding time gains. The decoding time gains are achieved by increasing decoding complexity for big cores and reducing decoding complexity for little cores. In addition, Figure 9 and 10 show utilization rates of each cores during conventional and the proposed Tile partitioning-based decoding. In the Figure 9 , utilization rates of 2 big cores show large fluctuations. A cause of the large fluctuations is that big cores wait for little cores to complete decoding of a picture, although the big cores complete to decode Tiles which is allocated to them. On the other hand, the figure 10 shows relatively stable utilization rates of big cores compared to figure 9, because wait time of big cores is minimized by the proposed Tile partitioning method. The minimized wait time enhances overall decoding performance. 
CONCLUSION
This paper proposes the novel HEVC Tile partitioning method using asymmetric multicores for UHD parallel video processing. The method minimizes the decoding time gap between big (faster) and little (power e cient) cores by allocating non-uniform HEVC Tiles to the cores. Experimental results with standard 4K UHD test sequences show an average 25% performance improvement on the Android smart phone introduced recently.
