We developed and fabricated the world's highest resolution (18 megapixel, 1443 ppi) OLED on glass display panel. The design uses a white OLED with color filter structure for high density pixelization and an n-type LTPS backplane for faster response time than mobile phone displays. A custom high bandwidth driver IC was fabricated. We developed a foveated pixel pipeline appropriate for virtual reality and augmented reality applications, especially mobile systems. FIGURE 1 -Cross-section of high ppi OLED display for VR. Journal of the SID, 2018
Introduction

Virtual reality displays and the human visual system
Virtual and augmented reality offer the promise of amazing immersive experiences. Virtual reality (VR) can take you to new places, and augmented reality can bring these places to you. 1 Enabling these amazing immersive experiences requires great displays that come as close as possible to matching the capabilities of the human visual system (HVS). 2 These displays require lots of pixels, high pixel density, fast response time, high refresh rate, short illumination duty cycle, and of course reasonable brightness, contrast, and color gamut. 1 The primary objective of this work is to develop mobile head-mounted dtyisplay (HMD) prototypes that provide a visual experience that matches the HVS as closely as possible. Developing such displays requires overcoming a number of significant challenges. Mobile organic lightemitting diode (OLED) displays offer excellent frontof-screen image quality, but they need a number of major improvements to approach the capabilities of the HVS when used in a VR headset. For example, the display diagonal per eye needs to be between 2 and 6 00 depending on the design specifications of the headset. A headset providing high immersion and HVS-like acuity requires a wide field of view (FoV) over 100°, a display with 1000 to 2200 pixels per inch (ppi), and 15-25 million pixels per eye. Making an OLED display on glass (as opposed to microdisplays on silicon) with these attributes poses significant challenges in materials and processes. Driving this class of display poses another big challenge for the circuitry and interfaces, especially given the space and power constraints of a mobile (untethered) system. Several of these challenges are addressed in this work.
Key challenges of virtual reality displays 1.2.1 Pixel pitch, pixel count, and optics *
For head-mounted VR systems to approach the visual acuity and FoV of the HVS, they can make use of significantly more pixels than handheld displays, and even most largeformat displays. Typical human FoV for each eye is approximately 160°(horizontal) by 150°(vertical). 3 At an acuity of 60 pixels per degree (ppd), or 20/20 Snellen acuity, covering this full FoV requires 9600 × 9000 pixels per eye. This of course assumes a uniform acuity over the full FoV, which is an overestimate for the HVS 3 even considering eye roll and may be beyond the resolving limit for some optical systems, but provides a useful upper bound. Alternate assumptions and analysis may give different results for maximum pixel counts.
Techniques for dealing with spatially varying acuity, both in the HVS and the HMD's optics, are addressed in later sections of this paper. It is possible to design a panel with a spatially varying pixel pitch to match these acuity variations, but possibly at the cost of undesirable nonuniformity, driving or manufacturing complexity, and other drawbacks. For this work, we assume the pixel pitch over the full pixel array is constant, although the HVS and/or the optical system may be unable to clearly resolve all the pixels over the full array.
Pixel pitch may be calculated by considering the optical system that creates a magnified virtual image for the viewer. In the center of the optics, the spacing between two pixel centers (the pixel pitch) subtends the angle θ and may be calculated from the following equation:
Typical VR optics may have a focal length of roughly 40 mm. 4 For 60 ppd, θ is 1 arc minute, and pixel pitch at this focal length should therefore be 11.6 μm, or 2183 ppi. For comparison, a modern smartphone display may only be 400 to 800 ppi.
Constructing an optical system capable of resolving 11.6 μm features over a 160°FoV is extremely challenging. Lenses may become large and heavy, have substantial distortion or aberrations across the FoV, and have a small eyebox. System-level tradeoffs between pixel size, FoV, optics, HMD size, and other factors should be considered carefully.
For our system, we tried to balance these tradeoffs, particularly the optical system acuity and FoV. A comparison between the calculated parameters of the "upper bound" display described previously, and our prototype panel is shown in Table 1 . We chose a FoV of 120°× 96°per eye and central acuity of 40 ppd, for a pixel count of 4800 × 3840. This pixel count is half WHUXGA, so a full system with two displays matches the WHUXGA pixel count. Additional details of panel driving tradeoffs are provided in Section 3.
Display addressability and interconnect bandwidth
Driving so many pixels presents engineering hurdles in both addressing the pixel array and the interconnect bandwidth required between the display and rendering system.
One metric for driving a line-at-a-time pixel array (representative of most modern LCD and OLED panels) is the time available to update a single line. During this line time, the row enable circuitry must transition from off to on; the analog pixel values must be driven along the column lines, through the pixel logic, and to a storage or display node; and the row enable circuitry must store the pixel values by transitioning from on to off. Other pixel array architectures are of course possible; many of them will have similar although distinct constraints.
The line time is a function of the panel refresh rate and the total number of lines in a frame, including any blanking lines. VR displays often refresh above 60 Hz to avoid flicker and reduce motion-to-photon latency. 5 VR displays may also use short persistence illumination 6 to reduce motion blur. Short persistence (low duty cycle) illumination is effectively an alternative to very high (500 Hz to 1 kHz) refresh rates, which may be infeasible for the rendering system. Reducing motion blur using low persistence is generally preferred over high refresh rates, but it has two drawbacks. First, keeping emission duty cycle short can lower display brightness. Second, data loading may need to be paused via a vertical blanking porch during the emission period. This increases the total required bandwidth of the interface. For this example, the display is driven at 120 Hz (8.3 ms/frame) and 20% of the frame time (1.7 ms) is used for illumination. No additional time is allocated for pixel transition time although it may be required for some display technologies, such as LCDs.
System mechanical constraints may require VR displays to have a portrait orientation. This further limits the time available per line. For the 9600 × 9000 pixel theoretical display mentioned previously, the portrait mode line time may be calculated as:
This is extremely short. For comparison, the line time for a 4 k/60 (landscape) display is approximately 7.5 μs, more than 10× longer. To support shorter line times, the display must use very fast transistors and wires. Capacitive loading of pixel transistors, RC time constants of row and column wires, and voltage swings must all be optimized. Using a refresh rate of 75 or 90 Hz rather than 120 Hz and reducing the number of active lines below the theoretical maximum needed by the HVS are simple ways to reduce this constraint, at the cost of slightly increased latency and slightly reduced acuity. Lowering illumination duration may unacceptably lower display brightness or require high current densities for OLED components, but it is also an option.
Interconnect bandwidth between the rendering system and display is quite large. Assuming 15% overhead for horizontal porch and the above mentioned 20% vertical porch, the total number of pixels (active plus porches, keeping in mind the portrait orientation) is 11,520 × 10,350. At 120 Hz, the pixel clock is as follows: pixel clock ¼ 11; 520 Â 10; 350 Â 120 Hz ¼ 14:3 GHz For comparison, a 4 k/60 pixel clock is under 600 MHz, more than 20 times slower. The total data rate to the display is a function of the pixel clock and the number of bits per pixel. At 24 bits per pixel, total data rate is 343 Gb/s. For comparison, DisplayPort 1.4 supports an uncompressed payload data rate of 25.92 Gb/s. 7 Again, the theoretical upper bound VR display requires more than 10 times the bandwidth. A thoughtfully practical system, such as the 4800 × 3840 panel discussed earlier, requires significantly less bandwidth. Other techniques may also be applied to this bandwidth challenge. Limiting the refresh rate and total number of vertical lines (including blanking) lowers bandwidth substantially. Reducing total horizontal pixels (including blanking) also helps. Compression, such as Display Stream Compression (DSC) 8 can provide a factor of three or more bandwidth reduction. Subpixel rendering (e.g., using two 10-bit subpixels per pixel rather than three 8-bit subpixels per pixel) can give a 20% bandwidth reduction. And foveation techniques, which will be discussed in detail at a later point, can provide large bandwidth benefits in VR systems.
Other challenges
High performance VR displays have a few other challenges relative to direct view mobile or large format displays. Uniformity requirements are strict in head-tracked systems to avoid "dirty window" artifacts. 9, 10 In the pixel pitch range under consideration, very limited space is available for uniformity compensation logic within the pixel array. Displays should be designed either not to require much compensation logic, or it should be moved upstream.
The useful viewing cone of the display depends on the optical system. The lenses may only collect light from a narrow angle, for example +/À30°. Therefore, VR displays need not support the wide viewing angle expected of mobile device displays. Maintaining uniform spectral and luminance output within the viewing cone is more important in order to minimize color variation over the FoV in the HMD. Concentrating the display emission energy into this desired cone is useful both for power efficiency and to reduce stray light in the HMD, but shaping the emission cone is difficult in high resolution OLED displays. HMDs may also have high brightness requirements due to losses in the optical system.
Other front-of-screen metrics such as color gamut and contrast ratio may be similar to conventional displays.
Designing a high performance virtual reality display
Panel design and driving 2.1.1 Panel structure
We built a 4.3 00 1443 ppi OLED-on-glass display with a pixel format of 3840 × 4800, a pixel pitch of 17.6 μm, and a FoV appropriate for an immersive HMD computing system. When integrated with a high performance optical system with, for example, a 40 mm focal length, the resulting image spans approximately 120°(H) by 100°(V) per eye, with an acuity of 40 ppd, corresponding to 20/30 on a standard Snellen eye chart.
The display uses two subpixels per pixel: one green subpixel and either a red or blue subpixel. This subpixel arrangement is widely used in mobile phone displays. Each subpixel is 17.6 μm × 8.8 μm. Fabrication of small pixels for displays over 1000 ppi is an extreme challenge with conventional Fine Metal Mask systems. 11 Advanced Fine Metal Mask methods can make micrometer-sized holes, but usually have a wide dead zone between subpixels, making fabrication below 10 μm pixel pitch extremely difficult. To avoid these issues and have a lower risk path to mass production, we use a structure with white OLED and color filters. This approach is used in commercial OLED TV panels 12, 13 and in OLED on silicon microdisplays. 14 Current photolithography technology in an LTPS line can also achieve color filter patterning at pixel densities over 1000 ppi.
It is desirable for VR HMD panels to emit uniform color light over a narrow viewing cone. The conventional approach is to bond color filter glass to a white OLED substrate, but this creates a bigger cell gap that exacerbates color mixing. 15 For this display, we addressed this issue with a new color filter deposition process.
In the conventional glass-glass bonding between color filter and white OLED, the OLED cell gap, black matrix, bank open size, and alignment control between anode and color filter are important factors to determine viewing cone characteristics of display. To improve the color uniformity over a narrow viewing cone, we decided to pattern the color filter directly on the encapsulation layer. This can both improve the alignment between the color filter and the anode as well as make the OLED cell gap thinner. Figure 1 shows the color filter on encapsulation structure used for this display. Because the color filter process is carried out after the OLED process, low temperature materials and processes are essential. OLED material can be damaged above 100°C, so the color filter and black matrix materials were treated below 90°C. We did not see any performance degradation from the low temperature-cured color filter material, but the material is sensitive, so the process window is narrow. As shown in Fig. 3 , the panels should be provided as a pair. To maximize viewable pixels towards the nose, at least one side of the panel should be designed with a narrow bezel. We designed one side of the panel without any circuits or power lines, and the bezel width along that edge is 1.7 mm. The scan drivers support bidirectional driving for various image compositions of left and right panels.
Panel configuration
Viewing angle
As previously described, viewing cone performance is related to OLED cell gap, black matrix area, bank open area, and misalignment between the color filters and anodes. Also, because the subpixels are rectangular, subpixel orientation gives fundamentally different viewing cones between horizontal and vertical orientations. For this panel, we define viewing angle in terms of color shift: Δu'v' ≤ 0.02.
The viewing angle along the long axis of a subpixel is wider than along the short axis, as shown in Fig. 4 . Our display has a native 4:5 portrait aspect ratio, but it is used in a landscape orientation in an HMD to achieve appropriate horizontal and vertical FoV. Our pixel orientation is represented in Fig. 4(b) . Figure 5 shows Δu'v' measurement results of horizontal viewing angle dependence. Green exhibits the best viewing angle because it has no contrasting color subpixels along the horizontal direction, as shown in Fig. 4(b) . Blue exhibits the smallest viewing angle, but its Δu'v' remains below 0.02 at ±30°. The viewing angle of white is ±55°for Δu'v' equal to 0.02.
Panel driving for VR
Higher refresh rates reduce motion-to-photon latency of VR displays. This display was designed to refresh at up to 120 Hz. To reduce motion blur, this display also supports short persistence illumination. For example, Fig. 6 shows a global shutter drive scheme with approximately 80% of the frame time used for writing pixel data and 20% for light emission. Pixels do not emit light until the pixel array writing is complete. After addressing, the full pixel array emits light simultaneously. At 120 Hz refresh rate, our display supports an illumination duration of up to 1.65 ms. The panel and driving circuitry were designed to support the peak current draw during the illumination time.
High density and fast driving are challenges for a TFT backplane. VR displays require fast optical response time to reduce motion artifacts. The response time of an OLED display is related to TFT design and pixel circuit characteristics. For mobile OLED displays, p-type LTPS technology is considered mainstream, but is susceptible to a "ghost image" artifact that appears when the display is unable to reach the target brightness level in the first frame after changing the image. In order to achieve high resolution and fast driving speed, n-type LTPS TFTs that have higher mobility and lower hysteresis characteristics than p-type were chosen for the TFT backplane. 9,16
Foveated rendering and transport
Head mounted display systems differ from direct-view displays in a number of ways that impact how content can be rendered and displayed on them. HMDs include optics (lenses, etc.), that have spatially varying resolving performance; for example, the center of a lens usually has sharper image quality than the periphery. Additionally, if the system has a very wide FoV, the periphery of the image may be outside the area to which the user can comfortably roll their eyes to view with their fovea. HMDs are also usually head-tracked, so the user is able to turn their head to keep content of interest near the center of their FoV. These factors all support image "foveation" for HMDs, in which only a subset of pixels are rendered and displayed at high resolution while the others use lower resolution. The total number of pixels rendered is much smaller than the native pixel count of the display; therefore, lower bandwidth is required, and low power mobile application processors can drive high acuity, high pixel count HMDs. Foveated rendering and transport are critical elements for implementation of standalone VR HMDs using this 4.3 00 OLED display. With the use of eye tracking, the foveated (high acuity) region can be made very small (typically less than +/À15°) relative to the overall FoV. 17 However, even without eye tracking, the image may be separated into regions with different acuity so the image matches the natural roll-off of the system optics and the HVS's low peripheral acuity.
The term "foveation" as used here has two parts: foveated rendering and foveated transport. Foveated rendering is a technique to reduce rendering computation in the GPU. Foveated transport is a technique for arranging the rendered pixel data for transmission from the GPU to the display. The display logic then processes the image data of the different regions to create an image at the native pixel count of the display. A conventional (unfoveated) image is typically sent as a serialized raster, with horizontal and vertical blanking regions. In a foveated system, multiple regions with different resolutions must be rendered and transmitted. In the system developed here, the regions are concatenated at the GPU into a single image frame with a nonstandard pixel count, along with a few bytes of image metadata to direct image reconstruction, and blanking regions.
Foveated rendering
Foveated rendering reduces the computation load on the GPU by separating the image to be rendered into higher and lower resolution regions. Multiple rendering passes are made for each frame generated by the application. For the same head pose and same scene, two (or more) renders are generated: a low acuity render that uses a relatively low pixel count to represent a wide FoV, and a high acuity render that uses a relatively high pixel count to represent a narrow FoV. 18 In Fig. 7 , the lower acuity (LA) region is shown in green and the higher acuity (HA) region is shown in yellow. Because parts of the LA region are overlapped by the HA region, we blank the occluded pixels, shown in black, to reduce rendering overhead. After rasterization, both the LA and HA content are warped to correct for the lens distortion in the HMD, as shown in the second stage of the figure. This warping causes a barrel distortion in the resulting images to counteract the pincushion artifact of a magnifying VR lens. Since the output distorted images are no longer rectangular, we render the original imagery at a larger pixel count so that a rectangular region can be cropped in the distorted image that matches the desired display pixel count, shown in the third stage.
Each region is processed independently, allowing for easy scalability to more than two regions if necessary. Lastly, the GPU composites these output renderings into a single display frame formatted for foveated transport.
Foveated transport
Once the GPU has rendered the different regions, the pixel data is reshaped for transport. Consider for example an image with two rendered regions: one high acuity, one LA. The high acuity (HA) region may be relatively small, for example, 640 × 640 pixels. The LA region may be larger, for example, 1280 × 1600 pixels. These two regions are combined into a single image frame by reshaping the HA pixel data to be the same width as the LA pixel data. In this case, the 640 × 640 pixels are arranged in a block that is 1280 × 320 pixels. This is not a scaling operation: the pixels are not modified, only the arrangement is changed. This HA block is prepended to the LA block, making an overall image that is 1280 × 1920. A line of metadata is added at the top, as is another blank line between the HA and LA blocks to keep the total number of lines even (which simplifies parts of the system). The total image sent to the display electronics is 1280 × 1922. For other possible resolutions, if the HA region does not fit evenly in the LA width, zero-padding pixels may be added. The concatenated image arrangement is shown in Fig. 8 .
The concatenated image may be sent over a physical layer, such as MIPI DSI or DisplayPort, in the conventional way. It may also be compressed to reduce physical layer bandwidth using DSC 8 or other compression algorithms. Note that compression algorithms dependent on spatial correlations may have difficulty with the reshaped regions, but one dimensional compression should still perform well.
The metadata may contain information about the size of the HA and LA regions and the position of the HA region in the final processed image. Since the metadata is sent with the image data, no additional synchronization or timestamps are required. If the system includes eye tracking, the rendering system may change the position of the HA region every frame as the eyeball position changes.
The foveated rendering, rearrangement for foveated transport, metadata calculation and insertion, optional compression, and physical layer transmission may all be performed on conventional GPU hardware. No hardware modifications are required.
At the panel, custom logic is required to reconstruct the image for presentation at the panel's native pixel count. The panel's foveation logic receives the foveated frame image data. The metadata is parsed to extract frame attributes. All of the HA image data is buffered. The LA image data is passed through upscaling logic and a few lines are buffered.
In this example, the data is upscaled by 3× in both the x and y directions. Suitable upscaling algorithms include bilinear or nearest neighbor. The input 1280 × 1600 image is therefore upscaled to 3840 × 4800, the native pixel count of our display. The HA region is composited at the appropriate location (defined by values sent in the metadata). Logic may be added to blend the HA and LA regions, or the blending may be performed around the perimeter of the HA region during the software rendering process. The resulting image is sent to the driver ICs with a conventional raster scan.
The system should be configured to use an appropriate size and location for the HA region. If the system is eye-tracked, the HA region should move with the viewer's gaze, and in this case may also be quite small, subtending less than 15°o f the total FoV. Note in this example LA data is transmitted for the entire pixel array, including in the image region that will be overwritten with the HA image data. Optimizations are possible both to avoid rendering this part of the LA image and to reduce transmission bandwidth by not transmitting this overlapping LA data. It is also possible to extend this scheme to more than two regions. Intermediate regions should be upscaled by intermediate values. For example, the HA region may still be passed to the display unscaled, but a "middle acuity" region might be upscaled by 2× in x and y, and the LA region upscaled by 4× or more in x and y. As additional regions are added, the overhead of overlapping regions increases and should be avoided.
Display foveation logic implementation
The foveation electronics for this display were implemented in an FPGA, suitable for porting to an ASIC. The input is the foveated transport video stream (either DisplayPort or MIPI DSI). Logic in the FPGA converts the video stream to the appropriate format for our display, as shown in Fig. 9 . The incoming image data is partially buffered but is not stored in a full frame buffer. VR systems typically have strict latency requirements, so the foveation logic must minimize latency. Since frame rate conversion is not possible without a frame buffer, the frame rate of the input and output streams must be locked. Logic was added to synchronize the output stream to a frequency locked, phase offset copy of the input vertical sync signal.
Results
Panel performance
A photograph of our display is shown in Fig. 10 . The display specifications are in Table 2 . Figure 11 shows the display response time measurement when the image changes from black to white. One frame time is 8.33 ms as it is driven at 120 Hz. Addressing data takes 6.68 ms, and OLED light emission uses 1.65 ms. The entire pixel array is turned on simultaneously after data addressing is complete. The response time after the global illumination is turned on is around 10 μs. The brightness of the first frame reaches the target brightness because of the n-type LTPS backplane. A p-type LTPS-based OLED display may take two or three frames to reach the target brightness. Even though an n-type LTPS backplane needs more process steps and higher temperature conditions, it can provide outstanding temporal characteristics for high performance VR systems.
An OLED LTPS backplane needs mura compensation. The internal compensation methods used in mobile phone OLED displays are not suitable for high ppi panels. We employed an external compensation approach, and Fig. 12 shows photographs of our panel's image quality before and after mura compensation. Figure 13 shows an enlargement of the image shown in Fig. 10 . The image shown was photographed through VR optics, although no distortion correction was applied to the image. The image quality is very good with no visible screen door effect even when viewed through high quality optics in a wide FoV HMD.
Panel driving using foveated rendering and transport
We implemented the foveated rendering software on a standard mobile SoC and the foveation logic in an FPGA. Foveated rendering implementation details and performance optimizations are beyond the scope of this paper. 18 The MIPI DSI interface between the mobile SoC and FPGA was limited to 6 Gb/s (uncompressed), which implies a 250 MHz pixel clock at 24 bits/pixel. We settled on foveated transport pixel counts near 1280 × 1920/75 Hz to fit within this bandwidth limitation. Both our SoC and FPGA can support DSC for an up to 3× increase in pixel count, but this image size is well matched to the GPUs rendering capability. A higher performance SoC could be used with DSC in future systems to increase image size without requiring higher interface bandwidth.
The FPGA foveation processing logic is similar to a conventional image upscaler. For our display, the output data rates are quite high, but the input rate matches existing mobile display panels. Recall from aforementioned that the theoretical 9600 × 9000/120 Hz display required a 14.3 GHz pixel clock and 343 Gb/s to the display. Our implementation optimizes a number of parameters to reduce data rates.
First, our panel pixel count is 3840 × 4800, providing a substantial bandwidth reduction while still matching our optical system's capabilities. Second, while our display is capable of 120 Hz refresh, we operate at 75 Hz to allow more complex rendering on our mobile SoC. Third, logic in the FPGA performs subpixel rendering, so the bandwidth to the driver ICs is 20 bits/pixel (10 bits/subpixel × 2 subpixels/pixel) rather than 24 bits/pixel (8 bits/subpixel × 3 subpixels/pixel).
To achieve our brightness target, the illumination duration is 20% of the frame time, a persistence of 2.7 ms at our 75 Hz refresh rate. The linetime may be calculated as previously: Within the FPGA, we found the vertical part of the LA upscaler had the most challenging timing constraints. This is similar to conventional upscalers. We implemented a bilinear upscaler, and composited the HA region without blending, since our foveated rendering software creates a transition zone around the HA perimeter. MIPI DSI and DisplayPort receivers are commonly implemented in FPGAs, and timing closure was relatively straightforward. The FPGA outputs to the driver ICs have high total bandwidth, but individual lanes are well within capabilities of FPGA gigabit transceivers.
The FPGA logic memory requirements were also reasonable. Our FPGA includes an embedded microcontroller, so some memory is allocated for it. A few of the logic blocks require line buffers at the display's native pixel count. Each line buffer requires 3840 × 2 × 10 = 76.8 kb. The HA region buffer requires 640 × 640 × 2 × 10 = 8.2 Mb. To support multiple, larger regions, each region may need a buffer of 10-12 Mb. Figure 14 shows a photograph of the display taken through a magnifying loupe showing the boundary between the high acuity region and the upscaled low acuity region. No blending has been applied to the perimeter of the high acuity region so that the boundary may be seen.
Conclusions
We have designed and fabricated a very high pixel count (>18MP), ultra-high ppi (1443 ppi) OLED display for VR applications. This is currently the world's highest resolution OLED on glass display. White OLED material and color filters were used to meet the high ppi requirements, and an n-type LTPS backplane was used to meet the panel driving and image ghosting requirements. Foveation logic was implemented in an FPGA to convert the low bandwidth foveated image rendered on a mobile processor to the high bandwidth stream required by the display. The result is a stunning visual experience in a mobile VR system.
