Theoretical thesis.Bibliography: pages 173-193.1. Introduction -- 2. Background and related work -- 3. VLSI architecture of full search variable-block-size motion estimation for HEVC video encoding -- 4. Fast sign-detection algorithm for residue number system moduli set -- 5. ASIC design in residue number system for calculating minimum sum of absolute differences -- 6. ASIC design of TZ search motion-estimation for HEVC with RSN -- 7. Residue number system hardware design of fast-search variable-motion-estimation accelerator for HEVC / H.265 -- 8. A novel angle-restricted test zone search algorithm for performance improvement of HEVC -- 9. An efficient ASIC design of variable length discrete cosine transform for HEVC -- Conclusions and future work.The recent demand for high density video, such as ultra high definition (UHD) as well as its distribution over wired and wireless networks, led to the proposal of the latest video encoding standard, high efficiency video coding (HEVC/H.265), by the joint collaborative team on video coding (JCT-VC). HEVC/H.265 achieves a significantly better compression than its predecessor, advanced video coding (AVC/H.264), by roughly 50% for an equivalent visual reproduction quality. How- ever, the improved compression efficiency comes with a drawback, the computational complexity. Since HEVC/H.265 encoding involves enormous computations, a hardware implementation of the encoder is necessary for real-time encoding, in particular for UHD video. The most computationally intensive task in video encoding is motion estimation, which comprises up to 80% of the total time for video encoding. There have been several suggestions for motion-estimation algorithms for reducing the complexity, but many proposed for AVC/H.264 are no longer suitable for HEVC/H.265 due to the underlying coding changes and other complications. Hence, this re- search offers different algorithms and architectures for motion estimation, pro- viding a trade-off between implementation cost and performance. Hardware design is proposed for a full-search motion-estimation algorithm which always comes up with the best results. The memory requirement is reduced to a large extent together with the data bandwidth demand. Another important aspect of real-time video compression, including motion estimation, is the delay of the arithmetic computations. Residue number systems have been used for decades for improving arithmetical operations performance. However, the non-positional nature of an RNS makes it difficult to do some mathematical operations such as sign detection, but it is a vital component for designing motion estimation and other elements of a video processor. The dissertation presents a fast algorithm and its architecture for sign detection, which decreases the area-delay product by 24% compared to designs in the literature. Since the full-search algorithm searches every possible location in a search area, the algorithm involves much computation, therefore fast-search methods are preferred for low-cost solutions. The test zone (TZ) search is a fast-search algorithm and is widely used for HEVC/H.265 as it provides near optimal performance. In this dissertation, a TZ-search hardware architecture is presented, which shows 51% less gate count than existing proposals in the literature and consider- ably fewer memory requirements than most. Further improvement is achieved by developing a fast-search algorithm appropriate for hardware designs, providing an area-efficient, real-time UHD video-encoding-capable design without degradation in quality from the TZ search in HEVC reference software. An angle-restricted test zone (ARTZ) search motion estimation is also proposed for software applications exploiting directional probabilities of the search, saving about 17% to 55% of time for motion estimation compared to the TZ search. The discrete cosine transform (DCT) is a standard method in several previous codecs and it is also a key factor for compression techniques in HEVC/H.265. A variable-length two-dimensional design is proposed for HEVC/H.265, where the architecture is optimised for the most likely block sizes in UHD video, thus eliminating unnecessary complexities found in many designs, and accomplishing more than 60% savings in hardware.1 online resource (xxxii, 193 pages) illustration