User-generated content (UGC) constitutes a significant portion of global internet traffic, with billions of videos and images shared daily on social media and streaming platforms. Despite its ubiquity, UGC often suffers from diverse and complex perceptual quality issues due to distortions introduced during capture, processing, and sharing. Addressing these challenges is critical for improving user experience, enabling better content optimization, and providing tools for inclusive content creation. This dissertation focuses on three critical problems in no-reference (NR) perceptual quality assessment for UGC: video quality prediction, image quality enhancement for visually impaired users, and the quality assessment of text embedded in multimedia content. First, we tackle the challenging and unsolved problem of NR video quality assessment (VQA) for UGC. Traditional VQA models struggle to generalize to the diverse and “in-the-wild” nature of UGC. To address this gap, we developed the largest subjective video quality dataset to date, containing 38,811 real-world distorted videos, 116,433 space-time localized video patches, and 5.5 million human perceptual quality annotations. Using this dataset, we proposed two novel NR-VQA models: (a) Pathc-VQ (PVQ), a region-based architecture that captures local-to-global quality relationships, achieving state-of-the-art performance on three benchmark UGC datasets, and (b) PVQ Mapper, the first space-time video quality mapping tool that visualizes and localizes perceptual distortions. These models advance the state of the art in VQA, offering robust predictions and actionable insights into the quality of real-world UGC videos. Second, we address the unique challenges faced by visually impaired users in capturing high-quality images. This demographic often produces content exhibiting severe distortions, including blur, noise, and poor exposure, which pose significant barriers to quality assessment and actionable feedback. To address these issues, we created the LIVE-Meta VI-UGC Database, the largest dataset of its kind, comprising 40,000 distorted images, 40,000 patches, and 2.7 million human perceptual quality and distortion labels. Leveraging this dataset, we developed a blind image quality predictor that models local-to-global spatial relationships, achieving state-of-the-art prediction accuracy on VI-UGC data. Furthermore, we designed a prototype feedback system based on a multi-task learning framework, empowering visually impaired users with actionable insights to improve their photography and confidently share higher-quality content on social media. Third, we investigate the underexplored problem of assessing the quality and legibility of text embedded in UGC, particularly in short-form videos. The quality of embedded text significantly affects user comprehension and the overall perception of multimedia content, as well as applications like visual search and recognition. To advance this domain, we created two novel datasets: the LIVE-COCO Text Legibility Database, featuring 74,440 text patches with subjective legibility annotations, and the LIVE-YouTube Text-in-Video Quality Database, containing approximately 19,000 subjective quality ratings on 405 videos and 641 text patches. Using these datasets, we developed models capable of predicting both text quality and legibility. We further introduced a multi-task model that simultaneously predicts overall video quality and local text quality, addressing the interplay between text legibility and multimedia quality in UGC. Overall, this dissertation presents a comprehensive approach to improving the perceptual quality of UGC through the development of advanced datasets, innovative quality prediction models, and user-centric tools. By addressing the diverse challenges of video, image, and text quality in UGC, this work provides solutions that enhance user experience, optimize content, and support accessibility. The outcomes of this dissertation are expected to benefit applications such as quality monitoring, content creation tools, accessibility enhancements, and user guidance, ultimately improving the global experience of social media and streaming platforms.Electrical and Computer Engineerin
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.