1,751 research outputs found

    Strategies for improving efficiency and efficacy of image quality assessment algorithms

    Get PDF
    Image quality assessment (IQA) research aims to predict the qualities of images in a manner that agrees with subjective quality ratings. Over the last several decades, the major impetus in IQA research has focused on improving prediction efficacy globally (across images) of distortion-specific types or general types; very few studies have explored local image quality (within images), or IQA algorithm for improved JPEG2000 coding. Even fewer studies have focused on analyzing and improving the runtime performance of IQA algorithms. Moreover, reduced-reference (RR) IQA is also a new field to be explored, when the transmitting bandwidth is limited, side information about original image was received with distorted image at the receiver. This report explored these four topics. For local image quality, we provided a local sharpness database, and we analyzed the database along with current sharpness metrics. We revealed that human highly agreed when rating sharpness of small blocks. Overall, this sharpness database is a true representation of human subjective ratings and current sharpness algorithms could reach 0.87 in terms of SROCC score. For JPEG2000 coding using IQA, we provided a new JPEG2000 image database, which includes only same total distortion images. Analysis of existing IQA algorithms on this database revealed that even though current algorithms perform reasonably well on JPEG2000-compressed images in popular image-quality databases, they often fail to predict the correct rankings on our database's images. Based on the framework of Most Apparent Distortion (MAD), a new algorithm, MADDWT is then proposed using local DWT coefficient statistics to predict the perceived distortion due to subband quantization. MADDWT outperforms all others algorithms on this database, and shows a promising use in JPEG2000 coding. For efficiency of IQA algorithms, this paper is the first to examine IQA algorithms from the perspective of their interaction with the underlying hardware and microarchitectural resources, and to perform a systematic performance analysis using state-of-the-art tools and techniques from other computing disciplines. We implemented four popular full-reference IQA algorithms and two no-reference algorithms in C++ based on the code provided by their respective authors. Hotspot analysis and microarchitectural analysis of each algorithm were performed and compared. Despite the fact that all six algorithms share common algorithmic operations (e.g., filterbanks and statistical computations), our results revealed that different IQA algorithms overwhelm different microarchitectural resources and give rise to different types of bottlenecks. For RR IQA, we also provide a new framework based on multiscale sharpness map. This framework employs multiscale sharpness maps as reduced information. As we will demonstrate, our framework with 2% reduced information can outperform other frameworks, which employ from 2% to 3% reduced information. Our framework is also competitive to current state-of-the-art FR algorithms

    Evaluation of the color image and video processing chain and visual quality management for consumer systems

    Get PDF
    With the advent of novel digital display technologies, color processing is increasingly becoming a key aspect in consumer video applications. Today’s state-of-the-art displays require sophisticated color and image reproduction techniques in order to achieve larger screen size, higher luminance and higher resolution than ever before. However, from color science perspective, there are clearly opportunities for improvement in the color reproduction capabilities of various emerging and conventional display technologies. This research seeks to identify potential areas for improvement in color processing in a video processing chain. As part of this research, various processes involved in a typical video processing chain in consumer video applications were reviewed. Several published color and contrast enhancement algorithms were evaluated, and a novel algorithm was developed to enhance color and contrast in images and videos in an effective and coordinated manner. Further, a psychophysical technique was developed and implemented for performing visual evaluation of color image and consumer video quality. Based on the performance analysis and visual experiments involving various algorithms, guidelines were proposed for the development of an effective color and contrast enhancement method for images and video applications. It is hoped that the knowledge gained from this research will help build a better understanding of color processing and color quality management methods in consumer video

    Camera based Display Image Quality Assessment

    Get PDF
    This thesis presents the outcomes of research carried out by the PhD candidate Ping Zhao during 2012 to 2015 in Gjøvik University College. The underlying research was a part of the HyPerCept project, in the program of Strategic Projects for University Colleges, which was funded by The Research Council of Norway. The research was engaged under the supervision of Professor Jon Yngve Hardeberg and co-supervision of Associate Professor Marius Pedersen, from The Norwegian Colour and Visual Computing Laboratory, in the Faculty of Computer Science and Media Technology of Gjøvik University College; as well as the co-supervision of Associate Professor Jean-Baptiste Thomas, from The Laboratoire Electronique, Informatique et Image, in the Faculty of Computer Science of Universit´e de Bourgogne. The main goal of this research was to develop a fast and an inexpensive camera based display image quality assessment framework. Due to the limited time frame, we decided to focus only on projection displays with static images displayed on them. However, the proposed methods were not limited to projection displays, and they were expected to work with other types of displays, such as desktop monitors, laptop screens, smart phone screens, etc., with limited modifications. The primary contributions from this research can be summarized as follows: 1. We proposed a camera based display image quality assessment framework, which was originally designed for projection displays but it can be used for other types of displays with limited modifications. 2. We proposed a method to calibrate the camera in order to eliminate unwanted vignetting artifact, which is mainly introduced by the camera lens. 3. We proposed a method to optimize the camera’s exposure with respect to the measured luminance of incident light, so that after the calibration all camera sensors share a common linear response region. 4. We proposed a marker-less and view-independent method to register one captured image with its original at a sub-pixel level, so that we can incorporate existing full reference image quality metrics without modifying them. 5. We identified spatial uniformity, contrast and sharpness as the most important image quality attributes for projection displays, and we used the proposed framework to evaluate the prediction performance of the state-of-the-art image quality metrics regarding these attributes. The proposed image quality assessment framework is the core contribution of this research. Comparing to conventional image quality assessment approaches, which were largely based on the measurements of colorimeter or spectroradiometer, using camera as the acquisition device has the advantages of quickly recording all displayed pixels in one shot, relatively inexpensive to purchase the instrument. Therefore, the consumption of time and resources for image quality assessment can be largely reduced. We proposed a method to calibrate the camera in order to eliminate unwanted vignetting artifact primarily introduced by the camera lens. We used a hazy sky as a closely uniform light source, and the vignetting mask was generated with respect to the median sensor responses over i only a few rotated shots of the same spot on the sky. We also proposed a method to quickly determine whether all camera sensors were sharing a common linear response region. In order to incorporate existing full reference image quality metrics without modifying them, an accurate registration of pairs of pixels between one captured image and its original is required. We proposed a marker-less and view-independent image registration method to solve this problem. The experimental results proved that the proposed method worked well in the viewing conditions with a low ambient light. We further identified spatial uniformity, contrast and sharpness as the most important image quality attributes for projection displays. Subsequently, we used the developed framework to objectively evaluate the prediction performance of the state-of-art image quality metrics regarding these attributes in a robust manner. In this process, the metrics were benchmarked with respect to the correlations between the prediction results and the perceptual ratings collected from subjective experiments. The analysis of the experimental results indicated that our proposed methods were effective and efficient. Subjective experiment is an essential component for image quality assessment; however it can be time and resource consuming, especially in the cases that additional image distortion levels are required to extend the existing subjective experimental results. For this reason, we investigated the possibility of extending subjective experiments with baseline adjustment method, and we found that the method could work well if appropriate strategies were applied. The underlying strategies referred to the best distortion levels to be included in the baseline, as well as the number of them

    Disentangling Image Distortions in Deep Feature Space

    Full text link
    Previous literature suggests that perceptual similarity is an emergent property shared across deep visual representations. Experiments conducted on a dataset of human-judged image distortions have proven that deep features outperform classic perceptual metrics. In this work we take a further step in the direction of a broader understanding of such property by analyzing the capability of deep visual representations to intrinsically characterize different types of image distortions. To this end, we firstly generate a number of synthetically distorted images and then we analyze the features extracted by different layers of different Deep Neural Networks. We observe that a dimension-reduced representation of the features extracted from a given layer permits to efficiently separate types of distortions in the feature space. Moreover, each network layer exhibits a different ability to separate between different types of distortions, and this ability varies according to the network architecture. Finally, we evaluate the exploitation of features taken from the layer that better separates image distortions for: i) reduced-reference image quality assessment, and ii) distortion types and severity levels characterization on both single and multiple distortion databases. Results achieved on both tasks suggest that deep visual representations can be unsupervisedly employed to efficiently characterize various image distortions

    Media aesthetics based multimedia storytelling.

    Get PDF
    Since the earliest of times, humans have been interested in recording their life experiences, for future reference and for storytelling purposes. This task of recording experiences --i.e., both image and video capture-- has never before in history been as easy as it is today. This is creating a digital information overload that is becoming a great concern for the people that are trying to preserve their life experiences. As high-resolution digital still and video cameras become increasingly pervasive, unprecedented amounts of multimedia, are being downloaded to personal hard drives, and also uploaded to online social networks on a daily basis. The work presented in this dissertation is a contribution in the area of multimedia organization, as well as automatic selection of media for storytelling purposes, which eases the human task of summarizing a collection of images or videos in order to be shared with other people. As opposed to some prior art in this area, we have taken an approach in which neither user generated tags nor comments --that describe the photographs, either in their local or on-line repositories-- are taken into account, and also no user interaction with the algorithms is expected. We take an image analysis approach where both the context images --e.g. images from online social networks to which the image stories are going to be uploaded--, and the collection images --i.e., the collection of images or videos that needs to be summarized into a story--, are analyzed using image processing algorithms. This allows us to extract relevant metadata that can be used in the summarization process. Multimedia-storytellers usually follow three main steps when preparing their stories: first they choose the main story characters, the main events to describe, and finally from these media sub-groups, they choose the media based on their relevance to the story as well as based on their aesthetic value. Therefore, one of the main contributions of our work has been the design of computational models --both regression based, as well as classification based-- that correlate well with human perception of the aesthetic value of images and videos. These computational aesthetics models have been integrated into automatic selection algorithms for multimedia storytelling, which are another important contribution of our work. A human centric approach has been used in all experiments where it was feasible, and also in order to assess the final summarization results, i.e., humans are always the final judges of our algorithms, either by inspecting the aesthetic quality of the media, or by inspecting the final story generated by our algorithms. We are aware that a perfect automatically generated story summary is very hard to obtain, given the many subjective factors that play a role in such a creative process; rather, the presented approach should be seen as a first step in the storytelling creative process which removes some of the ground work that would be tedious and time consuming for the user. Overall, the main contributions of this work can be capitalized in three: (1) new media aesthetics models for both images and videos that correlate with human perception, (2) new scalable multimedia collection structures that ease the process of media summarization, and finally, (3) new media selection algorithms that are optimized for multimedia storytelling purposes.Postprint (published version

    Modeling and applications of the focus cue in conventional digital cameras

    Get PDF
    El enfoque en cámaras digitales juega un papel fundamental tanto en la calidad de la imagen como en la percepción del entorno. Esta tesis estudia el enfoque en cámaras digitales convencionales, tales como cámaras de móviles, fotográficas, webcams y similares. Una revisión rigurosa de los conceptos teóricos detras del enfoque en cámaras convencionales muestra que, a pasar de su utilidad, el modelo clásico del thin lens presenta muchas limitaciones para aplicación en diferentes problemas relacionados con el foco. En esta tesis, el focus profile es propuesto como una alternativa a conceptos clásicos como la profundidad de campo. Los nuevos conceptos introducidos en esta tesis son aplicados a diferentes problemas relacionados con el foco, tales como la adquisición eficiente de imágenes, estimación de profundidad, integración de elementos perceptuales y fusión de imágenes. Los resultados experimentales muestran la aplicación exitosa de los modelos propuestos.The focus of digital cameras plays a fundamental role in both the quality of the acquired images and the perception of the imaged scene. This thesis studies the focus cue in conventional cameras with focus control, such as cellphone cameras, photography cameras, webcams and the like. A deep review of the theoretical concepts behind focus in conventional cameras reveals that, despite its usefulness, the widely known thin lens model has several limitations for solving different focus-related problems in computer vision. In order to overcome these limitations, the focus profile model is introduced as an alternative to classic concepts, such as the near and far limits of the depth-of-field. The new concepts introduced in this dissertation are exploited for solving diverse focus-related problems, such as efficient image capture, depth estimation, visual cue integration and image fusion. The results obtained through an exhaustive experimental validation demonstrate the applicability of the proposed models

    Image quality assessment : utility, beauty, appearance

    Get PDF

    Video enhancement : content classification and model selection

    Get PDF
    The purpose of video enhancement is to improve the subjective picture quality. The field of video enhancement includes a broad category of research topics, such as removing noise in the video, highlighting some specified features and improving the appearance or visibility of the video content. The common difficulty in this field is how to make images or videos more beautiful, or subjectively better. Traditional approaches involve lots of iterations between subjective assessment experiments and redesigns of algorithm improvements, which are very time consuming. Researchers have attempted to design a video quality metric to replace the subjective assessment, but so far it is not successful. As a way to avoid heuristics in the enhancement algorithm design, least mean square methods have received considerable attention. They can optimize filter coefficients automatically by minimizing the difference between processed videos and desired versions through a training. However, these methods are only optimal on average but not locally. To solve the problem, one can apply the least mean square optimization for individual categories that are classified by local image content. The most interesting example is Kondo’s concept of local content adaptivity for image interpolation, which we found could be generalized into an ideal framework for content adaptive video processing. We identify two parts in the concept, content classification and adaptive processing. By exploring new classifiers for the content classification and new models for the adaptive processing, we have generalized a framework for more enhancement applications. For the part of content classification, new classifiers have been proposed to classify different image degradations such as coding artifacts and focal blur. For the coding artifact, a novel classifier has been proposed based on the combination of local structure and contrast, which does not require coding block grid detection. For the focal blur, we have proposed a novel local blur estimation method based on edges, which does not require edge orientation detection and shows more robust blur estimation. With these classifiers, the proposed framework has been extended to coding artifact robust enhancement and blur dependant enhancement. With the content adaptivity to more image features, the number of content classes can increase significantly. We show that it is possible to reduce the number of classes without sacrificing much performance. For the part of model selection, we have introduced several nonlinear filters to the proposed framework. We have also proposed a new type of nonlinear filter, trained bilateral filter, which combines both advantages of the original bilateral filter and the least mean square optimization. With these nonlinear filters, the proposed framework show better performance than with linear filters. Furthermore, we have shown a proof-of-concept for a trained approach to obtain contrast enhancement by a supervised learning. The transfer curves are optimized based on the classification of global or local image content. It showed that it is possible to obtain the desired effect by learning from other computationally expensive enhancement algorithms or expert-tuned examples through the trained approach. Looking back, the thesis reveals a single versatile framework for video enhancement applications. It widens the application scope by including new content classifiers and new processing models and offers scalabilities with solutions to reduce the number of classes, which can greatly accelerate the algorithm design

    Resilient Perception for Outdoor Unmanned Ground Vehicles

    Get PDF
    This thesis promotes the development of resilience for perception systems with a focus on Unmanned Ground Vehicles (UGVs) in adverse environmental conditions. Perception is the interpretation of sensor data to produce a representation of the environment that is necessary for subsequent decision making. Long-term autonomy requires perception systems that correctly function in unusual but realistic conditions that will eventually occur during extended missions. State-of-the-art UGV systems can fail when the sensor data are beyond the operational capacity of the perception models. The key to resilient perception system lies in the use of multiple sensor modalities and the pre-selection of appropriate sensor data to minimise the chance of failure. This thesis proposes a framework based on diagnostic principles to evaluate and preselect sensor data prior to interpretation by the perception system. Image-based quality metrics are explored and evaluated experimentally using infrared (IR) and visual cameras onboard a UGV in the presence of smoke and airborne dust. A novel quality metric, Spatial Entropy (SE), is introduced and evaluated. The proposed framework is applied to a state-of-the-art Visual-SLAM algorithm combining visual and IR imaging as a real-world example. An extensive experimental evaluation demonstrates that the framework allows for camera-based localisation that is resilient to a range of low-visibility conditions when compared to other methods that use a single sensor or combine sensor data without selection. The proposed framework allows for a resilient localisation in adverse conditions using image data but also has significant potential to benefit many perception applications. Employing multiple sensing modalities along with pre-selection of appropriate data is a powerful method to create resilient perception systems by anticipating and mitigating errors. The development of such resilient perception systems is a requirement for next-generation outdoor UGVs
    corecore