33 research outputs found

    Image analysis using visual saliency with applications in hazmat sign detection and recognition

    Get PDF
    Visual saliency is the perceptual process that makes attractive objects stand out from their surroundings in the low-level human visual system. Visual saliency has been modeled as a preprocessing step of the human visual system for selecting the important visual information from a scene. We investigate bottom-up visual saliency using spectral analysis approaches. We present separate and composite model families that generalize existing frequency domain visual saliency models. We propose several frequency domain visual saliency models to generate saliency maps using new spectrum processing methods and an entropy-based saliency map selection approach. A group of saliency map candidates are then obtained by inverse transform. A final saliency map is selected among the candidates by minimizing the entropy of the saliency map candidates. The proposed models based on the separate and composite model families are also extended to various color spaces. We develop an evaluation tool for benchmarking visual saliency models. Experimental results show that the proposed models are more accurate and efficient than most state-of-the-art visual saliency models in predicting eye fixation.^ We use the above visual saliency models to detect the location of hazardous material (hazmat) signs in complex scenes. We develop a hazmat sign location detection and content recognition system using visual saliency. Saliency maps are employed to extract salient regions that are likely to contain hazmat sign candidates and then use a Fourier descriptor based contour matching method to locate the border of hazmat signs in these regions. This visual saliency based approach is able to increase the accuracy of sign location detection, reduce the number of false positive objects, and speed up the overall image analysis process. We also propose a color recognition method to interpret the color inside the detected hazmat sign. Experimental results show that our proposed hazmat sign location detection method is capable of detecting and recognizing projective distorted, blurred, and shaded hazmat signs at various distances.^ In other work we investigate error concealment for scalable video coding (SVC). When video compressed with SVC is transmitted over loss-prone networks, the decompressed video can suffer severe visual degradation across multiple frames. In order to enhance the visual quality, we propose an inter-layer error concealment method using motion vector averaging and slice interleaving to deal with burst packet losses and error propagation. Experimental results show that the proposed error concealment methods outperform two existing methods

    A Comparison Study of Saliency Models for Fixation Prediction on Infants and Adults

    Get PDF
    Various saliency models have been developed over the years. The performance of saliency models is typically evaluated based on databases of experimentally recorded adult eye fixations. Although studies on infant gaze patterns have attracted much attention recently, saliency based models have not been widely applied for prediction of infant gaze patterns. In this study, we conduct a comprehensive comparison study of eight state-ofthe- art saliency models on predictions of experimentally captured fixations from infants and adults. Seven evaluation metrics are used to evaluate and compare the performance of saliency models. The results demonstrate a consistent performance of saliency models predicting adult fixations over infant fixations in terms of overlap, center fitting, intersection, information loss of approximation, and spatial distance between the distributions of saliency map and fixation map. In saliency and baselines models performance ranking, the results show that GBVS and Itti models are among the top three contenders, infants and adults have bias toward the centers of images, and all models and the center baseline model outperformed the chance baseline model

    Multimodal Computational Attention for Scene Understanding

    Get PDF
    Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

    VISUAL SALIENCY ANALYSIS, PREDICTION, AND VISUALIZATION: A DEEP LEARNING PERSPECTIVE

    Get PDF
    In the recent years, a huge success has been accomplished in prediction of human eye fixations. Several studies employed deep learning to achieve high accuracy of prediction of human eye fixations. These studies rely on pre-trained deep learning for object classification. They exploit deep learning either as a transfer-learning problem, or the weights of the pre-trained network as the initialization to learn a saliency model. The utilization of such pre-trained neural networks is due to the relatively small datasets of human fixations available to train a deep learning model. Another relatively less prioritized problem is amount of computation of such deep learning models requires expensive hardware. In this dissertation, two approaches are proposed to tackle abovementioned problems. The first approach, codenamed DeepFeat, incorporates the deep features of convolutional neural networks pre-trained for object and scene classifications. This approach is the first approach that uses deep features without further learning. Performance of the DeepFeat model is extensively evaluated over a variety of datasets using a variety of implementations. The second approach is a deep learning saliency model, codenamed ClassNet. Two main differences separate the ClassNet from other deep learning saliency models. The ClassNet model is the only deep learning saliency model that learns its weights from scratch. In addition, the ClassNet saliency model treats prediction of human fixation as a classification problem, while other deep learning saliency models treat the human fixation prediction as a regression problem or as a classification of a regression problem

    Biologically motivated keypoint detection for RGB-D data

    Get PDF
    With the emerging interest in active vision, computer vision researchers have been increasingly concerned with the mechanisms of attention. Therefore, several visual attention computational models inspired by the human visual system, have been developed, aiming at the detection of regions of interest in images. This thesis is focused on selective visual attention, which provides a mechanism for the brain to focus computational resources on an object at a time, guided by low-level image properties (Bottom-Up attention). The task of recognizing objects in different locations is achieved by focusing on different locations, one at a time. Given the computational requirements of the models proposed, the research in this area has been mainly of theoretical interest. More recently, psychologists, neurobiologists and engineers have developed cooperation's and this has resulted in considerable benefits. The first objective of this doctoral work is to bring together concepts and ideas from these different research areas, providing a study of the biological research on human visual system and a discussion of the interdisciplinary knowledge in this area, as well as the state-of-art on computational models of visual attention (bottom-up). Normally, the visual attention is referred by engineers as saliency: when people fix their look in a particular region of the image, that's because that region is salient. In this research work, saliency methods are presented based on their classification (biological plausible, computational or hybrid) and in a chronological order. A few salient structures can be used for applications like object registration, retrieval or data simplification, being possible to consider these few salient structures as keypoints when aiming at performing object recognition. Generally, object recognition algorithms use a large number of descriptors extracted in a dense set of points, which comes along with very high computational cost, preventing real-time processing. To avoid the problem of the computational complexity required, the features have to be extracted from a small set of points, usually called keypoints. The use of keypoint-based detectors allows the reduction of the processing time and the redundancy in the data. Local descriptors extracted from images have been extensively reported in the computer vision literature. Since there is a large set of keypoint detectors, this suggests the need of a comparative evaluation between them. In this way, we propose to do a description of 2D and 3D keypoint detectors, 3D descriptors and an evaluation of existing 3D keypoint detectors in a public available point cloud library with 3D real objects. The invariance of the 3D keypoint detectors was evaluated according to rotations, scale changes and translations. This evaluation reports the robustness of a particular detector for changes of point-of-view and the criteria used are the absolute and the relative repeatability rate. In our experiments, the method that achieved better repeatability rate was the ISS3D method. The analysis of the human visual system and saliency maps detectors with biological inspiration led to the idea of making an extension for a keypoint detector based on the color information in the retina. Such proposal produced a 2D keypoint detector inspired by the behavior of the early visual system. Our method is a color extension of the BIMP keypoint detector, where we include both color and intensity channels of an image: color information is included in a biological plausible way and multi-scale image features are combined into a single keypoints map. This detector is compared against state-of-art detectors and found particularly well-suited for tasks such as category and object recognition. The recognition process is performed by comparing the extracted 3D descriptors in the locations indicated by the keypoints after mapping the 2D keypoints locations to the 3D space. The evaluation allowed us to obtain the best pair keypoint detector/descriptor on a RGB-D object dataset. Using our keypoint detector and the SHOTCOLOR descriptor a good category recognition rate and object recognition rate were obtained, and it is with the PFHRGB descriptor that we obtain the best results. A 3D recognition system involves the choice of keypoint detector and descriptor. A new method for the detection of 3D keypoints on point clouds is presented and a benchmarking is performed between each pair of 3D keypoint detector and 3D descriptor to evaluate their performance on object and category recognition. These evaluations are done in a public database of real 3D objects. Our keypoint detector is inspired by the behavior and neural architecture of the primate visual system: the 3D keypoints are extracted based on a bottom-up 3D saliency map, which is a map that encodes the saliency of objects in the visual environment. The saliency map is determined by computing conspicuity maps (a combination across different modalities) of the orientation, intensity and color information, in a bottom-up and in a purely stimulusdriven manner. These three conspicuity maps are fused into a 3D saliency map and, finally, the focus of attention (or "keypoint location") is sequentially directed to the most salient points in this map. Inhibiting this location automatically allows the system to attend to the next most salient location. The main conclusions are: with a similar average number of keypoints, our 3D keypoint detector outperforms the other eight 3D keypoint detectors evaluated by achiving the best result in 32 of the evaluated metrics in the category and object recognition experiments, when the second best detector only obtained the best result in 8 of these metrics. The unique drawback is the computational time, since BIK-BUS is slower than the other detectors. Given that differences are big in terms of recognition performance, size and time requirements, the selection of the keypoint detector and descriptor has to be matched to the desired task and we give some directions to facilitate this choice. After proposing the 3D keypoint detector, the research focused on a robust detection and tracking method for 3D objects by using keypoint information in a particle filter. This method consists of three distinct steps: Segmentation, Tracking Initialization and Tracking. The segmentation is made to remove all the background information, reducing the number of points for further processing. In the initialization, we use a keypoint detector with biological inspiration. The information of the object that we want to follow is given by the extracted keypoints. The particle filter does the tracking of the keypoints, so with that we can predict where the keypoints will be in the next frame. In a recognition system, one of the problems is the computational cost of keypoint detectors with this we intend to solve this problem. The experiments with PFBIKTracking method are done indoors in an office/home environment, where personal robots are expected to operate. The Tracking Error evaluates the stability of the general tracking method. We also quantitatively evaluate this method using a "Tracking Error". Our evaluation is done by the computation of the keypoint and particle centroid. Comparing our system that the tracking method which exists in the Point Cloud Library, we archive better results, with a much smaller number of points and computational time. Our method is faster and more robust to occlusion when compared to the OpenniTracker.Com o interesse emergente na visão ativa, os investigadores de visão computacional têm estado cada vez mais preocupados com os mecanismos de atenção. Por isso, uma série de modelos computacionais de atenção visual, inspirado no sistema visual humano, têm sido desenvolvidos. Esses modelos têm como objetivo detetar regiões de interesse nas imagens. Esta tese está focada na atenção visual seletiva, que fornece um mecanismo para que o cérebro concentre os recursos computacionais num objeto de cada vez, guiado pelas propriedades de baixo nível da imagem (atenção Bottom-Up). A tarefa de reconhecimento de objetos em diferentes locais é conseguida através da concentração em diferentes locais, um de cada vez. Dados os requisitos computacionais dos modelos propostos, a investigação nesta área tem sido principalmente de interesse teórico. Mais recentemente, psicólogos, neurobiólogos e engenheiros desenvolveram cooperações e isso resultou em benefícios consideráveis. No início deste trabalho, o objetivo é reunir os conceitos e ideias a partir dessas diferentes áreas de investigação. Desta forma, é fornecido o estudo sobre a investigação da biologia do sistema visual humano e uma discussão sobre o conhecimento interdisciplinar da matéria, bem como um estado de arte dos modelos computacionais de atenção visual (bottom-up). Normalmente, a atenção visual é denominada pelos engenheiros como saliência, se as pessoas fixam o olhar numa determinada região da imagem é porque esta região é saliente. Neste trabalho de investigação, os métodos saliência são apresentados em função da sua classificação (biologicamente plausível, computacional ou híbrido) e numa ordem cronológica. Algumas estruturas salientes podem ser usadas, em vez do objeto todo, em aplicações tais como registo de objetos, recuperação ou simplificação de dados. É possível considerar estas poucas estruturas salientes como pontos-chave, com o objetivo de executar o reconhecimento de objetos. De um modo geral, os algoritmos de reconhecimento de objetos utilizam um grande número de descritores extraídos num denso conjunto de pontos. Com isso, estes têm um custo computacional muito elevado, impedindo que o processamento seja realizado em tempo real. A fim de evitar o problema da complexidade computacional requerido, as características devem ser extraídas a partir de um pequeno conjunto de pontos, geralmente chamados pontoschave. O uso de detetores de pontos-chave permite a redução do tempo de processamento e a quantidade de redundância dos dados. Os descritores locais extraídos a partir das imagens têm sido amplamente reportados na literatura de visão por computador. Uma vez que existe um grande conjunto de detetores de pontos-chave, sugere a necessidade de uma avaliação comparativa entre eles. Desta forma, propomos a fazer uma descrição dos detetores de pontos-chave 2D e 3D, dos descritores 3D e uma avaliação dos detetores de pontos-chave 3D existentes numa biblioteca de pública disponível e com objetos 3D reais. A invariância dos detetores de pontoschave 3D foi avaliada de acordo com variações nas rotações, mudanças de escala e translações. Essa avaliação retrata a robustez de um determinado detetor no que diz respeito às mudanças de ponto-de-vista e os critérios utilizados são as taxas de repetibilidade absoluta e relativa. Nas experiências realizadas, o método que apresentou melhor taxa de repetibilidade foi o método ISS3D. Com a análise do sistema visual humano e dos detetores de mapas de saliência com inspiração biológica, surgiu a ideia de se fazer uma extensão para um detetor de ponto-chave com base na informação de cor na retina. A proposta produziu um detetor de ponto-chave 2D inspirado pelo comportamento do sistema visual. O nosso método é uma extensão com base na cor do detetor de ponto-chave BIMP, onde se incluem os canais de cor e de intensidade de uma imagem. A informação de cor é incluída de forma biológica plausível e as características multi-escala da imagem são combinadas num único mapas de pontos-chave. Este detetor é comparado com os detetores de estado-da-arte e é particularmente adequado para tarefas como o reconhecimento de categorias e de objetos. O processo de reconhecimento é realizado comparando os descritores 3D extraídos nos locais indicados pelos pontos-chave. Para isso, as localizações do pontos-chave 2D têm de ser convertido para o espaço 3D. Isto foi possível porque o conjunto de dados usado contém a localização de cada ponto de no espaço 2D e 3D. A avaliação permitiu-nos obter o melhor par detetor de ponto-chave/descritor num RGB-D object dataset. Usando o nosso detetor de ponto-chave e o descritor SHOTCOLOR, obtemos uma noa taxa de reconhecimento de categorias e para o reconhecimento de objetos é com o descritor PFHRGB que obtemos os melhores resultados. Um sistema de reconhecimento 3D envolve a escolha de detetor de ponto-chave e descritor, por isso é apresentado um novo método para a deteção de pontos-chave em nuvens de pontos 3D e uma análise comparativa é realizada entre cada par de detetor de ponto-chave 3D e descritor 3D para avaliar o desempenho no reconhecimento de categorias e de objetos. Estas avaliações são feitas numa base de dados pública de objetos 3D reais. O nosso detetor de ponto-chave é inspirado no comportamento e na arquitetura neural do sistema visual dos primatas. Os pontos-chave 3D são extraídas com base num mapa de saliências 3D bottom-up, ou seja, um mapa que codifica a saliência dos objetos no ambiente visual. O mapa de saliência é determinada pelo cálculo dos mapas de conspicuidade (uma combinação entre diferentes modalidades) da orientação, intensidade e informações de cor de forma bottom-up e puramente orientada para o estímulo. Estes três mapas de conspicuidade são fundidos num mapa de saliência 3D e, finalmente, o foco de atenção (ou "localização do ponto-chave") está sequencialmente direcionado para os pontos mais salientes deste mapa. Inibir este local permite que o sistema automaticamente orientado para próximo local mais saliente. As principais conclusões são: com um número médio similar de pontos-chave, o nosso detetor de ponto-chave 3D supera os outros oito detetores de pontos-chave 3D avaliados, obtendo o melhor resultado em 32 das métricas avaliadas nas experiências do reconhecimento das categorias e dos objetos, quando o segundo melhor detetor obteve apenas o melhor resultado em 8 dessas métricas. A única desvantagem é o tempo computacional, uma vez que BIK-BUS é mais lento do que os outros detetores. Dado que existem grandes diferenças em termos de desempenho no reconhecimento, de tamanho e de tempo, a seleção do detetor de ponto-chave e descritor tem de ser interligada com a tarefa desejada e nós damos algumas orientações para facilitar esta escolha neste trabalho de investigação. Depois de propor um detetor de ponto-chave 3D, a investigação incidiu sobre um método robusto de deteção e tracking de objetos 3D usando as informações dos pontos-chave num filtro de partículas. Este método consiste em três etapas distintas: Segmentação, Inicialização do Tracking e Tracking. A segmentação é feita de modo a remover toda a informação de fundo, a fim de reduzir o número de pontos para processamento futuro. Na inicialização, usamos um detetor de ponto-chave com inspiração biológica. A informação do objeto que queremos seguir é dada pelos pontos-chave extraídos. O filtro de partículas faz o acompanhamento dos pontoschave, de modo a se poder prever onde os pontos-chave estarão no próximo frame. As experiências com método PFBIK-Tracking são feitas no interior, num ambiente de escritório/casa, onde se espera que robôs pessoais possam operar. Também avaliado quantitativamente este método utilizando um "Tracking Error". A avaliação passa pelo cálculo das centróides dos pontos-chave e das partículas. Comparando o nosso sistema com o método de tracking que existe na biblioteca usada no desenvolvimento, nós obtemos melhores resultados, com um número muito menor de pontos e custo computacional. O nosso método é mais rápido e mais robusto em termos de oclusão, quando comparado com o OpenniTracker

    Perceptual quality assessment and processing for visual signals.

    Get PDF
    視覺信號,包括圖像,視頻等,在采集,壓縮,存儲,傳輸,重新生成的過程中都會被各種各樣的噪聲所影響,因此他們的主觀質量也就會降低。所以,主觀視覺質量在現今的視覺信號處理跟通訊系統中起到了很大的作用。這篇畢業論文主要討論質量評價的算法設計,以及這些衡量標準在視覺信號處理上的應用。這篇論文的工作主要包括以下五個方面。第一部分主要集中在具有完全套考原始圖像的圖像質量評價。首先我們研究人類視覺系統的特征。具體說來,視覺在結構化失真上面的水平特性和顯著特征會被建模然后應用到結構相似度(SSIM)這個衡量標準上。實驗顯示我們的方法明顯的提高了衡量標準典主觀評價的相似度。由這個質量衡量標準的啟發,我們設計了一個主觀圖像壓縮的方法。其中我們提出了一個自適應的塊大小的超分辨率算法指導的下采樣的算法。實驗結果證明提出的圖像壓縮算法無論在主觀還是在客觀層面都構建了高質量的圖像。第二個部分的工作主要討論具有完全參考原始視頻的視頻質量評價。考慮到人類視覺系統的特征,比如時空域的對此敏感函數,眼球的移動,紋理的遮掩特性,空間域的一致性,時間域的協調性,不同塊變換的特性,我們設計了一個自適應塊大小的失真閾值的模型。實驗證明,我們提出的失真閾值模型能夠更精確的描迷人類視覺系統的特性。基于這個自適應塊大小的失真閾值模型,我們設計了一個簡單的主觀質量評價標準。在公共的圓像以及視頻的主觀數據庫上的測試結果證明了這個簡單的評價標準的有效性。因此,我們把這個簡單的質量標準應用于視頻編碼系統中。它可以在同樣的碼率下提供更高主觀質量的視頻。第三部分我們討論具有部分參考信息的圖像質量評價。我們通過描迷重組后的離散余弦變換域的系數的統計分布來衡量圖像的主觀質量。提出的評價標準發掘了相鄰的離散余弦系數的相同統計特性,相鄰的重組離散余弦系數的互信息,以及圖像的能量在不同頻率下的分布。實驗結果證明我們提出的質量標準河以超越其他的具有部分參考信息的質量評價標準,甚至還超過了具有完全參考信息的質量評價標準。而且,提取的特征很容易被編碼以及隱藏到圖像中以便于在圖像通訊中進行質量監控。第四部分我們討論具有部分參考信息的視頻質量評價。我們提取的特征可以很好的描迷空間域的信息失,和時間域的相鄰兩幀間的直方圖的統計特性。在視頻主觀質量的數據庫上的實驗結果,也證明了提出的方法河以超越其他代表性的視頻質量評價標準,甚至是具有完全參考信息的質量評價標準, 譬如PSNR以及SSIM 。我們的方法只需要很少的特征來描迷每一幀視頻圖像。對于每一幀圖像,一個特征用于描迷空間域的特點,另外三個特征用于描述時間域的特點。考慮到計算的復雜度以及壓縮特征所需要的碼率,提出的方法河以很簡單的在視頻的傳輸過程中監控視頻的質量。之前的四部分提到的主觀質量評價標準主要集中在傳統的失真上面, 譬如JPEG 圖像壓縮, H.264視頻壓縮。在最后一部分,我們討論在圖像跟視頻的retargeting過程中的失真。現如今,隨著消費者電子的發展,視覺信號需要在不同分辨率的顯示設備上進行通訊交互。因此, retargeting的算法把同一個原始圖像適應于不同的分辨率的顯示設備。這樣的過程就會引入圖像的失真。我們研究了對于retargeting圖像主觀質量的測試者的分數,從三個方面進行討論測試者對于retargeting圖像失真的反應.圖像retargeting的尺度,圖像retargeting的算法,原始圖像的內容特性。通過大量的主觀實驗測試,我們構建了一個關于圖像retargeting的主觀數據庫。基于這個主觀數據庫,我們評價以及分析了幾個具有代表性的質量評價標準。Visual signals, including images, videos, etc., are affected by a wide variety of distortions during acquisition, compression, storage, processing, transmission, and reproduction processes, which result in perceptual quality degradation. As a result, perceptual quality assessment plays a very important role in today's visual signal processing and communication systems. In this thesis, quality assessment algorithms for evaluating the visual signal perceptual quality, as well as the applications on visual signal processing and communications, are investigated. The work consists of five parts as briefly summarized below.The first part focuses on the full-reference (FR) image quality assessment. The properties of the human visual system (HVS) are firstly investigated. Specifically, the visual horizontal effect (HE) and saliency properties over the structural distortions are modelled and incorporated into the structure similarity index (SSIM). Experimental results show significantly improved performance in matching the subjective ratings. Inspired by the developed FR image metric, a perceptual image compression scheme is developed, where the adaptive block-based super-resolution directed down-sampling is proposed. Experimental results demonstrated that the proposed image compression scheme can produce higher quality images in terms of both objective and subjective qualities, compared with the existing methods.The second part concerns the FR video quality assessment. The adaptive block-size transform (ABT) based just-noticeable difference (JND) for visual signals is investigated by considering the HVS characteristics, e.g., spatio-temporal contrast sensitivity function (CSF), eye movement, texture masking, spatial coherence, temporal consistency, properties of different block-size transforms, etc. It is verified that the developed ABT based JND can more accurately depict the HVS property, compared with the state-of-the-art JND models. The ABT based JND is thereby utilized to develop a simple perceptual quality metric for visual signals. Validations on the image and video subjective quality databases proved its effectiveness. As a result, the developed perceptual quality metric is employed for perceptual video coding, which can deliver video sequences of higher perceptual quality at the same bit-rates.The third part discusses the reduced-reference (RR) image quality assessment, which is developed by statistically modelling the coe cient distribution in the reorganized discrete cosine transform (RDCT) domain. The proposed RR metric exploits the identical statistical nature of the adjacent DCT coefficients, the mutual information (MI) relationship between adjacent RDCT coefficients, and the image energy distribution among different frequency components. Experimental results demonstrate that the proposed metric outperforms the representative RR image quality metrics, and even the FR quality metric, i.e., peak signal to noise ratio (PSNR). Furthermore, the extracted RR features can be easily encoded and embedded into the distorted images for quality monitoring during image communications.The fourth part investigates the RR video quality assessment. The RR features are extracted to exploit the spatial information loss and the temporal statistical characteristics of the inter-frame histogram. Evaluations on the video subjective quality databases demonstrate that the proposed method outperforms the representative RR video quality metrics, and even the FR metrics, such as PSNR, SSIM in matching the subjective ratings. Furthermore, only a small number of RR features is required to represent the original video sequence (each frame requires only 1 and 3 parameters to depict the spatial and temporal characteristics, respectively). By considering the computational complexity and the bit-rates for extracting and representing the RR features, the proposed RR quality metric can be utilized for quality monitoring during video transmissions, where the RR features for perceptual quality analysis can be easily embedded into the videos or transmitted through an ancillary data channel.The aforementioned perceptual quality metrics focus on the traditional distortions, such as JPEG image compression noise, H.264 video compression noise, and so on. In the last part, we investigate the distortions introduced during the image and video retargeting process. Nowadays, with the development of the consumer electronics, more and more visual signals have to communicate between different display devices of different resolutions. The retargeting algorithm is employed to adapt a source image of one resolution to be displayed in a device of a different resolution, which may introduce distortions during the retargeting process. We investigate the subjective responses on the perceptual qualities of the retargeted images, and discuss the subjective results from three perspectives, i.e., retargeting scales, retargeting methods, and source image content attributes. An image retargeting subjective quality database is built by performing a large-scale subjective study of image retargeting quality on a collection of retargeted images. Based on the built database, several representative quality metrics for retargeted images are evaluated and discussed.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Ma, Lin."December 2012."Thesis (Ph.D.)--Chinese University of Hong Kong, 2013.Includes bibliographical references (leaves 185-197).Abstract also in Chinese.Dedication --- p.iiAcknowledgments --- p.iiiAbstract --- p.viiiPublications --- p.xiNomenclature --- p.xviiContents --- p.xxivList of Figures --- p.xxviiiList of Tables --- p.xxxChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation and Objectives --- p.1Chapter 1.2 --- Subjective Perceptual Quality Assessment --- p.5Chapter 1.3 --- Objective Perceptual Quality Assessment --- p.10Chapter 1.3.1 --- Visual Modelling Approach --- p.10Chapter 1.3.2 --- Engineering Modelling Approach --- p.15Chapter 1.3.3 --- Perceptual Subjective Quality Databases --- p.19Chapter 1.3.4 --- Performance Evaluation --- p.21Chapter 1.4 --- Thesis Contributions --- p.22Chapter 1.5 --- Organization of the Thesis --- p.24Chapter I --- Full Reference Quality Assessment --- p.26Chapter 2 --- Full Reference Image Quality Assessment --- p.27Chapter 2.1 --- Visual Horizontal Effect for Image Quality Assessment --- p.27Chapter 2.1.1 --- Introduction --- p.27Chapter 2.1.2 --- Proposed Image Quality Assessment Framework --- p.28Chapter 2.1.3 --- Experimental Results --- p.34Chapter 2.1.4 --- Conclusion --- p.36Chapter 2.2 --- Image Compression via Adaptive Block-Based Super-Resolution Directed Down-Sampling --- p.37Chapter 2.2.1 --- Introduction --- p.37Chapter 2.2.2 --- The Proposed Image Compression Framework --- p.38Chapter 2.2.3 --- Experimental Results --- p.42Chapter 2.2.4 --- Conclusion --- p.45Chapter 3 --- Full Reference Video Quality Assessment --- p.46Chapter 3.1 --- Adaptive Block-size Transform based Just-Noticeable Dfference Model for Visual Signals --- p.46Chapter 3.1.1 --- Introduction --- p.46Chapter 3.1.2 --- JND Model based on Transforms of Different Block Sizes --- p.48Chapter 3.1.3 --- Selection Strategy Between Transforms of Different Block Sizes --- p.53Chapter 3.1.4 --- JND Model Evaluation --- p.56Chapter 3.1.5 --- Conclusion --- p.60Chapter 3.2 --- Perceptual Quality Assessment --- p.60Chapter 3.2.1 --- Experimental Results --- p.62Chapter 3.2.2 --- Conclusion --- p.64Chapter 3.3 --- Motion Trajectory Based Visual Saliency for Video Quality Assessment --- p.65Chapter 3.3.1 --- Motion Trajectory based Visual Saliency for VQA --- p.66Chapter 3.3.2 --- New Quaternion Representation (QR) for Each frame --- p.66Chapter 3.3.3 --- Saliency Map Construction by QR --- p.67Chapter 3.3.4 --- Incorporating Visual Saliency with VQAs --- p.68Chapter 3.3.5 --- Experimental Results --- p.69Chapter 3.3.6 --- Conclusion --- p.72Chapter 3.4 --- Perceptual Video Coding --- p.72Chapter 3.4.1 --- Experimental Results --- p.75Chapter 3.4.2 --- Conclusion --- p.76Chapter II --- Reduced Reference Quality Assessment --- p.77Chapter 4 --- Reduced Reference Image Quality Assessment --- p.78Chapter 4.1 --- Introduction --- p.78Chapter 4.2 --- Reorganization Strategy of DCT Coefficients --- p.81Chapter 4.3 --- Relationship Analysis of Intra and Inter RDCT subbands --- p.83Chapter 4.4 --- Reduced Reference Feature Extraction in Sender Side --- p.88Chapter 4.4.1 --- Intra RDCT Subband Modeling --- p.89Chapter 4.4.2 --- Inter RDCT Subband Modeling --- p.91Chapter 4.4.3 --- Image Frequency Feature --- p.92Chapter 4.5 --- Perceptual Quality Analysis in the Receiver Side --- p.95Chapter 4.5.1 --- Intra RDCT Feature Difference Analysis --- p.95Chapter 4.5.2 --- Inter RDCT Feature Difference Analysis --- p.96Chapter 4.5.3 --- Image Frequency Feature Difference Analysis --- p.96Chapter 4.6 --- Experimental Results --- p.98Chapter 4.6.1 --- Efficiency of the DCT Reorganization Strategy --- p.98Chapter 4.6.2 --- Performance of the Proposed RR IQA --- p.99Chapter 4.6.3 --- Performance of the Proposed RR IQA over Each Individual Distortion Type --- p.105Chapter 4.6.4 --- Statistical Significance --- p.107Chapter 4.6.5 --- Performance Analysis of Each Component --- p.109Chapter 4.7 --- Conclusion --- p.111Chapter 5 --- Reduced Reference Video Quality Assessment --- p.113Chapter 5.1 --- Introduction --- p.113Chapter 5.2 --- Proposed Reduced Reference Video Quality Metric --- p.114Chapter 5.2.1 --- Reduced Reference Feature Extraction from Spatial Perspective --- p.116Chapter 5.2.2 --- Reduced Reference Feature Extraction from Temporal Perspective --- p.118Chapter 5.2.3 --- Visual Quality Analysis in Receiver Side --- p.121Chapter 5.3 --- Experimental Results --- p.123Chapter 5.3.1 --- Consistency Test of the Proposed RR VQA over Compressed Video Sequences --- p.124Chapter 5.3.2 --- Consistency Test of the Proposed RR VQA over Video Sequences with Simulated Distortions --- p.126Chapter 5.3.3 --- Performance Evaluation of the Proposed RR VQA on Compressed Video Sequences --- p.129Chapter 5.3.4 --- Performance Evaluation of the Proposed RR VQA on Video Sequences Containing Transmission Distortions --- p.133Chapter 5.3.5 --- Performance Analysis of Each Component --- p.135Chapter 5.4 --- Conclusion --- p.137Chapter III --- Retargeted Visual Signal Quality Assessment --- p.138Chapter 6 --- Image Retargeting Perceptual Quality Assessment --- p.139Chapter 6.1 --- Introduction --- p.139Chapter 6.2 --- Preparation of Database Building --- p.142Chapter 6.2.1 --- Source Image --- p.142Chapter 6.2.2 --- Retargeting Methods --- p.143Chapter 6.2.3 --- Subjective Testing --- p.146Chapter 6.3 --- Data Processing and Analysis for the Database --- p.150Chapter 6.3.1 --- Processing of Subjective Ratings --- p.150Chapter 6.3.2 --- Analysis and Discussion of the Subjective Ratings --- p.153Chapter 6.4 --- Objective Quality Metric for Retargeted Images --- p.162Chapter 6.4.1 --- Quality Metric Performances on the Constructed Image Retargeting Database --- p.162Chapter 6.4.2 --- Subjective Analysis of the Shape Distortion and Content Information Loss --- p.165Chapter 6.4.3 --- Discussion --- p.167Chapter 6.5 --- Conclusion --- p.169Chapter 7 --- Conclusions --- p.170Chapter 7.1 --- Conclusion --- p.170Chapter 7.2 --- Future Work --- p.173Chapter A --- Attributes of the Source Image --- p.176Chapter B --- Retargeted Image Name and the Corresponding Number --- p.179Chapter C --- Source Image Name and the Corresponding Number --- p.183Bibliography --- p.18

    Doctor of Philosophy

    Get PDF
    dissertationPrimate primary visual cortex (V1) consists of six anatomical layers. There are both heterogeneous and homogeneous functional properties found across layers. Surround modulation (SM) occurs when neuronal responses to stimulation of a neuron's receptive field (RF) is modulated by simultaneous simulation outside of the RF. There are three potential candidates for SM: feedforward (FF) and intra-V1 horizontal (HZ) connections underpin the region nearby RF (near surround), while the modulatory signal arising from distant regions (far surround) are conveyed by feedback (FB) connections from higher visual areas. Also, V1 layers show distinct patterns of FF, HZ, and FB terminations. The goal of my dissertation research was to study 1) the properties of SM across V1 layers, 2) how simple visual stimuli in the RF and surround of a V1 column activate V1 layers, and (3) what specific afferent circuits to and within the V1 column of these stimuli recruit. Using single electrode recordings sampling from all the layers of V1, I found that near SM is more sharply orientation-tuned in the superficial layers (L3B, 4B and 4C?), where there are prominent horizontal connections. However, far SM is more orientation-tuned in L4B, possibly reflecting the orientation organization of feedback connections to this layer. Using laminar recordings, I investigated the temporal dynamics of inputs (local field potentials, LFPs) to each layer when stimulating surround elements. Near surround stimulation simultaneously localized the first inputs in superficial and deep layers with a significant delay in L4C, suggesting both HZ and FB contribute to near SM. Feedback recipient layers (L1/2A and L5/6) received the earliest inputs with far surround stimulation. Measuring the latency of spiking activity while co-stimulating RF and surround, the untuned near SM first emerged in L4C, but, tuned near SM and far SM, emerged outside thalamic recipient layers, suggesting a cortical origin. Finally, I found that brain oscillations in response to stimuli in the surround, mirror the structure of the underlying horizontal and feedback connections. Grating patches positioned on the collinear axis to a cell's preferred orientation, evoke a greater power in different frequency bands of LFP, including alpha, beta, and gamma, compared to orthogonal position in both near and far surround. We propose that horizontal and feedback connections, substrates of near and far surround, are aligned collinearly in the visual field and help generate brain oscillations
    corecore