735 research outputs found
Recent Advances in Image Restoration with Applications to Real World Problems
In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included
Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment
Compressed videos often exhibit visually annoying artifacts, known as
Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual
quality. Subjective and objective measures capable of identifying and
quantifying various types of PEAs are critical in improving visual quality. In
this paper, we investigate the influence of four spatial PEAs (i.e. blurring,
blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and
floating) on video quality. For spatial artifacts, we propose a visual saliency
model with a low computational cost and higher consistency with human visual
perception. In terms of temporal artifacts, self-attention based TimeSFormer is
improved to detect temporal artifacts. Based on the six types of PEAs, a
quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement
(SSTAM) is proposed. Experimental results demonstrate that the proposed method
outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial
for optimizing video coding techniques
Comparison between Structural Similarity Index Metric and Human Perception
This thesis examines the image quality assessment using Structural Similarity Index Metric (SSIM). The performance of Structural Similarity Index Metric was evaluated by comparing Mean Structural Similarity Index (MSSIM) index values with the Probability of Identification (PID) values. The perception experiments were designed for letter images with blur and letter images with blur and noise to obtain the PID values from an ensemble of observers. The other set of images used in this study were tank images for which PID data existed. All the images used in the experiment belong to Gaussian and Exponential filter shapes at various blur levels. All images at a specific blur level and specific filter shape were compared and MSSIM was obtained. MSSIM was compared with blur and PID was compared with blur at various levels for both the filter shapes to observe the correlation between SSIM and human perception. It is noticed from the results that there is no correlation between MSSIM and PID. The image quality differences between SSIM and human perception were obtained in this thesis. From the results it is noticed that SSIM cannot detect the filter shape difference where as humans perceived the difference for letter images with blur in our experiments. The Probability of Identification for Gaussian is lower than the Exponential filter shape which is explained by the edge energies analysis. It is observed that the results of tank images and letter images with blur and noise were similar where humans and MSSIM cannot distinguish between filter shapes
Deep learning based objective quality assessment of multidimensional visual content
Tese (doutorado) — Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2022.Na última década, houve um tremendo aumento na popularidade dos aplicativos multimídia, aumentando assim o conteúdo multimídia. Quando esses conteúdossão gerados, transmitidos, reconstruídos e compartilhados, seus valores de pixel originais são transformados. Nesse cenário, torna-se mais crucial e exigente avaliar a qualidade visual do conteúdo visual afetado para que os requisitos dos usuários finais sejam atendidos. Neste trabalho, investigamos recursos espaciais, temporais e angulares eficazes desenvolvendo algoritmos sem referência que avaliam a qualidade visual de conteúdo visual multidimensional distorcido. Usamos algoritmos de aprendizado de máquina e aprendizado profundo para obter precisão de previsão.Para avaliação de qualidade de imagem bidimensional (2D), usamos padrões binários locais multiescala e informações de saliência e treinamos/testamos esses recursos usando o Random Forest Regressor. Para avaliação de qualidade de vídeo 2D, apresentamos um novo conceito de saliência espacial e temporal e pontuações de qualidade objetivas personalizadas. Usamos um modelo leve baseado em Rede Neural Convolucional (CNN) para treinamento e teste em patches selecionados de quadros de vídeo.Para avaliação objetiva da qualidade de imagens de campo de luz (LFI) em quatro dimensões (4D), propomos sete métodos de avaliação de qualidade LFI (LF-IQA) no total. Considerando que o LFI é composto por multi-views densas, Inspired by Human Visual System (HVS), propomos nosso primeiro método LF-IQA que é baseado em uma arquitetura CNN de dois fluxos. O segundo e terceiro métodos LF-IQA também são baseados em uma arquitetura de dois fluxos, que incorpora CNN, Long Short-Term Memory (LSTM) e diversos recursos de gargalo. O quarto LF-IQA é baseado nas camadas CNN e Atrous Convolution (ACL), enquanto o quinto método usa as camadas CNN, ACL e LSTM. O sexto método LF-IQA também é baseado em uma arquitetura de dois fluxos, na qual EPIs horizontais e verticais são processados no domínio da frequência. Por último, mas não menos importante, o sétimo método LF-IQA é baseado em uma Rede Neural Convolucional de Gráfico. Para todos os métodos mencionados acima, realizamos experimentos intensivos e os resultados mostram que esses métodos superaram os métodos de última geração em conjuntos de dados de qualidade populares.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).In the last decade, there has been a tremendous increase in the popularity of multimedia applications, hence increasing multimedia content. When these contents are generated,
transmitted, reconstructed and shared, their original pixel values are transformed. In this
scenario, it becomes more crucial and demanding to assess visual quality of the affected
visual content so that the requirements of end-users are satisfied. In this work, we investigate effective spatial, temporal, and angular features by developing no-reference algorithms
that assess the visual quality of distorted multi-dimensional visual content. We use machine
learning and deep learning algorithms to obtain prediction accuracy.
For two-dimensional (2D) image quality assessment, we use multiscale local binary patterns and saliency information, and train / test these features using Random Forest Regressor. For 2D video quality assessment, we introduce a novel concept of spatial and temporal
saliency and custom objective quality scores. We use a Convolutional Neural Network (CNN)
based light-weight model for training and testing on selected patches of video frames.
For objective quality assessment of four-dimensional (4D) light field images (LFI), we
propose seven LFI quality assessment (LF-IQA) methods in total. Considering that LFI is
composed of dense multi-views, Inspired by Human Visual System (HVS), we propose our
first LF-IQA method that is based on a two-streams CNN architecture. The second and third
LF-IQA methods are also based on a two-stream architecture, which incorporates CNN, Long
Short-Term Memory (LSTM), and diverse bottleneck features. The fourth LF-IQA is based
on CNN and Atrous Convolution layers (ACL), while the fifth method uses CNN, ACL, and
LSTM layers. The sixth LF-IQA method is also based on a two-stream architecture, in which,
horizontal and vertical EPIs are processed in the frequency domain. Last, but not least, the
seventh LF-IQA method is based on a Graph Convolutional Neural Network. For all of the
methods mentioned above, we performed intensive experiments, and the results show that
these methods outperformed state-of-the-art methods on popular quality datasets
Deep learning in crowd counting: A survey
Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.BHF, AA/18/3/34220Hope Foundation for Cancer Research,
RM60G0680GCRF,
P202PF11;Sino‐UK Industrial Fund,
RP202G0289LIAS, P202ED10, P202RE969Data
Science Enhancement Fund,
P202RE237Sino‐UK Education Fund, OP202006Fight for Sight, 24NN201Royal Society
International Exchanges Cost Share Award, RP202G0230MRC, MC_PC_17171BBSRC, RM32G0178B
Automated Complexity-Sensitive Image Fusion
To construct a complete representation of a scene with environmental obstacles such as fog, smoke, darkness, or textural homogeneity, multisensor video streams captured in diferent modalities are considered. A computational method for automatically fusing multimodal image streams into a highly informative and unified stream is proposed. The method consists of the following steps: 1. Image registration is performed to align video frames in the visible band over time, adapting to the nonplanarity of the scene by automatically subdividing the image domain into regions approximating planar patches
2. Wavelet coefficients are computed for each of the input frames in each modality
3. Corresponding regions and points are compared using spatial and temporal information across various scales
4. Decision rules based on the results of multimodal image analysis are used to combine thewavelet coefficients from different modalities
5. The combined wavelet coefficients are inverted to produce an output frame containing useful information gathered from the available modalities
Experiments show that the proposed system is capable of producing fused output containing the characteristics of color visible-spectrum imagery while adding information exclusive to infrared imagery, with attractive visual and informational properties
Sense and Sensitivity: Spatial Structure of conspecific signals during social interaction
Organisms rely on sensory systems to gather information about their environment. Localizing the source of a signal is key in guiding the behavior of the animal successfully. Localization mechanisms must cope with the challenges of representing the spatial information of weak, noisy signals. In this dissertation, I investigate the spatial dynamics of natural stimuli and explore how the electrosensory system of weakly electric fish encodes these realistic spatial signals. To do so In Chapter 2, I develop a model that examines the strength of the signal as it reaches the sensory array and simulates the responses of the receptors. The results demonstrate that beyond distances of 20 cm, the signal strength is only a fraction of the self-generated signal, often measuring less than a few percent. Chapter 2 also focuses on modeling a heterogeneous population of receptors to gain insights into the encoding of the spatial signal perceived by the fish. The findings reveal a significant decrease in signal detection beyond 40 cm, with a corresponding decrease in localization accuracy at 30 cm. Additionally, I investigate the impact of receptor density differences between the front and back on both signal detection and resolution accuracy. In Chapter 3, I analyze distinct movement patterns observed during agonistic encounters and their correlation with the estimated range of receptor sensitivity. Furthermore, I uncover that these agonistic interactions follow a classical pattern of cumulative assessment of competitors\u27 abilities. The outcome of this research is a comprehensive understanding of the spatial dynamics of social interactions and how this information is captured by the sensory system. Moreover, the research contributed to the development of a range of tools and models that will play crucial roles in future investigations of sensory processing within this system
Recommended from our members
Perceptual models for high-refresh-rate rendering
Rendering realistic images requires substantial computational power. With new high-refresh-rate displays as well as the renaissance of virtual reality (VR) and augmented reality (AR), one cannot expect that GPU performance will scale fast enough to meet the requirements of immersive photo-realistic rendering with current rendering techniques.
In this dissertation, I follow the dual of the well-known computer vision approach: vision is inverse graphics: to improve graphical algorithms, I consider the operation of the human visual system. I propose to model and exploit the limitations of the visual system in the context of novel high-refresh-rate displays; specifically, I focus on spatio-temporal perception, a topic that has received remarkably less attention than spatial-only perception so far.
I present three main contributions. First, I demonstrate the validity of the perceptual approach by presenting a conceptually simple rendering technique motivated by our eyes' limited sensitivity to high spatio-temporal change which reduces the rendering load and transmission requirement of current-generation VR headsets without introducing perceivable visual artefacts. Second, I present two visual models related to motion perception: (a) a metric for detecting flicker; and (b) a comprehensive visual model to predict perceived motion quality on monitors with arbitrary refresh rates and monitor resolutions. Third, I propose an adaptive rendering algorithm that utilises the proposed models. All algorithms operate on physical colorimetric units (instead of display-referenced pixel values), for which I provide the appropriate display measurements and models. All proposed algorithms and visual models are calibrated and validated with psychophysical experiments
- …