45,604 research outputs found

    Face recognition in low resolution video sequences using super resolution

    Get PDF
    Human activity is a major concern in a wide variety of applications, such as video surveillance, human computer interface and face image database management. Detecting and recognizing faces is a crucial step in these applications. Furthermore, major advancements and initiatives in security applications in the past years have propelled face recognition technology into the spotlight. The performance of existing face recognition systems declines significantly if the resolution of the face image falls below a certain level. This is especially critical in surveillance imagery where often, due to many reasons, only low-resolution video of faces is available. If these low-resolution images are passed to a face recognition system, the performance is usually unacceptable. Hence, resolution plays a key role in face recognition systems. In this thesis, we address this issue by using super-resolution techniques as a middle step, where multiple low resolution face image frames are used to obtain a high-resolution face image for improved recognition rates. Two different techniques based on frequency and spatial domains were utilized in super resolution image enhancement. In this thesis, we apply super resolution to both images and video utilizing these techniques and we employ principal component analysis for face matching, which is both computationally efficient and accurate. The result is a system hat can accurately recognize faces using multiple low resolution images/frames

    A New Dataset and Transformer for Stereoscopic Video Super-Resolution

    Full text link
    Stereo video super-resolution (SVSR) aims to enhance the spatial resolution of the low-resolution video by reconstructing the high-resolution video. The key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue. There are several notable works on stereoscopic image super-resolution, but there is little research on stereo video super-resolution. In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises two key novel components: a spatio-temporal convolutional self-attention layer and an optical flow-based feed-forward layer that discovers the correlation across different video frames and aligns the features. The parallax attention mechanism (PAM) that uses the cross-view information to consider the significant disparities is used to fuse the stereo views. Due to the lack of a benchmark dataset suitable for the SVSR task, we collected a new stereoscopic video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos captured using a professional stereo camera. Extensive experiments on the collected dataset, along with two other datasets, demonstrate that the Trans-SVSR can achieve competitive performance compared to the state-of-the-art methods. Project code and additional results are available at https://github.com/H-deep/Trans-SVSR/Comment: Conference on Computer Vision and Pattern Recognition (CVPR 2022

    Realce de vídeo para seqüências de qualidade e resolução variáveis

    Get PDF
    Tese(doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2012.As técnicas propostas nesta tese permitem realçar a qualidade (objetiva e subjetiva) na decodificação de vídeo. Estas técnicas baseiam-se no uso de exemplos, também denominado quadros-chave que são imagens ou quadros com qualidade ou resolução que sejam maiores que a do vídeo alvo, denominados de quadros-não-chave. Neste caso, serão compostos dicionários contendo informações (exemplos) dos quadros-chave para super-resolver, ou realçar, os quadros do vídeo com baixa-resolução ou qualidade, denominados de não-chaves. As arquiteturas de qualidade e resolução mista podem ser adotadas em vários cenários como: a redução da complexidade durante o processo de compressão, redução da taxa de transmissão, melhorias na qualidade geral do vídeo baseadas em outros quadros ou imagens, correção de erros de transmissão, etc. Nesta tese são propostas duas novas técnicas utilizadas no processo de super-resolução baseada em exemplos: compensação de movimento utilizando blocos multi-escala sobrepostos e a combinação das informações de múltiplos dicionários. Um novo processo de extração de informação para aplicação em super-resolução utilizando o domínio transformado (DCT) também é proposto na tese. Por fim, propõe-se uma generalização do processo de realce baseado em exemplos para aplicação em vídeos com variação de qualidade entre quadros. Dentre as possíveis variações de qualidade foram contemplados: parâmetros de quantização (definindo a qualidade da compressão), foco ou ruído. Os cenários de aplicação testados nesta tese são: (i) vídeo com resolução mista, (ii) vídeos com múltiplas vistas em resolução mista com informação de profundidade, (iii) vídeo com fotografias redundantes durante a gravação, (iv) vídeo com qualidade mista. _______________________________________________________________________________________ ABSTRACTThis thesis proposes techniques for example-based enhancement of decoded video, providing both subjective and objective increases in quality. The techniques rely on the usage of information from images or frames available at greater quality or resolution (key-frames) to enhance the target images of lower quality or resolution (non-key-frames) within a video sequence. A codebook is composed of examples taken from the key-frames. From these examples, high-frequency information is extracted in order to enhance or super-resolve non-key-frames within video. This mixed quality or mixed resolution architecture may be adopted for appliations such as encoding complexity reduction, transmission bit-rate reduction, video enhancement based on other frames or images, error concealment, etc. In this thesis we first propose two techniques for usage in example-based super-resolution: a multi-scale overlapped block motion compensation scheme and a codebook combination of multiple dictionaries. Next, a novel transform-domain super-resolution method using the DCT is presented. Finally, a generalization of the example-based enhancement method is proposed. The generalization can account for videos with varying quality among frames due to different quantization parameters (which define the compression quality), focus or noise. The application scenarios considered in this thesis are: (i) mixed resolution video, (ii) multiview video plus depth with mixed resolution, (iii) videos with redundant snapshots, and (iv) mixed quality video

    Side-information generation for temporally and spatially scalablewyner-ziv codecs

    Get PDF
    The distributed video coding paradigmenables video codecs to operate with reversed complexity, in which the complexity is shifted from the encoder toward the decoder. Its performance is heavily dependent on the quality of the side information generated by motio estimation at the decoder. We compare the rate-distortion performance of different side-information estimators, for both temporally and spatially scalableWyner-Ziv codecs. For the temporally scalable codec we compared an established method with a new algorithm that uses a linear-motion model to produce side-information. As a continuation of previous works, in this paper, we propose to use a super-resolution method to upsample the nonkey frame, for the spatial scalable codec, using the key frames as reference.We verify the performance of the spatial scalableWZcoding using the state-of-the-art video coding standard H.264/AVC

    Two-Stream Action Recognition-Oriented Video Super-Resolution

    Full text link
    We study the video super-resolution (SR) problem for facilitating video analytics tasks, e.g. action recognition, instead of for visual quality. The popular action recognition methods based on convolutional networks, exemplified by two-stream networks, are not directly applicable on video of low spatial resolution. This can be remedied by performing video SR prior to recognition, which motivates us to improve the SR procedure for recognition accuracy. Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively. On the one hand, we observe that regions with action are more important to recognition, and we propose an optical-flow guided weighted mean-squared-error loss for our spatial-oriented SR (SoSR) network to emphasize the reconstruction of moving objects. On the other hand, we observe that existing video SR methods incur temporal discontinuity between frames, which also worsens the recognition accuracy, and we propose a siamese network for our temporal-oriented SR (ToSR) training that emphasizes the temporal continuity between consecutive frames. We perform experiments using two state-of-the-art action recognition networks and two well-known datasets--UCF101 and HMDB51. Results demonstrate the effectiveness of our proposed SoSR and ToSR in improving recognition accuracy.Comment: Accepted to ICCV 2019. Code: https://github.com/AlanZhang1995/TwoStreamS
    corecore