9 research outputs found

    Video Face Super-Resolution with Motion-Adaptive Feedback Cell

    Full text link
    Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN). Current state-of-the-art CNN methods usually treat the VSR problem as a large number of separate multi-frame super-resolution tasks, at which a batch of low resolution (LR) frames is utilized to generate a single high resolution (HR) frame, and running a slide window to select LR frames over the entire video would obtain a series of HR frames. However, duo to the complex temporal dependency between frames, with the number of LR input frames increase, the performance of the reconstructed HR frames become worse. The reason is in that these methods lack the ability to model complex temporal dependencies and hard to give an accurate motion estimation and compensation for VSR process. Which makes the performance degrade drastically when the motion in frames is complex. In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way. Our approach efficiently utilizes the information of the inter-frame motion, the dependence of the network on motion estimation and compensation method can be avoid. In addition, benefiting from the excellent nature of MAFC, the network can achieve better performance in the case of extremely complex motion scenarios. Extensive evaluations and comparisons validate the strengths of our approach, and the experimental results demonstrated that the proposed framework is outperform the state-of-the-art methods.Comment: To appear in AAAI 202

    Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection

    Full text link
    This paper analyses the design choices of face detection architecture that improve efficiency between computation cost and accuracy. Specifically, we re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture on face detection. Unlike the current tendency of lightweight architecture design, which heavily utilizes depthwise separable convolution layers, we show that heavily channel-pruned standard convolution layer can achieve better accuracy and inference speed when using a similar parameter size. This observation is supported by the analyses concerning the characteristics of the target data domain, face. Based on our observation, we propose to employ ResNet with a highly reduced channel, which surprisingly allows high efficiency compared to other mobile-friendly networks (e.g., MobileNet-V1,-V2,-V3). From the extensive experiments, we show that the proposed backbone can replace that of the state-of-the-art face detector with a faster inference speed. Also, we further propose a new feature aggregation method maximizing the detection performance. Our proposed detector EResFD obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA image inference in on CPU. Code will be available at https://github.com/clovaai/EResFD

    KPNet: Towards Minimal Face Detector

    Full text link
    The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors. In this work, we find that the appearance feature of a generic face is discriminative enough for a tiny and shallow neural network to verify from the background. And the essential barriers behind us are 1) the vague definition of the face bounding box and 2) tricky design of anchor-boxes or receptive field. Unlike most top-down methods for joint face detection and alignment, the proposed KPNet detects small facial keypoints instead of the whole face by in a bottom-up manner. It first predicts the facial landmarks from a low-resolution image via the well-designed fine-grained scale approximation and scale adaptive soft-argmax operator. Finally, the precise face bounding boxes, no matter how we define it, can be inferred from the keypoints. Without any complex head architecture or meticulous network designing, the KPNet achieves state-of-the-art accuracy on generic face detection and alignment benchmarks with only 1M\sim1M parameters, which runs at 1000fps on GPU and is easy to perform real-time on most modern front-end chips.Comment: AAAI 202

    Diseño de un sistema prototipo de reconocimiento facial para la identificación de personas en la Facultad de Ingeniería en Ciencias Aplicadas (FICA) de la Universidad Técnica del Norte utilizando técnicas de Inteligencia Artificial

    Get PDF
    Diseñar un sistema prototipo de reconocimiento facial para la identificación de personas en la FICA de la Universidad Técnica del Norte utilizando técnicas de Inteligencia Artificial.En la actualidad, los sistemas de monitorización de circuito cerrado de televisión (CCTV), control de acceso, y muchas otras aplicaciones relacionadas con la seguridad, incorporan técnicas de reconocimiento facial. Esta herramienta disruptiva, se diferencia de otras técnicas biométricas, ya que los rostros pueden ser reconocidos a distancia. Así, estas aplicaciones pueden incorporase en diferentes instituciones con la finalidad de restringir el acceso a personas no autorizadas/desconocidas, evitando daños y pérdidas al bien público y privado. El objetivo del presente trabajo fue identificar personas en entornos controlados y no controlados dentro del edificio universitario de la FICA, que ha sufrido problemas de inseguridad en varias ocasiones. Al ser un tema abierto de estudio y de ardua investigación en el área de la Inteligencia Artificial (IA), en este documento se presenta el diseño completo de un sistema de reconocimiento facial combinando una arquitectura de Red Neuronal Convolucional (CNN) y el poder de clasificación del algoritmo de Máquinas de Vector Soporte (SVM), implementadas bajo tecnología de procesamiento paralelo (CUDA) a través de una unidad de procesamiento grafico (GPU). Todo el procedimiento de desarrollo e implementación se describe a detalle, donde se inicia con el entrenamiento de la CNN usando el conjunto de datos VGGFace2, para el aprendizaje y generalización de incrustaciones faciales profundamente discriminativas de tamaño de 512 bytes por rostro mediante la supervisión conjunta de las señales de pérdida de softmax y la pérdida central. Consecuentemente se emplea SVM como clasificador en varios experimentos con diferentes cantidades de clases, para finalmente mostrar la eficiencia del enfoque en los entornos mencionados en tiempo real, empleando una muestra de individuos para el entrenamiento del sistema, logrando resultados bastante aceptables. Finalmente, el sistema propuesto establece un punto de partida para el desarrollo de un sistema más robusto en entornos de producción.Ingenierí
    corecore