9 research outputs found
Video Face Super-Resolution with Motion-Adaptive Feedback Cell
Video super-resolution (VSR) methods have recently achieved a remarkable
success due to the development of deep convolutional neural networks (CNN).
Current state-of-the-art CNN methods usually treat the VSR problem as a large
number of separate multi-frame super-resolution tasks, at which a batch of low
resolution (LR) frames is utilized to generate a single high resolution (HR)
frame, and running a slide window to select LR frames over the entire video
would obtain a series of HR frames. However, duo to the complex temporal
dependency between frames, with the number of LR input frames increase, the
performance of the reconstructed HR frames become worse. The reason is in that
these methods lack the ability to model complex temporal dependencies and hard
to give an accurate motion estimation and compensation for VSR process. Which
makes the performance degrade drastically when the motion in frames is complex.
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but
effective block, which can efficiently capture the motion compensation and feed
it back to the network in an adaptive way. Our approach efficiently utilizes
the information of the inter-frame motion, the dependence of the network on
motion estimation and compensation method can be avoid. In addition, benefiting
from the excellent nature of MAFC, the network can achieve better performance
in the case of extremely complex motion scenarios. Extensive evaluations and
comparisons validate the strengths of our approach, and the experimental
results demonstrated that the proposed framework is outperform the
state-of-the-art methods.Comment: To appear in AAAI 202
Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection
This paper analyses the design choices of face detection architecture that
improve efficiency between computation cost and accuracy. Specifically, we
re-examine the effectiveness of the standard convolutional block as a
lightweight backbone architecture on face detection. Unlike the current
tendency of lightweight architecture design, which heavily utilizes depthwise
separable convolution layers, we show that heavily channel-pruned standard
convolution layer can achieve better accuracy and inference speed when using a
similar parameter size. This observation is supported by the analyses
concerning the characteristics of the target data domain, face. Based on our
observation, we propose to employ ResNet with a highly reduced channel, which
surprisingly allows high efficiency compared to other mobile-friendly networks
(e.g., MobileNet-V1,-V2,-V3). From the extensive experiments, we show that the
proposed backbone can replace that of the state-of-the-art face detector with a
faster inference speed. Also, we further propose a new feature aggregation
method maximizing the detection performance. Our proposed detector EResFD
obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA
image inference in on CPU. Code will be available at
https://github.com/clovaai/EResFD
KPNet: Towards Minimal Face Detector
The small receptive field and capacity of minimal neural networks limit their
performance when using them to be the backbone of detectors. In this work, we
find that the appearance feature of a generic face is discriminative enough for
a tiny and shallow neural network to verify from the background. And the
essential barriers behind us are 1) the vague definition of the face bounding
box and 2) tricky design of anchor-boxes or receptive field. Unlike most
top-down methods for joint face detection and alignment, the proposed KPNet
detects small facial keypoints instead of the whole face by in a bottom-up
manner. It first predicts the facial landmarks from a low-resolution image via
the well-designed fine-grained scale approximation and scale adaptive
soft-argmax operator. Finally, the precise face bounding boxes, no matter how
we define it, can be inferred from the keypoints. Without any complex head
architecture or meticulous network designing, the KPNet achieves
state-of-the-art accuracy on generic face detection and alignment benchmarks
with only parameters, which runs at 1000fps on GPU and is easy to
perform real-time on most modern front-end chips.Comment: AAAI 202
Diseño de un sistema prototipo de reconocimiento facial para la identificación de personas en la Facultad de Ingeniería en Ciencias Aplicadas (FICA) de la Universidad Técnica del Norte utilizando técnicas de Inteligencia Artificial
Diseñar un sistema prototipo de reconocimiento facial para la identificación de personas en la FICA de la Universidad Técnica del Norte utilizando técnicas de Inteligencia Artificial.En la actualidad, los sistemas de monitorización de circuito cerrado de televisión (CCTV), control de acceso, y muchas otras aplicaciones relacionadas con la seguridad, incorporan técnicas de reconocimiento facial. Esta herramienta disruptiva, se diferencia de otras técnicas biométricas, ya que los rostros pueden ser reconocidos a distancia. Así, estas aplicaciones pueden incorporase en diferentes instituciones con la finalidad de restringir el acceso a personas no autorizadas/desconocidas, evitando daños y pérdidas al bien público y privado. El objetivo del presente trabajo fue identificar personas en entornos controlados y no controlados dentro del edificio universitario de la FICA, que ha sufrido problemas de inseguridad en varias ocasiones. Al ser un tema abierto de estudio y de ardua investigación en el área de la Inteligencia Artificial (IA), en este documento se presenta el diseño completo de un sistema de reconocimiento facial combinando una arquitectura de Red Neuronal Convolucional (CNN) y el poder de clasificación del algoritmo de Máquinas de Vector Soporte (SVM), implementadas bajo tecnología de procesamiento paralelo (CUDA) a través de una unidad de procesamiento grafico (GPU). Todo el procedimiento de desarrollo e implementación se describe a detalle, donde se inicia con el entrenamiento de la CNN usando el conjunto de datos VGGFace2, para el aprendizaje y generalización de incrustaciones faciales profundamente discriminativas de tamaño de 512 bytes por rostro mediante la supervisión conjunta de las señales de pérdida de softmax y la pérdida central. Consecuentemente se emplea SVM como clasificador en varios experimentos con diferentes cantidades de clases, para finalmente mostrar la eficiencia del enfoque en los entornos mencionados en tiempo real, empleando una muestra de individuos para el entrenamiento del sistema, logrando resultados bastante aceptables. Finalmente, el sistema propuesto establece un punto de partida para el desarrollo de un sistema más robusto en entornos de producción.Ingenierí