357,715 research outputs found
Cross-Domain Local Characteristic Enhanced Deepfake Video Detection
As ultra-realistic face forgery techniques emerge, deepfake detection has
attracted increasing attention due to security concerns. Many detectors cannot
achieve accurate results when detecting unseen manipulations despite excellent
performance on known forgeries. In this paper, we are motivated by the
observation that the discrepancies between real and fake videos are extremely
subtle and localized, and inconsistencies or irregularities can exist in some
critical facial regions across various information domains. To this end, we
propose a novel pipeline, Cross-Domain Local Forensics (XDLF), for more general
deepfake video detection. In the proposed pipeline, a specialized framework is
presented to simultaneously exploit local forgery patterns from space,
frequency, and time domains, thus learning cross-domain features to detect
forgeries. Moreover, the framework leverages four high-level forgery-sensitive
local regions of a human face to guide the model to enhance subtle artifacts
and localize potential anomalies. Extensive experiments on several benchmark
datasets demonstrate the impressive performance of our method, and we achieve
superiority over several state-of-the-art methods on cross-dataset
generalization. We also examined the factors that contribute to its performance
through ablations, which suggests that exploiting cross-domain local
characteristics is a noteworthy direction for developing more general deepfake
detectors
Towards robust and reliable multimedia analysis through semantic integration of services
Thanks to ubiquitous Web connectivity and portable multimedia devices, it has never been so easy to produce and distribute new multimedia resources such as videos, photos, and audio. This ever-increasing production leads to an information overload for consumers, which calls for efficient multimedia retrieval techniques. Multimedia resources can be efficiently retrieved using their metadata, but the multimedia analysis methods that can automatically generate this metadata are currently not reliable enough for highly diverse multimedia content. A reliable and automatic method for analyzing general multimedia content is needed. We introduce a domain-agnostic framework that annotates multimedia resources using currently available multimedia analysis methods. By using a three-step reasoning cycle, this framework can assess and improve the quality of multimedia analysis results, by consecutively (1) combining analysis results effectively, (2) predicting which results might need improvement, and (3) invoking compatible analysis methods to retrieve new results. By using semantic descriptions for the Web services that wrap the multimedia analysis methods, compatible services can be automatically selected. By using additional semantic reasoning on these semantic descriptions, the different services can be repurposed across different use cases. We evaluated this problem-agnostic framework in the context of video face detection, and showed that it is capable of providing the best analysis results regardless of the input video. The proposed methodology can serve as a basis to build a generic multimedia annotation platform, which returns reliable results for diverse multimedia analysis problems. This allows for better metadata generation, and improves the efficient retrieval of multimedia resources
Face Detection in Camera Image on a Mobile Phone
Tato práce se zabývá detekcí obličejů na mobilních telefonech. Konkrétně se zaměřuje na platformu Windows Mobile. Úvod je tedy věnován tomuto operačnímu systému a možnostem práce s kamerou. Další část textu je věnována obecné problematice detekce obličeje v obraze s ohledem na slabý výkon cílových zařízení. Součástí práce je také popis získávání obrazu z kamery pomocí multimediálního frameworku DirectShow a tvorba vlastního transformačního filtru pro detekci obličeje. V závěru jsou shrnuty dosažené výsledky formou testů na několika mobilních zařízení a také jsou zmíněna všechna úskalí, která obnáší vývoj aplikací pro Windows Mobile.This thesis deals with a face detection on mobile phones. It especially focuses on Windows Mobile platform. The introduction is therefore devoted to this operating system and alternatives of working with the camera. The next part of the text refers to general problems of the face detection in the image considering the weak performance of the target device. Another part of this thesis is a description of the acquisition of images from the camera using DirectShow multimedia framework and creation of a custom transformation filter for the face detection. Achieved results are summarized in the conclusion. It takes a form of tests examining different mobile devices. All difficulties arising during Windows Mobile developing are also mentioned.
Assessment Framework for Deepfake Detection in Real-world Situations
Detecting digital face manipulation in images and video has attracted
extensive attention due to the potential risk to public trust. To counteract
the malicious usage of such techniques, deep learning-based deepfake detection
methods have been employed and have exhibited remarkable performance. However,
the performance of such detectors is often assessed on related benchmarks that
hardly reflect real-world situations. For example, the impact of various image
and video processing operations and typical workflow distortions on detection
accuracy has not been systematically measured. In this paper, a more reliable
assessment framework is proposed to evaluate the performance of learning-based
deepfake detectors in more realistic settings. To the best of our
acknowledgment, it is the first systematic assessment approach for deepfake
detectors that not only reports the general performance under real-world
conditions but also quantitatively measures their robustness toward different
processing operations. To demonstrate the effectiveness and usage of the
framework, extensive experiments and detailed analysis of three popular
deepfake detection methods are further presented in this paper. In addition, a
stochastic degradation-based data augmentation method driven by realistic
processing operations is designed, which significantly improves the robustness
of deepfake detectors
Real-time Convolutional Neural Networks for Emotion and Gender Classification
In this paper we propose an implement a general convolutional neural network
(CNN) building framework for designing real-time CNNs. We validate our models
by creating a real-time vision system which accomplishes the tasks of face
detection, gender classification and emotion classification simultaneously in
one blended step using our proposed CNN architecture. After presenting the
details of the training procedure setup we proceed to evaluate on standard
benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66%
in the FER-2013 emotion dataset. Along with this we also introduced the very
recent real-time enabled guided back-propagation visualization technique.
Guided back-propagation uncovers the dynamics of the weight changes and
evaluates the learned features. We argue that the careful implementation of
modern CNN architectures, the use of the current regularization methods and the
visualization of previously hidden features are necessary in order to reduce
the gap between slow performances and real-time architectures. Our system has
been validated by its deployment on a Care-O-bot 3 robot used during
RoboCup@Home competitions. All our code, demos and pre-trained architectures
have been released under an open-source license in our public repository.Comment: Submitted to ICRA 201
Gender Classification from Facial Images
Gender classification based on facial images has received increased attention in the computer vision community. In this work, a comprehensive evaluation of state-of-the-art gender classification methods is carried out on publicly available databases and extended to reallife face images, where face detection and face normalization are essential for the success of the system. Next, the possibility of predicting gender from face images acquired in the near-infrared spectrum (NIR) is explored. In this regard, the following two questions are addressed: (a) Can gender be predicted from NIR face images; and (b) Can a gender predictor learned using visible (VIS) images operate successfully on NIR images and vice-versa? The experimental results suggest that NIR face images do have some discriminatory information pertaining to gender, although the degree of discrimination is noticeably lower than that of VIS images. Further, the use of an illumination normalization routine may be essential for facilitating cross-spectral gender prediction. By formulating the problem of gender classification in the framework of both visible and near-infrared images, the guidelines for performing gender classification in a real-world scenario is provided, along with the strengths and weaknesses of each methodology. Finally, the general problem of attribute classification is addressed, where features such as expression, age and ethnicity are derived from a face image
- …