201,543 research outputs found
Detecting Deepfake Videos in Data Scarcity Conditions by Means of Video Coding Features
The most powerful deepfake detection methods developed so far are based on deep learning, requiring that large amounts of training data representative of the specific task are available to the trainer. In this paper, we propose a feature-based method for video deepfake detection that can work in data scarcity conditions, that is, when only very few examples are available to the forensic analyst. The proposed method is based on video coding analysis and relies on a simple footprint obtained from the motion prediction modes in the video sequence. The footprint is extracted from video sequences and used to train a simple linear Support Vector Machine classifier. The effectiveness of the proposed method is validated experimentally on three different datasets, namely, a synthetic street video dataset and two datasets of Deepfake face videos
Lend Me a Hand: Auxiliary Image Data Helps Interaction Detection
In social settings, people interact in close proximity. When analyzing such encounters from video, we are typically interested in distinguishing between a large number of different interactions. Here, we address training deformable part models (DPMs) for the detection of such interactions from video, in both space and time. When we consider a large number of interaction classes, we face two challenges. First, we need to distinguish between interactions that are visually more similar. Second, it becomes more difficult to obtain sufficient specific training examples for each interaction class. In this paper, we address both challenges and focus on the latter. Specifically, we introduce a method to train body part detectors from nonspecific images with pose information. Such resources are widely available. We introduce a training scheme and an adapted DPM formulation to allow for the inclusion of this auxiliary data. We perform cross-dataset experiments to evaluate the generalization performance of our method. We demonstrate that our method can still achieve decent performance, from as few as five training examples
Deep Poselets for Human Detection
We address the problem of detecting people in natural scenes using a part
approach based on poselets. We propose a bootstrapping method that allows us to
collect millions of weakly labeled examples for each poselet type. We use these
examples to train a Convolutional Neural Net to discriminate different poselet
types and separate them from the background class. We then use the trained CNN
as a way to represent poselet patches with a Pose Discriminative Feature (PDF)
vector -- a compact 256-dimensional feature vector that is effective at
discriminating pose from appearance. We train the poselet model on top of PDF
features and combine them with object-level CNNs for detection and bounding box
prediction. The resulting model leads to state-of-the-art performance for human
detection on the PASCAL datasets
One-shot learning of object categories
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by maximum likelihood (ML) and maximum a posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully
Incremental Training of a Detector Using Online Sparse Eigen-decomposition
The ability to efficiently and accurately detect objects plays a very crucial
role for many computer vision tasks. Recently, offline object detectors have
shown a tremendous success. However, one major drawback of offline techniques
is that a complete set of training data has to be collected beforehand. In
addition, once learned, an offline detector can not make use of newly arriving
data. To alleviate these drawbacks, online learning has been adopted with the
following objectives: (1) the technique should be computationally and storage
efficient; (2) the updated classifier must maintain its high classification
accuracy. In this paper, we propose an effective and efficient framework for
learning an adaptive online greedy sparse linear discriminant analysis (GSLDA)
model. Unlike many existing online boosting detectors, which usually apply
exponential or logistic loss, our online algorithm makes use of LDA's learning
criterion that not only aims to maximize the class-separation criterion but
also incorporates the asymmetrical property of training data distributions. We
provide a better alternative for online boosting algorithms in the context of
training a visual object detector. We demonstrate the robustness and efficiency
of our methods on handwriting digit and face data sets. Our results confirm
that object detection tasks benefit significantly when trained in an online
manner.Comment: 14 page
Deep Feature-based Face Detection on Mobile Devices
We propose a deep feature-based face detector for mobile devices to detect
user's face acquired by the front facing camera. The proposed method is able to
detect faces in images containing extreme pose and illumination variations as
well as partial faces. The main challenge in developing deep feature-based
algorithms for mobile devices is the constrained nature of the mobile platform
and the non-availability of CUDA enabled GPUs on such devices. Our
implementation takes into account the special nature of the images captured by
the front-facing camera of mobile devices and exploits the GPUs present in
mobile devices without CUDA-based frameorks, to meet these challenges.Comment: ISBA 201
- …