7,312 research outputs found
Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identification
Re-identification is generally carried out by encoding the appearance of a
subject in terms of outfit, suggesting scenarios where people do not change
their attire. In this paper we overcome this restriction, by proposing a
framework based on a deep convolutional neural network, SOMAnet, that
additionally models other discriminative aspects, namely, structural attributes
of the human figure (e.g. height, obesity, gender). Our method is unique in
many respects. First, SOMAnet is based on the Inception architecture, departing
from the usual siamese framework. This spares expensive data preparation
(pairing images across cameras) and allows the understanding of what the
network learned. Second, and most notably, the training data consists of a
synthetic 100K instance dataset, SOMAset, created by photorealistic human body
generation software. Synthetic data represents a good compromise between
realistic imagery, usually not required in re-identification since surveillance
cameras capture low-resolution silhouettes, and complete control of the
samples, which is useful in order to customize the data w.r.t. the surveillance
scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on
recent re-identification benchmarks, outperforms all competitors, matching
subjects even with different apparel. The combination of synthetic data with
Inception architectures opens up new research avenues in re-identification.Comment: 14 page
Virtual Network Embedding Approximations: Leveraging Randomized Rounding
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The Virtual Network Embedding Problem (VNEP) captures the essence of many resource allocation problems. In the VNEP, customers request resources in the form of Virtual Networks. An embedding of a virtual network on a shared physical infrastructure is the joint mapping of (virtual) nodes to physical servers together with the mapping of (virtual) edges onto paths in the physical network connecting the respective servers. This work initiates the study of approximation algorithms for the VNEP for general request graphs. Concretely, we study the offline setting with admission control: given multiple requests, the task is to embed the most profitable subset while not exceeding resource capacities. Our approximation is based on the randomized rounding of Linear Programming (LP) solutions. Interestingly, we uncover that the standard LP formulation for the VNEP exhibits an inherent structural deficit when considering general virtual network topologies: its solutions cannot be decomposed into valid embeddings. In turn, focusing on the class of cactus request graphs, we devise a novel LP formulation, whose solutions can be decomposed. Proving performance guarantees of our rounding scheme, we obtain the first approximation algorithm for the VNEP in the resource augmentation model. We propose different types of rounding heuristics and evaluate their performance in an extensive computational study. Our results indicate that good solutions can be achieved even without resource augmentations. Specifically, heuristical rounding achieves 77.2% of the baseline’s profit on average while respecting capacities.BMBF, 01IS12056, Software Campus GrantEC/H2020/679158/EU/Resolving the Tussle in the Internet: Mapping, Architecture, and Policy Making/ResolutioNe
You said that?
We present a method for generating a video of a talking face. The method
takes as inputs: (i) still images of the target face, and (ii) an audio speech
segment; and outputs a video of the target face lip synched with the audio. The
method runs in real time and is applicable to faces and audio not seen at
training time.
To achieve this we propose an encoder-decoder CNN model that uses a joint
embedding of the face and audio to generate synthesised talking face video
frames. The model is trained on tens of hours of unlabelled videos.
We also show results of re-dubbing videos using speech from a different
person.Comment: https://youtu.be/LeufDSb15Kc British Machine Vision Conference
(BMVC), 201
- …