7,312 research outputs found

    Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identification

    Full text link
    Re-identification is generally carried out by encoding the appearance of a subject in terms of outfit, suggesting scenarios where people do not change their attire. In this paper we overcome this restriction, by proposing a framework based on a deep convolutional neural network, SOMAnet, that additionally models other discriminative aspects, namely, structural attributes of the human figure (e.g. height, obesity, gender). Our method is unique in many respects. First, SOMAnet is based on the Inception architecture, departing from the usual siamese framework. This spares expensive data preparation (pairing images across cameras) and allows the understanding of what the network learned. Second, and most notably, the training data consists of a synthetic 100K instance dataset, SOMAset, created by photorealistic human body generation software. Synthetic data represents a good compromise between realistic imagery, usually not required in re-identification since surveillance cameras capture low-resolution silhouettes, and complete control of the samples, which is useful in order to customize the data w.r.t. the surveillance scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on recent re-identification benchmarks, outperforms all competitors, matching subjects even with different apparel. The combination of synthetic data with Inception architectures opens up new research avenues in re-identification.Comment: 14 page

    Virtual Network Embedding Approximations: Leveraging Randomized Rounding

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The Virtual Network Embedding Problem (VNEP) captures the essence of many resource allocation problems. In the VNEP, customers request resources in the form of Virtual Networks. An embedding of a virtual network on a shared physical infrastructure is the joint mapping of (virtual) nodes to physical servers together with the mapping of (virtual) edges onto paths in the physical network connecting the respective servers. This work initiates the study of approximation algorithms for the VNEP for general request graphs. Concretely, we study the offline setting with admission control: given multiple requests, the task is to embed the most profitable subset while not exceeding resource capacities. Our approximation is based on the randomized rounding of Linear Programming (LP) solutions. Interestingly, we uncover that the standard LP formulation for the VNEP exhibits an inherent structural deficit when considering general virtual network topologies: its solutions cannot be decomposed into valid embeddings. In turn, focusing on the class of cactus request graphs, we devise a novel LP formulation, whose solutions can be decomposed. Proving performance guarantees of our rounding scheme, we obtain the first approximation algorithm for the VNEP in the resource augmentation model. We propose different types of rounding heuristics and evaluate their performance in an extensive computational study. Our results indicate that good solutions can be achieved even without resource augmentations. Specifically, heuristical rounding achieves 77.2% of the baseline’s profit on average while respecting capacities.BMBF, 01IS12056, Software Campus GrantEC/H2020/679158/EU/Resolving the Tussle in the Internet: Mapping, Architecture, and Policy Making/ResolutioNe

    You said that?

    Full text link
    We present a method for generating a video of a talking face. The method takes as inputs: (i) still images of the target face, and (ii) an audio speech segment; and outputs a video of the target face lip synched with the audio. The method runs in real time and is applicable to faces and audio not seen at training time. To achieve this we propose an encoder-decoder CNN model that uses a joint embedding of the face and audio to generate synthesised talking face video frames. The model is trained on tens of hours of unlabelled videos. We also show results of re-dubbing videos using speech from a different person.Comment: https://youtu.be/LeufDSb15Kc British Machine Vision Conference (BMVC), 201
    corecore