13 research outputs found

    Celebrities-ReID: A Benchmark for Clothes Variation in Long-Term Person Re-Identification

    Full text link
    © 2019 IEEE. This paper considers person re-identification (re-ID) in the case of long-time gap (i.e., long-term re-ID) that concentrates on the challenge of clothes variation of each person. We introduce a new dataset, named Celebrities-reID to handle that challenge. Compared with current datasets, the proposed Celebrities-reID dataset is featured in two aspects. First, it contains 590 persons with 10,842 images, and each person does not wear the same clothing twice, making it the largest clothes variation person re-ID dataset to date. Second, a comprehensive evaluation using state of the arts is carried out to verify the feasibility and new challenge exposed by this dataset. In addition, we propose a benchmark approach to the dataset where a two-step fine-tuning strategy on human body parts is introduced to tackle the challenge of clothes variation. In experiments, we evaluate the feasibility and quality of the proposed Celebrities-reID dataset. The experimental results demonstrate that the proposed benchmark approach is not only able to best tackle clothes variation shown in our dataset but also achieves competitive performance on a widely used person re-ID dataset Market1501, which further proves the reliability of the proposed benchmark approach

    Multimodal Image-to-Image Translation via a Single Generative Adversarial Network

    Full text link
    Despite significant advances in image-to-image (I2I) translation with Generative Adversarial Networks (GANs) have been made, it remains challenging to effectively translate an image to a set of diverse images in multiple target domains using a pair of generator and discriminator. Existing multimodal I2I translation methods adopt multiple domain-specific content encoders for different domains, where each domain-specific content encoder is trained with images from the same domain only. Nevertheless, we argue that the content (domain-invariant) features should be learned from images among all the domains. Consequently, each domain-specific content encoder of existing schemes fails to extract the domain-invariant features efficiently. To address this issue, we present a flexible and general SoloGAN model for efficient multimodal I2I translation among multiple domains with unpaired data. In contrast to existing methods, the SoloGAN algorithm uses a single projection discriminator with an additional auxiliary classifier, and shares the encoder and generator for all domains. As such, the SoloGAN model can be trained effectively with images from all domains such that the domain-invariant content representation can be efficiently extracted. Qualitative and quantitative results over a wide range of datasets against several counterparts and variants of the SoloGAN model demonstrate the merits of the method, especially for the challenging I2I translation tasks, i.e., tasks that involve extreme shape variations or need to keep the complex backgrounds unchanged after translations. Furthermore, we demonstrate the contribution of each component using ablation studies.Comment: pages 13, 15 figure

    Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification

    Full text link
    Object re-identification (re-id) aims to identify a specific object across times or camera views, with the person re-id and vehicle re-id as the most widely studied applications. Re-id is challenging because of the variations in viewpoints, (human) poses, and occlusions. Multi-shots of the same object can cover diverse viewpoints/poses and thus provide more comprehensive information. In this paper, we propose exploiting the multi-shots of the same identity to guide the feature learning of each individual image. Specifically, we design an Uncertainty-aware Multi-shot Teacher-Student (UMTS) Network. It consists of a teacher network (T-net) that learns the comprehensive features from multiple images of the same object, and a student network (S-net) that takes a single image as input. In particular, we take into account the data dependent heteroscedastic uncertainty for effectively transferring the knowledge from the T-net to S-net. To the best of our knowledge, we are the first to make use of multi-shots of an object in a teacher-student learning manner for effectively boosting the single image based re-id. We validate the effectiveness of our approach on the popular vehicle re-id and person re-id datasets. In inference, the S-net alone significantly outperforms the baselines and achieves the state-of-the-art performance.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20

    Adaptive Boosting for Domain Adaptation: Towards Robust Predictions in Scene Segmentation

    Full text link
    Domain adaptation is to transfer the shared knowledge learned from the source domain to a new environment, i.e., target domain. One common practice is to train the model on both labeled source-domain data and unlabeled target-domain data. Yet the learned models are usually biased due to the strong supervision of the source domain. Most researchers adopt the early-stopping strategy to prevent over-fitting, but when to stop training remains a challenging problem since the lack of the target-domain validation set. In this paper, we propose one efficient bootstrapping method, called Adaboost Student, explicitly learning complementary models during training and liberating users from empirical early stopping. Adaboost Student combines the deep model learning with the conventional training strategy, i.e., adaptive boosting, and enables interactions between learned models and the data sampler. We adopt one adaptive data sampler to progressively facilitate learning on hard samples and aggregate "weak" models to prevent over-fitting. Extensive experiments show that (1) Without the need to worry about the stopping time, AdaBoost Student provides one robust solution by efficient complementary model learning during training. (2) AdaBoost Student is orthogonal to most domain adaptation methods, which can be combined with existing approaches to further improve the state-of-the-art performance. We have achieved competitive results on three widely-used scene segmentation domain adaptation benchmarks.Comment: 10 pages, 7 tables, 5 figure

    Top-Push Constrained Modality-Adaptive Dictionary Learning for Cross-Modality Person Re-Identification

    Full text link

    Re-identificación de personas

    Full text link
    En la actualidad, la re-identificación de personas es un recurso con una alta demanda sobre todo en el ámbito de la video seguridad, esto no significa conocer la identidad de una persona sino poder hacer un seguimiento de esta en distintas cámaras cuyas imágenes no se solapen. Hay una gran cantidad de medidas de evaluación tradicionales que nos permiten hacer extracciones” manuales” de características, las cuales utilizan algoritmos matemáticos los cuales permiten extraer información de las imágenes. Sin embargo, hay gran cantidad de aspectos a tener en cuenta si queremos que nuestros modelos funcionen lo mejor posible, como puede ser la orientación de la persona, el entorno o su posición. La extracción de características basada en aprendizaje profundo realiza un modelado de datos, para ello utiliza flujos de datos de gran tamaño para aprender de estos y poder realizar una clasificación y un análisis predictivo. El aprendizaje profundo se basa en la utilización de redes neuronales, un modelo matemático que trata de imitar el comportamiento biológico de las neuronas, conectando diferentes capas de procesamiento y otorgando distintos pesos a cada una con el fin de obtener un modelo optimizado. El objetivo de este TFG es la comparación de los resultados de las medidas de extracción manuales (handcrafted features) y las automáticas (basadas en Deep Learning), esto se realizará ejecutando un script que nos calcule el porcentaje de acierto a la hora de re-identificar personas entre las cámaras utilizando los distintos métodos de extracción manual y después utilizando los basados en redes neuronales en los datasets que utilicemos para evaluar

    Multi-pseudo regularized label for generated data in person re-identification

    Full text link
    © 1992-2012 IEEE. Sufficient training data normally is required to train deeply learned models. However, due to the expensive manual process for a labeling large number of images (i.e., annotation), the amount of available training data (i.e., real data) is always limited. To produce more data for training a deep network, generative adversarial network can be used to generate artificial sample data (i.e., generated data). However, the generated data usually does not have annotation labels. To solve this problem, in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated data. With MpRL, the generated data will be used as the supplementary of real training data to train a deep neural network in a semi-supervised learning fashion. To build the corresponding relationship between the real data and generated data, MpRL assigns each generated data a proper virtual label which reflects the likelihood of the affiliation of the generated data to pre-defined training classes in the real data domain. Unlike the traditional label which usually is a single integral number, the virtual label proposed in this paper is a set of weight-based values each individual of which is a number in (0,1] called multi-pseudo label and reflects the degree of relation between each generated data to every pre-defined class of real data. A comprehensive evaluation is carried out by adopting two state-of-the-art convolutional neural networks (CNNs) in our experiments to verify the effectiveness of MpRL. Experiments demonstrate that by assigning MpRL to generated data, we can further improve the person re-ID performance on five re-ID datasets, i.e., Market-1501, DukeMTMC-reID, CUHK03, VIPeR, and CUHK01. The proposed method obtains +6.29%, +6.30%, +5.58%, +5.84%, and +3.48% improvements in rank-1 accuracy over a strong CNN baseline on the five datasets, respectively, and outperforms state-of-the-art methods
    corecore