Search CORE

11,332 research outputs found

Improving Visual Recognition With Unlabeled Data

Author: Roy Chowdhury Aruni
Publication venue: ScholarWorks@UMass Amherst
Publication date: 16/07/2020
Field of study

The success of deep neural networks has resulted in computer vision systems that obtain high accuracy on a wide variety of tasks such as image classification, object detection, semantic segmentation, etc. However, most state-of-the-art vision systems are dependent upon large amounts of labeled training data, which is not a scalable solution in the long run. This work focuses on improving existing models for visual object recognition and detection without being dependent on such large-scale human-annotated data. We first show how large numbers of hard examples (cases where an existing model makes a mistake) can be obtained automatically from unlabeled video sequences by exploiting temporal consistency cues in the output of a pre-trained object detector. These examples can strongly influence a model\u27s parameters when the network is re-trained to correct them, resulting in improved performance on several object detection tasks. Further, such hard examples from unlabeled videos can be used to address the problem of unsupervised domain adaptation. We focus on the automatic adaptation of an existing object detector to a new domain with no labeled data, assuming that a large number of unlabeled videos are readily available. Our approach is evaluated on challenging face and pedestrian detection tasks involving large domain shifts, showing improved performance with minimal dependence on hyper-parameters. Finally, we address the problem of face recognition, which has achieved high accuracy by employing deep neural networks trained on massive labeled datasets. Further improvements through supervised learning require significantly larger datasets and hence massive annotation efforts. We improve upon the performance of face recognition models trained on large-scale labeled datasets by using unlabeled faces as additional training data. We present insights and recipes for training deep face recognition models with labeled and unlabeled data at scale, addressing real-world challenges such as overlapping identities between the labeled and unlabeled datasets, as well as label noise introduced by clustering errors