4 research outputs found

    On the Data Efficiency and Model Complexity of Visual Learning

    Get PDF
    Computer vision is a research field that aims to automate the procedure of gaining abstract understanding from digital images or videos. The recent rapid developments of deep neural networks have demonstrated human-level performance or beyond on many vision tasks that require high-level understanding, such as image recognition, object detection, etc. However, training deep neural networks usually requires large-scale datasets annotated by humans, and the models typically have millions of parameters and consume a lot of computation resources. The issues of data efficiency and model complexity are commonly observed in many frameworks based on deep neural networks, limiting their deployment in real-world applications. In this dissertation, I will present our research works that address the issues of data efficiency and model complexity of deep neural networks. For the data efficiency, (i) we study the problem of few-shot image recognition, where the training datasets are limited to having only a few examples per category. (ii) We also investigate semi-supervised visual learning, which provides unlabeled samples in addition to the annotated dataset and aims to utilize them to learn better models. For the model complexity, (iii) we seek alternatives to cascading layers or blocks for improving the representation capacities of convolutional neural networks without introducing additional computations. (iv) We improve the computational resource utilization of deep neural networks by finding, reallocating, and rejuvenating underutilized neurons. (v) We present two techniques for object detection that reuse computations to reduce the architecture complexity and improve the detection performance. (vi) Finally, we show our work on reusing visual features for multi-task learning to improve computation efficiency and share training information between different tasks

    ScaleNet - Improve CNNs through Recursively Rescaling Objects

    No full text
    Deep networks are often not scale-invariant hence their performance can vary wildly if recognizable objects are at an unseen scale occurring only at testing time. In this paper, we propose ScaleNet, which recursively predicts object scale in a deep learning framework. With an explicit objective to predict the scale of objects in images, ScaleNet enables pretrained deep learning models to identify objects in the scales that are not present in their training sets. By recursively calling ScaleNet, one can generalize to very large scale changes unseen in the training set. To demonstrate the robustness of our proposed framework, we conduct experiments with pretrained as well as fine-tuned classification and detection frameworks on MNIST, CIFAR-10, and MS COCO datasets and results reveal that our proposed framework significantly boosts the performances of deep networks

    ScaleNet - Improve CNNs through Recursively Rescaling Objects

    No full text
    corecore