4 research outputs found
Recommended from our members
Leveraging Structures of the Data in Deep Learning
The performance of deep learning frameworks could be significantly improved through considering the particular underlying structures for each dataset. In this thesis, I summarize our three work about boosting the performance of deep learning models through leveraging structures of the data. In the first work, we theoretically justify that, for convolutional neural networks (CNNs), neighborhoods of a pixel should be redefined as its most correlated spatial locations, in order to achieve a lower generalization error. Based on the correlation pattern, we propose a data-driven approach to design multiple layers of different customized filter shapes by repeatedly solving lasso problems. In the second work, we address the problem of scale-invariance in deep learning. We propose ScaleNet to predict object scales. Through recursively applying ScaleNet and rescaling, pretrained deep networks can identify objects with scales significantly different from the training set. In the last work, we perform an extensive study on PointConv based frameworks to tackle the problems of scale \& rotation invariances in point cloud convolution. PointConv is a novel convolution operation that can be directly applied on point clouds, and achieves parity with 2D CNNs in terms of formulation and performance. It takes coordinates of points as inputs to generate corresponding weights for convolution. We identify two effective strategies -- first, for point clouds converted from regular 2D raster images, we replace the multi-layer perceptrons (MLPs) based weight function with much simpler cubic polynomials, and achieve more robustness and better performance than traditional 2D CNNs on MNIST dataset. Next, for 3D point clouds, we introduce a novel viewpoint-invariant (VI) descriptor utilizing geometric properties between a center point and its local neighbors, as the additional input to the weight function. Integrated with the VI descriptor, we not only significantly improve the robustness of PointConv but also achieve comparable or better performance in comparison to the state-of-the-art point-based approaches on both SemanticKITTI and ScanNet
On the Data Efficiency and Model Complexity of Visual Learning
Computer vision is a research field that aims to automate the procedure of gaining abstract understanding from digital images or videos. The recent rapid developments of deep neural networks have demonstrated human-level performance or beyond on many vision tasks that require high-level understanding, such as image recognition, object detection, etc. However, training deep neural networks usually requires large-scale datasets annotated by humans, and the models typically have millions of parameters and consume a lot of computation resources. The issues of data efficiency and model complexity are commonly observed in many frameworks based on deep neural networks, limiting their deployment in real-world applications.
In this dissertation, I will present our research works that address the issues of data efficiency and model complexity of deep neural networks. For the data efficiency, (i) we study the problem of few-shot image recognition, where the training datasets are limited to having only a few examples per category. (ii) We also investigate semi-supervised visual learning, which provides unlabeled samples in addition to the annotated dataset and aims to utilize them to learn better models. For the model complexity, (iii) we seek alternatives to cascading layers or blocks for improving the representation capacities of convolutional neural networks without introducing additional computations. (iv) We improve the computational resource utilization of deep neural networks by finding, reallocating, and rejuvenating underutilized neurons. (v) We present two techniques for object detection that reuse computations to reduce the architecture complexity and improve the detection performance. (vi) Finally, we show our work on reusing visual features for multi-task learning to improve computation efficiency and share training information between different tasks
ScaleNet - Improve CNNs through Recursively Rescaling Objects
Deep networks are often not scale-invariant hence their performance can vary wildly if recognizable objects are at an unseen scale occurring only at testing time. In this paper, we propose ScaleNet, which recursively predicts object scale in a deep learning framework. With an explicit objective to predict the scale of objects in images, ScaleNet enables pretrained deep learning models to identify objects in the scales that are not present in their training sets. By recursively calling ScaleNet, one can generalize to very large scale changes unseen in the training set. To demonstrate the robustness of our proposed framework, we conduct experiments with pretrained as well as fine-tuned classification and detection frameworks on MNIST, CIFAR-10, and MS COCO datasets and results reveal that our proposed framework significantly boosts the performances of deep networks