685 research outputs found

    Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing

    Full text link
    Advancements in deep learning are often associated with increasing model sizes. The model size dramatically affects the deployment cost and latency of deep models. For instance, models like BERT cannot be deployed on edge devices and mobiles due to their sheer size. As a result, most advances in Deep Learning are yet to reach the edge. Model compression has sought much-deserved attention in literature across natural language processing, vision, and recommendation domains. This paper proposes a model-agnostic, cache-friendly model compression approach: Random Operation Access Specific Tile (ROAST) hashing. ROAST collapses the parameters by clubbing them through a lightweight mapping. Notably, while clubbing these parameters, ROAST utilizes cache hierarchies by aligning the memory access pattern with the parameter access pattern. ROAST is up to ∼25×\sim 25 \times faster to train and ∼50×\sim 50 \times faster to infer than the popular parameter sharing method HashedNet. Additionally, ROAST introduces global weight sharing, which is empirically and theoretically superior to local weight sharing in HashedNet, and can be of independent interest in itself. With ROAST, we present the first compressed BERT, which is 100×−1000×100\times - 1000\times smaller but does not result in quality degradation. These compression levels on universal architecture like transformers are promising for the future of SOTA model deployment on resource-constrained devices like mobile and edge device

    Token-Weighted RNN-T for Learning from Flawed Data

    Full text link
    ASR models are commonly trained with the cross-entropy criterion to increase the probability of a target token sequence. While optimizing the probability of all tokens in the target sequence is sensible, one may want to de-emphasize tokens that reflect transcription errors. In this work, we propose a novel token-weighted RNN-T criterion that augments the RNN-T objective with token-specific weights. The new objective is used for mitigating accuracy loss from transcriptions errors in the training data, which naturally appear in two settings: pseudo-labeling and human annotation errors. Experiments results show that using our method for semi-supervised learning with pseudo-labels leads to a consistent accuracy improvement, up to 38% relative. We also analyze the accuracy degradation resulting from different levels of WER in the reference transcription, and show that token-weighted RNN-T is suitable for overcoming this degradation, recovering 64%-99% of the accuracy loss

    Light Field Salient Object Detection: A Review and Benchmark

    Full text link
    Salient object detection (SOD) is a long-standing research topic in computer vision and has drawn an increasing amount of research interest in the past decade. This paper provides the first comprehensive review and benchmark for light field SOD, which has long been lacking in the saliency community. Firstly, we introduce preliminary knowledge on light fields, including theory and data forms, and then review existing studies on light field SOD, covering ten traditional models, seven deep learning-based models, one comparative study, and one brief review. Existing datasets for light field SOD are also summarized with detailed information and statistical analyses. Secondly, we benchmark nine representative light field SOD models together with several cutting-edge RGB-D SOD models on four widely used light field datasets, from which insightful discussions and analyses, including a comparison between light field SOD and RGB-D SOD models, are achieved. Besides, due to the inconsistency of datasets in their current forms, we further generate complete data and supplement focal stacks, depth maps and multi-view images for the inconsistent datasets, making them consistent and unified. Our supplemental data makes a universal benchmark possible. Lastly, because light field SOD is quite a special problem attributed to its diverse data representations and high dependency on acquisition hardware, making it differ greatly from other saliency detection tasks, we provide nine hints into the challenges and future directions, and outline several open issues. We hope our review and benchmarking could help advance research in this field. All the materials including collected models, datasets, benchmarking results, and supplemented light field datasets will be publicly available on our project site https://github.com/kerenfu/LFSOD-Survey
    • …
    corecore