25 research outputs found
Density Matters: Improved Core-set for Active Domain Adaptive Segmentation
Active domain adaptation has emerged as a solution to balance the expensive
annotation cost and the performance of trained models in semantic segmentation.
However, existing works usually ignore the correlation between selected samples
and its local context in feature space, which leads to inferior usage of
annotation budgets. In this work, we revisit the theoretical bound of the
classical Core-set method and identify that the performance is closely related
to the local sample distribution around selected samples. To estimate the
density of local samples efficiently, we introduce a local proxy estimator with
Dynamic Masked Convolution and develop a Density-aware Greedy algorithm to
optimize the bound. Extensive experiments demonstrate the superiority of our
approach. Moreover, with very few labels, our scheme achieves comparable
performance to the fully supervised counterpart
Model Compression and Efficient Inference for Large Language Models: A Survey
Transformer based large language models have achieved tremendous success.
However, the significant memory and computational costs incurred during the
inference process make it challenging to deploy large models on
resource-constrained devices. In this paper, we investigate compression and
efficient inference methods for large language models from an algorithmic
perspective. Regarding taxonomy, similar to smaller models, compression and
acceleration algorithms for large language models can still be categorized into
quantization, pruning, distillation, compact architecture design, dynamic
networks. However, Large language models have two prominent characteristics
compared to smaller models: (1) Most of compression algorithms require
finetuning or even retraining the model after compression. The most notable
aspect of large models is the very high cost associated with model finetuning
or training. Therefore, many algorithms for large models, such as quantization
and pruning, start to explore tuning-free algorithms. (2) Large models
emphasize versatility and generalization rather than performance on a single
task. Hence, many algorithms, such as knowledge distillation, focus on how to
preserving their versatility and generalization after compression. Since these
two characteristics were not very pronounced in early large models, we further
distinguish large language models into medium models and ``real'' large models.
Additionally, we also provide an introduction to some mature frameworks for
efficient inference of large models, which can support basic compression or
acceleration algorithms, greatly facilitating model deployment for users.Comment: 47 pages, review 380 papers. The work is ongoin
You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction
Challenging illumination conditions (low-light, under-exposure and
over-exposure) in the real world not only cast an unpleasant visual appearance
but also taint the computer vision tasks. After camera captures the raw-RGB
data, it renders standard sRGB images with image signal processor (ISP). By
decomposing ISP pipeline into local and global image components, we propose a
lightweight fast Illumination Adaptive Transformer (IAT) to restore the normal
lit sRGB image from either low-light or under/over-exposure conditions.
Specifically, IAT uses attention queries to represent and adjust the
ISP-related parameters such as colour correction, gamma correction. With only
~90k parameters and ~0.004s processing speed, our IAT consistently achieves
superior performance over SOTA on the current benchmark low-light enhancement
and exposure correction datasets. Competitive experimental performance also
demonstrates that our IAT significantly enhances object detection and semantic
segmentation tasks under various light conditions. Training code and pretrained
model is available at
https://github.com/cuiziteng/Illumination-Adaptive-Transformer.Comment: 23 page
Improving the performance of models for one-step retrosynthesis through re-ranking
Abstract
Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models’ suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method.
Graphical Abstrac