106 research outputs found
Crowd Localization from Gaussian Mixture Scoped Knowledge and Scoped Teacher
Crowd localization is to predict each instance head position in crowd
scenarios. Since the distance of instances being to the camera are variant,
there exists tremendous gaps among scales of instances within an image, which
is called the intrinsic scale shift. The core reason of intrinsic scale shift
being one of the most essential issues in crowd localization is that it is
ubiquitous in crowd scenes and makes scale distribution chaotic.
To this end, the paper concentrates on access to tackle the chaos of the
scale distribution incurred by intrinsic scale shift. We propose Gaussian
Mixture Scope (GMS) to regularize the chaotic scale distribution. Concretely,
the GMS utilizes a Gaussian mixture distribution to adapt to scale distribution
and decouples the mixture model into sub-normal distributions to regularize the
chaos within the sub-distributions. Then, an alignment is introduced to
regularize the chaos among sub-distributions. However, despite that GMS is
effective in regularizing the data distribution, it amounts to dislodging the
hard samples in training set, which incurs overfitting. We assert that it is
blamed on the block of transferring the latent knowledge exploited by GMS from
data to model. Therefore, a Scoped Teacher playing a role of bridge in
knowledge transform is proposed. What' s more, the consistency regularization
is also introduced to implement knowledge transform. To that effect, the
further constraints are deployed on Scoped Teacher to derive feature
consistence between teacher and student end.
With proposed GMS and Scoped Teacher implemented on five mainstream datasets
of crowd localization, the extensive experiments demonstrate the superiority of
our work. Moreover, comparing with existing crowd locators, our work achieves
state-of-the-art via F1-meansure comprehensively on five datasets.Comment: Accepted by IEEE TI
EWT: Efficient Wavelet-Transformer for Single Image Denoising
Transformer-based image denoising methods have achieved encouraging results
in the past year. However, it must uses linear operations to model long-range
dependencies, which greatly increases model inference time and consumes GPU
storage space. Compared with convolutional neural network-based methods,
current Transformer-based image denoising methods cannot achieve a balance
between performance improvement and resource consumption. In this paper, we
propose an Efficient Wavelet Transformer (EWT) for image denoising.
Specifically, we use Discrete Wavelet Transform (DWT) and Inverse Wavelet
Transform (IWT) for downsampling and upsampling, respectively. This method can
fully preserve the image features while reducing the image resolution, thereby
greatly reducing the device resource consumption of the Transformer model.
Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) to
extract image features at different levels, which can further reduce model
inference time and GPU memory usage. Experiments show that our method speeds up
the original Transformer by more than 80%, reduces GPU memory usage by more
than 60%, and achieves excellent denoising results. All code will be public.Comment: 12 pages, 11 figur
Robust Reinforcement Learning through Efficient Adversarial Herding
Although reinforcement learning (RL) is considered the gold standard for
policy design, it may not always provide a robust solution in various
scenarios. This can result in severe performance degradation when the
environment is exposed to potential disturbances. Adversarial training using a
two-player max-min game has been proven effective in enhancing the robustness
of RL agents. In this work, we extend the two-player game by introducing an
adversarial herd, which involves a group of adversaries, in order to address
() the difficulty of the inner optimization problem, and
() the potential over pessimism caused by the selection of a
candidate adversary set that may include unlikely scenarios. We first prove
that adversarial herds can efficiently approximate the inner optimization
problem. Then we address the second issue by replacing the worst-case
performance in the inner optimization with the average performance over the
worst- adversaries. We evaluate the proposed method on multiple MuJoCo
environments. Experimental results demonstrate that our approach consistently
generates more robust policies
Utjecaj procesa očvršćivanja na ostatno naprezanje u modulu solarne ćelije
Panels using solar power require high reliability, and the residual stress in the solar panel has an important effect on its reliability and lifetime. The finite element method was adopted to simulate the impacts of the rectangular solar panel encapsulation process parameters, such as the elastic modulus, the thickness of adhesive, and the curing temperature on the residual stress in the solar cell module. The results show that the residual stress in the solar cell module increases linearly with the increase in these three factors. The residual strain is consistent with that of the stress. The generation mechanism and distribution evolution of stress are discussed in detail. Both the thickness and the elastic modulus of the silicone rubber have significant impact on the residual stress. However, the influence of the curing temperature is less observable.Solarni paneli trebaju biti iznimno pouzdani, a na pouzdanost i životni vijek znatno utječe ostatno naprezanje. Za simulaciju utjecaja parametara izrade četvrtastog solarnog panela, kao što su modul elastičnosti, debljina ljepila i temperatura očvršćivanja, na ostatno naprezanje u solarnom modulu primijenjena je metoda konačnih elemenata.
Rezultati pokazuju da ostatno naprezanje linearno raste s porastom tih triju faktora. Rezidualna deformacija slijedi rezidualno naprezanje. Detaljno su razmatrani mehanizam nastanka i tijek raspodjele naprezanja. Na ostatno naprezanje značajan je utjecaj debljine i modula elastičnosti silikonske gume, no manje je uočljivo djelovanje temperature očvršćivanja
CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution
Recently, deep convolution neural networks (CNNs) steered face
super-resolution methods have achieved great progress in restoring degraded
facial details by jointly training with facial priors. However, these methods
have some obvious limitations. On the one hand, multi-task joint learning
requires additional marking on the dataset, and the introduced prior network
will significantly increase the computational cost of the model. On the other
hand, the limited receptive field of CNN will reduce the fidelity and
naturalness of the reconstructed facial images, resulting in suboptimal
reconstructed images. In this work, we propose an efficient CNN-Transformer
Cooperation Network (CTCNet) for face super-resolution tasks, which uses the
multi-scale connected encoder-decoder architecture as the backbone.
Specifically, we first devise a novel Local-Global Feature Cooperation Module
(LGCM), which is composed of a Facial Structure Attention Unit (FSAU) and a
Transformer block, to promote the consistency of local facial detail and global
facial structure restoration simultaneously. Then, we design an efficient Local
Feature Refinement Module (LFRM) to enhance the local facial structure
information. Finally, to further improve the restoration of fine facial
details, we present a Multi-scale Feature Fusion Unit (MFFU) to adaptively fuse
the features from different stages in the encoder procedure. Comprehensive
evaluations on various datasets have assessed that the proposed CTCNet can
outperform other state-of-the-art methods significantly.Comment: 12 pages, 10 figures, 8 table
Trends in smoking and quitting in China from 1993 to 2003: National Health Service Survey data
OBJECTIVE: China has about 350 million smokers, more commonly men. Using data from National Health Service Surveys conducted in 1993, 1998 and 2003, we (i) estimated trends in smoking prevalence and cessation according to sociodemographic variables and (ii) analysed cessation rates, quitting intentions, reasons for quitting and reasons for relapsing. METHODS: Data were collected from approximately 57 000 households and 200 000 individuals in each survey year. Household members > 15 years of age were interviewed about their smoking habits, quitting intentions and attitudes towards smoking. We present descriptive data stratified by age, sex, income level and rural versus urban residence. FINDINGS: In China, current smoking in those > 15 years old declined 60–49% in men and 5–3.2% in women over 1993–2003. The decline was more marked in urban areas. However, heavy smoking (≥ 20 cigarettes daily) increased substantially overall and doubled in men. The average age of uptake also dropped by about 3 years. In 2003, 7.9% of smokers reported intending to quit, and 6% of people who had ever smoked reported having quit. Of former smokers, 40.6% quit because of illness, 26.9% to prevent disease and 10.9% for financial reasons. CONCLUSION: Smoking prevalence declined in China over the study period, perhaps due to the combined effect of smoking cessation, reduced uptake in women and selective mortality among men over 40 years of age. However, heavy smoking increased. People in China rarely quit or intend to quit smoking, except at older ages. Further tobacco control efforts are urgently needed, especially in rural areas
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
Visual programming, a modular and generalizable paradigm, integrates
different modules and Python operators to solve various vision-language tasks.
Unlike end-to-end models that need task-specific data, it advances in
performing visual processing and reasoning in an unsupervised manner. Current
visual programming methods generate programs in a single pass for each task
where the ability to evaluate and optimize based on feedback, unfortunately, is
lacking, which consequentially limits their effectiveness for complex,
multi-step problems. Drawing inspiration from benders decomposition, we
introduce De-fine, a general framework that automatically decomposes complex
tasks into simpler subtasks and refines programs through auto-feedback. This
model-agnostic approach can improve logical reasoning performance by
integrating the strengths of multiple models. Our experiments across various
visual tasks show that De-fine creates more accurate and robust programs,
setting new benchmarks in the field
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
Prompt tuning, a recently emerging paradigm, enables the powerful
vision-language pre-training models to adapt to downstream tasks in a parameter
-- and data -- efficient way, by learning the ``soft prompts'' to condition
frozen pre-training models. Though effective, it is particularly problematic in
the few-shot scenario, where prompt tuning performance is sensitive to the
initialization and requires a time-consuming process to find a good
initialization, thus restricting the fast adaptation ability of the
pre-training models. In addition, prompt tuning could undermine the
generalizability of the pre-training models, because the learnable prompt
tokens are easy to overfit to the limited training samples. To address these
issues, we introduce a novel Gradient-RegulAted Meta-prompt learning (GRAM)
framework that jointly meta-learns an efficient soft prompt initialization for
better adaptation and a lightweight gradient regulating function for strong
cross-domain generalizability in a meta-learning paradigm using only the
unlabeled image-text pre-training data. Rather than designing a specific prompt
tuning method, our GRAM can be easily incorporated into various prompt tuning
methods in a model-agnostic way, and comprehensive experiments show that GRAM
brings about consistent improvement for them in several settings (i.e.,
few-shot learning, cross-domain generalization, cross-dataset generalization,
etc.) over 11 datasets. Further, experiments show that GRAM enables the
orthogonal methods of textual and visual prompt tuning to work in a
mutually-enhanced way, offering better generalizability beyond the uni-modal
prompt tuning methods.Comment: Accepted by ICCV 202
- …