106 research outputs found

    Crowd Localization from Gaussian Mixture Scoped Knowledge and Scoped Teacher

    Full text link
    Crowd localization is to predict each instance head position in crowd scenarios. Since the distance of instances being to the camera are variant, there exists tremendous gaps among scales of instances within an image, which is called the intrinsic scale shift. The core reason of intrinsic scale shift being one of the most essential issues in crowd localization is that it is ubiquitous in crowd scenes and makes scale distribution chaotic. To this end, the paper concentrates on access to tackle the chaos of the scale distribution incurred by intrinsic scale shift. We propose Gaussian Mixture Scope (GMS) to regularize the chaotic scale distribution. Concretely, the GMS utilizes a Gaussian mixture distribution to adapt to scale distribution and decouples the mixture model into sub-normal distributions to regularize the chaos within the sub-distributions. Then, an alignment is introduced to regularize the chaos among sub-distributions. However, despite that GMS is effective in regularizing the data distribution, it amounts to dislodging the hard samples in training set, which incurs overfitting. We assert that it is blamed on the block of transferring the latent knowledge exploited by GMS from data to model. Therefore, a Scoped Teacher playing a role of bridge in knowledge transform is proposed. What' s more, the consistency regularization is also introduced to implement knowledge transform. To that effect, the further constraints are deployed on Scoped Teacher to derive feature consistence between teacher and student end. With proposed GMS and Scoped Teacher implemented on five mainstream datasets of crowd localization, the extensive experiments demonstrate the superiority of our work. Moreover, comparing with existing crowd locators, our work achieves state-of-the-art via F1-meansure comprehensively on five datasets.Comment: Accepted by IEEE TI

    EWT: Efficient Wavelet-Transformer for Single Image Denoising

    Full text link
    Transformer-based image denoising methods have achieved encouraging results in the past year. However, it must uses linear operations to model long-range dependencies, which greatly increases model inference time and consumes GPU storage space. Compared with convolutional neural network-based methods, current Transformer-based image denoising methods cannot achieve a balance between performance improvement and resource consumption. In this paper, we propose an Efficient Wavelet Transformer (EWT) for image denoising. Specifically, we use Discrete Wavelet Transform (DWT) and Inverse Wavelet Transform (IWT) for downsampling and upsampling, respectively. This method can fully preserve the image features while reducing the image resolution, thereby greatly reducing the device resource consumption of the Transformer model. Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) to extract image features at different levels, which can further reduce model inference time and GPU memory usage. Experiments show that our method speeds up the original Transformer by more than 80%, reduces GPU memory usage by more than 60%, and achieves excellent denoising results. All code will be public.Comment: 12 pages, 11 figur

    Robust Reinforcement Learning through Efficient Adversarial Herding

    Full text link
    Although reinforcement learning (RL) is considered the gold standard for policy design, it may not always provide a robust solution in various scenarios. This can result in severe performance degradation when the environment is exposed to potential disturbances. Adversarial training using a two-player max-min game has been proven effective in enhancing the robustness of RL agents. In this work, we extend the two-player game by introducing an adversarial herd, which involves a group of adversaries, in order to address (i\textit{i}) the difficulty of the inner optimization problem, and (ii\textit{ii}) the potential over pessimism caused by the selection of a candidate adversary set that may include unlikely scenarios. We first prove that adversarial herds can efficiently approximate the inner optimization problem. Then we address the second issue by replacing the worst-case performance in the inner optimization with the average performance over the worst-kk adversaries. We evaluate the proposed method on multiple MuJoCo environments. Experimental results demonstrate that our approach consistently generates more robust policies

    Utjecaj procesa očvršćivanja na ostatno naprezanje u modulu solarne ćelije

    Get PDF
    Panels using solar power require high reliability, and the residual stress in the solar panel has an important effect on its reliability and lifetime. The finite element method was adopted to simulate the impacts of the rectangular solar panel encapsulation process parameters, such as the elastic modulus, the thickness of adhesive, and the curing temperature on the residual stress in the solar cell module. The results show that the residual stress in the solar cell module increases linearly with the increase in these three factors. The residual strain is consistent with that of the stress. The generation mechanism and distribution evolution of stress are discussed in detail. Both the thickness and the elastic modulus of the silicone rubber have significant impact on the residual stress. However, the influence of the curing temperature is less observable.Solarni paneli trebaju biti iznimno pouzdani, a na pouzdanost i životni vijek znatno utječe ostatno naprezanje. Za simulaciju utjecaja parametara izrade četvrtastog solarnog panela, kao što su modul elastičnosti, debljina ljepila i temperatura očvršćivanja, na ostatno naprezanje u solarnom modulu primijenjena je metoda konačnih elemenata. Rezultati pokazuju da ostatno naprezanje linearno raste s porastom tih triju faktora. Rezidualna deformacija slijedi rezidualno naprezanje. Detaljno su razmatrani mehanizam nastanka i tijek raspodjele naprezanja. Na ostatno naprezanje značajan je utjecaj debljine i modula elastičnosti silikonske gume, no manje je uočljivo djelovanje temperature očvršćivanja

    CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution

    Full text link
    Recently, deep convolution neural networks (CNNs) steered face super-resolution methods have achieved great progress in restoring degraded facial details by jointly training with facial priors. However, these methods have some obvious limitations. On the one hand, multi-task joint learning requires additional marking on the dataset, and the introduced prior network will significantly increase the computational cost of the model. On the other hand, the limited receptive field of CNN will reduce the fidelity and naturalness of the reconstructed facial images, resulting in suboptimal reconstructed images. In this work, we propose an efficient CNN-Transformer Cooperation Network (CTCNet) for face super-resolution tasks, which uses the multi-scale connected encoder-decoder architecture as the backbone. Specifically, we first devise a novel Local-Global Feature Cooperation Module (LGCM), which is composed of a Facial Structure Attention Unit (FSAU) and a Transformer block, to promote the consistency of local facial detail and global facial structure restoration simultaneously. Then, we design an efficient Local Feature Refinement Module (LFRM) to enhance the local facial structure information. Finally, to further improve the restoration of fine facial details, we present a Multi-scale Feature Fusion Unit (MFFU) to adaptively fuse the features from different stages in the encoder procedure. Comprehensive evaluations on various datasets have assessed that the proposed CTCNet can outperform other state-of-the-art methods significantly.Comment: 12 pages, 10 figures, 8 table

    Trends in smoking and quitting in China from 1993 to 2003: National Health Service Survey data

    Get PDF
    OBJECTIVE: China has about 350 million smokers, more commonly men. Using data from National Health Service Surveys conducted in 1993, 1998 and 2003, we (i) estimated trends in smoking prevalence and cessation according to sociodemographic variables and (ii) analysed cessation rates, quitting intentions, reasons for quitting and reasons for relapsing. METHODS: Data were collected from approximately 57 000 households and 200 000 individuals in each survey year. Household members > 15 years of age were interviewed about their smoking habits, quitting intentions and attitudes towards smoking. We present descriptive data stratified by age, sex, income level and rural versus urban residence. FINDINGS: In China, current smoking in those > 15 years old declined 60–49% in men and 5–3.2% in women over 1993–2003. The decline was more marked in urban areas. However, heavy smoking (≥ 20 cigarettes daily) increased substantially overall and doubled in men. The average age of uptake also dropped by about 3 years. In 2003, 7.9% of smokers reported intending to quit, and 6% of people who had ever smoked reported having quit. Of former smokers, 40.6% quit because of illness, 26.9% to prevent disease and 10.9% for financial reasons. CONCLUSION: Smoking prevalence declined in China over the study period, perhaps due to the combined effect of smoking cessation, reduced uptake in women and selective mortality among men over 40 years of age. However, heavy smoking increased. People in China rarely quit or intend to quit smoking, except at older ages. Further tobacco control efforts are urgently needed, especially in rural areas

    De-fine: Decomposing and Refining Visual Programs with Auto-Feedback

    Full text link
    Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks. Unlike end-to-end models that need task-specific data, it advances in performing visual processing and reasoning in an unsupervised manner. Current visual programming methods generate programs in a single pass for each task where the ability to evaluate and optimize based on feedback, unfortunately, is lacking, which consequentially limits their effectiveness for complex, multi-step problems. Drawing inspiration from benders decomposition, we introduce De-fine, a general framework that automatically decomposes complex tasks into simpler subtasks and refines programs through auto-feedback. This model-agnostic approach can improve logical reasoning performance by integrating the strengths of multiple models. Our experiments across various visual tasks show that De-fine creates more accurate and robust programs, setting new benchmarks in the field

    Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

    Full text link
    Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models. Though effective, it is particularly problematic in the few-shot scenario, where prompt tuning performance is sensitive to the initialization and requires a time-consuming process to find a good initialization, thus restricting the fast adaptation ability of the pre-training models. In addition, prompt tuning could undermine the generalizability of the pre-training models, because the learnable prompt tokens are easy to overfit to the limited training samples. To address these issues, we introduce a novel Gradient-RegulAted Meta-prompt learning (GRAM) framework that jointly meta-learns an efficient soft prompt initialization for better adaptation and a lightweight gradient regulating function for strong cross-domain generalizability in a meta-learning paradigm using only the unlabeled image-text pre-training data. Rather than designing a specific prompt tuning method, our GRAM can be easily incorporated into various prompt tuning methods in a model-agnostic way, and comprehensive experiments show that GRAM brings about consistent improvement for them in several settings (i.e., few-shot learning, cross-domain generalization, cross-dataset generalization, etc.) over 11 datasets. Further, experiments show that GRAM enables the orthogonal methods of textual and visual prompt tuning to work in a mutually-enhanced way, offering better generalizability beyond the uni-modal prompt tuning methods.Comment: Accepted by ICCV 202
    • …
    corecore