82 research outputs found

    Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty

    Full text link
    Knowledge distillation is an effective paradigm for boosting the performance of pocket-size model, especially when multiple teacher models are available, the student would break the upper limit again. However, it is not economical to train diverse teacher models for the disposable distillation. In this paper, we introduce a new concept dubbed Avatars for distillation, which are the inference ensemble models derived from the teacher. Concretely, (1) For each iteration of distillation training, various Avatars are generated by a perturbation transformation. We validate that Avatars own higher upper limit of working capacity and teaching ability, aiding the student model in learning diverse and receptive knowledge perspectives from the teacher model. (2) During the distillation, we propose an uncertainty-aware factor from the variance of statistical differences between the vanilla teacher and Avatars, to adjust Avatars' contribution on knowledge transfer adaptively. Avatar Knowledge Distillation AKD is fundamentally different from existing methods and refines with the innovative view of unequal training. Comprehensive experiments demonstrate the effectiveness of our Avatars mechanism, which polishes up the state-of-the-art distillation methods for dense prediction without more extra computational cost. The AKD brings at most 0.7 AP gains on COCO 2017 for Object Detection and 1.83 mIoU gains on Cityscapes for Semantic Segmentation, respectively.Comment: Accepted by ACM MM 202

    DAMO-YOLO : A Report on Real-Time Object Detection Design

    Full text link
    In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. In particular, we use MAE-NAS, a method guided by the principle of maximum entropy, to search our detection backbone under the constraints of low latency and high performance, producing ResNet/CSP-like structures with spatial pyramid pooling and focus modules. In the design of necks and heads, we follow the rule of ``large neck, small head''.We import Generalized-FPN with accelerated queen-fusion to build the detector neck and upgrade its CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Then we investigate how detector head size affects detection performance and find that a heavy neck with only one task projection layer would yield better results.In addition, AlignedOTA is proposed to solve the misalignment problem in label assignment. And a distillation schema is introduced to improve performance to a higher level. Based on these new techs, we build a suite of models at various scales to meet the needs of different scenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L. They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of 2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge devices with limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nl lightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with the latency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweight models have outperformed other YOLO series models in their respective application scenarios.Comment: Project Website: https://github.com/tinyvision/damo-yol

    Characterization of severe fever with thrombocytopenia syndrome in rural regions of Zhejiang, China.

    Get PDF
    Severe fever with thrombocytopenia syndrome virus (SFTSV) infections have recently been found in rural regions of Zhejiang. A severe fever with thrombocytopenia syndrome (SFTS) surveillance and sero-epidemiological investigation was conducted in the districts with outbreaks. During the study period of 2011-2014, a total of 51 SFTSV infection cases were identified and the case fatality rate was 12% (6/51). Ninety two percent of the patients (47/51) were over 50 years of age, and 63% (32/51) of laboratory confirmed cases occurred from May to July. Nine percent (11/120) of the serum samples from local healthy people without symptoms were found to be positive for antibodies to the SFTS virus. SFTSV strains were isolated by culture using Vero, and the whole genomic sequences of two SFTSV strains (01 and Zhao) were sequenced and submitted to the GenBank. Homology analysis showed that the similarity of the target nucleocapsid gene from the SFTSV strains from different geographic areas was 94.2-100%. From the constructed phylogenetic tree, it was found that all the SFTSV strains diverged into two main clusters. Only the SFTSV strains from the Zhejiang (Daishan) region of China and the Yamaguchi, Miyazakj regions of Japan, were clustered into lineage II, consistent with both of these regions being isolated areas with similar geographic features. Two out of eight predicted linear B cell epitopes from the nucleocapsid protein showed mutations between the SFTSV strains of different clusters, but did not contribute to the binding ability of the specific SFTSV antibodies. This study confirmed that SFTSV has been circulating naturally and can cause a seasonal prevalence in Daishan, China. The results also suggest that the molecular characteristics of SFTSV are associated with the geographic region and all SFTSV strains can be divided into two genotypes

    Deconfounding Causal Inference for Zero-shot Action Recognition

    Get PDF
    Zero-shot action recognition (ZSAR) aims to recognize unseen action categories in the test set without corresponding training examples. Most existing zero-shot methods follow the feature generation framework to transfer knowledge from seen action categories to model the feature distribution of unseen categories. However, due to the complexity and diversity of actions, it remains challenging to generate unseen feature distribution, especially for the cross-dataset scenario when there is potentially larger domain shift. This paper proposes a De confounding Ca usa l GAN (DeCalGAN) for generating unseen action video features with the following technical contributions: 1) Our model unifies compositional ZSAR with traditional visual-semantic models to incorporate local object information with global semantic information for feature generation. 2) A GAN-based architecture is proposed for causal inference and unseen distribution discovery. 3) A deconfounding module is proposed to refine representations of local object and global semantic information confounder in the training data. Action descriptions and random object feature after causal inference are then used to discover unseen distributions of novel actions in different datasets. Our extensive experiments on C ross- D ataset Z ero- S hot A ction R ecognition (CD-ZSAR) demonstrate substantial improvement over the UCF101 and HMDB51 standard benchmarks for this problem

    Learning Accurate Entropy Model with Global Reference for Image Compression

    Full text link
    In recent deep image compression neural networks, the entropy model plays a critical role in estimating the prior distribution of deep image encodings. Existing methods combine hyperprior with local context in the entropy estimation function. This greatly limits their performance due to the absence of a global vision. In this work, we propose a novel Global Reference Model for image compression to effectively leverage both the local and the global context information, leading to an enhanced compression rate. The proposed method scans decoded latents and then finds the most relevant latent to assist the distribution estimating of the current latent. A by-product of this work is the innovation of a mean-shifting GDN module that further improves the performance. Experimental results demonstrate that the proposed model outperforms the rate-distortion performance of most of the state-of-the-art methods in the industry

    Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space

    Full text link
    Face clustering has attracted rising research interest recently to take advantage of massive amounts of face images on the web. State-of-the-art performance has been achieved by Graph Convolutional Networks (GCN) due to their powerful representation capacity. However, existing GCN-based methods build face graphs mainly according to kNN relations in the feature space, which may lead to a lot of noise edges connecting two faces of different classes. The face features will be polluted when messages pass along these noise edges, thus degrading the performance of GCNs. In this paper, a novel algorithm named Ada-NETS is proposed to cluster faces by constructing clean graphs for GCNs. In Ada-NETS, each face is transformed to a new structure space, obtaining robust features by considering face features of the neighbour images. Then, an adaptive neighbour discovery strategy is proposed to determine a proper number of edges connecting to each face image. It significantly reduces the noise edges while maintaining the good ones to build a graph with clean yet rich edges for GCNs to cluster faces. Experiments on multiple public clustering datasets show that Ada-NETS significantly outperforms current state-of-the-art methods, proving its superiority and generalization. Code is available at https://github.com/damo-cv/Ada-NETS

    Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

    Full text link
    Large language models (LLMs) have emerged as a new paradigm for Text-to-SQL task. However, the absence of a systematical benchmark inhibits the development of designing effective, efficient and economic LLM-based Text-to-SQL solutions. To address this challenge, in this paper, we first conduct a systematical and extensive comparison over existing prompt engineering methods, including question representation, example selection and example organization, and with these experimental results, we elaborate their pros and cons. Based on these findings, we propose a new integrated solution, named DAIL-SQL, which refreshes the Spider leaderboard with 86.6% execution accuracy and sets a new bar. To explore the potential of open-source LLM, we investigate them in various scenarios, and further enhance their performance with supervised fine-tuning. Our explorations highlight open-source LLMs' potential in Text-to-SQL, as well as the advantages and disadvantages of the supervised fine-tuning. Additionally, towards an efficient and economic LLM-based Text-to-SQL solution, we emphasize the token efficiency in prompt engineering and compare the prior studies under this metric. We hope that our work provides a deeper understanding of Text-to-SQL with LLMs, and inspires further investigations and broad applications.Comment: We have released code on https://github.com/BeachWang/DAIL-SQ

    DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

    Full text link
    The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art performances in various vision tasks, overshadowing the conventional CNN-based models. This ignites a few recent striking-back research in the CNN world showing that pure CNN models can achieve as good performance as ViT models when carefully tuned. While encouraging, designing such high-performance CNN models is challenging, requiring non-trivial prior knowledge of network design. To this end, a novel framework termed Mathematical Architecture Design for Deep CNN (DeepMAD) is proposed to design high-performance CNN models in a principled way. In DeepMAD, a CNN network is modeled as an information processing system whose expressiveness and effectiveness can be analytically formulated by their structural parameters. Then a constrained mathematical programming (MP) problem is proposed to optimize these structural parameters. The MP problem can be easily solved by off-the-shelf MP solvers on CPUs with a small memory footprint. In addition, DeepMAD is a pure mathematical framework: no GPU or training data is required during network design. The superiority of DeepMAD is validated on multiple large-scale computer vision benchmark datasets. Notably on ImageNet-1k, only using conventional convolutional layers, DeepMAD achieves 0.7% and 1.5% higher top-1 accuracy than ConvNeXt and Swin on Tiny level, and 0.8% and 0.9% higher on Small level.Comment: Accepted by CVPR 202

    PTGES2 and RNASET2 identified as novel potential biomarkers and therapeutic targets for basal cell carcinoma: insights from proteome-wide mendelian randomization, colocalization, and MR-PheWAS analyses

    Get PDF
    IntroductionBasal cell carcinoma (BCC) is the most common skin cancer, lacking reliable biomarkers or therapeutic targets for effective treatment. Genome-wide association studies (GWAS) can aid in identifying drug targets, repurposing existing drugs, predicting clinical trial side effects, and reclassifying patients in clinical utility. Hence, the present study investigates the association between plasma proteins and skin cancer to identify effective biomarkers and therapeutic targets for BCC.MethodsProteome-wide mendelian randomization was performed using inverse-variance-weight and Wald Ratio methods, leveraging 1 Mb cis protein quantitative trait loci (cis-pQTLs) in the UK Biobank Pharma Proteomics Project (UKB-PPP) and the deCODE Health Study, to determine the causal relationship between plasma proteins and skin cancer and its subtypes in the FinnGen R10 study and the SAIGE database of Lee lab. Significant association with skin cancer and its subtypes was defined as a false discovery rate (FDR) < 0.05. pQTL to GWAS colocalization analysis was executed using a Bayesian model to evaluate five exclusive hypotheses. Strong colocalization evidence was defined as a posterior probability for shared causal variants (PP.H4) of ≥0.85. Mendelian randomization-Phenome-wide association studies (MR-PheWAS) were used to evaluate potential biomarkers and therapeutic targets for skin cancer and its subtypes within a phenome-wide human disease category.ResultsPTGES2, RNASET2, SF3B4, STX8, ENO2, and HS3ST3B1 (besides RNASET2, five other plasma proteins were previously unknown in expression quantitative trait loci (eQTL) and methylation quantitative trait loci (mQTL)) were significantly associated with BCC after FDR correction in the UKB-PPP and deCODE studies. Reverse MR showed no association between BCC and these proteins. PTGES2 and RNASET2 exhibited strong evidence of colocalization with BCC based on a posterior probability PP.H4 >0.92. Furthermore, MR-PheWAS analysis showed that BCC was the most significant phenotype associated with PTGES2 and RNASET2 among 2,408 phenotypes in the FinnGen R10 study. Therefore, PTGES2 and RNASET2 are highlighted as effective biomarkers and therapeutic targets for BCC within the phenome-wide human disease category.ConclusionThe study identifies PTGES2 and RNASET2 plasma proteins as novel, reliable biomarkers and therapeutic targets for BCC, suggesting more effective clinical application strategies for patients
    corecore