82 research outputs found
Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty
Knowledge distillation is an effective paradigm for boosting the performance
of pocket-size model, especially when multiple teacher models are available,
the student would break the upper limit again. However, it is not economical to
train diverse teacher models for the disposable distillation. In this paper, we
introduce a new concept dubbed Avatars for distillation, which are the
inference ensemble models derived from the teacher. Concretely, (1) For each
iteration of distillation training, various Avatars are generated by a
perturbation transformation. We validate that Avatars own higher upper limit of
working capacity and teaching ability, aiding the student model in learning
diverse and receptive knowledge perspectives from the teacher model. (2) During
the distillation, we propose an uncertainty-aware factor from the variance of
statistical differences between the vanilla teacher and Avatars, to adjust
Avatars' contribution on knowledge transfer adaptively. Avatar Knowledge
Distillation AKD is fundamentally different from existing methods and refines
with the innovative view of unequal training. Comprehensive experiments
demonstrate the effectiveness of our Avatars mechanism, which polishes up the
state-of-the-art distillation methods for dense prediction without more extra
computational cost. The AKD brings at most 0.7 AP gains on COCO 2017 for Object
Detection and 1.83 mIoU gains on Cityscapes for Semantic Segmentation,
respectively.Comment: Accepted by ACM MM 202
DAMO-YOLO : A Report on Real-Time Object Detection Design
In this report, we present a fast and accurate object detection method dubbed
DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO
series. DAMO-YOLO is extended from YOLO with some new technologies, including
Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN
(RepGFPN), a lightweight head with AlignedOTA label assignment, and
distillation enhancement. In particular, we use MAE-NAS, a method guided by the
principle of maximum entropy, to search our detection backbone under the
constraints of low latency and high performance, producing ResNet/CSP-like
structures with spatial pyramid pooling and focus modules. In the design of
necks and heads, we follow the rule of ``large neck, small head''.We import
Generalized-FPN with accelerated queen-fusion to build the detector neck and
upgrade its CSPNet with efficient layer aggregation networks (ELAN) and
reparameterization. Then we investigate how detector head size affects
detection performance and find that a heavy neck with only one task projection
layer would yield better results.In addition, AlignedOTA is proposed to solve
the misalignment problem in label assignment. And a distillation schema is
introduced to improve performance to a higher level. Based on these new techs,
we build a suite of models at various scales to meet the needs of different
scenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L.
They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of
2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge devices
with limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nl
lightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with the
latency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweight
models have outperformed other YOLO series models in their respective
application scenarios.Comment: Project Website: https://github.com/tinyvision/damo-yol
Characterization of severe fever with thrombocytopenia syndrome in rural regions of Zhejiang, China.
Severe fever with thrombocytopenia syndrome virus (SFTSV) infections have recently been found in rural regions of Zhejiang. A severe fever with thrombocytopenia syndrome (SFTS) surveillance and sero-epidemiological investigation was conducted in the districts with outbreaks. During the study period of 2011-2014, a total of 51 SFTSV infection cases were identified and the case fatality rate was 12% (6/51). Ninety two percent of the patients (47/51) were over 50 years of age, and 63% (32/51) of laboratory confirmed cases occurred from May to July. Nine percent (11/120) of the serum samples from local healthy people without symptoms were found to be positive for antibodies to the SFTS virus. SFTSV strains were isolated by culture using Vero, and the whole genomic sequences of two SFTSV strains (01 and Zhao) were sequenced and submitted to the GenBank. Homology analysis showed that the similarity of the target nucleocapsid gene from the SFTSV strains from different geographic areas was 94.2-100%. From the constructed phylogenetic tree, it was found that all the SFTSV strains diverged into two main clusters. Only the SFTSV strains from the Zhejiang (Daishan) region of China and the Yamaguchi, Miyazakj regions of Japan, were clustered into lineage II, consistent with both of these regions being isolated areas with similar geographic features. Two out of eight predicted linear B cell epitopes from the nucleocapsid protein showed mutations between the SFTSV strains of different clusters, but did not contribute to the binding ability of the specific SFTSV antibodies. This study confirmed that SFTSV has been circulating naturally and can cause a seasonal prevalence in Daishan, China. The results also suggest that the molecular characteristics of SFTSV are associated with the geographic region and all SFTSV strains can be divided into two genotypes
Deconfounding Causal Inference for Zero-shot Action Recognition
Zero-shot action recognition (ZSAR) aims to recognize unseen action categories in the test set without corresponding training examples. Most existing zero-shot methods follow the feature generation framework to transfer knowledge from seen action categories to model the feature distribution of unseen categories. However, due to the complexity and diversity of actions, it remains challenging to generate unseen feature distribution, especially for the cross-dataset scenario when there is potentially larger domain shift. This paper proposes a De confounding Ca usa l GAN (DeCalGAN) for generating unseen action video features with the following technical contributions: 1) Our model unifies compositional ZSAR with traditional visual-semantic models to incorporate local object information with global semantic information for feature generation. 2) A GAN-based architecture is proposed for causal inference and unseen distribution discovery. 3) A deconfounding module is proposed to refine representations of local object and global semantic information confounder in the training data. Action descriptions and random object feature after causal inference are then used to discover unseen distributions of novel actions in different datasets. Our extensive experiments on C ross- D ataset Z ero- S hot A ction R ecognition (CD-ZSAR) demonstrate substantial improvement over the UCF101 and HMDB51 standard benchmarks for this problem
Learning Accurate Entropy Model with Global Reference for Image Compression
In recent deep image compression neural networks, the entropy model plays a
critical role in estimating the prior distribution of deep image encodings.
Existing methods combine hyperprior with local context in the entropy
estimation function. This greatly limits their performance due to the absence
of a global vision. In this work, we propose a novel Global Reference Model for
image compression to effectively leverage both the local and the global context
information, leading to an enhanced compression rate. The proposed method scans
decoded latents and then finds the most relevant latent to assist the
distribution estimating of the current latent. A by-product of this work is the
innovation of a mean-shifting GDN module that further improves the performance.
Experimental results demonstrate that the proposed model outperforms the
rate-distortion performance of most of the state-of-the-art methods in the
industry
Ada-NETS: Face Clustering via Adaptive Neighbour Discovery in the Structure Space
Face clustering has attracted rising research interest recently to take
advantage of massive amounts of face images on the web. State-of-the-art
performance has been achieved by Graph Convolutional Networks (GCN) due to
their powerful representation capacity. However, existing GCN-based methods
build face graphs mainly according to kNN relations in the feature space, which
may lead to a lot of noise edges connecting two faces of different classes. The
face features will be polluted when messages pass along these noise edges, thus
degrading the performance of GCNs. In this paper, a novel algorithm named
Ada-NETS is proposed to cluster faces by constructing clean graphs for GCNs. In
Ada-NETS, each face is transformed to a new structure space, obtaining robust
features by considering face features of the neighbour images. Then, an
adaptive neighbour discovery strategy is proposed to determine a proper number
of edges connecting to each face image. It significantly reduces the noise
edges while maintaining the good ones to build a graph with clean yet rich
edges for GCNs to cluster faces. Experiments on multiple public clustering
datasets show that Ada-NETS significantly outperforms current state-of-the-art
methods, proving its superiority and generalization. Code is available at
https://github.com/damo-cv/Ada-NETS
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
Large language models (LLMs) have emerged as a new paradigm for Text-to-SQL
task. However, the absence of a systematical benchmark inhibits the development
of designing effective, efficient and economic LLM-based Text-to-SQL solutions.
To address this challenge, in this paper, we first conduct a systematical and
extensive comparison over existing prompt engineering methods, including
question representation, example selection and example organization, and with
these experimental results, we elaborate their pros and cons. Based on these
findings, we propose a new integrated solution, named DAIL-SQL, which refreshes
the Spider leaderboard with 86.6% execution accuracy and sets a new bar. To
explore the potential of open-source LLM, we investigate them in various
scenarios, and further enhance their performance with supervised fine-tuning.
Our explorations highlight open-source LLMs' potential in Text-to-SQL, as well
as the advantages and disadvantages of the supervised fine-tuning.
Additionally, towards an efficient and economic LLM-based Text-to-SQL solution,
we emphasize the token efficiency in prompt engineering and compare the prior
studies under this metric. We hope that our work provides a deeper
understanding of Text-to-SQL with LLMs, and inspires further investigations and
broad applications.Comment: We have released code on https://github.com/BeachWang/DAIL-SQ
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art
performances in various vision tasks, overshadowing the conventional CNN-based
models. This ignites a few recent striking-back research in the CNN world
showing that pure CNN models can achieve as good performance as ViT models when
carefully tuned. While encouraging, designing such high-performance CNN models
is challenging, requiring non-trivial prior knowledge of network design. To
this end, a novel framework termed Mathematical Architecture Design for Deep
CNN (DeepMAD) is proposed to design high-performance CNN models in a principled
way. In DeepMAD, a CNN network is modeled as an information processing system
whose expressiveness and effectiveness can be analytically formulated by their
structural parameters. Then a constrained mathematical programming (MP) problem
is proposed to optimize these structural parameters. The MP problem can be
easily solved by off-the-shelf MP solvers on CPUs with a small memory
footprint. In addition, DeepMAD is a pure mathematical framework: no GPU or
training data is required during network design. The superiority of DeepMAD is
validated on multiple large-scale computer vision benchmark datasets. Notably
on ImageNet-1k, only using conventional convolutional layers, DeepMAD achieves
0.7% and 1.5% higher top-1 accuracy than ConvNeXt and Swin on Tiny level, and
0.8% and 0.9% higher on Small level.Comment: Accepted by CVPR 202
PTGES2 and RNASET2 identified as novel potential biomarkers and therapeutic targets for basal cell carcinoma: insights from proteome-wide mendelian randomization, colocalization, and MR-PheWAS analyses
IntroductionBasal cell carcinoma (BCC) is the most common skin cancer, lacking reliable biomarkers or therapeutic targets for effective treatment. Genome-wide association studies (GWAS) can aid in identifying drug targets, repurposing existing drugs, predicting clinical trial side effects, and reclassifying patients in clinical utility. Hence, the present study investigates the association between plasma proteins and skin cancer to identify effective biomarkers and therapeutic targets for BCC.MethodsProteome-wide mendelian randomization was performed using inverse-variance-weight and Wald Ratio methods, leveraging 1 Mb cis protein quantitative trait loci (cis-pQTLs) in the UK Biobank Pharma Proteomics Project (UKB-PPP) and the deCODE Health Study, to determine the causal relationship between plasma proteins and skin cancer and its subtypes in the FinnGen R10 study and the SAIGE database of Lee lab. Significant association with skin cancer and its subtypes was defined as a false discovery rate (FDR) < 0.05. pQTL to GWAS colocalization analysis was executed using a Bayesian model to evaluate five exclusive hypotheses. Strong colocalization evidence was defined as a posterior probability for shared causal variants (PP.H4) of ≥0.85. Mendelian randomization-Phenome-wide association studies (MR-PheWAS) were used to evaluate potential biomarkers and therapeutic targets for skin cancer and its subtypes within a phenome-wide human disease category.ResultsPTGES2, RNASET2, SF3B4, STX8, ENO2, and HS3ST3B1 (besides RNASET2, five other plasma proteins were previously unknown in expression quantitative trait loci (eQTL) and methylation quantitative trait loci (mQTL)) were significantly associated with BCC after FDR correction in the UKB-PPP and deCODE studies. Reverse MR showed no association between BCC and these proteins. PTGES2 and RNASET2 exhibited strong evidence of colocalization with BCC based on a posterior probability PP.H4 >0.92. Furthermore, MR-PheWAS analysis showed that BCC was the most significant phenotype associated with PTGES2 and RNASET2 among 2,408 phenotypes in the FinnGen R10 study. Therefore, PTGES2 and RNASET2 are highlighted as effective biomarkers and therapeutic targets for BCC within the phenome-wide human disease category.ConclusionThe study identifies PTGES2 and RNASET2 plasma proteins as novel, reliable biomarkers and therapeutic targets for BCC, suggesting more effective clinical application strategies for patients
- …