167 research outputs found
A Zero-/Few-Shot Anomaly Classification and Segmentation Method for CVPR 2023 VAND Workshop Challenge Tracks 1&2: 1st Place on Zero-shot AD and 4th Place on Few-shot AD
In this technical report, we briefly introduce our solution for the
Zero/Few-shot Track of the Visual Anomaly and Novelty Detection (VAND) 2023
Challenge. For industrial visual inspection, building a single model that can
be rapidly adapted to numerous categories without or with only a few normal
reference images is a promising research direction. This is primarily because
of the vast variety of the product types. For the zero-shot track, we propose a
solution based on the CLIP model by adding extra linear layers. These layers
are used to map the image features to the joint embedding space, so that they
can compare with the text features to generate the anomaly maps. Besides, when
the reference images are available, we utilize multiple memory banks to store
their features and compare them with the features of the test images during the
testing phase. In this challenge, our method achieved first place in the
zero-shot track, especially excelling in segmentation with an impressive F1
score improvement of 0.0489 over the second-ranked participant. Furthermore, in
the few-shot track, we secured the fourth position overall, with our
classification F1 score of 0.8687 ranking first among all participating teams
Transmission of H7N9 influenza virus in mice by different infective routes.
BackgroundOn 19 February 2013, the first patient infected with a novel influenza A H7N9 virus from an avian source showed symptoms of sickness. More than 349 laboratory-confirmed cases and 109 deaths have been reported in mainland China since then. Laboratory-confirmed, human-to-human H7N9 virus transmission has not been documented between individuals having close contact; however, this transmission route could not be excluded for three families. To control the spread of the avian influenza H7N9 virus, we must better understand its pathogenesis, transmissibility, and transmission routes in mammals. Studies have shown that this particular virus is transmitted by aerosols among ferrets.MethodsTo study potential transmission routes in animals with direct or close contact to other animals, we investigated these factors in a murine model.ResultsViable H7N9 avian influenza virus was detected in the upper and lower respiratory tracts, intestine, and brain of model mice. The virus was transmissible between mice in close contact, with a higher concentration of virus found in pharyngeal and ocular secretions, and feces. All these biological materials were contagious for naïve mice.ConclusionsOur results suggest that the possible transmission routes for the H7N9 influenza virus were through mucosal secretions and feces
Learning Global-aware Kernel for Image Harmonization
Image harmonization aims to solve the visual inconsistency problem in
composited images by adaptively adjusting the foreground pixels with the
background as references. Existing methods employ local color transformation or
region matching between foreground and background, which neglects powerful
proximity prior and independently distinguishes fore-/back-ground as a whole
part for harmonization. As a result, they still show a limited performance
across varied foreground objects and scenes. To address this issue, we propose
a novel Global-aware Kernel Network (GKNet) to harmonize local regions with
comprehensive consideration of long-distance background references.
Specifically, GKNet includes two parts, \ie, harmony kernel prediction and
harmony kernel modulation branches. The former includes a Long-distance
Reference Extractor (LRE) to obtain long-distance context and Kernel Prediction
Blocks (KPB) to predict multi-level harmony kernels by fusing global
information with local features. To achieve this goal, a novel Selective
Correlation Fusion (SCF) module is proposed to better select relevant
long-distance background references for local harmonization. The latter employs
the predicted kernels to harmonize foreground regions with both local and
global awareness. Abundant experiments demonstrate the superiority of our
method for image harmonization over state-of-the-art methods, \eg, achieving
39.53dB PSNR that surpasses the best counterpart by +0.78dB ;
decreasing fMSE/MSE by 11.5\%/6.7\% compared with the
SoTA method. Code will be available at
\href{https://github.com/XintianShen/GKNet}{here}.Comment: 10 pages, 10 figure
CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection
This paper considers zero-shot Anomaly Detection (AD), performing AD without
reference images of the test objects. We propose a framework called CLIP-AD to
leverage the zero-shot capabilities of the large vision-language model CLIP.
Firstly, we reinterpret the text prompts design from a distributional
perspective and propose a Representative Vector Selection (RVS) paradigm to
obtain improved text features. Secondly, we note opposite predictions and
irrelevant highlights in the direct computation of the anomaly maps. To address
these issues, we introduce a Staged Dual-Path model (SDP) that leverages
features from various levels and applies architecture and feature surgery.
Lastly, delving deeply into the two phenomena, we point out that the image and
text features are not aligned in the joint embedding space. Thus, we introduce
a fine-tuning strategy by adding linear layers and construct an extended model
SDP+, further enhancing the performance. Abundant experiments demonstrate the
effectiveness of our approach, e.g., on MVTec-AD, SDP outperforms the SOTA
WinCLIP by +4.2/+10.7 in segmentation metrics F1-max/PRO, while SDP+ achieves
+8.3/+20.5 improvements
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
Large language models (LLM) have achieved remarkable performance on various
NLP tasks and are augmented by tools for broader applications. Yet, how to
evaluate and analyze the tool-utilization capability of LLMs is still
under-explored. In contrast to previous works that evaluate models
holistically, we comprehensively decompose the tool utilization into multiple
sub-processes, including instruction following, planning, reasoning, retrieval,
understanding, and review. Based on that, we further introduce T-Eval to
evaluate the tool utilization capability step by step. T-Eval disentangles the
tool utilization evaluation into several sub-domains along model capabilities,
facilitating the inner understanding of both holistic and isolated competency
of LLMs. We conduct extensive experiments on T-Eval and in-depth analysis of
various LLMs. T-Eval not only exhibits consistency with the outcome-oriented
evaluation but also provides a more fine-grained analysis of the capabilities
of LLMs, providing a new perspective in LLM evaluation on tool-utilization
ability. The benchmark will be available at
https://github.com/open-compass/T-Eval.Comment: Project: https://open-compass.github.io/T-Eva
- …