50 research outputs found
Vision-Language Models for Vision Tasks: A Survey
Most visual recognition studies rely heavily on crowd-labelled data in deep
neural networks (DNNs) training, and they usually train a DNN for each single
visual recognition task, leading to a laborious and time-consuming visual
recognition paradigm. To address the two challenges, Vision-Language Models
(VLMs) have been intensively investigated recently, which learns rich
vision-language correlation from web-scale image-text pairs that are almost
infinitely available on the Internet and enables zero-shot predictions on
various visual recognition tasks with a single VLM. This paper provides a
systematic review of visual language models for various visual recognition
tasks, including: (1) the background that introduces the development of visual
recognition paradigms; (2) the foundations of VLM that summarize the
widely-adopted network architectures, pre-training objectives, and downstream
tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4)
the review and categorization of existing VLM pre-training methods, VLM
transfer learning methods, and VLM knowledge distillation methods; (5) the
benchmarking, analysis and discussion of the reviewed methods; (6) several
research challenges and potential research directions that could be pursued in
the future VLM studies for visual recognition. A project associated with this
survey has been created at https://github.com/jingyi0000/VLM_survey
Domain Generalization via Balancing Training Difficulty and Model Capability
Domain generalization (DG) aims to learn domain-generalizable models from one
or multiple source domains that can perform well in unseen target domains.
Despite its recent progress, most existing work suffers from the misalignment
between the difficulty level of training samples and the capability of
contemporarily trained models, leading to over-fitting or under-fitting in the
trained generalization model. We design MoDify, a Momentum Difficulty framework
that tackles the misalignment by balancing the seesaw between the model's
capability and the samples' difficulties along the training process. MoDify
consists of two novel designs that collaborate to fight against the
misalignment while learning domain-generalizable models. The first is
MoDify-based Data Augmentation which exploits an RGB Shuffle technique to
generate difficulty-aware training samples on the fly. The second is
MoDify-based Network Optimization which dynamically schedules the training
samples for balanced and smooth learning with appropriate difficulty. Without
bells and whistles, a simple implementation of MoDify achieves superior
performance across multiple benchmarks. In addition, MoDify can complement
existing methods as a plug-in, and it is generic and can work for different
visual recognition tasks.Comment: 11 pages, 6 figures, Accepted by ICCV 202
SME creation facilitation process at Universities
Much research on SMEs is aimed at researching SMEs after the fact that they have become SMEs. However all SMEs as well as larger companies start as an idea in the head or heads of one or many persons - the prospective entrepreneurs. The purpose of this paper is to investigate how SMEs can be created by transforming ideas into real companies. More specifically we will investigate if and how Universities can facilitate this process by running international cross-functional courses. Our hypothesis is that in order to create a SME three topics are of pivotal importance: • Specialist Competence in the business area • General management competence • Financial capital During the fall of 2012 we will test the hypothesis by running a university course called international Marked Driven Engineering (iMDE) in cooperation between Lund University and Zhejiang University. Technology faculties from both Universities are involved – students as well as teachers. Their participation is crucial to cover specialist competence in the business area – technology-based enterprises. Management faculties from both Universities are involved – students as well as teachers. Their participation is crucial to cover general management competence in setting up, funding and running an enterprise. When it comes to financial capital our hypothesis is that for clever business ideas, financial capital can be raised in order to industrialize such a business idea. In the first trial run 8 business ideas will be generated and tested in the Hangzhou area during the period 120910-121019. Each of the 8 teams will consist of 8 persons – blended to cross-fertilize engineering-business, Chinese-Swedish and male-female participants. With the support of university teachers with the same blend the aim is to create embryos of SME’s
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
Inspired by the outstanding zero-shot capability of vision language models
(VLMs) in image classification tasks, open-vocabulary object detection has
attracted increasing interest by distilling the broad VLM knowledge into
detector training. However, most existing open-vocabulary detectors learn by
aligning region embeddings with categorical labels (e.g., bicycle) only,
disregarding the capability of VLMs on aligning visual embeddings with
fine-grained text description of object parts (e.g., pedals and bells). This
paper presents DVDet, a Descriptor-Enhanced Open Vocabulary Detector that
introduces conditional context prompts and hierarchical textual descriptors
that enable precise region-text alignment as well as open-vocabulary detection
training in general. Specifically, the conditional context prompt transforms
regional embeddings into image-like representations that can be directly
integrated into general open vocabulary detection training. In addition, we
introduce large language models as an interactive and implicit knowledge
repository which enables iterative mining and refining visually oriented
textual descriptors for precise region-text alignment. Extensive experiments
over multiple large-scale benchmarks show that DVDet outperforms the
state-of-the-art consistently by large margins
Film bulk acoustic resonators integrated on arbitrary substrates using a polymer support layer
The film bulk acoustic resonator (FBAR) is a widely-used MEMS device which can be used as a filter, or as a gravimetric sensor for biochemical or physical sensing. Current device architectures require the use of an acoustic mirror or a freestanding membrane and are fabricated as discrete components. A new architecture is demonstrated which permits fabrication and integration of FBARs on arbitrary substrates. Wave confinement is achieved by fabricating the resonator on a polyimide support layer. Results show when the polymer thickness is greater than a critical value, d, the FBARs have similar performance to devices using alternative architectures. For ZnO FBARs operating at 1.3–2.2 GHz, d is ~9 μm, and the devices have a Q-factor of 470, comparable to 493 for the membrane architecture devices. The polymer support makes the resonators insensitive to the underlying substrate. Yields over 95% have been achieved on roughened silicon, copper and glass
Feasibility study of carbon cloth for 3D integrated flexible cathode of lithium-ion battery
The carbon cloths made of carbon fiber as 3D integrated cathode for lithium-ion batterie were studied. The graphitization degree of three types of carbon cloths after heat treatment were qualitatively analyzed and quantitatively calculated. Using lithium metal as the counter electrode, the graphitized carbon cloth electrodes show first discharge specific capacities of 83.6, 94.5 mAh∙g-1 and 115.2 mAh∙g-1 under 0.1-0.5 V, respectively. After 50 cycles, the specific capacities of carbon cloth electrodes remain 55.0, 80.0 mAh∙g-1 and 88.0 mAh∙g-1.With LiFePO4-loaded graphitized carbon cloths as cathodes, the initial discharge specific capacities of electrodes are 73.2, 109.5 mAh∙g-1 and 130.2 mAh∙g-1, respectively. The carbon cloth whose graphitization degree is 76.02% shows stable specific capacity of about 90.0 mAh∙g-1 after 50 cycles, and shows better comprehensive performances. This carbon cloth is more suitable for the integrated flexible cathode of lithium-ion batteries. By establishing the mechanical model of the interaction between LiFePO4 particles and carbon fiber, the relationship between mechanical, electrical and electrochemical properties of the integrated cathode were discussed.Using carbon cloth as an integrated cathode for lithium-ion batteries can simplify the conventional production process and innovate its production process