50 research outputs found

    Vision-Language Models for Vision Tasks: A Survey

    Full text link
    Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm. To address the two challenges, Vision-Language Models (VLMs) have been intensively investigated recently, which learns rich vision-language correlation from web-scale image-text pairs that are almost infinitely available on the Internet and enables zero-shot predictions on various visual recognition tasks with a single VLM. This paper provides a systematic review of visual language models for various visual recognition tasks, including: (1) the background that introduces the development of visual recognition paradigms; (2) the foundations of VLM that summarize the widely-adopted network architectures, pre-training objectives, and downstream tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4) the review and categorization of existing VLM pre-training methods, VLM transfer learning methods, and VLM knowledge distillation methods; (5) the benchmarking, analysis and discussion of the reviewed methods; (6) several research challenges and potential research directions that could be pursued in the future VLM studies for visual recognition. A project associated with this survey has been created at https://github.com/jingyi0000/VLM_survey

    Domain Generalization via Balancing Training Difficulty and Model Capability

    Full text link
    Domain generalization (DG) aims to learn domain-generalizable models from one or multiple source domains that can perform well in unseen target domains. Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model. We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model's capability and the samples' difficulties along the training process. MoDify consists of two novel designs that collaborate to fight against the misalignment while learning domain-generalizable models. The first is MoDify-based Data Augmentation which exploits an RGB Shuffle technique to generate difficulty-aware training samples on the fly. The second is MoDify-based Network Optimization which dynamically schedules the training samples for balanced and smooth learning with appropriate difficulty. Without bells and whistles, a simple implementation of MoDify achieves superior performance across multiple benchmarks. In addition, MoDify can complement existing methods as a plug-in, and it is generic and can work for different visual recognition tasks.Comment: 11 pages, 6 figures, Accepted by ICCV 202

    SME creation facilitation process at Universities

    Get PDF
    Much research on SMEs is aimed at researching SMEs after the fact that they have become SMEs. However all SMEs as well as larger companies start as an idea in the head or heads of one or many persons - the prospective entrepreneurs. The purpose of this paper is to investigate how SMEs can be created by transforming ideas into real companies. More specifically we will investigate if and how Universities can facilitate this process by running international cross-functional courses. Our hypothesis is that in order to create a SME three topics are of pivotal importance: • Specialist Competence in the business area • General management competence • Financial capital During the fall of 2012 we will test the hypothesis by running a university course called international Marked Driven Engineering (iMDE) in cooperation between Lund University and Zhejiang University. Technology faculties from both Universities are involved – students as well as teachers. Their participation is crucial to cover specialist competence in the business area – technology-based enterprises. Management faculties from both Universities are involved – students as well as teachers. Their participation is crucial to cover general management competence in setting up, funding and running an enterprise. When it comes to financial capital our hypothesis is that for clever business ideas, financial capital can be raised in order to industrialize such a business idea. In the first trial run 8 business ideas will be generated and tested in the Hangzhou area during the period 120910-121019. Each of the 8 teams will consist of 8 persons – blended to cross-fertilize engineering-business, Chinese-Swedish and male-female participants. With the support of university teachers with the same blend the aim is to create embryos of SME’s

    LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors

    Full text link
    Inspired by the outstanding zero-shot capability of vision language models (VLMs) in image classification tasks, open-vocabulary object detection has attracted increasing interest by distilling the broad VLM knowledge into detector training. However, most existing open-vocabulary detectors learn by aligning region embeddings with categorical labels (e.g., bicycle) only, disregarding the capability of VLMs on aligning visual embeddings with fine-grained text description of object parts (e.g., pedals and bells). This paper presents DVDet, a Descriptor-Enhanced Open Vocabulary Detector that introduces conditional context prompts and hierarchical textual descriptors that enable precise region-text alignment as well as open-vocabulary detection training in general. Specifically, the conditional context prompt transforms regional embeddings into image-like representations that can be directly integrated into general open vocabulary detection training. In addition, we introduce large language models as an interactive and implicit knowledge repository which enables iterative mining and refining visually oriented textual descriptors for precise region-text alignment. Extensive experiments over multiple large-scale benchmarks show that DVDet outperforms the state-of-the-art consistently by large margins

    Film bulk acoustic resonators integrated on arbitrary substrates using a polymer support layer

    Get PDF
    The film bulk acoustic resonator (FBAR) is a widely-used MEMS device which can be used as a filter, or as a gravimetric sensor for biochemical or physical sensing. Current device architectures require the use of an acoustic mirror or a freestanding membrane and are fabricated as discrete components. A new architecture is demonstrated which permits fabrication and integration of FBARs on arbitrary substrates. Wave confinement is achieved by fabricating the resonator on a polyimide support layer. Results show when the polymer thickness is greater than a critical value, d, the FBARs have similar performance to devices using alternative architectures. For ZnO FBARs operating at 1.3–2.2 GHz, d is ~9 μm, and the devices have a Q-factor of 470, comparable to 493 for the membrane architecture devices. The polymer support makes the resonators insensitive to the underlying substrate. Yields over 95% have been achieved on roughened silicon, copper and glass

    Feasibility study of carbon cloth for 3D integrated flexible cathode of lithium-ion battery

    No full text
    The carbon cloths made of carbon fiber as 3D integrated cathode for lithium-ion batterie were studied. The graphitization degree of three types of carbon cloths after heat treatment were qualitatively analyzed and quantitatively calculated. Using lithium metal as the counter electrode, the graphitized carbon cloth electrodes show first discharge specific capacities of 83.6, 94.5 mAh∙g-1 and 115.2 mAh∙g-1 under 0.1-0.5 V, respectively. After 50 cycles, the specific capacities of carbon cloth electrodes remain 55.0, 80.0 mAh∙g-1 and 88.0 mAh∙g-1.With LiFePO4-loaded graphitized carbon cloths as cathodes, the initial discharge specific capacities of electrodes are 73.2, 109.5 mAh∙g-1 and 130.2 mAh∙g-1, respectively. The carbon cloth whose graphitization degree is 76.02% shows stable specific capacity of about 90.0 mAh∙g-1 after 50 cycles, and shows better comprehensive performances. This carbon cloth is more suitable for the integrated flexible cathode of lithium-ion batteries. By establishing the mechanical model of the interaction between LiFePO4 particles and carbon fiber, the relationship between mechanical, electrical and electrochemical properties of the integrated cathode were discussed.Using carbon cloth as an integrated cathode for lithium-ion batteries can simplify the conventional production process and innovate its production process
    corecore