40 research outputs found
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Large vision-language models have achieved outstanding performance, but their
size and computational requirements make their deployment on
resource-constrained devices and time-sensitive tasks impractical. Model
distillation, the process of creating smaller, faster models that maintain the
performance of larger models, is a promising direction towards the solution.
This paper investigates the distillation of visual representations in large
teacher vision-language models into lightweight student models using a small-
or mid-scale dataset. Notably, this study focuses on open-vocabulary
out-of-distribution (OOD) generalization, a challenging problem that has been
overlooked in previous model distillation literature. We propose two principles
from vision and language modality perspectives to enhance student's OOD
generalization: (1) by better imitating teacher's visual representation space,
and carefully promoting better coherence in vision-language alignment with the
teacher; (2) by enriching the teacher's language representations with
informative and finegrained semantic attributes to effectively distinguish
between different labels. We propose several metrics and conduct extensive
experiments to investigate their techniques. The results demonstrate
significant improvements in zero-shot and few-shot student performance on
open-vocabulary out-of-distribution classification, highlighting the
effectiveness of our proposed approaches. Our code will be released at
https://github.com/xuanlinli17/large_vlm_distillation_oo
Deductive Verification of Chain-of-Thought Reasoning
Large Language Models (LLMs) significantly benefit from Chain-of-Thought
(CoT) prompting in performing various reasoning tasks. While CoT allows models
to produce more comprehensive reasoning processes, its emphasis on intermediate
reasoning steps can inadvertently introduce hallucinations and accumulated
errors, thereby limiting models' ability to solve complex reasoning tasks.
Inspired by how humans engage in careful and meticulous deductive logical
reasoning processes to solve tasks, we seek to enable language models to
perform explicit and rigorous deductive reasoning, and also ensure the
trustworthiness of their reasoning process through self-verification. However,
directly verifying the validity of an entire deductive reasoning process is
challenging, even with advanced models like ChatGPT. In light of this, we
propose to decompose a reasoning verification process into a series of
step-by-step subprocesses, each only receiving their necessary context and
premises. To facilitate this procedure, we propose Natural Program, a natural
language-based deductive reasoning format. Our approach enables models to
generate precise reasoning steps where subsequent steps are more rigorously
grounded on prior steps. It also empowers language models to carry out
reasoning self-verification in a step-by-step manner. By integrating this
verification process into each deductive reasoning stage, we significantly
enhance the rigor and trustfulness of generated reasoning steps. Along this
process, we also improve the answer correctness on complex reasoning tasks.
Code will be released at https://github.com/lz1oceani/verify_cot
Anxiety mediates association between sex and jaw function limitation in temporomandibular disorder patients from China
AimThe objective of this study is to explore the relationship between sex and jaw function and to test whether anxiety mediates the causal relationship between sex and jaw function in temporomandibular disorders (TMDs) patients.MethodsA total of 488 participants with TMD were included in the analysis. Demographic data were collected. Generalized anxiety symptoms and anxiety severity were initially assessed using the GAD-7 questionnaire. And jaw function limitation was measured using the JFLS-8 scale. A directed acyclic graph (DAG) was used in this study to evaluate the hypotheses. Mediation analysis was conducted to explore causality and to calculate the total effect, natural direct effect (NDE) and natural indirect effect (NIE).ResultsIn TMD patients, there was a significant association between female and jaw function (r = 0.17, p < 0.001), female and anxiety (r = 0.15, p = 0.002), anxiety and jaw function (r = 0.35, p < 0.001). In addition, sex can directly lead to differences in impaired jaw function (NDE: 3.719, 95% CI: 1.619–5.828, p < 0.001), and can also be causally related to jaw function through anxiety (NIE: 1.146, 95% CI: 0.267–2.024, p = 0.011). And the total effect was 4.865 (95% CI, 2.709–7.029, p < 0.001).ConclusionA causal mechanism was found that anxiety acts as a mediator of sex effects on jaw function. Therefore, psychological factors need to be taken into account in the treatment of female TMD patients. Further clinical trials are needed to explore whether psychotherapy is more beneficial to improve jaw function in female TMD patients
Recommended from our members
Towards Enhanced Language Model Reasoning and Efficient Knowledge Transfer
Large language models (LLMs) and vision language models (VLMs) are changing the world and gradually presenting human-level intelligence in various real-world scenarios, including knowledge-based question answering, mathematics, and programming. During the master period, my research focuses on understanding and improving current large language models’ reasoning capacity towards general problems solving, and efficient methods to enable the knowledge transfer for vision-language models: distill knowledge from large vision-language models
Spatial planning for urban ventilation corridors by urban climatology
Ventilation corridors in cities can decrease air pollution and alleviate heat island problems but there remains a need to fully assess their effectiveness. Few urban managers have been able to take city-scale approaches to the construction of urban ventilation corridors. This study aimed to introduced the Ventilation Corridor Planning (VCP) model, which is a multi-criteria evaluation method combined with a geographical information system (GIS) to determine where the ventilated environment is most appropriate. Specifically, the VCP model took Bozhou, China as the research object and contained two scales, including mesoscale and local scale. In mesoscale scale, we got three outputs to build urban ventilation corridors, including 1) background wind environment, 2) ventilation potential, 3) heat island intensity. In local scale, we used traditional computational fluid dynamics (CFD) model to verify the impact of VCP criteria. The results revealed that compared with the traditional CFD model, the proposed VCP model has advantages in establishing a comprehensive evaluation standard. In addition, the application of VCP model in macro and micro also enhances the efficiency of ventilation corridor construction. Overall, this study introduced a effective modeling method to urban ventilation corridors planning, and provide a way to study the urban climate
Electroacupuncture at Fengchi(GB20) and Yanglingquan(GB34) Ameliorates Paralgesia through Microglia-Mediated Neuroinflammation in a Rat Model of Migraine
Background: Multiple studies have suggested that paralgesia (hyperalgesia and cutaneous allodynia) in migraine reflects the activation and sensitisation of the trigeminovascular system (TGVS). In particular, it reflects the second-order and higher nerve centre sensitisation, which is caused and maintained by neuroinflammation. Microglia activation leads to the release of proinflammatory cytokines involved in inflammatory responses. Accumulating evidence indicates that electroacupuncture (EA) is effective in ameliorating paralgesia, but the underlying mechanisms of EA in migraine attacks caused by microglia and microglia-mediated inflammatory responses are still unclear. The purpose of this study was to explore whether EA could ameliorate the dysregulation of pain sensation by suppressing microglial activation and the resulting neuroinflammatory response, and to evaluate whether this response was regulated by Toll-like receptor 4 (TLR4)/nuclear factor-kappa B(NF-κB) in the trigeminal nucleus caudalis (TNC) in a rat model of migraine. Methods: Repeated Inflammatory Soup (IS) was infused into the dura for seven sessions to establish a recurrent migraine-like rat model, and EA treatment was administered at Fengchi (GB20) and Yanglingquan (GB34) after daily IS infusion. Facial mechanical withdrawal thresholds were measured to evaluate the change in pain perception, and plasma samples and the TNC tissues of rats were collected to examine the changes in calcitonin gene-related peptide (CGRP), the Ibal-1-labelled microglial activation, and the resulting inflammatory response, including interleukin-1β (IL-1β), tumour necrosis factor-α (TNF-α), interleukin-6 (IL-6), and their regulatory molecules TLR4/NF-κB, via enzyme-linked immunosorbent assay (ELISA), real-time polymerase chain reaction (RT-PCR), immunohistochemistry (IHC) and Western blot analysis. Results: Repeated IS injections into the dura induced facial mechanical paralgesia, which is the manifestation of migraine attacks, and increased the expression of CGRP, Ibal-1, microglial mediated inflammatory cytokines (IL-1β, TNF-α, IL-6), and regulatory molecules TLR4/NF-κB. EA at GB20/34 significantly attenuated repetitive IS-induced pain hypersensitivity. This effect was consistent with decreased levels of CGRP and inflammatory cytokines in the plasma and the TNC via the inhibition of microglia activation, and this response may be regulated by TLR4/NF-κB. Conclusions: EA ameliorated paralgesia in repetitive IS-induced migraine-like rats, which was mainly mediated by a reduction in microglial activation and microglial-mediated inflammatory responses that could be regulated by TLR4/NF-κB
Five-Direction Occlusion Filling with Five Layer Parallel Two-Stage Pipeline for Stereo Matching with Sub-Pixel Disparity Map Estimation
Binocular stereoscopic matching is an essential method in computer vision, imitating human binocular technology to obtain distance information. Among plentiful stereo matching algorithms, Semi-Global Matching (SGM) is recognized as one of the most popular vision algorithms due to its relatively low power consumption and high accuracy, resulting in many excellent SGM-based hardware accelerators. However, vision algorithms, including SGM, are still somewhat inaccurate in actual long-range applications. Therefore, this paper proposes a disparity improvement strategy based on subpixel interpolation and disparity optimization post-processing using an area optimization strategy, hardware-friendly divider, split look-up table, and the clock alignment multi-directional disparity occlusion filling, and depth acquisition based on floating-point operations. The hardware architecture based on optimization algorithms is on the Stratix-IV platform. It consumes about 5.6 K LUTs, 12.8 K registers, and 2.5 M bits of on-chip memory. Meanwhile, the non-occlusion error rate of only 4.61% is about 1% better than the state-of-the-art works in the KITTI2015 dataset. The maximum working frequency can reach up to 98.28 MHz for the 640 × 480 resolution video and 128 disparity range with the power dissipation of 1.459 W and 320 frames per second processing speed