90 research outputs found

    CoupleNet: Coupling Global Structure with Local Parts for Object Detection

    Get PDF
    The region-based Convolutional Neural Network (CNN) detectors such as Faster R-CNN or R-FCN have already shown promising results for object detection by combining the region proposal subnetwork and the classification subnetwork together. Although R-FCN has achieved higher detection speed while keeping the detection performance, the global structure information is ignored by the position-sensitive score maps. To fully explore the local and global properties, in this paper, we propose a novel fully convolutional network, named as CoupleNet, to couple the global structure with local parts for object detection. Specifically, the object proposals obtained by the Region Proposal Network (RPN) are fed into the the coupling module which consists of two branches. One branch adopts the position-sensitive RoI (PSRoI) pooling to capture the local part information of the object, while the other employs the RoI pooling to encode the global and context information. Next, we design different coupling strategies and normalization ways to make full use of the complementary advantages between the global and local branches. Extensive experiments demonstrate the effectiveness of our approach. We achieve state-of-the-art results on all three challenging datasets, i.e. a mAP of 82.7% on VOC07, 80.4% on VOC12, and 34.4% on COCO. Codes will be made publicly available.Comment: Accepted by ICCV 201

    Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

    Full text link
    Replicating the innate human ability to detect all objects based on free-form texts at any granularity remains a formidable challenge for Vision-Language models. Current Large Vision Language Models (LVLMs) are predominantly constrained to grounding a single, pre-existing object, relying solely on data from Referring Expression Comprehension tasks. The limitation leads to a compromise in model design, necessitating the introduction of visual expert models or the integration of customized head structures. Beyond these constraints, our research delves into the untapped potential of LVLMs and uncover their inherent capability for basic object perception, allowing them to accurately identify and locate objects of interest. Building on this insight, we introduce a novel language-prompted localization dataset designed to fully unleash the capabilities of LVLMs in integrating fine-grained object perception with precise location awareness. More importantly, we present Griffon\textbf{Griffon}, a purely LVLM-based baseline, which does not require the introduction of any special tokens, expert models, or additional detection modules. It simply maintains a consistent structure with popular LVLMs by unifying data formats across various localization-related scenarios and is trained end-to-end through a well-designed pipeline. Comprehensive experiments demonstrate that Griffon\textbf{Griffon} not only achieves state-of-the-art performance on the fine-grained RefCOCO series but also approaches the capabilities of the expert model Faster RCNN on the detection benchmark MSCOCO.Comment: Technical report. The codes and dataset will be released soon at https://github.com/jefferyZhan/Griffo

    Mitigating Hallucination in Visual Language Models with Visual Supervision

    Full text link
    Large vision-language models (LVLMs) suffer from hallucination a lot, generating responses that apparently contradict to the image content occasionally. The key problem lies in its weak ability to comprehend detailed content in a multi-modal context, which can be mainly attributed to two factors in training data and loss function. The vision instruction dataset primarily focuses on global description, and the auto-regressive loss function favors text modeling rather than image understanding. In this paper, we bring more detailed vision annotations and more discriminative vision models to facilitate the training of LVLMs, so that they can generate more precise responses without encounter hallucination. On one hand, we generate image-text pairs with detailed relationship annotations in panoptic scene graph dataset (PSG). These conversations pay more attention on detailed facts in the image, encouraging the model to answer questions based on multi-modal contexts. On the other hand, we integrate SAM and mask prediction loss as auxiliary supervision, forcing the LVLMs to have the capacity to identify context-related objects, so that they can generate more accurate responses, mitigating hallucination. Moreover, to provide a deeper evaluation on the hallucination in LVLMs, we propose a new benchmark, RAH-Bench. It divides vision hallucination into three different types that contradicts the image with wrong categories, attributes or relations, and introduces False Positive Rate as detailed sub-metric for each type. In this benchmark, our approach demonstrates an +8.4% enhancement compared to original LLaVA and achieves widespread performance improvements across other models

    Effects of tumor necrosis factor-α polymorphism on the brain structural changes of the patients with major depressive disorder

    Get PDF
    Single Nucleotide Polymorphic (SNP) variations of proinflammatory cytokines such as Tumor Necrosis Factor-α (TNF-α) have been reported to be closely associated with the major depressive disorder (MDD). However, it is unclear if proinflammatory genetic burden adversely affects the regional gray matter volume in patients with MDD. The aim of this study was to test whether rs1799724, an SNP of TNF-α, contributes to the neuroanatomical changes in MDD. In this cross-sectional study, a total of 144 MDD patients and 111 healthy controls (HC) well matched for age, sex and education were recruited from Shanghai Mental Health Center. Voxel-based morphometry (VBM) followed by graph theory based structural covariance analysis was applied to locate diagnosis x genotype interactions. Irrespective of diagnosis, individuals with the high-risk genotype (T-carriers) had reduced volume in left angular gyrus (main effect of genotype). Diagnosis x genotype interaction was exclusively localized to the visual cortex (right superior occipital gyrus). The same region also showed reduced volume in patients with MDD than HC (main effect of diagnosis), with this effect being most pronounced in patients carrying the high-risk genotype. However, neither global nor regional network of structural covariance was found to have group difference. In conclusion, a genetic variation which can increase TNF-α expression selectively affects the anatomy of the visual cortex among the depressed subjects, with no effect on the topographical organization of multiple cortical regions. This supports the notion that anatomical changes in depression are in part influenced by the genetic determinants of inflammatory activity

    Sequential Reassortments Underlie Diverse Influenza H7N9 Genotypes in China

    Get PDF
    Initial genetic characterizations have suggested that the influenza A (H7N9) viruses responsible for the current outbreak in China are novel reassortants. However, little is known about the pathways of their evolution and, in particular, the generation of diverse viral genotypes. Here we report an in-depth evolutionary analysis of whole-genome sequence data of 45 H7N9 and 42 H9N2 viruses isolated from humans, poultry, and wild birds during recent influenza surveillance efforts in China. Our analysis shows that the H7N9 viruses were generated by at least two steps of sequential reassortments involving distinct H9N2 donor viruses in different hosts. The first reassortment likely occurred in wild birds and the second in domestic birds in east China in early 2012. Our study identifies the pathways for the generation of diverse H7N9 genotypes in China and highlights the importance of monitoring multiple sources for effective surveillance of potential influenza outbreaks.National Natural Science Foundation (China) (31125016)National Natural Science Foundation (China) (31371338)National Center for Biotechnology Information (U.S.) (Major National Earmark Project for Infectious Diseases, 2013ZX10004611-002)National Basic Research Program of China (973 Program)National Basic Research Program of China (973 Program, grant, 2009CB918503)National Science and Technology Major Projects (2012ZX10004214001002)Jiangsu Sheng (China) (Priority Academic Program Development of Jiangsu Higher Education Institutions)National Natural Science Foundation (China) (31100950)MIT International Science and Technology Initiative

    The applicability and efficacy of Micro-Video Psychological Training Camp in groups with mild to moderate symptoms of depression and anxiety: A prospective and randomized controlled trial protocol

    Get PDF
    BackgroundMental health is a global issue requiring global attention. Depression and anxiety are two of the most common mental disorders (CMDs) and are characterized by high incidence and high comorbidity. In recent years, the prolonged COVID-19 pandemic and exacerbated social instability have posed significant challenges to the mental resilience and mental health outcomes of the global population. Now more than ever, with an increase in mental health needs, it has become even more crucial to find an effective solution to provide universal mental healthcare. Psychotherapy is of vital importance for those coping with symptoms of depression and anxiety and is used to enhance mental resilience. However, such therapy can be difficult to access in reality. In this context, the Micro-Video Psychological Training Camp (MVPTC) platform will be developed.ObjectivesAs an online self-help platform for psychological intervention, the MVPTC platform was developed for those who suffer from mild to moderate symptoms of depression and/or anxiety and is tasked with the goal of reducing depressive and anxious symptoms while improving mental resilience. Thus, this study will be carried out to verify its efficacy and applicability.MethodsIn this parallel-group, randomized controlled trial, a total of 200 mild to moderately depressed and/or anxious adults seeking self-help will be randomly recruited and assigned to either the micro-video psychological intervention group or the wait list control group. Online measurements by self-assessment will be taken at baseline, post-intervention, 1-month, and 3-month follow-up.ResultsThe primary results will involve symptoms of depression and anxiety. The secondary results will involve mental resilience. An analysis will be conducted based on the intention-to-treat principle.DiscussionThis trial will examine whether the MVPTC platform for the relief of symptoms and the enhancement of resilience in a population screened for depression and anxiety symptoms proves effective and applicable. Large-scale resilience enhancement may benefit public mental health in terms of preventive interventions, managing depressive and anxiety symptoms, and promoting mental health. With the MVPTC-based method being applied, a brief, efficient, and structured intervention model can potentially be established, having the potential to provide necessary and accessible mental support for an extensive target group.Clinical trial registrationhttp://www.chictr.org.cn/, identifier ChiCTR2100043725

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Key Component Capture and Safety Intelligent Analysis of Beam String Structure Based on Digital Twins

    No full text
    In the construction process of beam string structures, the environmental effect and corresponding mechanical properties of the structure are complex. The problem of the misjudgment of structural safety performance caused by the uncertainty of a structural mechanical parameter analysis under various factors needs to be solved. In this study, a method for capturing key components and an intelligent safety analysis of beam string structures based on digital twins (DTs) was proposed. Combined with the characteristics of DTs mapping feedback, a component capture and security analysis framework was formed. Driven by twin framework, multi-source data for structural safety analysis were obtained and the parameter association mechanism established. Considering the space-time evolution and the interaction between the virtual and real elements of the construction process, a multidimensional model was established. Driven by the Dempster–Shafer (D–S) evidence theory, the fusion of structural mechanics parameters was carried out. The safety of the structure was analyzed intelligently by capturing key structural components, thereby providing a basis for the safety maintenance of the structure. The integration of DTs modeling and multi-source data improves the accuracy and intelligence of structural construction safety analysis. In the analysis process, capturing the key components of the structure is the core step. Taking the construction process of a string supported beam roof (symmetrical structure) in a convention and exhibition center as an example, the outlined research method was applied. Based on DTs and D–S evidence theory, the variation degree of mechanical parameters of various components under temperature was determined. By comprehensively investigating the changes of various mechanical parameters, the key components of the structure were captured. Thus, the intelligent analysis of structural safety was realized. The comparison of data verified that the intelligent method can effectively analyze the safety performance of the structure

    Intelligent Prediction of Prestressed Steel Structure Construction Safety Based on BP Neural Network

    No full text
    In the construction process of a prestressed steel structure, it is a point of research interest to obtain the safety state of the structure according to the design parameters and working conditions of the structure. The intelligent prediction of structural construction safety provides the basis for safety control. This study proposes an intelligent prediction method of structural construction safety based on a back propagation (BP) neural network. Firstly, the correlation mechanism of structural construction safety performance parameters is established, which involves structural design parameters and mechanical parameters. According to the basic principle of a BP neural network, the relationship between design parameters and mechanical parameters is captured. The virtual model of a structure construction process is established based on digital twins (DTs). The DTs and BP neural network are combined to form a structural safety intelligent prediction framework and theoretical method, setting working conditions in a twin model to obtain mechanical parameters. Mechanical parameters are intelligently predicted by design parameters in neural networks. The safety performance of structure construction is evaluated according to mechanical parameters. Finally, the intelligent prediction method is applied to the construction process of string beam. Based on DTs and BP neural network, the intelligent analysis of structural construction safety is carried out. This provides a reliable basis for safety control. The feasibility of this research method is verified by comparing the predicted results of the theoretical method with the measured data on site
    corecore