24 research outputs found

    CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection

    Full text link
    Task driven object detection aims to detect object instances suitable for affording a task in an image. Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection. Simply mapping categories and visual features of common objects to the task cannot address the challenge. In this paper, we propose to explore fundamental affordances rather than object categories, i.e., common attributes that enable different objects to accomplish the same task. Moreover, we propose a novel multi-level chain-of-thought prompting (MLCoT) to extract the affordance knowledge from large language models, which contains multi-level reasoning steps from task to object examples to essential visual attributes with rationales. Furthermore, to fully exploit knowledge to benefit object recognition and localization, we propose a knowledge-conditional detection framework, namely CoTDet. It conditions the detector from the knowledge to generate object queries and regress boxes. Experimental results demonstrate that our CoTDet outperforms state-of-the-art methods consistently and significantly (+15.6 box AP and +14.8 mask AP) and can generate rationales for why objects are detected to afford the task.Comment: Accepted by ICCV 202

    Contrastive Grouping with Transformer for Referring Image Segmentation

    Full text link
    Referring image segmentation aims to segment the target referent in an image conditioning on a natural language expression. Existing one-stage methods employ per-pixel classification frameworks, which attempt straightforwardly to align vision and language at the pixel level, thus failing to capture critical object-level information. In this paper, we propose a mask classification framework, Contrastive Grouping with Transformer network (CGFormer), which explicitly captures object-level information via token-based querying and grouping strategy. Specifically, CGFormer first introduces learnable query tokens to represent objects and then alternately queries linguistic features and groups visual features into the query tokens for object-aware cross-modal reasoning. In addition, CGFormer achieves cross-level interaction by jointly updating the query tokens and decoding masks in every two consecutive layers. Finally, CGFormer cooperates contrastive learning to the grouping strategy to identify the token and its mask corresponding to the referent. Experimental results demonstrate that CGFormer outperforms state-of-the-art methods in both segmentation and generalization settings consistently and significantly.Comment: Accepted by CVPR 202

    Outlier-Robust Gromov-Wasserstein for Graph Data

    Full text link
    Gromov-Wasserstein (GW) distance is a powerful tool for comparing and aligning probability distributions supported on different metric spaces. Recently, GW has become the main modeling technique for aligning heterogeneous data for a wide range of graph learning tasks. However, the GW distance is known to be highly sensitive to outliers, which can result in large inaccuracies if the outliers are given the same weight as other samples in the objective function. To mitigate this issue, we introduce a new and robust version of the GW distance called RGW. RGW features optimistically perturbed marginal constraints within a Kullback-Leibler divergence-based ambiguity set. To make the benefits of RGW more accessible in practice, we develop a computationally efficient and theoretically provable procedure using Bregman proximal alternating linearized minimization algorithm. Through extensive experimentation, we validate our theoretical results and demonstrate the effectiveness of RGW on real-world graph learning tasks, such as subgraph matching and partial shape correspondence

    DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

    Full text link
    A long-standing goal of AI systems is to perform complex multimodal reasoning like humans. Recently, large language models (LLMs) have made remarkable strides in such multi-step reasoning on the language modality solely by leveraging the chain of thought (CoT) to mimic human thinking. However, the transfer of these advancements to multimodal contexts introduces heightened challenges, including but not limited to the impractical need for labor-intensive annotation and the limitations in terms of flexibility, generalizability, and explainability. To evoke CoT reasoning in multimodality, this work first conducts an in-depth analysis of these challenges posed by multimodality and presents two key insights: "keeping critical thinking" and "letting everyone do their jobs" in multimodal CoT reasoning. Furthermore, this study proposes a novel DDCoT prompting that maintains a critical attitude through negative-space prompting and incorporates multimodality into reasoning by first dividing the reasoning responsibility of LLMs into reasoning and recognition and then integrating the visual recognition capability of visual models into the joint reasoning process. The rationales generated by DDCoT not only improve the reasoning abilities of both large and small language models in zero-shot prompting and fine-tuning learning, significantly outperforming state-of-the-art methods but also exhibit impressive generalizability and explainability.Comment: 24 pages, 13 figures, to be published in NeurIPS 202

    Fast and Provably Convergent Algorithms for Gromov-Wasserstein in Graph Data

    Full text link
    In this paper, we study the design and analysis of a class of efficient algorithms for computing the Gromov-Wasserstein (GW) distance tailored to large-scale graph learning tasks. Armed with the Luo-Tseng error bound condition~\citep{luo1992error}, two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) enjoy the convergence guarantees. Upon task-specific properties, our analysis further provides novel theoretical insights to guide how to select the best-fit method. As a result, we are able to provide comprehensive experiments to validate the effectiveness of our methods on a host of tasks, including graph alignment, graph partition, and shape matching. In terms of both wall-clock time and modeling performance, the proposed methods achieve state-of-the-art results

    Role of melatonin in enhancing arbuscular mycorrhizal symbiosis and mitigating cold stress in perennial ryegrass (Lolium perenne L.)

    Get PDF
    Melatonin is a biomolecule that affects plant development and is involved in protecting plants from environmental stress. However, the mechanisms of melatonin’s impact on arbuscular mycorrhizal (AM) symbiosis and cold tolerance in plants are still unclear. In this research, AM fungi inoculation and exogenous melatonin (MT) were applied to perennial ryegrass (Lolium perenne L.) seedlings alone or in combination to investigate their effect on cold tolerance. The study was conducted in two parts. The initial trial examined two variables, AM inoculation, and cold stress, to investigate the involvement of the AM fungus Rhizophagus irregularis in endogenous melatonin accumulation and the transcriptional levels of its synthesis genes in the root system of perennial ryegrass under cold stress. The subsequent trial was designed as a three-factor analysis, encompassing AM inoculation, cold stress, and melatonin application, to explore the effects of exogenous melatonin application on plant growth, AM symbiosis, antioxidant activity, and protective molecules in perennial ryegrass subjected to cold stress. The results of the study showed that compared to non-mycorrhizal (NM) plants, cold stress promoted an increase in the accumulation of melatonin in the AM-colonized counterparts. Acetylserotonin methyltransferase (ASMT) catalyzed the final enzymatic reaction in melatonin production. Melatonin accumulation was associated with the level of expression of the genes, LpASMT1 and LpASMT3. Treatment with melatonin can improve the colonization of AM fungi in plants. Simultaneous utilization of AM inoculation and melatonin treatment enhanced the growth, antioxidant activity, and phenylalanine ammonia-lyase (PAL) activity, while simultaneously reducing polyphenol oxidase (PPO) activity and altering osmotic regulation in the roots. These effects are expected to aid in the mitigation of cold stress in Lolium perenne. Overall, melatonin treatment would help Lolium perenne to improve growth by promoting AM symbiosis, improving the accumulation of protective molecules, and triggering in antioxidant activity under cold stress

    Rethinking Graph Neural Networks for Anomaly Detection

    Full text link
    Graph Neural Networks (GNNs) are widely applied for graph anomaly detection. As one of the key components for GNN design is to select a tailored spectral filter, we take the first step towards analyzing anomalies via the lens of the graph spectrum. Our crucial observation is the existence of anomalies will lead to the `right-shift' phenomenon, that is, the spectral energy distribution concentrates less on low frequencies and more on high frequencies. This fact motivates us to propose the Beta Wavelet Graph Neural Network (BWGNN). Indeed, BWGNN has spectral and spatial localized band-pass filters to better handle the `right-shift' phenomenon in anomalies. We demonstrate the effectiveness of BWGNN on four large-scale anomaly detection datasets. Our code and data are released at https://github.com/squareRoot3/Rethinking-Anomaly-DetectionComment: Accepted by ICML 2022. Our code and data are released at https://github.com/squareRoot3/Rethinking-Anomaly-Detectio

    Recommending third-party APIs via using lightweight graph convolutional neural networks

    No full text
    Third-party APIs have been widely used to develop various applications. As the number of third-party APIs grows, it becomes increasingly challenging to quickly find suitable APIs that meet users’ requirements. Inspired by recommender systems, API recommendation methods have been proposed to address this issue. However, previous API recommendation methods are insufficient in utilising the high-order interactions between users and APIs, and thus have limited performance. Based on the model of lightweight graph convolutional neural network, this paper proposes an effective API recommendation method by exploiting both low-order and high-order interactions between users and APIs. It first learns the embedding of users and APIs from the user-API interaction graph, and then adopts a weighted summation operator to aggregate the embeddings learned from different propagation layers for API recommendation. Extensive experiments are conducted on a real dataset with 160,309 API users and 21,031 Web APIs, and the results show that our method has significantly better precision and recall than other state-of-the-art methods

    Field Study on Earth Pressure of Finite Soil Considering Soil Displacement

    No full text
    The classical earth pressure theory assumes a semi-infinite soil behind the wall, which is no longer applicable to the problem of earth pressure in the case of finite soil. A field study was conducted to investigate the earth pressure of finite soil at different excavation depths. The earth pressure cells were used to measure the change in earth pressure along the depth, and the measured earth pressures were compared with the calculation results for finite soil. Moreover, the influence of the width-to-depth ratio, cohesion, and internal friction angle on the earth pressure of finite soil was also analyzed based on the theoretical calculation method. The research results show that compared with the Rankine active earth pressure, the active earth pressure of finite soil was more suitable for the calculation of earth pressure for a finite soil situation. The difference in the earth pressure of finite soil under different width-to-depth ratios would increase with depth, while the cohesion and internal friction angle had little effect on the earth pressure of finite soil

    Effect of Heat Treatment on Microstructural Evolution and Microhardness Change of Al-5Zn-0.03In-1Er Alloy

    No full text
    Adding an appropriate amount of Er element to Al-Zn-In alloys can improve the electrochemical performance of Al alloys; it is convenient to study the electrochemical behavior of the alloy in the rest of our work. However, Er segregation in solid solutions which reduced the comprehensive properties of alloys was difficult to reduce and there was no report on the homogenization of Al-Zn-In alloys. We found that the ultra-high temperature treatment (UHTT) can obviously reduce Er segregation. To explore the better homogenization treatment and the microstructure evolution of Al-5Zn-0.03In-1Er alloy after UHTT, we carried out a series of heat treatments on the alloy and characterized the microstructure of the alloy by optical microscopy (OM), X-ray diffraction (XRD), scanning electron microscopy (SEM), energy spectrum analysis (EDS) and transmission electron microscopy (TEM). The results showed that the main element Er of the Al-Zn-In-Er was largely enriched in grain boundaries after UHTT; the distribution Zn and In was almost unchanged. The as-cast Al-Zn-In-Er alloy consisted mainly of α(Al) solid solution and Al3Er phase. As the temperature of UHTT increased and the treatment time prolonged, the precipitated phase dissolved into the matrix, and there were dispersed Al3Er particles in the crystal. The proper UHTT for reducing the interdendritic segregation of the alloy was 615 °C × 32 h, which was properly consistent with the results of the evolution of the statistical amount of interdendritic phase, the line scanning analysis and the microhardness. Moreover, the microhardness of the alloy after treatment of 615 °C × 32 h was obviously higher than that of the as-cast alloy because of the anchoring effect of Al3Er nanoparticles on the movement of dislocations
    corecore