24 research outputs found
CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
Task driven object detection aims to detect object instances suitable for
affording a task in an image. Its challenge lies in object categories available
for the task being too diverse to be limited to a closed set of object
vocabulary for traditional object detection. Simply mapping categories and
visual features of common objects to the task cannot address the challenge. In
this paper, we propose to explore fundamental affordances rather than object
categories, i.e., common attributes that enable different objects to accomplish
the same task. Moreover, we propose a novel multi-level chain-of-thought
prompting (MLCoT) to extract the affordance knowledge from large language
models, which contains multi-level reasoning steps from task to object examples
to essential visual attributes with rationales. Furthermore, to fully exploit
knowledge to benefit object recognition and localization, we propose a
knowledge-conditional detection framework, namely CoTDet. It conditions the
detector from the knowledge to generate object queries and regress boxes.
Experimental results demonstrate that our CoTDet outperforms state-of-the-art
methods consistently and significantly (+15.6 box AP and +14.8 mask AP) and can
generate rationales for why objects are detected to afford the task.Comment: Accepted by ICCV 202
Contrastive Grouping with Transformer for Referring Image Segmentation
Referring image segmentation aims to segment the target referent in an image
conditioning on a natural language expression. Existing one-stage methods
employ per-pixel classification frameworks, which attempt straightforwardly to
align vision and language at the pixel level, thus failing to capture critical
object-level information. In this paper, we propose a mask classification
framework, Contrastive Grouping with Transformer network (CGFormer), which
explicitly captures object-level information via token-based querying and
grouping strategy. Specifically, CGFormer first introduces learnable query
tokens to represent objects and then alternately queries linguistic features
and groups visual features into the query tokens for object-aware cross-modal
reasoning. In addition, CGFormer achieves cross-level interaction by jointly
updating the query tokens and decoding masks in every two consecutive layers.
Finally, CGFormer cooperates contrastive learning to the grouping strategy to
identify the token and its mask corresponding to the referent. Experimental
results demonstrate that CGFormer outperforms state-of-the-art methods in both
segmentation and generalization settings consistently and significantly.Comment: Accepted by CVPR 202
Outlier-Robust Gromov-Wasserstein for Graph Data
Gromov-Wasserstein (GW) distance is a powerful tool for comparing and
aligning probability distributions supported on different metric spaces.
Recently, GW has become the main modeling technique for aligning heterogeneous
data for a wide range of graph learning tasks. However, the GW distance is
known to be highly sensitive to outliers, which can result in large
inaccuracies if the outliers are given the same weight as other samples in the
objective function. To mitigate this issue, we introduce a new and robust
version of the GW distance called RGW. RGW features optimistically perturbed
marginal constraints within a Kullback-Leibler divergence-based ambiguity set.
To make the benefits of RGW more accessible in practice, we develop a
computationally efficient and theoretically provable procedure using Bregman
proximal alternating linearized minimization algorithm. Through extensive
experimentation, we validate our theoretical results and demonstrate the
effectiveness of RGW on real-world graph learning tasks, such as subgraph
matching and partial shape correspondence
DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
A long-standing goal of AI systems is to perform complex multimodal reasoning
like humans. Recently, large language models (LLMs) have made remarkable
strides in such multi-step reasoning on the language modality solely by
leveraging the chain of thought (CoT) to mimic human thinking. However, the
transfer of these advancements to multimodal contexts introduces heightened
challenges, including but not limited to the impractical need for
labor-intensive annotation and the limitations in terms of flexibility,
generalizability, and explainability. To evoke CoT reasoning in multimodality,
this work first conducts an in-depth analysis of these challenges posed by
multimodality and presents two key insights: "keeping critical thinking" and
"letting everyone do their jobs" in multimodal CoT reasoning. Furthermore, this
study proposes a novel DDCoT prompting that maintains a critical attitude
through negative-space prompting and incorporates multimodality into reasoning
by first dividing the reasoning responsibility of LLMs into reasoning and
recognition and then integrating the visual recognition capability of visual
models into the joint reasoning process. The rationales generated by DDCoT not
only improve the reasoning abilities of both large and small language models in
zero-shot prompting and fine-tuning learning, significantly outperforming
state-of-the-art methods but also exhibit impressive generalizability and
explainability.Comment: 24 pages, 13 figures, to be published in NeurIPS 202
Fast and Provably Convergent Algorithms for Gromov-Wasserstein in Graph Data
In this paper, we study the design and analysis of a class of efficient
algorithms for computing the Gromov-Wasserstein (GW) distance tailored to
large-scale graph learning tasks. Armed with the Luo-Tseng error bound
condition~\citep{luo1992error}, two proposed algorithms, called Bregman
Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient
(hBPG) enjoy the convergence guarantees. Upon task-specific properties, our
analysis further provides novel theoretical insights to guide how to select the
best-fit method. As a result, we are able to provide comprehensive experiments
to validate the effectiveness of our methods on a host of tasks, including
graph alignment, graph partition, and shape matching. In terms of both
wall-clock time and modeling performance, the proposed methods achieve
state-of-the-art results
Role of melatonin in enhancing arbuscular mycorrhizal symbiosis and mitigating cold stress in perennial ryegrass (Lolium perenne L.)
Melatonin is a biomolecule that affects plant development and is involved in protecting plants from environmental stress. However, the mechanisms of melatonin’s impact on arbuscular mycorrhizal (AM) symbiosis and cold tolerance in plants are still unclear. In this research, AM fungi inoculation and exogenous melatonin (MT) were applied to perennial ryegrass (Lolium perenne L.) seedlings alone or in combination to investigate their effect on cold tolerance. The study was conducted in two parts. The initial trial examined two variables, AM inoculation, and cold stress, to investigate the involvement of the AM fungus Rhizophagus irregularis in endogenous melatonin accumulation and the transcriptional levels of its synthesis genes in the root system of perennial ryegrass under cold stress. The subsequent trial was designed as a three-factor analysis, encompassing AM inoculation, cold stress, and melatonin application, to explore the effects of exogenous melatonin application on plant growth, AM symbiosis, antioxidant activity, and protective molecules in perennial ryegrass subjected to cold stress. The results of the study showed that compared to non-mycorrhizal (NM) plants, cold stress promoted an increase in the accumulation of melatonin in the AM-colonized counterparts. Acetylserotonin methyltransferase (ASMT) catalyzed the final enzymatic reaction in melatonin production. Melatonin accumulation was associated with the level of expression of the genes, LpASMT1 and LpASMT3. Treatment with melatonin can improve the colonization of AM fungi in plants. Simultaneous utilization of AM inoculation and melatonin treatment enhanced the growth, antioxidant activity, and phenylalanine ammonia-lyase (PAL) activity, while simultaneously reducing polyphenol oxidase (PPO) activity and altering osmotic regulation in the roots. These effects are expected to aid in the mitigation of cold stress in Lolium perenne. Overall, melatonin treatment would help Lolium perenne to improve growth by promoting AM symbiosis, improving the accumulation of protective molecules, and triggering in antioxidant activity under cold stress
Rethinking Graph Neural Networks for Anomaly Detection
Graph Neural Networks (GNNs) are widely applied for graph anomaly detection.
As one of the key components for GNN design is to select a tailored spectral
filter, we take the first step towards analyzing anomalies via the lens of the
graph spectrum. Our crucial observation is the existence of anomalies will lead
to the `right-shift' phenomenon, that is, the spectral energy distribution
concentrates less on low frequencies and more on high frequencies. This fact
motivates us to propose the Beta Wavelet Graph Neural Network (BWGNN). Indeed,
BWGNN has spectral and spatial localized band-pass filters to better handle the
`right-shift' phenomenon in anomalies. We demonstrate the effectiveness of
BWGNN on four large-scale anomaly detection datasets. Our code and data are
released at https://github.com/squareRoot3/Rethinking-Anomaly-DetectionComment: Accepted by ICML 2022. Our code and data are released at
https://github.com/squareRoot3/Rethinking-Anomaly-Detectio
Recommending third-party APIs via using lightweight graph convolutional neural networks
Third-party APIs have been widely used to develop various applications. As the number of third-party APIs grows, it becomes increasingly challenging to quickly find suitable APIs that meet users’ requirements. Inspired by recommender systems, API recommendation methods have been proposed to address this issue. However, previous API recommendation methods are insufficient in utilising the high-order interactions between users and APIs, and thus have limited performance. Based on the model of lightweight graph convolutional neural network, this paper proposes an effective API recommendation method by exploiting both low-order and high-order interactions between users and APIs. It first learns the embedding of users and APIs from the user-API interaction graph, and then adopts a weighted summation operator to aggregate the embeddings learned from different propagation layers for API recommendation. Extensive experiments are conducted on a real dataset with 160,309 API users and 21,031 Web APIs, and the results show that our method has significantly better precision and recall than other state-of-the-art methods
Field Study on Earth Pressure of Finite Soil Considering Soil Displacement
The classical earth pressure theory assumes a semi-infinite soil behind the wall, which is no longer applicable to the problem of earth pressure in the case of finite soil. A field study was conducted to investigate the earth pressure of finite soil at different excavation depths. The earth pressure cells were used to measure the change in earth pressure along the depth, and the measured earth pressures were compared with the calculation results for finite soil. Moreover, the influence of the width-to-depth ratio, cohesion, and internal friction angle on the earth pressure of finite soil was also analyzed based on the theoretical calculation method. The research results show that compared with the Rankine active earth pressure, the active earth pressure of finite soil was more suitable for the calculation of earth pressure for a finite soil situation. The difference in the earth pressure of finite soil under different width-to-depth ratios would increase with depth, while the cohesion and internal friction angle had little effect on the earth pressure of finite soil
Effect of Heat Treatment on Microstructural Evolution and Microhardness Change of Al-5Zn-0.03In-1Er Alloy
Adding an appropriate amount of Er element to Al-Zn-In alloys can improve the electrochemical performance of Al alloys; it is convenient to study the electrochemical behavior of the alloy in the rest of our work. However, Er segregation in solid solutions which reduced the comprehensive properties of alloys was difficult to reduce and there was no report on the homogenization of Al-Zn-In alloys. We found that the ultra-high temperature treatment (UHTT) can obviously reduce Er segregation. To explore the better homogenization treatment and the microstructure evolution of Al-5Zn-0.03In-1Er alloy after UHTT, we carried out a series of heat treatments on the alloy and characterized the microstructure of the alloy by optical microscopy (OM), X-ray diffraction (XRD), scanning electron microscopy (SEM), energy spectrum analysis (EDS) and transmission electron microscopy (TEM). The results showed that the main element Er of the Al-Zn-In-Er was largely enriched in grain boundaries after UHTT; the distribution Zn and In was almost unchanged. The as-cast Al-Zn-In-Er alloy consisted mainly of α(Al) solid solution and Al3Er phase. As the temperature of UHTT increased and the treatment time prolonged, the precipitated phase dissolved into the matrix, and there were dispersed Al3Er particles in the crystal. The proper UHTT for reducing the interdendritic segregation of the alloy was 615 °C × 32 h, which was properly consistent with the results of the evolution of the statistical amount of interdendritic phase, the line scanning analysis and the microhardness. Moreover, the microhardness of the alloy after treatment of 615 °C × 32 h was obviously higher than that of the as-cast alloy because of the anchoring effect of Al3Er nanoparticles on the movement of dislocations