14 research outputs found
TCEIP: Text Condition Embedded Regression Network for Dental Implant Position Prediction
When deep neural network has been proposed to assist the dentist in designing
the location of dental implant, most of them are targeting simple cases where
only one missing tooth is available. As a result, literature works do not work
well when there are multiple missing teeth and easily generate false
predictions when the teeth are sparsely distributed. In this paper, we are
trying to integrate a weak supervision text, the target region, to the implant
position regression network, to address above issues. We propose a text
condition embedded implant position regression network (TCEIP), to embed the
text condition into the encoder-decoder framework for improvement of the
regression performance. A cross-modal interaction that consists of cross-modal
attention (CMA) and knowledge alignment module (KAM) is proposed to facilitate
the interaction between features of images and texts. The CMA module performs a
cross-attention between the image feature and the text condition, and the KAM
mitigates the knowledge gap between the image feature and the image encoder of
the CLIP. Extensive experiments on a dental implant dataset through five-fold
cross-validation demonstrated that the proposed TCEIP achieves superior
performance than existing methods.Comment: MICCAI 202
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Recent text-to-image diffusion models have demonstrated an astonishing
capacity to generate high-quality images. However, researchers mainly studied
the way of synthesizing images with only text prompts. While some works have
explored using other modalities as conditions, considerable paired data, e.g.,
box/mask-image pairs, and fine-tuning time are required for nurturing models.
As such paired data is time-consuming and labor-intensive to acquire and
restricted to a closed set, this potentially becomes the bottleneck for
applications in an open world. This paper focuses on the simplest form of
user-provided conditions, e.g., box or scribble. To mitigate the aforementioned
problem, we propose a training-free method to control objects and contexts in
the synthesized images adhering to the given spatial conditions. Specifically,
three spatial constraints, i.e., Inner-Box, Outer-Box, and Corner Constraints,
are designed and seamlessly integrated into the denoising step of diffusion
models, requiring no additional training and massive annotated layout data.
Extensive results show that the proposed constraints can control what and where
to present in the images while retaining the ability of the Stable Diffusion
model to synthesize with high fidelity and diverse concept coverage. The code
is publicly available at https://github.com/Sierkinhane/BoxDiff.Comment: Accepted by ICCV 2023. The paper is still being revised for better
organization and comparison. Code is available at:
https://github.com/Sierkinhane/BoxDif
Open-World Weakly-Supervised Object Localization
While remarkable success has been achieved in weakly-supervised object
localization (WSOL), current frameworks are not capable of locating objects of
novel categories in open-world settings. To address this issue, we are the
first to introduce a new weakly-supervised object localization task called
OWSOL (Open-World Weakly-Supervised Object Localization). During training, all
labeled data comes from known categories and, both known and novel categories
exist in the unlabeled data. To handle such data, we propose a novel paradigm
of contrastive representation co-learning using both labeled and unlabeled data
to generate a complete G-CAM (Generalized Class Activation Map) for object
localization, without the requirement of bounding box annotation. As no class
label is available for the unlabelled data, we conduct clustering over the full
training set and design a novel multiple semantic centroids-driven contrastive
loss for representation learning. We re-organize two widely used datasets,
i.e., ImageNet-1K and iNatLoc500, and propose OpenImages150 to serve as
evaluation benchmarks for OWSOL. Extensive experiments demonstrate that the
proposed method can surpass all baselines by a large margin. We believe that
this work can shift the close-set localization towards the open-world setting
and serve as a foundation for subsequent works. Code will be released at
https://github.com/ryylcc/OWSOL
Dynamically Masked Discriminator for Generative Adversarial Networks
Training Generative Adversarial Networks (GANs) remains a challenging
problem. The discriminator trains the generator by learning the distribution of
real/generated data. However, the distribution of generated data changes
throughout the training process, which is difficult for the discriminator to
learn. In this paper, we propose a novel method for GANs from the viewpoint of
online continual learning. We observe that the discriminator model, trained on
historically generated data, often slows down its adaptation to the changes in
the new arrival generated data, which accordingly decreases the quality of
generated results. By treating the generated data in training as a stream, we
propose to detect whether the discriminator slows down the learning of new
knowledge in generated data. Therefore, we can explicitly enforce the
discriminator to learn new knowledge fast. Particularly, we propose a new
discriminator, which automatically detects its retardation and then dynamically
masks its features, such that the discriminator can adaptively learn the
temporally-vary distribution of generated data. Experimental results show our
method outperforms the state-of-the-art approaches
VisorGPT: Learning Visual Prior via Generative Pre-Training
Various stuff and things in visual data possess specific traits, which can be
learned by deep neural networks and are implicitly represented as the visual
prior, e.g., object location and shape, in the model. Such prior potentially
impacts many vision tasks. For example, in conditional image synthesis, spatial
conditions failing to adhere to the prior can result in visually inaccurate
synthetic results. This work aims to explicitly learn the visual prior and
enable the customization of sampling. Inspired by advances in language
modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed
VisorGPT. By discretizing visual locations of objects, e.g., bounding boxes,
human pose, and instance masks, into sequences, VisorGPT can model visual prior
through likelihood maximization. Besides, prompt engineering is investigated to
unify various visual locations and enable customized sampling of sequential
outputs from the learned prior. Experimental results demonstrate that VisorGPT
can effectively model the visual prior, which can be employed for many vision
tasks, such as customizing accurate human pose for conditional image synthesis
models like ControlNet. Code will be released at
https://github.com/Sierkinhane/VisorGPT.Comment: Project web-page: https://sierkinhane.github.io/visor-gpt
C\u3csup\u3e2\u3c/sup\u3eAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
While class activation map (CAM) generated by image classification network has been widely used for weakly su-pervised object localization (WSOL) and semantic segmentation (WSSS), such classifiers usually focus on discriminative object regions. In this paper, we propose Contrastive learning for Class-agnostic Activation Map (C2AM) generation only using unlabeled image data, without the involvement of image-level supervision. The core idea comes from the observation that i) semantic information of fore-ground objects usually differs from their backgrounds; ii) foreground objects with similar appearance or background with similar color/texture have similar representations in the feature space. We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background with a class-agnostic activation map using a novel contrastive loss. As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions. We successfully extracted from C2 AM class-agnostic object bounding boxes for object localization and background cues to refine CAM generated by classification network for semantic segmentation. Extensive experiments on CUB-200-2011, ImageNet-1K, and PASCAL VOC2012 datasets show that both WSOL and WSSS can benefit from the proposed C2AM. Code will be available at https://github.com/CVI-SZUICCAM
Control of Welding Residual Stress in Large Storage Tank by Finite Element Method
T-joint welding is a key manufacturing process of large storage tanks. However, complex residual stresses are generated and have a great effect on the structural integrity of storage tanks. The high residual stress caused by welding and the discontinuous structure may result in tank cracking and failure. In this work, the residual stress distributions on the inner surface, outer surface, and thickness direction of the T-joint were investigated by using the finite element method and indentation test method. The effect of local PWHT with different heating temperatures, heating rates, and heating widths on the residual stress distribution was also discussed. Results show that the residual stress of the T-shaped joint is high due to the serious structure discontinuity, multi-layer welding, and high strength. Among all the stresses, the circumferential residual stress is the highest and most concentrated in the outer weld connected with the annular plate. The residual stress gradually decreases with the increase in the heat treatment temperature. When the heating rate is less than 106 °C/h, the residual stress gradually decreases with the decrease in the heating rate. The large thermal deformation caused by heat treatment can be simultaneously avoided by heating the inside and outside of the T-joint. The residual stress decreases with the decrease in the width of the heating zone. The residual stress can be regulated by using a smaller width in the heating zone. An optimized heat treatment scheme with a heating temperature of 700 °C, heating rate of 56 °C/h, and heating width of 200 mm was proposed, which has a good ability to control residual stresses and improve the quality of the T-joint. It also has a good application in engineering
Failure Analysis of Cracked P110 Repaired Tubing Used for Gas Transmission
With green and low-carbon developments in oil fields, an increasing amount of repaired oil tubing is being used as oil and gas transmission pipelines in China. However, due to differences in manufacturing standards between oil tubing and transmission pipelines, there are inevitably some issues during their use. This paper investigates a case of cracking failure in repaired oil tubing used as a gathering and transportation pipeline. The failure occurred after eight months of operation and was characterized by a circumferential crack at the male thread end of the tubing joint. To determine the root cause of the failure, a series of experiments were conducted on the oil tubing. The experiments included visual inspection, chemical composition analysis, mechanical properties testing, hardness testing, metallographic examination, and microstructure analysis. The results revealed that the thread of the cracked tubing was not tightened to the specified position; the connection between the tubing and the coupling was welded in a circumferential direction; and cracks occurred in the heat-affected zone of the weld. Chemical composition, tensile performance, and the Charpy impact of the tubing meet the requirements of API 5CT for P110 material, and no abnormalities were found in the metallographic structure. The microstructure at the weld toe of the fracture is martensite, and the hardness is 476 HV10. Based on the thermal simulation verification test, when the material of the tubing cools from 1200 °C, which is located in the coarse HAZ temperature zone, the base metal transforms into martensite with a little granular bainite, exhibiting its highest hardness value at 371 HV10, which is higher than the allowable hardness for carbon steel and indicates the material has poor weldability. The reasons for the cracking and failure of the tubing are that the P110 repaired tubing has a high carbon equivalent and poor weldability. During the welding process, martensitic structure was formed at the weld toe, and cold cracks appeared in the heat-affected zone, resulting in failure. To avoid the reoccurrence of such failure, recommendations are proposed
Late Cretaceous tectono-magmatic activity in the Nize region, central Tibet: evidence for lithospheric delamination beneath the Qiangtang–Lhasa collision zone
<p>The results of zircon U–Pb age dating and whole-rock geochemistry for the Late Cretaceous Nize granodiorite porphyries, combined with analysis of near-coeval structural deformation of the Lower Cretaceous Langshan Formation, provide new data to better understand the tectonic evolution of the northern Lhasa subterrane, central Tibet. Zircon U–Pb ages of 89.2 ± 0.3 Ma to 87.8 ± 0.3 Ma indicate emplacement during the Late Cretaceous. Granodiorite porphyry intrusions were contemporaneous with the development of a regional angular unconformity, overlain by the Upper Cretaceous Jingzhushan (or Abushan) Formation, within the collision zone between the South Qiangtang and Lhasa terranes. Geochemical data for Nize granodiorite porphyries indicate that they have a calc-alkaline composition enriched in large-ion lithophile elements and light rare earth elements and depleted in high-field-strength elements and heavy rare earth elements. High Al<sub>2</sub>O<sub>3</sub> and Sr contents, low Yb and Y contents, and high Sr/Y ratios are similar to adakitic magmas.</p> <p>Structural analysis indicates two stages of deformation (D<sub>1</sub> and D<sub>2</sub>), with D<sub>1</sub> forming the focus of the present study. The D<sub>1</sub> deformation is represented by large-scale faults and records two periods of faulting. These periods are recognized as early compressional thrust faulting and a dominant late stage characterized by normal faulting and extension, with the latter stages of D<sub>1</sub> being near-coeval with the emplacement of the Nize granodiorite porphyries. The combination of zircon ages, geochemical data, and structural analysis indicates that the Nize granodiorite porphyries formed after collision of the South Qiangtang and Lhasa terranes. Adakitic magma derived from partial melting of the thickened lower or middle crust resulted from lithospheric delamination that may have been promoted by the convective removal of deeper lithospheric mantle.</p