14 research outputs found

    TCEIP: Text Condition Embedded Regression Network for Dental Implant Position Prediction

    Full text link
    When deep neural network has been proposed to assist the dentist in designing the location of dental implant, most of them are targeting simple cases where only one missing tooth is available. As a result, literature works do not work well when there are multiple missing teeth and easily generate false predictions when the teeth are sparsely distributed. In this paper, we are trying to integrate a weak supervision text, the target region, to the implant position regression network, to address above issues. We propose a text condition embedded implant position regression network (TCEIP), to embed the text condition into the encoder-decoder framework for improvement of the regression performance. A cross-modal interaction that consists of cross-modal attention (CMA) and knowledge alignment module (KAM) is proposed to facilitate the interaction between features of images and texts. The CMA module performs a cross-attention between the image feature and the text condition, and the KAM mitigates the knowledge gap between the image feature and the image encoder of the CLIP. Extensive experiments on a dental implant dataset through five-fold cross-validation demonstrated that the proposed TCEIP achieves superior performance than existing methods.Comment: MICCAI 202

    BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

    Full text link
    Recent text-to-image diffusion models have demonstrated an astonishing capacity to generate high-quality images. However, researchers mainly studied the way of synthesizing images with only text prompts. While some works have explored using other modalities as conditions, considerable paired data, e.g., box/mask-image pairs, and fine-tuning time are required for nurturing models. As such paired data is time-consuming and labor-intensive to acquire and restricted to a closed set, this potentially becomes the bottleneck for applications in an open world. This paper focuses on the simplest form of user-provided conditions, e.g., box or scribble. To mitigate the aforementioned problem, we propose a training-free method to control objects and contexts in the synthesized images adhering to the given spatial conditions. Specifically, three spatial constraints, i.e., Inner-Box, Outer-Box, and Corner Constraints, are designed and seamlessly integrated into the denoising step of diffusion models, requiring no additional training and massive annotated layout data. Extensive results show that the proposed constraints can control what and where to present in the images while retaining the ability of the Stable Diffusion model to synthesize with high fidelity and diverse concept coverage. The code is publicly available at https://github.com/Sierkinhane/BoxDiff.Comment: Accepted by ICCV 2023. The paper is still being revised for better organization and comparison. Code is available at: https://github.com/Sierkinhane/BoxDif

    Open-World Weakly-Supervised Object Localization

    Full text link
    While remarkable success has been achieved in weakly-supervised object localization (WSOL), current frameworks are not capable of locating objects of novel categories in open-world settings. To address this issue, we are the first to introduce a new weakly-supervised object localization task called OWSOL (Open-World Weakly-Supervised Object Localization). During training, all labeled data comes from known categories and, both known and novel categories exist in the unlabeled data. To handle such data, we propose a novel paradigm of contrastive representation co-learning using both labeled and unlabeled data to generate a complete G-CAM (Generalized Class Activation Map) for object localization, without the requirement of bounding box annotation. As no class label is available for the unlabelled data, we conduct clustering over the full training set and design a novel multiple semantic centroids-driven contrastive loss for representation learning. We re-organize two widely used datasets, i.e., ImageNet-1K and iNatLoc500, and propose OpenImages150 to serve as evaluation benchmarks for OWSOL. Extensive experiments demonstrate that the proposed method can surpass all baselines by a large margin. We believe that this work can shift the close-set localization towards the open-world setting and serve as a foundation for subsequent works. Code will be released at https://github.com/ryylcc/OWSOL

    Dynamically Masked Discriminator for Generative Adversarial Networks

    Full text link
    Training Generative Adversarial Networks (GANs) remains a challenging problem. The discriminator trains the generator by learning the distribution of real/generated data. However, the distribution of generated data changes throughout the training process, which is difficult for the discriminator to learn. In this paper, we propose a novel method for GANs from the viewpoint of online continual learning. We observe that the discriminator model, trained on historically generated data, often slows down its adaptation to the changes in the new arrival generated data, which accordingly decreases the quality of generated results. By treating the generated data in training as a stream, we propose to detect whether the discriminator slows down the learning of new knowledge in generated data. Therefore, we can explicitly enforce the discriminator to learn new knowledge fast. Particularly, we propose a new discriminator, which automatically detects its retardation and then dynamically masks its features, such that the discriminator can adaptively learn the temporally-vary distribution of generated data. Experimental results show our method outperforms the state-of-the-art approaches

    VisorGPT: Learning Visual Prior via Generative Pre-Training

    Full text link
    Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VisorGPT. By discretizing visual locations of objects, e.g., bounding boxes, human pose, and instance masks, into sequences, VisorGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate that VisorGPT can effectively model the visual prior, which can be employed for many vision tasks, such as customizing accurate human pose for conditional image synthesis models like ControlNet. Code will be released at https://github.com/Sierkinhane/VisorGPT.Comment: Project web-page: https://sierkinhane.github.io/visor-gpt

    C\u3csup\u3e2\u3c/sup\u3eAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation

    No full text
    While class activation map (CAM) generated by image classification network has been widely used for weakly su-pervised object localization (WSOL) and semantic segmentation (WSSS), such classifiers usually focus on discriminative object regions. In this paper, we propose Contrastive learning for Class-agnostic Activation Map (C2AM) generation only using unlabeled image data, without the involvement of image-level supervision. The core idea comes from the observation that i) semantic information of fore-ground objects usually differs from their backgrounds; ii) foreground objects with similar appearance or background with similar color/texture have similar representations in the feature space. We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background with a class-agnostic activation map using a novel contrastive loss. As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions. We successfully extracted from C2 AM class-agnostic object bounding boxes for object localization and background cues to refine CAM generated by classification network for semantic segmentation. Extensive experiments on CUB-200-2011, ImageNet-1K, and PASCAL VOC2012 datasets show that both WSOL and WSSS can benefit from the proposed C2AM. Code will be available at https://github.com/CVI-SZUICCAM

    Control of Welding Residual Stress in Large Storage Tank by Finite Element Method

    No full text
    T-joint welding is a key manufacturing process of large storage tanks. However, complex residual stresses are generated and have a great effect on the structural integrity of storage tanks. The high residual stress caused by welding and the discontinuous structure may result in tank cracking and failure. In this work, the residual stress distributions on the inner surface, outer surface, and thickness direction of the T-joint were investigated by using the finite element method and indentation test method. The effect of local PWHT with different heating temperatures, heating rates, and heating widths on the residual stress distribution was also discussed. Results show that the residual stress of the T-shaped joint is high due to the serious structure discontinuity, multi-layer welding, and high strength. Among all the stresses, the circumferential residual stress is the highest and most concentrated in the outer weld connected with the annular plate. The residual stress gradually decreases with the increase in the heat treatment temperature. When the heating rate is less than 106 °C/h, the residual stress gradually decreases with the decrease in the heating rate. The large thermal deformation caused by heat treatment can be simultaneously avoided by heating the inside and outside of the T-joint. The residual stress decreases with the decrease in the width of the heating zone. The residual stress can be regulated by using a smaller width in the heating zone. An optimized heat treatment scheme with a heating temperature of 700 °C, heating rate of 56 °C/h, and heating width of 200 mm was proposed, which has a good ability to control residual stresses and improve the quality of the T-joint. It also has a good application in engineering

    Failure Analysis of Cracked P110 Repaired Tubing Used for Gas Transmission

    No full text
    With green and low-carbon developments in oil fields, an increasing amount of repaired oil tubing is being used as oil and gas transmission pipelines in China. However, due to differences in manufacturing standards between oil tubing and transmission pipelines, there are inevitably some issues during their use. This paper investigates a case of cracking failure in repaired oil tubing used as a gathering and transportation pipeline. The failure occurred after eight months of operation and was characterized by a circumferential crack at the male thread end of the tubing joint. To determine the root cause of the failure, a series of experiments were conducted on the oil tubing. The experiments included visual inspection, chemical composition analysis, mechanical properties testing, hardness testing, metallographic examination, and microstructure analysis. The results revealed that the thread of the cracked tubing was not tightened to the specified position; the connection between the tubing and the coupling was welded in a circumferential direction; and cracks occurred in the heat-affected zone of the weld. Chemical composition, tensile performance, and the Charpy impact of the tubing meet the requirements of API 5CT for P110 material, and no abnormalities were found in the metallographic structure. The microstructure at the weld toe of the fracture is martensite, and the hardness is 476 HV10. Based on the thermal simulation verification test, when the material of the tubing cools from 1200 °C, which is located in the coarse HAZ temperature zone, the base metal transforms into martensite with a little granular bainite, exhibiting its highest hardness value at 371 HV10, which is higher than the allowable hardness for carbon steel and indicates the material has poor weldability. The reasons for the cracking and failure of the tubing are that the P110 repaired tubing has a high carbon equivalent and poor weldability. During the welding process, martensitic structure was formed at the weld toe, and cold cracks appeared in the heat-affected zone, resulting in failure. To avoid the reoccurrence of such failure, recommendations are proposed

    Late Cretaceous tectono-magmatic activity in the Nize region, central Tibet: evidence for lithospheric delamination beneath the Qiangtang–Lhasa collision zone

    No full text
    <p>The results of zircon U–Pb age dating and whole-rock geochemistry for the Late Cretaceous Nize granodiorite porphyries, combined with analysis of near-coeval structural deformation of the Lower Cretaceous Langshan Formation, provide new data to better understand the tectonic evolution of the northern Lhasa subterrane, central Tibet. Zircon U–Pb ages of 89.2 ± 0.3 Ma to 87.8 ± 0.3 Ma indicate emplacement during the Late Cretaceous. Granodiorite porphyry intrusions were contemporaneous with the development of a regional angular unconformity, overlain by the Upper Cretaceous Jingzhushan (or Abushan) Formation, within the collision zone between the South Qiangtang and Lhasa terranes. Geochemical data for Nize granodiorite porphyries indicate that they have a calc-alkaline composition enriched in large-ion lithophile elements and light rare earth elements and depleted in high-field-strength elements and heavy rare earth elements. High Al<sub>2</sub>O<sub>3</sub> and Sr contents, low Yb and Y contents, and high Sr/Y ratios are similar to adakitic magmas.</p> <p>Structural analysis indicates two stages of deformation (D<sub>1</sub> and D<sub>2</sub>), with D<sub>1</sub> forming the focus of the present study. The D<sub>1</sub> deformation is represented by large-scale faults and records two periods of faulting. These periods are recognized as early compressional thrust faulting and a dominant late stage characterized by normal faulting and extension, with the latter stages of D<sub>1</sub> being near-coeval with the emplacement of the Nize granodiorite porphyries. The combination of zircon ages, geochemical data, and structural analysis indicates that the Nize granodiorite porphyries formed after collision of the South Qiangtang and Lhasa terranes. Adakitic magma derived from partial melting of the thickened lower or middle crust resulted from lithospheric delamination that may have been promoted by the convective removal of deeper lithospheric mantle.</p
    corecore