691 research outputs found

    Towards Open-Ended Visual Recognition with Large Language Model

    Full text link
    Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal model, complemented by an open-vocabulary classifier (e.g., CLIP) using pre-extracted text embeddings. However, it is worth noting that these open-vocabulary recognition models still exhibit limitations in practical applications. On one hand, they rely on the provision of class names during testing, where the recognition performance heavily depends on this predefined set of semantic classes by users. On the other hand, when training with multiple datasets, human intervention is required to alleviate the label definition conflict between them. In this paper, we introduce the OmniScient Model (OSM), a novel Large Language Model (LLM) based mask classifier, as a straightforward and effective solution to the aforementioned challenges. Specifically, OSM predicts class labels in a generative manner, thus removing the supply of class names during both training and testing. It also enables cross-dataset training without any human interference, exhibiting robust generalization capabilities due to the world knowledge acquired from the LLM. By combining OSM with an off-the-shelf mask proposal model, we present promising results on various benchmarks, and demonstrate its effectiveness in handling novel concepts. Code/model are available at https://github.com/bytedance/OmniScient-Model

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

    Full text link
    Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a shared embedding space, which bridges the gap between closed-vocabulary and open-vocabulary recognition. Hence, existing methods often adopt a two-stage framework to tackle the problem, where the inputs first go through a mask generator and then through the CLIP model along with the predicted masks. This process involves extracting features from images multiple times, which can be ineffective and inefficient. By contrast, we propose to build everything into a single-stage framework using a shared Frozen Convolutional CLIP backbone, which not only significantly simplifies the current two-stage pipeline, but also remarkably yields a better accuracy-cost trade-off. The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining. When training on COCO panoptic data only and testing in a zero-shot manner, FC-CLIP achieve 26.8 PQ, 16.8 AP, and 34.1 mIoU on ADE20K, 18.2 PQ, 27.9 mIoU on Mapillary Vistas, 44.0 PQ, 26.8 AP, 56.2 mIoU on Cityscapes, outperforming the prior art by +4.2 PQ, +2.4 AP, +4.2 mIoU on ADE20K, +4.0 PQ on Mapillary Vistas and +20.1 PQ on Cityscapes, respectively. Additionally, the training and testing time of FC-CLIP is 7.5x and 6.6x significantly faster than the same prior art, while using 5.9x fewer parameters. FC-CLIP also sets a new state-of-the-art performance across various open-vocabulary semantic segmentation datasets. Code at https://github.com/bytedance/fc-clipComment: code and model available at https://github.com/bytedance/fc-cli

    Seismic test and numerical verification of the scaled-down reinforced concrete containment vessel

    Get PDF
    According to the ASME-359 code, a scaled-down structure of Reinforced Concrete Containment Vessel (RCCV) of Advanced Boiling Water Reactor (ABWR) building is constructed for the seismic test on the shaking table. Several acceleration time history satisfing design response spectrum with different magnitudes are used in the test. Besides, the numerical finite element model of RCCV is built by SAP2000 for calculating the dynamic responses numerically

    Enhancing the Insulation of Wide-Range Spectrum in the PVA/N Thin Film by Doping ZnO Nanowires

    Get PDF
    In this study, polyvinyl alcohol/nitrogen (PVA/N) hybrid thin films doped with sharp-sword ZnO nanowires with insulating effect and wide-range spectrum are demonstrated for the first time. PVA/N doped ZnO nanocomposites were developed by blending PVA and N-doped ZnO nanowires in water at room temperature. Measurements from the field emission scanning electron microscopy (FE-SEM), X-ray diffraction (XRD), Raman, and photoluminescence emission (PL) spectra of the products show that nitrogen is successfully doped into the ZnO wurtzite crystal lattice. In addition, the refractive index of PVA/N doped ZnO hybrid thin films can be controlled by varying the doped ZnO nanowires under different NH3 concentrations. It is believed that PVA/N doped ZnO hybrid thin films are a suitable candidate for emerging applications like heat-shielding coatings on smart windows

    MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

    Full text link
    Video panoptic segmentation requires consistently segmenting (for both `thing' and `stuff' classes) and tracking objects in a video over time. In this work, we present MaXTron, a general framework that exploits Mask XFormer with Trajectory Attention to tackle the task. MaXTron enriches an off-the-shelf mask transformer by leveraging trajectory attention. The deployed mask transformer takes as input a short clip consisting of only a few frames and predicts the clip-level segmentation. To enhance the temporal consistency, MaXTron employs within-clip and cross-clip tracking modules, efficiently utilizing trajectory attention. Originally designed for video classification, trajectory attention learns to model the temporal correspondences between neighboring frames and aggregates information along the estimated motion paths. However, it is nontrivial to directly extend trajectory attention to the per-pixel dense prediction tasks due to its quadratic dependency on input size. To alleviate the issue, we propose to adapt the trajectory attention for both the dense pixel features and object queries, aiming to improve the short-term and long-term tracking results, respectively. Particularly, in our within-clip tracking module, we propose axial-trajectory attention that effectively computes the trajectory attention for tracking dense pixels sequentially along the height- and width-axes. The axial decomposition significantly reduces the computational complexity for dense pixel features. In our cross-clip tracking module, since the object queries in mask transformer are learned to encode the object information, we are able to capture the long-term temporal connections by applying trajectory attention to object queries, which learns to track each object across different clips. Without bells and whistles, MaXTron demonstrates state-of-the-art performances on video segmentation benchmarks.Comment: Code at https://github.com/TACJu/MaXTro

    Fluoroquinolones are associated with delayed treatment and resistance in tuberculosis: a systematic review and meta-analysis

    Get PDF
    SummaryBackgroundCurrent guidelines for treating community-acquired pneumonia recommend the use of fluoroquinolones for high-risk patients. Previous studies have reported controversial results as to whether fluoroquinolones are associated with delayed diagnosis and treatment of pulmonary tuberculosis (TB) and the development of fluoroquinolone-resistant Mycobacterium tuberculosis. We performed a systematic review and meta-analysis to clarify these issues.MethodsThe following databases were searched through September 30, 2010: PubMed, EMBASE, CINAHL, Cochrane Library, Web of Science, BIOSIS Previews, and the ACP Journal Club. We considered studies that addressed the issues of delay in diagnosis and treatment of TB and the development of resistance.ResultsNine eligible studies (four for delays and five for resistance issues) were included in the meta-analysis from the 770 articles originally identified in the database search. The mean duration of delayed diagnosis and treatment of pulmonary TB in the fluoroquinolone prescription group was 19.03 days, significantly longer than that in the non-fluoroquinolone group (95% confidence interval (CI) 10.87 to 27.18, p<0.001). The pooled odds ratio of developing a fluoroquinolone-resistant M. tuberculosis strain was 2.70 (95% CI 1.30 to 5.60, p=0.008). No significant heterogeneity was found among studies in the meta-analysis.ConclusionsEmpirical fluoroquinolone prescriptions for pneumonia are associated with longer delays in diagnosis and treatment of pulmonary TB and a higher risk of developing fluoroquinolone-resistant M. tuberculosis

    k-means Mask Transformer

    Full text link
    The rise of transformers in vision tasks not only advances network backbone designs, but also starts a brand-new page to achieve end-to-end image recognition (e.g., object detection and panoptic segmentation). Originated from Natural Language Processing (NLP), transformer architectures, consisting of self-attention and cross-attention, effectively learn long-range interactions between elements in a sequence. However, we observe that most existing transformer-based vision models simply borrow the idea from NLP, neglecting the crucial difference between languages and images, particularly the extremely large sequence length of spatially flattened pixel features. This subsequently impedes the learning in cross-attention between pixel features and object queries. In this paper, we rethink the relationship between pixels and object queries and propose to reformulate the cross-attention learning as a clustering process. Inspired by the traditional k-means clustering algorithm, we develop a k-means Mask Xformer (kMaX-DeepLab) for segmentation tasks, which not only improves the state-of-the-art, but also enjoys a simple and elegant design. As a result, our kMaX-DeepLab achieves a new state-of-the-art performance on COCO val set with 58.0% PQ, Cityscapes val set with 68.4% PQ, 44.0% AP, and 83.5% mIoU, and ADE20K val set with 50.9% PQ and 55.2% mIoU without test-time augmentation or external dataset. We hope our work can shed some light on designing transformers tailored for vision tasks. Code and models are available at https://github.com/google-research/deeplab2Comment: ECCV 2022. arXiv v2: add results on ADE20K. arXiv v3: fix appendix. Codes and models are available at https://github.com/google-research/deeplab
    corecore