691 research outputs found
Towards Open-Ended Visual Recognition with Large Language Model
Localizing and recognizing objects in the open-ended physical world poses a
long-standing challenge within the domain of machine perception. Recent methods
have endeavored to address the issue by employing a class-agnostic mask (or
box) proposal model, complemented by an open-vocabulary classifier (e.g., CLIP)
using pre-extracted text embeddings. However, it is worth noting that these
open-vocabulary recognition models still exhibit limitations in practical
applications. On one hand, they rely on the provision of class names during
testing, where the recognition performance heavily depends on this predefined
set of semantic classes by users. On the other hand, when training with
multiple datasets, human intervention is required to alleviate the label
definition conflict between them. In this paper, we introduce the OmniScient
Model (OSM), a novel Large Language Model (LLM) based mask classifier, as a
straightforward and effective solution to the aforementioned challenges.
Specifically, OSM predicts class labels in a generative manner, thus removing
the supply of class names during both training and testing. It also enables
cross-dataset training without any human interference, exhibiting robust
generalization capabilities due to the world knowledge acquired from the LLM.
By combining OSM with an off-the-shelf mask proposal model, we present
promising results on various benchmarks, and demonstrate its effectiveness in
handling novel concepts. Code/model are available at
https://github.com/bytedance/OmniScient-Model
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Open-vocabulary segmentation is a challenging task requiring segmenting and
recognizing objects from an open set of categories. One way to address this
challenge is to leverage multi-modal models, such as CLIP, to provide image and
text features in a shared embedding space, which bridges the gap between
closed-vocabulary and open-vocabulary recognition. Hence, existing methods
often adopt a two-stage framework to tackle the problem, where the inputs first
go through a mask generator and then through the CLIP model along with the
predicted masks. This process involves extracting features from images multiple
times, which can be ineffective and inefficient. By contrast, we propose to
build everything into a single-stage framework using a shared Frozen
Convolutional CLIP backbone, which not only significantly simplifies the
current two-stage pipeline, but also remarkably yields a better accuracy-cost
trade-off. The proposed FC-CLIP, benefits from the following observations: the
frozen CLIP backbone maintains the ability of open-vocabulary classification
and can also serve as a strong mask generator, and the convolutional CLIP
generalizes well to a larger input resolution than the one used during
contrastive image-text pretraining. When training on COCO panoptic data only
and testing in a zero-shot manner, FC-CLIP achieve 26.8 PQ, 16.8 AP, and 34.1
mIoU on ADE20K, 18.2 PQ, 27.9 mIoU on Mapillary Vistas, 44.0 PQ, 26.8 AP, 56.2
mIoU on Cityscapes, outperforming the prior art by +4.2 PQ, +2.4 AP, +4.2 mIoU
on ADE20K, +4.0 PQ on Mapillary Vistas and +20.1 PQ on Cityscapes,
respectively. Additionally, the training and testing time of FC-CLIP is 7.5x
and 6.6x significantly faster than the same prior art, while using 5.9x fewer
parameters. FC-CLIP also sets a new state-of-the-art performance across various
open-vocabulary semantic segmentation datasets. Code at
https://github.com/bytedance/fc-clipComment: code and model available at https://github.com/bytedance/fc-cli
Seismic test and numerical verification of the scaled-down reinforced concrete containment vessel
According to the ASME-359 code, a scaled-down structure of Reinforced Concrete Containment Vessel (RCCV) of Advanced Boiling Water Reactor (ABWR) building is constructed for the seismic test on the shaking table. Several acceleration time history satisfing design response spectrum with different magnitudes are used in the test. Besides, the numerical finite element model of RCCV is built by SAP2000 for calculating the dynamic responses numerically
Enhancing the Insulation of Wide-Range Spectrum in the PVA/N Thin Film by Doping ZnO Nanowires
In this study, polyvinyl alcohol/nitrogen (PVA/N) hybrid thin films doped with sharp-sword ZnO nanowires with insulating effect and wide-range spectrum are demonstrated for the first time. PVA/N doped ZnO nanocomposites were developed by blending PVA and N-doped ZnO nanowires in water at room temperature. Measurements from the field emission scanning electron microscopy (FE-SEM), X-ray diffraction (XRD), Raman, and photoluminescence emission (PL) spectra of the products show that nitrogen is successfully doped into the ZnO wurtzite crystal lattice. In addition, the refractive index of PVA/N doped ZnO hybrid thin films can be controlled by varying the doped ZnO nanowires under different NH3 concentrations. It is believed that PVA/N doped ZnO hybrid thin films are a suitable candidate for emerging applications like heat-shielding coatings on smart windows
MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation
Video panoptic segmentation requires consistently segmenting (for both
`thing' and `stuff' classes) and tracking objects in a video over time. In this
work, we present MaXTron, a general framework that exploits Mask XFormer with
Trajectory Attention to tackle the task. MaXTron enriches an off-the-shelf mask
transformer by leveraging trajectory attention. The deployed mask transformer
takes as input a short clip consisting of only a few frames and predicts the
clip-level segmentation. To enhance the temporal consistency, MaXTron employs
within-clip and cross-clip tracking modules, efficiently utilizing trajectory
attention. Originally designed for video classification, trajectory attention
learns to model the temporal correspondences between neighboring frames and
aggregates information along the estimated motion paths. However, it is
nontrivial to directly extend trajectory attention to the per-pixel dense
prediction tasks due to its quadratic dependency on input size. To alleviate
the issue, we propose to adapt the trajectory attention for both the dense
pixel features and object queries, aiming to improve the short-term and
long-term tracking results, respectively. Particularly, in our within-clip
tracking module, we propose axial-trajectory attention that effectively
computes the trajectory attention for tracking dense pixels sequentially along
the height- and width-axes. The axial decomposition significantly reduces the
computational complexity for dense pixel features. In our cross-clip tracking
module, since the object queries in mask transformer are learned to encode the
object information, we are able to capture the long-term temporal connections
by applying trajectory attention to object queries, which learns to track each
object across different clips. Without bells and whistles, MaXTron demonstrates
state-of-the-art performances on video segmentation benchmarks.Comment: Code at https://github.com/TACJu/MaXTro
Fluoroquinolones are associated with delayed treatment and resistance in tuberculosis: a systematic review and meta-analysis
SummaryBackgroundCurrent guidelines for treating community-acquired pneumonia recommend the use of fluoroquinolones for high-risk patients. Previous studies have reported controversial results as to whether fluoroquinolones are associated with delayed diagnosis and treatment of pulmonary tuberculosis (TB) and the development of fluoroquinolone-resistant Mycobacterium tuberculosis. We performed a systematic review and meta-analysis to clarify these issues.MethodsThe following databases were searched through September 30, 2010: PubMed, EMBASE, CINAHL, Cochrane Library, Web of Science, BIOSIS Previews, and the ACP Journal Club. We considered studies that addressed the issues of delay in diagnosis and treatment of TB and the development of resistance.ResultsNine eligible studies (four for delays and five for resistance issues) were included in the meta-analysis from the 770 articles originally identified in the database search. The mean duration of delayed diagnosis and treatment of pulmonary TB in the fluoroquinolone prescription group was 19.03 days, significantly longer than that in the non-fluoroquinolone group (95% confidence interval (CI) 10.87 to 27.18, p<0.001). The pooled odds ratio of developing a fluoroquinolone-resistant M. tuberculosis strain was 2.70 (95% CI 1.30 to 5.60, p=0.008). No significant heterogeneity was found among studies in the meta-analysis.ConclusionsEmpirical fluoroquinolone prescriptions for pneumonia are associated with longer delays in diagnosis and treatment of pulmonary TB and a higher risk of developing fluoroquinolone-resistant M. tuberculosis
k-means Mask Transformer
The rise of transformers in vision tasks not only advances network backbone
designs, but also starts a brand-new page to achieve end-to-end image
recognition (e.g., object detection and panoptic segmentation). Originated from
Natural Language Processing (NLP), transformer architectures, consisting of
self-attention and cross-attention, effectively learn long-range interactions
between elements in a sequence. However, we observe that most existing
transformer-based vision models simply borrow the idea from NLP, neglecting the
crucial difference between languages and images, particularly the extremely
large sequence length of spatially flattened pixel features. This subsequently
impedes the learning in cross-attention between pixel features and object
queries. In this paper, we rethink the relationship between pixels and object
queries and propose to reformulate the cross-attention learning as a clustering
process. Inspired by the traditional k-means clustering algorithm, we develop a
k-means Mask Xformer (kMaX-DeepLab) for segmentation tasks, which not only
improves the state-of-the-art, but also enjoys a simple and elegant design. As
a result, our kMaX-DeepLab achieves a new state-of-the-art performance on COCO
val set with 58.0% PQ, Cityscapes val set with 68.4% PQ, 44.0% AP, and 83.5%
mIoU, and ADE20K val set with 50.9% PQ and 55.2% mIoU without test-time
augmentation or external dataset. We hope our work can shed some light on
designing transformers tailored for vision tasks. Code and models are available
at https://github.com/google-research/deeplab2Comment: ECCV 2022. arXiv v2: add results on ADE20K. arXiv v3: fix appendix.
Codes and models are available at https://github.com/google-research/deeplab
- …