27 research outputs found

    Examining CNN Representations with respect to Dataset Bias

    Full text link
    Given a pre-trained CNN without any testing samples, this paper proposes a simple yet effective method to diagnose feature representations of the CNN. We aim to discover representation flaws caused by potential dataset bias. More specifically, when the CNN is trained to estimate image attributes, we mine latent relationships between representations of different attributes inside the CNN. Then, we compare the mined attribute relationships with ground-truth attribute relationships to discover the CNN's blind spots and failure modes due to dataset bias. In fact, representation flaws caused by dataset bias cannot be examined by conventional evaluation strategies based on testing images, because testing images may also have a similar bias. Experiments have demonstrated the effectiveness of our method.Comment: in AAAI 201

    Retrosynthesis prediction enhanced by in-silico reaction data augmentation

    Full text link
    Recent advances in machine learning (ML) have expedited retrosynthesis research by assisting chemists to design experiments more efficiently. However, all ML-based methods consume substantial amounts of paired training data (i.e., chemical reaction: product-reactant(s) pair), which is costly to obtain. Moreover, companies view reaction data as a valuable asset and restrict the accessibility to researchers. These issues prevent the creation of more powerful retrosynthesis models due to their data-driven nature. As a response, we exploit easy-to-access unpaired data (i.e., one component of product-reactant(s) pair) for generating in-silico paired data to facilitate model training. Specifically, we present RetroWISE, a self-boosting framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation using unpaired data, ultimately leading to a superior model. On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models (e.g., +8.6% top-1 accuracy on the USPTO-50K test dataset). Moreover, it consistently improves the prediction accuracy of rare transformations. These results show that Retro- WISE overcomes the training bottleneck by in-silico reactions, thereby paving the way toward more effective ML-based retrosynthesis models

    Boosting Video Object Segmentation via Space-time Correspondence Learning

    Full text link
    Current top-leading solutions for video object segmentation (VOS) typically follow a matching-based regime: for each query frame, the segmentation mask is inferred according to its correspondence to previously processed and the first annotated frames. They simply exploit the supervisory signals from the groundtruth masks for learning mask prediction only, without posing any constraint on the space-time correspondence matching, which, however, is the fundamental building block of such regime. To alleviate this crucial yet commonly ignored issue, we devise a correspondence-aware training framework, which boosts matching-based VOS solutions by explicitly encouraging robust correspondence matching during network learning. Through comprehensively exploring the intrinsic coherence in videos on pixel and object levels, our algorithm reinforces the standard, fully supervised training of mask segmentation with label-free, contrastive correspondence learning. Without neither requiring extra annotation cost during training, nor causing speed delay during deployment, nor incurring architectural modification, our algorithm provides solid performance gains on four widely used benchmarks, i.e., DAVIS2016&2017, and YouTube-VOS2018&2019, on the top of famous matching-based VOS solutions.Comment: CVPR 2023; Project page: https://github.com/wenguanwang/VOS_Correspondenc

    Towards Interpretable Video Super-Resolution via Alternating Optimization

    Full text link
    In this paper, we study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate low-resolution blurry video. Such problem often occurs when recording a fast dynamic event with a low-framerate and low-resolution camera, and the captured video would suffer from three typical issues: i) motion blur occurs due to object/camera motions during exposure time; ii) motion aliasing is unavoidable when the event temporal frequency exceeds the Nyquist limit of temporal sampling; iii) high-frequency details are lost because of the low spatial sampling rate. These issues can be alleviated by a cascade of three separate sub-tasks, including video deblurring, frame interpolation, and super-resolution, which, however, would fail to capture the spatial and temporal correlations among video sequences. To address this, we propose an interpretable STVSR framework by leveraging both model-based and learning-based methods. Specifically, we formulate STVSR as a joint video deblurring, frame interpolation, and super-resolution problem, and solve it as two sub-problems in an alternate way. For the first sub-problem, we derive an interpretable analytical solution and use it as a Fourier data transform layer. Then, we propose a recurrent video enhancement layer for the second sub-problem to further recover high-frequency details. Extensive experiments demonstrate the superiority of our method in terms of quantitative metrics and visual quality.Comment: ECCV 202

    ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection

    Full text link
    Existing approaches for unsupervised point cloud pre-training are constrained to either scene-level or point/voxel-level instance discrimination. Scene-level methods tend to lose local details that are crucial for recognizing the road objects, while point/voxel-level methods inherently suffer from limited receptive field that is incapable of perceiving large objects or context environments. Considering region-level representations are more suitable for 3D object detection, we devise a new unsupervised point cloud pre-training framework, called ProposalContrast, that learns robust 3D representations by contrasting region proposals. Specifically, with an exhaustive set of region proposals sampled from each point cloud, geometric point relations within each proposal are modeled for creating expressive proposal representations. To better accommodate 3D detection properties, ProposalContrast optimizes with both inter-cluster and inter-proposal separation, i.e., sharpening the discriminativeness of proposal representations across semantic classes and object instances. The generalizability and transferability of ProposalContrast are verified on various 3D detectors (i.e., PV-RCNN, CenterPoint, PointPillars and PointRCNN) and datasets (i.e., KITTI, Waymo and ONCE).Comment: Accepted to ECCV 2022. Code: https://github.com/yinjunbo/ProposalContras

    Organic-Inorganic Perovskite Light-Emitting Electrochemical Cells with a Large Capacitance

    Get PDF
    While perovskite light-emitting diodes typically made with high work function anodes and low work function cathodes have recently gained intense interests. Perovskite light-emitting devices with two high work function electrodes with interesting features are demonstrated here. Firstly, electroluminescence can be easily obtained from both forward and reverse biases. Secondly, the results of impedance spectroscopy indicate that the ionic conductivity in the iodide perovskite (CH3 NH3PbI3) is large with a value of approximate to 10(-8) S cm(-1). Thirdly, the shift of the emission spectrum in the mixed halide perovskite (CH3NH3PbI3-Br-x(x)) light-emitting devices indicates that I(-)ions are mobile in the perovskites. Fourthly, this work shows that the accumulated ions at the interfaces result in a large capacitance (approximate to 100 mu F cm(-2)). The above results conclusively prove that the organic-inorganic halide perovskites are solid electrolytes with mixed ionic and electronic conductivity and the light-emitting device is a light-emitting electrochemical cell. The work also suggests that the organic-inorganic halide perovskites are potential energy-storage materials, which may be applicable in the field of solid-state supercapacitors and batteries.While perovskite light-emitting diodes typically made with high work function anodes and low work function cathodes have recently gained intense interests. Perovskite light-emitting devices with two high work function electrodes with interesting features are demonstrated here. Firstly, electroluminescence can be easily obtained from both forward and reverse biases. Secondly, the results of impedance spectroscopy indicate that the ionic conductivity in the iodide perovskite (CH3NH3PbI3) is large with a value of ≈10-8 S cm-1. Thirdly, the shift of the emission spectrum in the mixed halide perovskite (CH3NH3PbI3-xBrx) light-emitting devices indicates that I- ions are mobile in the perovskites. Fourthly, this work shows that the accumulated ions at the interfaces result in a large capacitance (≈100 μF cm-2). The above results conclusively prove that the organic-inorganic halide perovskites are solid electrolytes with mixed ionic and electronic conductivity and the light-emitting device is a light-emitting electrochemical cell. The work also suggests that the organic-inorganic halide perovskites are potential energy-storage materials, which may be applicable in the field of solid-state supercapacitors and batteries. Light-emitting electrochemical cells (LECs) of organic-inorganic perovskite (CH3NH3PbI3) with two high work function electrodes are demonstrated. Results indicate that CH3NH3PbI3 has an ionic conductivity of ≈10-8 S cm-1. The accumulated ions at the interfaces result in a large capacitance, which suggests a potential application in electrochemical energy-storage devices, such as solid-state supercapacitors and batteries

    Reference-Based Image Super-Resolution with Deformable Attention Transformer

    No full text
    Reference-based image super-resolution (RefSR) aims to exploit auxiliary reference (Ref) images to super-resolve low-resolution (LR) images. Recently, RefSR has been attracting great attention as it provides an alternative way to surpass single image SR. However, addressing the RefSR problem has two critical challenges: (i) It is difficult to match the correspondence between LR and Ref images when they are significantly different; (ii) How to transfer the relevant texture from Ref images to compensate the details for LR images is very challenging. To address these issues of RefSR, this paper proposes a deformable attention Transformer, namely DATSR, with multiple scales, each of which consists of a texture feature encoder (TFE) module, a reference-based deformable attention (RDA) module and a residual feature aggregation (RFA) module. Specifically, TFE first extracts image transformation (e.g., brightness) insensitive features for LR and Ref images, RDA then can exploit multiple relevant textures to compensate more information for LR features, and RFA lastly aggregates LR features and relevant textures to get a more visually pleasant result. Extensive experiments demonstrate that our DATSR achieves state-of-the-art performance on benchmark datasets quantitatively and qualitatively.ISSN:0302-9743ISSN:1611-334
    corecore