387 research outputs found

    Use the Detection Transformer as a Data Augmenter

    Full text link
    Detection Transformer (DETR) is a Transformer architecture based object detection model. In this paper, we demonstrate that it can also be used as a data augmenter. We term our approach as DETR assisted CutMix, or DeMix for short. DeMix builds on CutMix, a simple yet highly effective data augmentation technique that has gained popularity in recent years. CutMix improves model performance by cutting and pasting a patch from one image onto another, yielding a new image. The corresponding label for this new example is specified as the weighted average of the original labels, where the weight is proportional to the area of the patches. CutMix selects a random patch to be cut. In contrast, DeMix elaborately selects a semantically rich patch, located by a pre-trained DETR. The label of the new image is specified in the same way as in CutMix. Experimental results on benchmark datasets for image classification demonstrate that DeMix significantly outperforms prior art data augmentation methods including CutMix.Comment: 13 page

    Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

    Get PDF
    Due to its causal semantics, Bayesian networks (BN) have been widely employed to discover the underlying data relationship in exploratory studies, such as brain research. Despite its success in modeling the probability distribution of variables, BN is naturally a generative model, which is not necessarily discriminative. This may cause the ignorance of subtle but critical network changes that are of investigation values across populations. In this paper, we propose to improve the discriminative power of BN models for continuous variables from two different perspectives. This brings two general discriminative learning frameworks for Gaussian Bayesian networks (GBN). In the first framework, we employ Fisher kernel to bridge the generative models of GBN and the discriminative classifiers of SVMs, and convert the GBN parameter learning to Fisher kernel learning via minimizing a generalization error bound of SVMs. In the second framework, we employ the max-margin criterion and build it directly upon GBN models to explicitly optimize the classification performance of the GBNs. The advantages and disadvantages of the two frameworks are discussed and experimentally compared. Both of them demonstrate strong power in learning discriminative parameters of GBNs for neuroimaging based brain network analysis, as well as maintaining reasonable representation capacity. The contributions of this paper also include a new Directed Acyclic Graph (DAG) constraint with theoretical guarantee to ensure the graph validity of GBN.Comment: 16 pages and 5 figures for the article (excluding appendix

    Adversarial Feature Stacking for Accurate and Robust Predictions

    Full text link
    Deep Neural Networks (DNNs) have achieved remarkable performance on a variety of applications but are extremely vulnerable to adversarial perturbation. To address this issue, various defense methods have been proposed to enhance model robustness. Unfortunately, the most representative and promising methods, such as adversarial training and its variants, usually degrade model accuracy on benign samples, limiting practical utility. This indicates that it is difficult to extract both robust and accurate features using a single network under certain conditions, such as limited training data, resulting in a trade-off between accuracy and robustness. To tackle this problem, we propose an Adversarial Feature Stacking (AFS) model that can jointly take advantage of features with varied levels of robustness and accuracy, thus significantly alleviating the aforementioned trade-off. Specifically, we adopt multiple networks adversarially trained with different perturbation budgets to extract either more robust features or more accurate features. These features are then fused by a learnable merger to give final predictions. We evaluate the AFS model on CIFAR-10 and CIFAR-100 datasets with strong adaptive attack methods, which significantly advances the state-of-the-art in terms of the trade-off. Without extra training data, the AFS model achieves a benign accuracy improvement of 6% on CIFAR-10 and 9% on CIFAR-100 with comparable or even stronger robustness than the state-of-the-art adversarial training methods. This work demonstrates the feasibility to obtain both accurate and robust models under the circumstances of limited training data

    ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification

    Full text link
    Due to the limitations of inadequate Whole-Slide Image (WSI) samples with weak labels, pseudo-bag-based multiple instance learning (MIL) appears as a vibrant prospect in WSI classification. However, the pseudo-bag dividing scheme, often crucial for classification performance, is still an open topic worth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, using a bag prototype to guide the division of WSI pseudo-bags. Rather than designing complex network architecture, this scheme takes a plugin-and-play approach to safely augment WSI data for effective training while preserving sample consistency. Furthermore, we specially devise an attention-based prototype that could be optimized dynamically in training to adapt to a classification task. We apply our ProtoDiv scheme on seven baseline models, and then carry out a group of comparison experiments on two public WSI datasets. Experiments confirm our ProtoDiv could usually bring obvious performance improvements to WSI classification.Comment: 12 pages, 5 figures, and 3 table

    Catalytic Asymmetric Reactions between Alkenes and Aldehydes

    Get PDF
    This doctoral work describes catalytic asymmetric reactions between alkenes and aldehydes, enabled by the development of chiral Brønsted acids. Valuable and functionalized enantiomerically enriched cyclic compounds were efficiently furnished from inexpensive and commercially available reagents with high degrees of atom economy. In the first part of this thesis, the first highly enantioselective organocatalytic intramolecular carbonyl−ene cyclization of olefinic aldehydes is presented. In the second part, asymmetric cyclizations via oxocarbenium ions are described. One is a general asymmetric catalytic Prins cyclization of aldehydes with homoallylic alcohols, in which the oxocarbenium ion is attacked intramolecularly by a pendent alkene. The other one is an asymmetric oxa-Pictet−Spengler reaction between aldehydes and homobenzyl alcohols, in which the oxocarbenium ion is trapped by an intramolecular arene. The first general asymmetric [4+2]-cycloaddition of simple and unactivated dienes with aldehydes is developed in the last part of this thesis. This methodology is extremely robust and scalable. Valuable enantiomerically enriched dihydropyran compounds could be readily obtained from inexpensive and abundant dienes and aldehydes. New types of confined Brønsted acids were rationally designed and synthesized, including imino-imidodiphosphates (iIDPs), nitrated imidodiphosphates (nIDPs), and imidodiphosphorimidates (IDPis). Beyond the application of these catalysts in various asymmetric reactions between simple alkenes and aldehydes, mechanistic investigations are also disclosed in this doctoral work

    Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder

    Full text link
    Medical Visual Question Answering (VQA) systems play a supporting role to understand clinic-relevant information carried by medical images. The questions to a medical image include two categories: close-end (such as Yes/No question) and open-end. To obtain answers, the majority of the existing medical VQA methods relies on classification approaches, while a few works attempt to use generation approaches or a mixture of the two. The classification approaches are relatively simple but perform poorly on long open-end questions. To bridge this gap, in this paper, we propose a new Transformer based framework for medical VQA (named as Q2ATransformer), which integrates the advantages of both the classification and the generation approaches and provides a unified treatment for the close-end and open-end questions. Specifically, we introduce an additional Transformer decoder with a set of learnable candidate answer embeddings to query the existence of each answer class to a given image-question pair. Through the Transformer attention, the candidate answer embeddings interact with the fused features of the image-question pair to make the decision. In this way, despite being a classification-based approach, our method provides a mechanism to interact with the answer information for prediction like the generation-based approaches. On the other hand, by classification, we mitigate the task difficulty by reducing the search space of answers. Our method achieves new state-of-the-art performance on two medical VQA benchmarks. Especially, for the open-end questions, we achieve 79.19% on VQA-RAD and 54.85% on PathVQA, with 16.09% and 41.45% absolute improvements, respectively

    Deep Joint Source-Channel Coding for DNA Image Storage: A Novel Approach with Enhanced Error Resilience and Biological Constraint Optimization

    Full text link
    In the current era, DeoxyriboNucleic Acid (DNA) based data storage emerges as an intriguing approach, garnering substantial academic interest and investigation. This paper introduces a novel deep joint source-channel coding (DJSCC) scheme for DNA image storage, designated as DJSCC-DNA. This paradigm distinguishes itself from conventional DNA storage techniques through three key modifications: 1) it employs advanced deep learning methodologies, employing convolutional neural networks for DNA encoding and decoding processes; 2) it seamlessly integrates DNA polymerase chain reaction (PCR) amplification into the network architecture, thereby augmenting data recovery precision; and 3) it restructures the loss function by targeting biological constraints for optimization. The performance of the proposed model is demonstrated via numerical results from specific channel testing, suggesting that it surpasses conventional deep learning methodologies in terms of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Additionally, the model effectively ensures positive constraints on both homopolymer run-length and GC content
    • …
    corecore