797 research outputs found
The iNaturalist Species Classification and Detection Dataset
Existing image classification datasets used in computer vision tend to have a
uniform distribution of images across object categories. In contrast, the
natural world is heavily imbalanced, as some species are more abundant and
easier to photograph than others. To encourage further progress in challenging
real world conditions we present the iNaturalist species classification and
detection dataset, consisting of 859,000 images from over 5,000 different
species of plants and animals. It features visually similar species, captured
in a wide variety of situations, from all over the world. Images were collected
with different camera types, have varying image quality, feature a large class
imbalance, and have been verified by multiple citizen scientists. We discuss
the collection of the dataset and present extensive baseline experiments using
state-of-the-art computer vision classification and detection models. Results
show that current non-ensemble based methods achieve only 67% top one
classification accuracy, illustrating the difficulty of the dataset.
Specifically, we observe poor results for classes with small numbers of
training examples suggesting more attention is needed in low-shot learning.Comment: CVPR 201
Traceable and Authenticable Image Tagging for Fake News Detection
To prevent fake news images from misleading the public, it is desirable not
only to verify the authenticity of news images but also to trace the source of
fake news, so as to provide a complete forensic chain for reliable fake news
detection. To simultaneously achieve the goals of authenticity verification and
source tracing, we propose a traceable and authenticable image tagging approach
that is based on a design of Decoupled Invertible Neural Network (DINN). The
designed DINN can simultaneously embed the dual-tags, \textit{i.e.},
authenticable tag and traceable tag, into each news image before publishing,
and then separately extract them for authenticity verification and source
tracing. Moreover, to improve the accuracy of dual-tags extraction, we design a
parallel Feature Aware Projection Model (FAPM) to help the DINN preserve
essential tag information. In addition, we define a Distance Metric-Guided
Module (DMGM) that learns asymmetric one-class representations to enable the
dual-tags to achieve different robustness performances under malicious
manipulations. Extensive experiments, on diverse datasets and unseen
manipulations, demonstrate that the proposed tagging approach achieves
excellent performance in the aspects of both authenticity verification and
source tracing for reliable fake news detection and outperforms the prior
works
Large-Signal Stability Criteria in DC Power Grids with Distributed-Controlled Converters and Constant Power Loads
The increasing adoption of power electronic devices may lead to large
disturbance and destabilization of future power systems. However, stability
criteria are still an unsolved puzzle, since traditional small-signal stability
analysis is not applicable to power electronics-enabled power systems when a
large disturbance occurs, such as a fault, a pulse power load, or load
switching. To address this issue, this paper presents for the first time the
rigorous derivation of the sufficient criteria for large-signal stability in DC
microgrids with distributed-controlled DC-DC power converters. A novel type of
closed-loop converter controllers is designed and considered. Moreover, this
paper is the first to prove that the well-known and frequently cited
Brayton-Moser mixed potential theory (published in 1964) is incomplete. Case
studies are carried out to illustrate the defects of Brayton-Moser mixed
potential theory and verify the effectiveness of the proposed novel stability
criteria
Hyperbolic Face Anti-Spoofing
Learning generalized face anti-spoofing (FAS) models against presentation
attacks is essential for the security of face recognition systems. Previous FAS
methods usually encourage models to extract discriminative features, of which
the distances within the same class (bonafide or attack) are pushed close while
those between bonafide and attack are pulled away. However, these methods are
designed based on Euclidean distance, which lacks generalization ability for
unseen attack detection due to poor hierarchy embedding ability. According to
the evidence that different spoofing attacks are intrinsically hierarchical, we
propose to learn richer hierarchical and discriminative spoofing cues in
hyperbolic space. Specifically, for unimodal FAS learning, the feature
embeddings are projected into the Poincar\'e ball, and then the hyperbolic
binary logistic regression layer is cascaded for classification. To further
improve generalization, we conduct hyperbolic contrastive learning for the
bonafide only while relaxing the constraints on diverse spoofing attacks. To
alleviate the vanishing gradient problem in hyperbolic space, a new feature
clipping method is proposed to enhance the training stability of hyperbolic
models. Besides, we further design a multimodal FAS framework with Euclidean
multimodal feature decomposition and hyperbolic multimodal feature fusion &
classification. Extensive experiments on three benchmark datasets (i.e., WMCA,
PADISI-Face, and SiW-M) with diverse attack types demonstrate that the proposed
method can bring significant improvement compared to the Euclidean baselines on
unseen attack detection. In addition, the proposed framework is also
generalized well on four benchmark datasets (i.e., MSU-MFSD, IDIAP
REPLAY-ATTACK, CASIA-FASD, and OULU-NPU) with a limited number of attack types
Characterization of the DNA-unwinding activity of human RECQ1, a helicase specifically stimulated by human replication protein A.
The RecQ helicases are involved in several aspects of DNA metabolism. Five members of the RecQ family have been found in humans, but only two of them have been carefully characterized, BLM and WRN. In this work, we describe the enzymatic characterization of RECQ1. The helicase has 3' to 5' polarity, cannot start the unwinding from a blunt-ended terminus, and needs a 3'-single-stranded DNA tail longer than 10 nucleotides to open the substrate. However, it was also able to unwind a blunt-ended duplex DNA with a "bubble" of 25 nucleotides in the middle, as previously observed for WRN and BLM. We show that only short DNA duplexes (30 bp) can be unwound by RECQ1 alone, but the addition of human replication protein A (hRPA) increases the processivity of the enzyme (100 bp). Our studies done with Escherichia coli single-strand binding protein (SSB) indicate that the helicase activity of RECQ1 is specifically stimulated by hRPA. This finding suggests that RECQ1 and hRPA may interact also in vivo and function together in DNA metabolism. Comparison of the present results with previous studies on WRN and BLM provides novel insight into the role of the N- and C-terminal domains of these helicases in determining their substrate specificity and in their interaction with hRPA
Paint by Word
We investigate the problem of zero-shot semantic image painting. Instead of
painting modifications into an image using only concrete colors or a finite set
of semantic concepts, we ask how to create semantic paint based on open
full-text descriptions: our goal is to be able to point to a location in a
synthesized image and apply an arbitrary new concept such as "rustic" or
"opulent" or "happy dog." To do this, our method combines a state-of-the art
generative model of realistic images with a state-of-the-art text-image
semantic similarity network. We find that, to make large changes, it is
important to use non-gradient methods to explore latent space, and it is
important to relax the computations of the GAN to target changes to a specific
region. We conduct user studies to compare our methods to several baselines.Comment: 10 pages, 9 figure
Zero-shot sampling of adversarial entities in biomedical question answering
The increasing depth of parametric domain knowledge in large language models
(LLMs) is fueling their rapid deployment in real-world applications. In
high-stakes and knowledge-intensive tasks, understanding model vulnerabilities
is essential for quantifying the trustworthiness of model predictions and
regulating their use. The recent discovery of named entities as adversarial
examples in natural language processing tasks raises questions about their
potential guises in other settings. Here, we propose a powerscaled
distance-weighted sampling scheme in embedding space to discover diverse
adversarial entities as distractors. We demonstrate its advantage over random
sampling in adversarial question answering on biomedical topics. Our approach
enables the exploration of different regions on the attack surface, which
reveals two regimes of adversarial entities that markedly differ in their
characteristics. Moreover, we show that the attacks successfully manipulate
token-wise Shapley value explanations, which become deceptive in the
adversarial setting. Our investigations illustrate the brittleness of domain
knowledge in LLMs and reveal a shortcoming of standard evaluations for
high-capacity models.Comment: 20 pages incl. appendix, under revie
- …