7,721 research outputs found
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Visual question answering (Visual QA) has attracted a lot of attention
lately, seen essentially as a form of (visual) Turing test that artificial
intelligence should strive to achieve. In this paper, we study a crucial
component of this task: how can we design good datasets for the task? We focus
on the design of multiple-choice based datasets where the learner has to select
the right answer from a set of candidate ones including the target (\ie the
correct one) and the decoys (\ie the incorrect ones). Through careful analysis
of the results attained by state-of-the-art learning models and human
annotators on existing datasets, we show that the design of the decoy answers
has a significant impact on how and what the learning models learn from the
datasets. In particular, the resulting learner can ignore the visual
information, the question, or both while still doing well on the task. Inspired
by this, we propose automatic procedures to remedy such design deficiencies. We
apply the procedures to re-construct decoy answers for two popular Visual QA
datasets as well as to create a new Visual QA dataset from the Visual Genome
project, resulting in the largest dataset for this task. Extensive empirical
studies show that the design deficiencies have been alleviated in the remedied
datasets and the performance on them is likely a more faithful indicator of the
difference among learning models. The datasets are released and publicly
available via http://www.teds.usc.edu/website_vqa/.Comment: Accepted for Oral Presentation at NAACL-HLT 201
Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning
Leveraging class semantic descriptions and examples of known objects,
zero-shot learning makes it possible to train a recognition model for an object
class whose examples are not available. In this paper, we propose a novel
zero-shot learning model that takes advantage of clustering structures in the
semantic embedding space. The key idea is to impose the structural constraint
that semantic representations must be predictive of the locations of their
corresponding visual exemplars. To this end, this reduces to training multiple
kernel-based regressors from semantic representation-exemplar pairs from
labeled data of the seen object categories. Despite its simplicity, our
approach significantly outperforms existing zero-shot learning methods on
standard benchmark datasets, including the ImageNet dataset with more than
20,000 unseen categories.Comment: ICCV2017 camera-read
Comments on “High-temperature creep resistance and effects on the austenite reversion and precipitation of 18 Ni (300) maraging steel” by dos Reis et al. [Materials Characterization 107 (2015) 350- 357]
An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild
Zero-shot learning (ZSL) methods have been studied in the unrealistic setting
where test data are assumed to come from unseen classes only. In this paper, we
advocate studying the problem of generalized zero-shot learning (GZSL) where
the test data's class memberships are unconstrained. We show empirically that
naively using the classifiers constructed by ZSL approaches does not perform
well in the generalized setting. Motivated by this, we propose a simple but
effective calibration method that can be used to balance two conflicting
forces: recognizing data from seen classes versus those from unseen ones. We
develop a performance metric to characterize such a trade-off and examine the
utility of this metric in evaluating various ZSL approaches. Our analysis
further shows that there is a large gap between the performance of existing
approaches and an upper bound established via idealized semantic embeddings,
suggesting that improving class semantic embeddings is vital to GZSL.Comment: ECCV2016 camera-read
Large-Margin Determinantal Point Processes
Determinantal point processes (DPPs) offer a powerful approach to modeling
diversity in many applications where the goal is to select a diverse subset. We
study the problem of learning the parameters (the kernel matrix) of a DPP from
labeled training data. We make two contributions. First, we show how to
reparameterize a DPP's kernel matrix with multiple kernel functions, thus
enhancing modeling flexibility. Second, we propose a novel parameter estimation
technique based on the principle of large margin separation. In contrast to the
state-of-the-art method of maximum likelihood estimation, our large-margin loss
function explicitly models errors in selecting the target subsets, and it can
be customized to trade off different types of errors (precision vs. recall).
Extensive empirical studies validate our contributions, including applications
on challenging document and video summarization, where flexibility in modeling
the kernel matrix and balancing different errors is indispensable.Comment: 15 page
Maxwell-Hydrodynamic Model for Simulating Nonlinear Terahertz Generation from Plasmonic Metasurfaces
The interaction between the electromagnetic field and plasmonic
nanostructures leads to both the strong linear response and inherent nonlinear
behavior. In this paper, a time-domain hydrodynamic model for describing the
motion of electrons in plasmonic nanostructures is presented, in which both
surface and bulk contributions of nonlinearity are considered. A coupled
Maxwell-hydrodynamic system capturing full-wave physics and free electron
dynamics is numerically solved with the parallel finite-difference time-domain
(FDTD) method. The validation of the proposed method is presented to simulate
linear and nonlinear responses from a plasmonic metasurface. The linear
response is compared with the Drude dispersion model and the nonlinear
terahertz emission from a difference-frequency generation process is validated
with theoretical analyses. The proposed scheme is fundamentally important to
design nonlinear plasmonic nanodevices, especially for efficient and broadband
THz emitters.Comment: 8 pages, 7 figures, IEEE Journal on Multiscale and Multiphysics
Computational Techniques, 201
Exploring the Way to Approach the Efficiency Limit of Perovskite Solar Cells by Drift-Diffusion Model
Drift-diffusion model is an indispensable modeling tool to understand the
carrier dynamics (transport, recombination, and collection) and simulate
practical-efficiency of solar cells (SCs) through taking into account various
carrier recombination losses existing in multilayered device structures.
Exploring the way to predict and approach the SC efficiency limit by using the
drift-diffusion model will enable us to gain more physical insights and design
guidelines for emerging photovoltaics, particularly perovskite solar cells. Our
work finds out that two procedures are the prerequisites for predicting and
approaching the SC efficiency limit. Firstly, the intrinsic radiative
recombination needs to be corrected after adopting optical designs which will
significantly affect the open-circuit voltage at its Shockley-Queisser limit.
Through considering a detailed balance between emission and absorption of
semiconductor materials at the thermal equilibrium, and the Boltzmann
statistics at the non-equilibrium, we offer a different approach to derive the
accurate expression of intrinsic radiative recombination with the optical
corrections for semiconductor materials. The new expression captures light
trapping of the absorbed photons and angular restriction of the emitted photons
simultaneously, which are ignored in the traditional Roosbroeck-Shockley
expression. Secondly, the contact characteristics of the electrodes need to be
carefully engineered to eliminate the charge accumulation and surface
recombination at the electrodes. The selective contact or blocking layer
incorporated nonselective contact that inhibits the surface recombination at
the electrode is another important prerequisite. With the two procedures, the
accurate prediction of efficiency limit and precise evaluation of efficiency
degradation for perovskite solar cells are attainable by the drift-diffusion
model.Comment: 32 pages, 11 figure
- …
