7,721 research outputs found

    Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

    Full text link
    Visual question answering (Visual QA) has attracted a lot of attention lately, seen essentially as a form of (visual) Turing test that artificial intelligence should strive to achieve. In this paper, we study a crucial component of this task: how can we design good datasets for the task? We focus on the design of multiple-choice based datasets where the learner has to select the right answer from a set of candidate ones including the target (\ie the correct one) and the decoys (\ie the incorrect ones). Through careful analysis of the results attained by state-of-the-art learning models and human annotators on existing datasets, we show that the design of the decoy answers has a significant impact on how and what the learning models learn from the datasets. In particular, the resulting learner can ignore the visual information, the question, or both while still doing well on the task. Inspired by this, we propose automatic procedures to remedy such design deficiencies. We apply the procedures to re-construct decoy answers for two popular Visual QA datasets as well as to create a new Visual QA dataset from the Visual Genome project, resulting in the largest dataset for this task. Extensive empirical studies show that the design deficiencies have been alleviated in the remedied datasets and the performance on them is likely a more faithful indicator of the difference among learning models. The datasets are released and publicly available via http://www.teds.usc.edu/website_vqa/.Comment: Accepted for Oral Presentation at NAACL-HLT 201

    Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning

    Full text link
    Leveraging class semantic descriptions and examples of known objects, zero-shot learning makes it possible to train a recognition model for an object class whose examples are not available. In this paper, we propose a novel zero-shot learning model that takes advantage of clustering structures in the semantic embedding space. The key idea is to impose the structural constraint that semantic representations must be predictive of the locations of their corresponding visual exemplars. To this end, this reduces to training multiple kernel-based regressors from semantic representation-exemplar pairs from labeled data of the seen object categories. Despite its simplicity, our approach significantly outperforms existing zero-shot learning methods on standard benchmark datasets, including the ImageNet dataset with more than 20,000 unseen categories.Comment: ICCV2017 camera-read

    An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild

    Full text link
    Zero-shot learning (ZSL) methods have been studied in the unrealistic setting where test data are assumed to come from unseen classes only. In this paper, we advocate studying the problem of generalized zero-shot learning (GZSL) where the test data's class memberships are unconstrained. We show empirically that naively using the classifiers constructed by ZSL approaches does not perform well in the generalized setting. Motivated by this, we propose a simple but effective calibration method that can be used to balance two conflicting forces: recognizing data from seen classes versus those from unseen ones. We develop a performance metric to characterize such a trade-off and examine the utility of this metric in evaluating various ZSL approaches. Our analysis further shows that there is a large gap between the performance of existing approaches and an upper bound established via idealized semantic embeddings, suggesting that improving class semantic embeddings is vital to GZSL.Comment: ECCV2016 camera-read

    Large-Margin Determinantal Point Processes

    Full text link
    Determinantal point processes (DPPs) offer a powerful approach to modeling diversity in many applications where the goal is to select a diverse subset. We study the problem of learning the parameters (the kernel matrix) of a DPP from labeled training data. We make two contributions. First, we show how to reparameterize a DPP's kernel matrix with multiple kernel functions, thus enhancing modeling flexibility. Second, we propose a novel parameter estimation technique based on the principle of large margin separation. In contrast to the state-of-the-art method of maximum likelihood estimation, our large-margin loss function explicitly models errors in selecting the target subsets, and it can be customized to trade off different types of errors (precision vs. recall). Extensive empirical studies validate our contributions, including applications on challenging document and video summarization, where flexibility in modeling the kernel matrix and balancing different errors is indispensable.Comment: 15 page

    Maxwell-Hydrodynamic Model for Simulating Nonlinear Terahertz Generation from Plasmonic Metasurfaces

    Get PDF
    The interaction between the electromagnetic field and plasmonic nanostructures leads to both the strong linear response and inherent nonlinear behavior. In this paper, a time-domain hydrodynamic model for describing the motion of electrons in plasmonic nanostructures is presented, in which both surface and bulk contributions of nonlinearity are considered. A coupled Maxwell-hydrodynamic system capturing full-wave physics and free electron dynamics is numerically solved with the parallel finite-difference time-domain (FDTD) method. The validation of the proposed method is presented to simulate linear and nonlinear responses from a plasmonic metasurface. The linear response is compared with the Drude dispersion model and the nonlinear terahertz emission from a difference-frequency generation process is validated with theoretical analyses. The proposed scheme is fundamentally important to design nonlinear plasmonic nanodevices, especially for efficient and broadband THz emitters.Comment: 8 pages, 7 figures, IEEE Journal on Multiscale and Multiphysics Computational Techniques, 201

    Exploring the Way to Approach the Efficiency Limit of Perovskite Solar Cells by Drift-Diffusion Model

    Full text link
    Drift-diffusion model is an indispensable modeling tool to understand the carrier dynamics (transport, recombination, and collection) and simulate practical-efficiency of solar cells (SCs) through taking into account various carrier recombination losses existing in multilayered device structures. Exploring the way to predict and approach the SC efficiency limit by using the drift-diffusion model will enable us to gain more physical insights and design guidelines for emerging photovoltaics, particularly perovskite solar cells. Our work finds out that two procedures are the prerequisites for predicting and approaching the SC efficiency limit. Firstly, the intrinsic radiative recombination needs to be corrected after adopting optical designs which will significantly affect the open-circuit voltage at its Shockley-Queisser limit. Through considering a detailed balance between emission and absorption of semiconductor materials at the thermal equilibrium, and the Boltzmann statistics at the non-equilibrium, we offer a different approach to derive the accurate expression of intrinsic radiative recombination with the optical corrections for semiconductor materials. The new expression captures light trapping of the absorbed photons and angular restriction of the emitted photons simultaneously, which are ignored in the traditional Roosbroeck-Shockley expression. Secondly, the contact characteristics of the electrodes need to be carefully engineered to eliminate the charge accumulation and surface recombination at the electrodes. The selective contact or blocking layer incorporated nonselective contact that inhibits the surface recombination at the electrode is another important prerequisite. With the two procedures, the accurate prediction of efficiency limit and precise evaluation of efficiency degradation for perovskite solar cells are attainable by the drift-diffusion model.Comment: 32 pages, 11 figure
    corecore