347 research outputs found

    Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

    Full text link
    Visual question answering (Visual QA) has attracted a lot of attention lately, seen essentially as a form of (visual) Turing test that artificial intelligence should strive to achieve. In this paper, we study a crucial component of this task: how can we design good datasets for the task? We focus on the design of multiple-choice based datasets where the learner has to select the right answer from a set of candidate ones including the target (\ie the correct one) and the decoys (\ie the incorrect ones). Through careful analysis of the results attained by state-of-the-art learning models and human annotators on existing datasets, we show that the design of the decoy answers has a significant impact on how and what the learning models learn from the datasets. In particular, the resulting learner can ignore the visual information, the question, or both while still doing well on the task. Inspired by this, we propose automatic procedures to remedy such design deficiencies. We apply the procedures to re-construct decoy answers for two popular Visual QA datasets as well as to create a new Visual QA dataset from the Visual Genome project, resulting in the largest dataset for this task. Extensive empirical studies show that the design deficiencies have been alleviated in the remedied datasets and the performance on them is likely a more faithful indicator of the difference among learning models. The datasets are released and publicly available via http://www.teds.usc.edu/website_vqa/.Comment: Accepted for Oral Presentation at NAACL-HLT 201

    Evaluating Text-to-Image Matching using Binary Image Selection (BISON)

    Full text link
    Providing systems the ability to relate linguistic and visual content is one of the hallmarks of computer vision. Tasks such as text-based image retrieval and image captioning were designed to test this ability but come with evaluation measures that have a high variance or are difficult to interpret. We study an alternative task for systems that match text and images: given a text query, the system is asked to select the image that best matches the query from a pair of semantically similar images. The system's accuracy on this Binary Image SelectiON (BISON) task is interpretable, eliminates the reliability problems of retrieval evaluations, and focuses on the system's ability to understand fine-grained visual structure. We gather a BISON dataset that complements the COCO dataset and use it to evaluate modern text-based image retrieval and image captioning systems. Our results provide novel insights into the performance of these systems. The COCO-BISON dataset and corresponding evaluation code are publicly available from \url{http://hexianghu.com/bison/}

    Model-based Control of the Current Density Profile in the Experimental Advanced Superconducting Tokamak (EAST)

    Get PDF
    As worldwide energy consumption increases, the world is facing the possibility of an energy shortage problem. While several approaches have been proposed to slow down this process, which include the improvement of the combustion efficiency of fossil fuels and the introduction of nuclear energy and renewable energy, such as solar, wind, and geothermal energy, a replacement for fossil fuels will eventually be needed. The energy that comes from a nuclear reaction, which includes nuclear fission and nuclear fusion, has a high energy production density (rate of energy produced divided by the area of the land needed to produce it) and produces no air pollution or greenhouse gases, which makes it a strong and attractive candidate. Compared with nuclear fission, the radioactive waste from nuclear fusion can be more easily disposed, the reactants in a nuclear fusion reaction are abundantly available in nature, and nuclear fusion poses no risk of a nuclear accident. For all these reasons, nuclear fusion is a potential solution for the energy shortage problem. However, there are many challenges that need to be conquered to achieve nuclear fusion. The primary challenge is to confine the hot reactants, whose temperatures are about one hundred million degrees Kelvin. At these temperatures, the reactants are in the plasma state and have enough kinetic energy to overcome the repelling electrostatic forces and fuse. One of the most promising approaches to confine the fusion plasma is magnetic confinement, where magnetic fields are used to confine the plasma through the Lorentz force. The tokamak is one of the fusion devices that exploit magnetic confinement. To demonstrate the viability of a nuclear fusion power plant, the International Thermonuclear Experimental Reactor (ITER) tokamak project is aimed at producing 500 megawatts power with 50 megawatts of input power, which will make it the first tokamak with net energy output. To be able to obtain the desired fusion gain, the ITER tokamak will need to operate at a temperature and a pressure so high that the plasma has a good chance of becoming unstable and difficult to confine. To address this issue, extensive research has been conducted on different fusion tokamaks around the world to find high performance operating scenarios characterized by a high fusion gain, good plasma confinement, plasma stability, and a dominant self-generated plasma current with the goal of developing candidate scenarios for ITER. The shape of the toroidal current density profile, or the safety factor profile (qq-profile), impacts steady-state operation, magnetohydrodynamic (MHD) stability, and plasma performance. The plasma β\beta, which is the ratio of the kinetic pressure of the plasma to the magnetic pressure (pressure exerted on plasma by the magnetic field), acts as an important economic factor in fusion power generation. Therefore, active control of the toroidal current density profile and plasma β\beta is one path towards advanced scenarios. This dissertation focuses on developing control solutions for regulating the current density profile, and to some extent the normalized plasma β\beta (denoted as βN\beta_N), on the Experimental Advanced Superconducting Tokamak (EAST) located at the Institute of Plasma Physics, Chinese Academy of Sciences (ASIPP), in Hefei, China. Towards this goal, a control-oriented, physics-based model has been developed for the current density profile evolution in EAST in response to available heating and current-drive (H\&CD) systems. The feasibility of reconstructing the internal plasma states, which may be crucial for feedback control, from measurements at the magnetic axis and at the plasma edge has been studied by using experimental data and exploiting the response model. Target scenarios (characterized by desired qq-profile and βN\beta_N) have been developed by following a model-based finite-time optimization approach. Feedback controllers ranging from simpler Proportional-Integral-Derivative (PID) controllers to more complex model-based optimal controllers, derived from Linear-Quadratic-Regulator (LQR), H∞H_\infty, and Model Predictive Control (MPC) theories, have been synthesized to counteract deviations from the desired target scenario. The overall control solution has been implemented in the Plasma Control System (PCS) and closed-loop qq-profile regulation has been demonstrated for the first time ever in EAST in disturbance rejection and target tracking experiments

    Learning Structured Inference Neural Networks with Label Relations

    Full text link
    Images of scenes have various objects as well as abundant attributes, and diverse levels of visual categorization are possible. A natural image could be assigned with fine-grained labels that describe major components, coarse-grained labels that depict high level abstraction or a set of labels that reveal attributes. Such categorization at different concept layers can be modeled with label graphs encoding label information. In this paper, we exploit this rich information with a state-of-art deep learning framework, and propose a generic structured model that leverages diverse label relations to improve image classification performance. Our approach employs a novel stacked label prediction neural network, capturing both inter-level and intra-level label semantics. We evaluate our method on benchmark image datasets, and empirical results illustrate the efficacy of our model.Comment: Conference on Computer Vision and Pattern Recognition(CVPR) 201

    Compressed Video Action Recognition

    Full text link
    Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video compression (using H.264, HEVC, etc.), we propose to train a deep network directly on the compressed video. This representation has a higher information density, and we found the training to be easier. In addition, the signals in a compressed video provide free, albeit noisy, motion information. We propose novel techniques to use them effectively. Our approach is about 4.6 times faster than Res3D and 2.7 times faster than ResNet-152. On the task of action recognition, our approach outperforms all the other methods on the UCF-101, HMDB-51, and Charades dataset.Comment: CVPR 2018 (Selected for spotlight presentation
    • …