15,141 research outputs found

    Semantic Image Synthesis via Adversarial Learning

    Full text link
    In this paper, we propose a way of synthesizing realistic images directly with natural language description, which has many useful applications, e.g. intelligent image manipulation. We attempt to accomplish such synthesis: given a source image and a target text description, our model synthesizes images to meet two requirements: 1) being realistic while matching the target text description; 2) maintaining other image features that are irrelevant to the text description. The model should be able to disentangle the semantic information from the two modalities (image and text), and generate new images from the combined semantics. To achieve this, we proposed an end-to-end neural architecture that leverages adversarial learning to automatically learn implicit loss functions, which are optimized to fulfill the aforementioned two requirements. We have evaluated our model by conducting experiments on Caltech-200 bird dataset and Oxford-102 flower dataset, and have demonstrated that our model is capable of synthesizing realistic images that match the given descriptions, while still maintain other features of original images.Comment: Accepted to ICCV 201

    Exploiting Sentence Embedding for Medical Question Answering

    Full text link
    Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis based on electronic medical records (EMR). The comparison results show that our proposed framework achieved significant improvements compared to competitive baseline approaches. Additionally, a series of controlled experiments are also conducted to illustrate that the multi-scale strategy and the contextual self-attention layer play important roles for producing effective sentence embedding, and the two kinds of scoring strategies are highly complementary to each other for question answering problems.Comment: 8 page

    Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

    Full text link
    Modern object detectors usually suffer from low accuracy issues, as foregrounds always drown in tons of backgrounds and become hard examples during training. Compared with those proposal-based ones, real-time detectors are in far more serious trouble since they renounce the use of region-proposing stage which is used to filter a majority of backgrounds for achieving real-time rates. Though foregrounds as hard examples are in urgent need of being mined from tons of backgrounds, a considerable number of state-of-the-art real-time detectors, like YOLO series, have yet to profit from existing hard example mining methods, as using these methods need detectors fit series of prerequisites. In this paper, we propose a general hard example mining method named Loss Rank Mining (LRM) to fill the gap. LRM is a general method for real-time detectors, as it utilizes the final feature map which exists in all real-time detectors to mine hard examples. By using LRM, some elements representing easy examples in final feature map are filtered and detectors are forced to concentrate on hard examples during training. Extensive experiments validate the effectiveness of our method. With our method, the improvements of YOLOv2 detector on auto-driving related dataset KITTI and more general dataset PASCAL VOC are over 5% and 2% mAP, respectively. In addition, LRM is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded.Comment: 8 pages, 6 figure

    Phenotype-based and Self-learning Inter-individual Sleep Apnea Screening with a Level IV Monitoring System

    Get PDF
    Purpose: We propose a phenotype-based artificial intelligence system that can self-learn and is accurate for screening purposes, and test it on a Level IV monitoring system. Methods: Based on the physiological knowledge, we hypothesize that the phenotype information will allow us to find subjects from a well-annotated database that share similar sleep apnea patterns. Therefore, for a new-arriving subject, we can establish a prediction model from the existing database that is adaptive to the subject. We test the proposed algorithm on a database consisting of 62 subjects with the signals recorded from a Level IV wearable device measuring the thoracic and abdominal movements and the SpO2. Results: With the leave-one cross validation, the accuracy of the proposed algorithm to screen subjects with an apnea-hypopnea index greater or equal to 15 is 93.6%, the positive likelihood ratio is 6.8, and the negative likelihood ratio is 0.03. Conclusion: The results confirm the hypothesis and show that the proposed algorithm has great potential to screen patients with SAS
    • …
    corecore