121,724 research outputs found

    Generative Adversarial Text to Image Synthesis

    Full text link
    Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific categories, such as faces, album covers, and room interiors. In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image model- ing, translating visual concepts from characters to pixels. We demonstrate the capability of our model to generate plausible images of birds and flowers from detailed text descriptions.Comment: ICML 201

    PaperRobot: Incremental Draft Generation of Scientific Ideas

    Full text link
    We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper. Turing Tests, where a biomedical domain expert is asked to compare a system output and a human-authored string, show PaperRobot generated abstracts, conclusion and future work sections, and new titles are chosen over human-written ones up to 30%, 24% and 12% of the time, respectively.Comment: 12 pages. Accepted by ACL 2019 Code and resource is available at https://github.com/EagleW/PaperRobo

    Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

    Full text link
    Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained internet videos. Modeling multiple, dense labels benefits from temporal relations within and across classes. We define a novel variant of long short-term memory (LSTM) deep networks for modeling these temporal relations via multiple input and output connections. We show that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction.Comment: To appear in IJC

    Movie indexing via event detection

    Get PDF
    The past number of years has seen a large increase in the number of movies, and therefore movie databases, created. As movies are typically quite long, locating relevant clips in these databases is quite difficult unless a well defined index is in place. As movies are creatively made, creating automatic indexing algorithms is a challenging task. However, there are a number of underlying film grammar principles that are universally followed. By detecting and examining the use of these principles, it is possible to extract information about the occurrences of specific events in a movie. This work attempts to completely index a movie by detecting all of the relevant events. The event detection process involves examining the underlying structure of a movie and utilising audiovisual analysis techniques, supported by machine learning algorithms, to extract information based on this structure. This results in a summarised and indexed movie
    corecore