13 research outputs found

    Text to image synthesis for improved image captioning

    Get PDF
    Generating textual descriptions of images has been an important topic in computer vision and natural language processing. A number of techniques based on deep learning have been proposed on this topic. These techniques use human-annotated images for training and testing the models. These models require a large number of training data to perform at their full potential. Collecting human generated images with associative captions is expensive and time-consuming. In this paper, we propose an image captioning method that uses both real and synthetic data for training and testing the model. We use a Generative Adversarial Network (GAN) based text to image generator to generate synthetic images. We use an attention-based image captioning method trained on both real and synthetic images to generate the captions. We demonstrate the results of our models using both qualitative and quantitative analysis on popularly used evaluation metrics. We show that our experimental results achieve two fold benefits of our proposed work: i) it demonstrates the effectiveness of image captioning for synthetic images, and ii) it further improves the quality of the generated captions for real images, understandably because we use additional images for training

    Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning

    No full text
    In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods

    Attention-Based image captioning using DenseNet features

    No full text
    We present an attention-based image captioning method using DenseNet features. Conventional image captioning methods depend on visual information of the whole scene to generate image captions. Such a mechanism often fails to get the information of salient objects and cannot generate semantically correct captions. We consider an attention mechanism that can focus on relevant parts of the image to generate fine-grained description of that image. We use image features from DenseNet. We conduct our experiments on the MSCOCO dataset. Our proposed method achieved 53.6, 39.8, and 29.5 on BLEU-2, 3, and 4 metrics, respectively, which are superior to the state-of-the-art methods

    A usability evaluation of Neuromender's upper limb game-based rehabilitation system for stroke survivors

    No full text
    Game-based technologies have been widely used as part of stroke rehabilitation. The Neuromender system utilises game-based technologies and consists of serious games that are designed and developed for the purpose of rehabilitation of stroke survivors. In this paper, one of the modules in the Neuromender system which is the “upper limb” module is described and tested for its usability. The upper limb module primarily focuses on the rehabilitation of the upper body extremities of stroke survivors. An experimental study is designed to test the usability of the upper limb module. Various metrics including the optimal distance between the 3D depth sensor device and the survivor, the optimal position of the 3D depth sensor with respect to the survivor, and the response time of the gestures made by the survivors based on their distance to the sensor are evaluated. At the end of the experiments, the optimal distance and optimal position for the survivors to utilise the upper limb module is determined

    Seeing Through The Kinect: A Survey On Heuristics For Building Natural User Interfaces Environments

    No full text
    The idea of interacting with technologies through touch-less and body-based interfaces has caused great excitement amongst users, but for the designers it has created various new challenges. Usability encompasses part of these challenges, and there have been attempts at creating heuristics for NUIs design. However, most of these heuristics consider using a device such as Kinect for the recognition of gestures, not people or objects. Therefore, in this paper we investigate the subject by presenting a systematic literature review aimed at finding heuristics for the design and evaluation of NUIs. Our analysis focuses on the scenario of helping people with visual disabilities in their daily activities. By looking at the state of the art, we intend to verify how many and which heuristics fit in this context. © 2014 Springer International Publishing.8513 LNCSPART 1407418Bailly, G., Müller, J., Lecolinet, E., Design and evaluation of finger-count interaction: Combining multitouch gestures and menus (2012) Int. J. Hum. Comput. Stud., 70, pp. 673-689Block, F., Wigdor, D., Phillips, B.C., Horn, M.S., Shen, C., FlowBlocks: A Multi-Touch UI for Crowd Interaction (2012) Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, UIST 2012, pp. 497-508. , ACM, CambridgeCox, D., Wolford, J., Jensen, C., Beardsley, D., An evaluation of game controllers and tablets as controllers for interactive tv applications (2012) Proceedings of the 14th ACM International Conference on Multimodal Interaction, ICMI 2012, p. 181. , ACM Press, New YorkCuccurullo, S., Francese, R., Murad, S., Passero, I., Tucci, M., A gestural approach to presentation exploiting motion capture metaphors (2012) Proceedings of the International Working Conference on Advanced Visual Interfaces, AVI 2012, pp. 148-155. , ACM Press, New YorkErtin, E., Stohs, N., Kumar, S., Raij, A., Al'Absi, M., Shah, S., AutoSense: Unobtrusively Wearable Sensor Suite for Inferring the Onset, Causality, and Consequences of Stress in the Field (2011) Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems, SenSys 2011, pp. 274-287. , ACM, SeattleFrisch, M., Heydekorn, J., Dachselt, R., Diagram Editing on Interactive Displays Using Multi-touch and Pen Gestures (2010) LNCS, 6170, pp. 182-196. , Goel, A.K., Jamnik, M., Narayanan, N.H. (eds.) Diagrams 2010. Springer, HeidelbergGarzotto, F., Valoriani, M., Milano, P., Touchless Gestural Interaction with Small Displays: A Case Study (2013) Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI - CHItaly 2013, pp. 1-10. , ACM, TrentoGomes, A., Oh, H., Chisik, Y., Chen, M., Ilha Musical: A CAVE for nurturing cultural appreciation (2012) Proceedings of the 11th International Conference on Interaction Design and Children, IDC 2012, Bremen, Germany, pp. 232-235Jankowski, J., Decker, S., A Dual-Mode User Interface for Accessing 3D Content on the World Wide Web Categories and Subject Descriptors (2012) Proceedings of the 21st International Conference on World Wide Web, pp. 1047-1056. , ACM, LyonKlompmaker, F., Paelke, V., A Taxonomy-Based Approach Towards NUI Interaction Design (2013) TEI 2013, pp. 32-41. , ACM, BarcelonaLiberati, A., Altman, D.G., Tetzlaff, J., Mulrow, C., Gotzsche, P.C., Ioannidis, J.P., Clarke, M., Moher, D., The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: Explanation and elaboration (2009) BMJ, 6, p. 27Lin, S., Shie, C., Chen, S., Hung, Y., AirTouch Panel: A Re-Anchorable Virtual Touch Panel (2013) Proceedings of the 21st ACM International Conference on Multimedia, MM 2013, pp. 625-628. , ACM, BarcelonaNebe, K., Klompmaker, F., Jung, H., Fischer, H., Exploiting New Interaction Techniques for Disaster Control Management Using Multitouch-, Tangible- and Pen-Based-Interaction (2011) LNCS, 6762, pp. 100-109. , Jacko, J.A. (ed.) Human-Computer Interaction, Part II, HCII 2011. Springer, HeidelbergNielsen, J., (1994) Usability EngineeringO'Hara, K., Harper, R., Mentis, H., Sellen, A., Taylor, A., On the Naturalness of Touchless: Putting the "Interaction" Back into NUI (2013) ACM Trans. Comput. Interact., 20, pp. 1-25Oh, J., Jung, Y., Cho, Y., Hahm, C., Education, S.S., Sin, H., Science, C., Lee, J., Hands-Up: Motion Recognition using Kinect and a Ceiling to Improve the Convenience of Human Life (2012) CHI 2012 Extended Abstracts on Human Factors in Computing Systems, pp. 1655-1660. , ACMSae-bae, N., Ahmed, K., Isbister, K., Memon, N., Biometric-Rich Gestures: A Novel Approach to Authentication on Multi-touch Devices (2012) Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2012, pp. 977-986Sanna, A., Lamberti, F., Paravati, G., Rocha, F.D., A kinect-based interface to animate virtual characters (2012) J. Multimodal User Interfaces, 7, pp. 269-279Shiratuddin, M.F., Wong, K.W., Game Design Considerations When Using Non-touch Based Natural User Interface (2012) LNCS, 7220, pp. 35-45. , Pan, Z., Cheok, A.D., Müller, W., Chang, M., Zhang, M. (eds.) Transactions on Edutainment VIII. Springer, HeidelbergWigdor, D., Wixon, D., (2011) Brave NUI WorldYang, J., Dekker, A., Muhlberger, R., Viller, S., Exploring Virtual Representations of Physical Artefacts in a Multi-touch Clothing Design Collaboration System (2009) Proceedings of the 21st Annual Conference of the Australian Computer-Human Interaction Special Interest Group, OZCHI 2009, pp. 353-356. , ACM, Melbourn
    corecore