13 research outputs found
Text to image synthesis for improved image captioning
Generating textual descriptions of images has been an important topic in computer vision and natural language processing. A number of techniques based on deep learning have been proposed on this topic. These techniques use human-annotated images for training and testing the models. These models require a large number of training data to perform at their full potential. Collecting human generated images with associative captions is expensive and time-consuming. In this paper, we propose an image captioning method that uses both real and synthetic data for training and testing the model. We use a Generative Adversarial Network (GAN) based text to image generator to generate synthetic images. We use an attention-based image captioning method trained on both real and synthetic images to generate the captions. We demonstrate the results of our models using both qualitative and quantitative analysis on popularly used evaluation metrics. We show that our experimental results achieve two fold benefits of our proposed work: i) it demonstrates the effectiveness of image captioning for synthetic images, and ii) it further improves the quality of the generated captions for real images, understandably because we use additional images for training
Recommended from our members
Assessing the impact of a construction virtual reality game on design review skills of construction student
Being able to perform an effective design review is a fundamental skill that every design or construction student should possess. Performing a design review requires critical thinking, analysis, and communication skills. To support the acquisition of these skills, this study investigated the impact of a virtual reality game, the design review simulator (DRS), in a classroom environment. To assess the impact of the game, the authors performed a crossover experiment with 120 students. The students were split into groups and were asked to perform design reviews in two treatment conditions, 2D drawings or the DRS. A knowledge test was administered at the end of different treatments to collect quantitative data on the students’ performance. Results from this research indicate that the DRS supported students in identifying a higher number of design mistakes. Furthermore, students that played with the DRS after performing 2D drawings reviews scored significantly higher in the knowledge test. The research results contribute to the growing knowledge base on the implementation of VR in the classroom and in supporting students in meeting learning objectives related to design review skills. In particular, the research illustrated that the benefits of VR found in the construction industry in terms of improved communication, user involvement, and feedback collection could be translated into the classroom environment
Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning
In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods
Attention-Based image captioning using DenseNet features
We present an attention-based image captioning method using DenseNet features. Conventional image captioning methods depend on visual information of the whole scene to generate image captions. Such a mechanism often fails to get the information of salient objects and cannot generate semantically correct captions. We consider an attention mechanism that can focus on relevant parts of the image to generate fine-grained description of that image. We use image features from DenseNet. We conduct our experiments on the MSCOCO dataset. Our proposed method achieved 53.6, 39.8, and 29.5 on BLEU-2, 3, and 4 metrics, respectively, which are superior to the state-of-the-art methods
A usability evaluation of Neuromender's upper limb game-based rehabilitation system for stroke survivors
Game-based technologies have been widely used as part of stroke rehabilitation. The Neuromender system utilises game-based technologies and consists of serious games that are designed and developed for the purpose of rehabilitation of stroke survivors. In this paper, one of the modules in the Neuromender system which is the “upper limb” module is described and tested for its usability. The upper limb module primarily focuses on the rehabilitation of the upper body extremities of stroke survivors. An experimental study is designed to test the usability of the upper limb module. Various metrics including the optimal distance between the 3D depth sensor device and the survivor, the optimal position of the 3D depth sensor with respect to the survivor, and the response time of the gestures made by the survivors based on their distance to the sensor are evaluated. At the end of the experiments, the optimal distance and optimal position for the survivors to utilise the upper limb module is determined
Seeing Through The Kinect: A Survey On Heuristics For Building Natural User Interfaces Environments
The idea of interacting with technologies through touch-less and body-based interfaces has caused great excitement amongst users, but for the designers it has created various new challenges. Usability encompasses part of these challenges, and there have been attempts at creating heuristics for NUIs design. However, most of these heuristics consider using a device such as Kinect for the recognition of gestures, not people or objects. Therefore, in this paper we investigate the subject by presenting a systematic literature review aimed at finding heuristics for the design and evaluation of NUIs. Our analysis focuses on the scenario of helping people with visual disabilities in their daily activities. By looking at the state of the art, we intend to verify how many and which heuristics fit in this context. © 2014 Springer International Publishing.8513 LNCSPART 1407418Bailly, G., Müller, J., Lecolinet, E., Design and evaluation of finger-count interaction: Combining multitouch gestures and menus (2012) Int. J. Hum. Comput. Stud., 70, pp. 673-689Block, F., Wigdor, D., Phillips, B.C., Horn, M.S., Shen, C., FlowBlocks: A Multi-Touch UI for Crowd Interaction (2012) Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, UIST 2012, pp. 497-508. , ACM, CambridgeCox, D., Wolford, J., Jensen, C., Beardsley, D., An evaluation of game controllers and tablets as controllers for interactive tv applications (2012) Proceedings of the 14th ACM International Conference on Multimodal Interaction, ICMI 2012, p. 181. , ACM Press, New YorkCuccurullo, S., Francese, R., Murad, S., Passero, I., Tucci, M., A gestural approach to presentation exploiting motion capture metaphors (2012) Proceedings of the International Working Conference on Advanced Visual Interfaces, AVI 2012, pp. 148-155. , ACM Press, New YorkErtin, E., Stohs, N., Kumar, S., Raij, A., Al'Absi, M., Shah, S., AutoSense: Unobtrusively Wearable Sensor Suite for Inferring the Onset, Causality, and Consequences of Stress in the Field (2011) Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems, SenSys 2011, pp. 274-287. , ACM, SeattleFrisch, M., Heydekorn, J., Dachselt, R., Diagram Editing on Interactive Displays Using Multi-touch and Pen Gestures (2010) LNCS, 6170, pp. 182-196. , Goel, A.K., Jamnik, M., Narayanan, N.H. (eds.) Diagrams 2010. Springer, HeidelbergGarzotto, F., Valoriani, M., Milano, P., Touchless Gestural Interaction with Small Displays: A Case Study (2013) Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI - CHItaly 2013, pp. 1-10. , ACM, TrentoGomes, A., Oh, H., Chisik, Y., Chen, M., Ilha Musical: A CAVE for nurturing cultural appreciation (2012) Proceedings of the 11th International Conference on Interaction Design and Children, IDC 2012, Bremen, Germany, pp. 232-235Jankowski, J., Decker, S., A Dual-Mode User Interface for Accessing 3D Content on the World Wide Web Categories and Subject Descriptors (2012) Proceedings of the 21st International Conference on World Wide Web, pp. 1047-1056. , ACM, LyonKlompmaker, F., Paelke, V., A Taxonomy-Based Approach Towards NUI Interaction Design (2013) TEI 2013, pp. 32-41. , ACM, BarcelonaLiberati, A., Altman, D.G., Tetzlaff, J., Mulrow, C., Gotzsche, P.C., Ioannidis, J.P., Clarke, M., Moher, D., The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: Explanation and elaboration (2009) BMJ, 6, p. 27Lin, S., Shie, C., Chen, S., Hung, Y., AirTouch Panel: A Re-Anchorable Virtual Touch Panel (2013) Proceedings of the 21st ACM International Conference on Multimedia, MM 2013, pp. 625-628. , ACM, BarcelonaNebe, K., Klompmaker, F., Jung, H., Fischer, H., Exploiting New Interaction Techniques for Disaster Control Management Using Multitouch-, Tangible- and Pen-Based-Interaction (2011) LNCS, 6762, pp. 100-109. , Jacko, J.A. (ed.) Human-Computer Interaction, Part II, HCII 2011. Springer, HeidelbergNielsen, J., (1994) Usability EngineeringO'Hara, K., Harper, R., Mentis, H., Sellen, A., Taylor, A., On the Naturalness of Touchless: Putting the "Interaction" Back into NUI (2013) ACM Trans. Comput. Interact., 20, pp. 1-25Oh, J., Jung, Y., Cho, Y., Hahm, C., Education, S.S., Sin, H., Science, C., Lee, J., Hands-Up: Motion Recognition using Kinect and a Ceiling to Improve the Convenience of Human Life (2012) CHI 2012 Extended Abstracts on Human Factors in Computing Systems, pp. 1655-1660. , ACMSae-bae, N., Ahmed, K., Isbister, K., Memon, N., Biometric-Rich Gestures: A Novel Approach to Authentication on Multi-touch Devices (2012) Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2012, pp. 977-986Sanna, A., Lamberti, F., Paravati, G., Rocha, F.D., A kinect-based interface to animate virtual characters (2012) J. Multimodal User Interfaces, 7, pp. 269-279Shiratuddin, M.F., Wong, K.W., Game Design Considerations When Using Non-touch Based Natural User Interface (2012) LNCS, 7220, pp. 35-45. , Pan, Z., Cheok, A.D., Müller, W., Chang, M., Zhang, M. (eds.) Transactions on Edutainment VIII. Springer, HeidelbergWigdor, D., Wixon, D., (2011) Brave NUI WorldYang, J., Dekker, A., Muhlberger, R., Viller, S., Exploring Virtual Representations of Physical Artefacts in a Multi-touch Clothing Design Collaboration System (2009) Proceedings of the 21st Annual Conference of the Australian Computer-Human Interaction Special Interest Group, OZCHI 2009, pp. 353-356. , ACM, Melbourn