12,919 research outputs found

    A Deep Learning Approach to Automatic Caption Generation for News Images

    Get PDF
    Automatic caption generation of images has gained significant interest. It gives rise to a lot of interesting image-related applications. For example, it could help in image/video retrieval and management of vast amount of multimedia data available on the Internet. It could also help in development of tools that can aid visually impaired individuals in accessing multimedia content. In this paper, we particularly focus on news images and propose a methodology for automatically generating captions for news paper articles consisting of a text paragraph and an image. We propose several deep neural network architectures built upon Recurrent Neural Networks. Results on a BBC News dataset show that our proposed approach outperforms a traditional method based on Latent Dirichlet Allocation using both automatic evaluation based on BLEU scores and human evaluation

    Using remote vision: The effects of video image frame rate on visual object recognition performance

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.The process of using remote vision was simulated in order to determine the effects of video image frame rate on the performance in visual recognition of stationary environmental hazards in the dynamic video footage of the pedestrian travel environment. The recognition performance was assessed against two different video image frame rate variations: 25 and 2 fps. The assessment included a range of objective and subjective criteria. The obtained results show that the effects of the frame rate variations on the performance are statistically insignificant. This paper belongs to the process of development of a novel system for navigation of visually impaired pedestrians. The navigation system includes a remote vision facility, and the visual recognition of the environmental hazards by the sighted human guide is a basic activity in aiding the visually impaired user of the system in mobility

    An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education

    Get PDF
    The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking

    Accessibility-based reranking in multimedia search engines

    Get PDF
    Traditional multimedia search engines retrieve results based mostly on the query submitted by the user, or using a log of previous searches to provide personalized results, while not considering the accessibility of the results for users with vision or other types of impairments. In this paper, a novel approach is presented which incorporates the accessibility of images for users with various vision impairments, such as color blindness, cataract and glaucoma, in order to rerank the results of an image search engine. The accessibility of individual images is measured through the use of vision simulation filters. Multi-objective optimization techniques utilizing the image accessibility scores are used to handle users with multiple vision impairments, while the impairment profile of a specific user is used to select one from the Pareto-optimal solutions. The proposed approach has been tested with two image datasets, using both simulated and real impaired users, and the results verify its applicability. Although the proposed method has been used for vision accessibility-based reranking, it can also be extended for other types of personalization context

    A survey of comics research in computer science

    Full text link
    Graphical novels such as comics and mangas are well known all over the world. The digital transition started to change the way people are reading comics, more and more on smartphones and tablets and less and less on paper. In the recent years, a wide variety of research about comics has been proposed and might change the way comics are created, distributed and read in future years. Early work focuses on low level document image analysis: indeed comic books are complex, they contains text, drawings, balloon, panels, onomatopoeia, etc. Different fields of computer science covered research about user interaction and content generation such as multimedia, artificial intelligence, human-computer interaction, etc. with different sets of values. We propose in this paper to review the previous research about comics in computer science, to state what have been done and to give some insights about the main outlooks

    ICT-related skills and needs of blind and visually impaired people

    Get PDF
    This study focuses on the relationship between the ICT-related training offered to blind and\ud visually impaired people and their actual, self-reported and demonstrated, competencies\ud for online activities and information processing. The findings of the study can shed light on\ud how people with severe visual disabilities are prepared to access the web for educational,\ud institutional and social participation. The study also gives insight in the validity of instruments\ud to measure ICT-linked skills for the target group and creates an empirical foundation for\ud improvements of ICT-related training. The first phase of the study investigated how blind\ud and visually impaired people perceive their participation in society through ICT. An\ud extensive interview showed how this audience perceives the frequency and quality of their\ud Internet use (or absence thereof) and how they acquired these skills

    Comparing objective visual quality impairment detection in 2D and 3D video sequences

    Get PDF
    The skill level of teleoperator plays a key role in the telerobotic operation. However, plenty of experiments are required to evaluate the skill level in a conventional assessment. In this paper, a novel brain-based method of skill assessment is introduced, and the relationship between the teleoperator's brain states and skill level is first investigated based on a kernel canonical correlation analysis (KCCA) method. The skill of teleoperator (SoT) is defined by a statistic method using the cumulative probability function (CDF). Five indicators are extracted from the electroencephalo-graph (EEG) of the teleoperator to represent the brain states during the telerobotic operation. By using the KCCA algorithm in modeling the relationship between the SoT and the brain states, the correlation has been proved. During the telerobotic operation, the skill level of teleoperator can be well predicted through the brain states. © 2013 IEEE.Link_to_subscribed_fulltex
    • 

    corecore