38,954 research outputs found

    DramaQA: Character-Centered Video Story Understanding with Hierarchical QA

    Full text link
    Despite recent progress on computer vision and natural language processing, developing video understanding intelligence is still hard to achieve due to the intrinsic difficulty of story in video. Moreover, there is not a theoretical metric for evaluating the degree of video understanding. In this paper, we propose a novel video question answering (Video QA) task, DramaQA, for a comprehensive understanding of the video story. The DramaQA focused on two perspectives: 1) hierarchical QAs as an evaluation metric based on the cognitive developmental stages of human intelligence. 2) character-centered video annotations to model local coherence of the story. Our dataset is built upon the TV drama "Another Miss Oh" and it contains 16,191 QA pairs from 23,928 various length video clips, with each QA pair belonging to one of four difficulty levels. We provide 217,308 annotated images with rich character-centered annotations, including visual bounding boxes, behaviors, and emotions of main characters, and coreference resolved scripts. Additionally, we provide analyses of the dataset as well as Dual Matching Multistream model which effectively learns character-centered representations of video to answer questions about the video. We are planning to release our dataset and model publicly for research purposes and expect that our work will provide a new perspective on video story understanding research.Comment: 21 pages, 10 figures, submitted to ECCV 202

    Gaussian Processes with Context-Supported Priors for Active Object Localization

    Full text link
    We devise an algorithm using a Bayesian optimization framework in conjunction with contextual visual data for the efficient localization of objects in still images. Recent research has demonstrated substantial progress in object localization and related tasks for computer vision. However, many current state-of-the-art object localization procedures still suffer from inaccuracy and inefficiency, in addition to failing to provide a principled and interpretable system amenable to high-level vision tasks. We address these issues with the current research. Our method encompasses an active search procedure that uses contextual data to generate initial bounding-box proposals for a target object. We train a convolutional neural network to approximate an offset distance from the target object. Next, we use a Gaussian Process to model this offset response signal over the search space of the target. We then employ a Bayesian active search for accurate localization of the target. In experiments, we compare our approach to a state-of-theart bounding-box regression method for a challenging pedestrian localization task. Our method exhibits a substantial improvement over this baseline regression method.Comment: 10 pages, 4 figure

    A Review of Verbal and Non-Verbal Human-Robot Interactive Communication

    Get PDF
    In this paper, an overview of human-robot interactive communication is presented, covering verbal as well as non-verbal aspects of human-robot interaction. Following a historical introduction, and motivation towards fluid human-robot communication, ten desiderata are proposed, which provide an organizational axis both of recent as well as of future research on human-robot communication. Then, the ten desiderata are examined in detail, culminating to a unifying discussion, and a forward-looking conclusion

    Internet of robotic things : converging sensing/actuating, hypoconnectivity, artificial intelligence and IoT Platforms

    Get PDF
    The Internet of Things (IoT) concept is evolving rapidly and influencing newdevelopments in various application domains, such as the Internet of MobileThings (IoMT), Autonomous Internet of Things (A-IoT), Autonomous Systemof Things (ASoT), Internet of Autonomous Things (IoAT), Internetof Things Clouds (IoT-C) and the Internet of Robotic Things (IoRT) etc.that are progressing/advancing by using IoT technology. The IoT influencerepresents new development and deployment challenges in different areassuch as seamless platform integration, context based cognitive network integration,new mobile sensor/actuator network paradigms, things identification(addressing, naming in IoT) and dynamic things discoverability and manyothers. The IoRT represents new convergence challenges and their need to be addressed, in one side the programmability and the communication ofmultiple heterogeneous mobile/autonomous/robotic things for cooperating,their coordination, configuration, exchange of information, security, safetyand protection. Developments in IoT heterogeneous parallel processing/communication and dynamic systems based on parallelism and concurrencyrequire new ideas for integrating the intelligent “devices”, collaborativerobots (COBOTS), into IoT applications. Dynamic maintainability, selfhealing,self-repair of resources, changing resource state, (re-) configurationand context based IoT systems for service implementation and integrationwith IoT network service composition are of paramount importance whennew “cognitive devices” are becoming active participants in IoT applications.This chapter aims to be an overview of the IoRT concept, technologies,architectures and applications and to provide a comprehensive coverage offuture challenges, developments and applications

    Virtual Reality applied to biomedical engineering

    Get PDF
    Actualment, la realitat virtual esta sent tendència i s'està expandint a l'àmbit mèdic, fent possible l'aparició de nombroses aplicacions dissenyades per entrenar metges i tractar pacients de forma més eficient, així com optimitzar els processos de planificació quirúrgica. La necessitat mèdica i objectiu d'aquest projecte és fer òptim el procés de planificació quirúrgica per a cardiopaties congènites, que compren la reconstrucció en 3D del cor del pacient i la seva integració en una aplicació de realitat virtual. Seguint aquesta línia s’ha combinat un procés de modelat 3D d’imatges de cors obtinguts gracies al Hospital Sant Joan de Déu i el disseny de l’aplicació mitjançant el software Unity 3D gracies a l’empresa VISYON. S'han aconseguit millores en quant al software emprat per a la segmentació i reconstrucció, i s’han assolit funcionalitats bàsiques a l’aplicació com importar, moure, rotar i fer captures de pantalla en 3D de l'òrgan cardíac i així, entendre millor la cardiopatia que s’ha de tractar. El resultat ha estat la creació d'un procés òptim, en el que la reconstrucció en 3D ha aconseguit ser ràpida i precisa, el mètode d’importació a l’app dissenyada molt senzill, i una aplicació que permet una interacció atractiva i intuïtiva, gracies a una experiència immersiva i realista per ajustar-se als requeriments d'eficiència i precisió exigits en el camp mèdic

    Categorization of indoor places by combining local binary pattern histograms of range and reflectance data from laser range finders

    Get PDF
    This paper presents an approach to categorize typical places in indoor environments using 3D scans provided by a laser range finder. Examples of such places are offices, laboratories, or kitchens. In our method, we combine the range and reflectance data from the laser scan for the final categorization of places. Range and reflectance images are transformed into histograms of local binary patterns and combined into a single feature vector. This vector is later classified using support vector machines. The results of the presented experiments demonstrate the capability of our technique to categorize indoor places with high accuracy. We also show that the combination of range and reflectance information improves the final categorization results in comparison with a single modality
    corecore