38,954 research outputs found
DramaQA: Character-Centered Video Story Understanding with Hierarchical QA
Despite recent progress on computer vision and natural language processing,
developing video understanding intelligence is still hard to achieve due to the
intrinsic difficulty of story in video. Moreover, there is not a theoretical
metric for evaluating the degree of video understanding. In this paper, we
propose a novel video question answering (Video QA) task, DramaQA, for a
comprehensive understanding of the video story. The DramaQA focused on two
perspectives: 1) hierarchical QAs as an evaluation metric based on the
cognitive developmental stages of human intelligence. 2) character-centered
video annotations to model local coherence of the story. Our dataset is built
upon the TV drama "Another Miss Oh" and it contains 16,191 QA pairs from 23,928
various length video clips, with each QA pair belonging to one of four
difficulty levels. We provide 217,308 annotated images with rich
character-centered annotations, including visual bounding boxes, behaviors, and
emotions of main characters, and coreference resolved scripts. Additionally, we
provide analyses of the dataset as well as Dual Matching Multistream model
which effectively learns character-centered representations of video to answer
questions about the video. We are planning to release our dataset and model
publicly for research purposes and expect that our work will provide a new
perspective on video story understanding research.Comment: 21 pages, 10 figures, submitted to ECCV 202
Gaussian Processes with Context-Supported Priors for Active Object Localization
We devise an algorithm using a Bayesian optimization framework in conjunction
with contextual visual data for the efficient localization of objects in still
images. Recent research has demonstrated substantial progress in object
localization and related tasks for computer vision. However, many current
state-of-the-art object localization procedures still suffer from inaccuracy
and inefficiency, in addition to failing to provide a principled and
interpretable system amenable to high-level vision tasks. We address these
issues with the current research.
Our method encompasses an active search procedure that uses contextual data
to generate initial bounding-box proposals for a target object. We train a
convolutional neural network to approximate an offset distance from the target
object. Next, we use a Gaussian Process to model this offset response signal
over the search space of the target. We then employ a Bayesian active search
for accurate localization of the target.
In experiments, we compare our approach to a state-of-theart bounding-box
regression method for a challenging pedestrian localization task. Our method
exhibits a substantial improvement over this baseline regression method.Comment: 10 pages, 4 figure
A Review of Verbal and Non-Verbal Human-Robot Interactive Communication
In this paper, an overview of human-robot interactive communication is
presented, covering verbal as well as non-verbal aspects of human-robot
interaction. Following a historical introduction, and motivation towards fluid
human-robot communication, ten desiderata are proposed, which provide an
organizational axis both of recent as well as of future research on human-robot
communication. Then, the ten desiderata are examined in detail, culminating to
a unifying discussion, and a forward-looking conclusion
Internet of robotic things : converging sensing/actuating, hypoconnectivity, artificial intelligence and IoT Platforms
The Internet of Things (IoT) concept is evolving rapidly and influencing newdevelopments in various application domains, such as the Internet of MobileThings (IoMT), Autonomous Internet of Things (A-IoT), Autonomous Systemof Things (ASoT), Internet of Autonomous Things (IoAT), Internetof Things Clouds (IoT-C) and the Internet of Robotic Things (IoRT) etc.that are progressing/advancing by using IoT technology. The IoT influencerepresents new development and deployment challenges in different areassuch as seamless platform integration, context based cognitive network integration,new mobile sensor/actuator network paradigms, things identification(addressing, naming in IoT) and dynamic things discoverability and manyothers. The IoRT represents new convergence challenges and their need to be addressed, in one side the programmability and the communication ofmultiple heterogeneous mobile/autonomous/robotic things for cooperating,their coordination, configuration, exchange of information, security, safetyand protection. Developments in IoT heterogeneous parallel processing/communication and dynamic systems based on parallelism and concurrencyrequire new ideas for integrating the intelligent “devices”, collaborativerobots (COBOTS), into IoT applications. Dynamic maintainability, selfhealing,self-repair of resources, changing resource state, (re-) configurationand context based IoT systems for service implementation and integrationwith IoT network service composition are of paramount importance whennew “cognitive devices” are becoming active participants in IoT applications.This chapter aims to be an overview of the IoRT concept, technologies,architectures and applications and to provide a comprehensive coverage offuture challenges, developments and applications
Recommended from our members
Learning To Grasp
Providing robots with the ability to grasp objects has, despite decades of research, remained a challenging problem. The problem is approachable in constrained environments where there is ample prior knowledge of the scene and objects that will be manipulated. The challenge is in building systems that scale beyond specific situational instances and gracefully operate in novel conditions. In the past, heuristic and simple rule based strategies were used to accomplish tasks such as scene segmentation or reasoning about occlusion. These heuristic strategies work in constrained environments where a roboticist can make simplifying assumptions about everything from the geometries of the objects to be interacted with, level of clutter, camera position, lighting, and a myriad of other relevant variables. With these assumptions in place, it becomes tractable for a roboticist to hardcode desired behaviour and build a robotic system capable of completing repetitive tasks. These hardcoded behaviours will quickly fail if the assumptions about the environment are invalidated. In this thesis we will demonstrate how a robust grasping system can be built that is capable of operating under a more variable set of conditions without requiring significant engineering of behavior by a roboticist.
This robustness is enabled by a new found ability to empower novel machine learning techniques with massive amounts of synthetic training data. The ability of simulators to create realistic sensory data enables the generation of massive corpora of labeled training data for various grasping related tasks. The use of simulation allows for the creation of a wide variety of environments and experiences exposing the robotic system to a large number of scenarios before ever operating in the real world. This thesis demonstrates that it is now possible to build systems that work in the real world trained using deep learning on synthetic data. The sheer volume of data that can be produced via simulation enables the use of powerful deep learning techniques whose performance scales with the amount of data available. This thesis will explore how deep learning and other techniques can be used to encode these massive datasets for efficient runtime use. The ability to train and test on synthetic data allows for quick iterative development of new perception, planning and grasp execution algorithms that work in a large number of environments. Creative applications of machine learning and massive synthetic datasets are allowing robotic systems to learn skills, and move beyond repetitive hardcoded tasks
Virtual Reality applied to biomedical engineering
Actualment, la realitat virtual esta sent tendència i s'està expandint a l'àmbit mèdic, fent possible l'aparició de nombroses aplicacions dissenyades per entrenar metges i tractar pacients de forma més eficient, així com optimitzar els processos de planificació quirúrgica. La necessitat mèdica i objectiu d'aquest projecte és fer òptim el procés de planificació quirúrgica per a cardiopaties congènites, que compren la reconstrucció en 3D del cor del pacient i la seva integració en una aplicació de realitat virtual. Seguint aquesta línia s’ha combinat un procés de modelat 3D d’imatges de cors obtinguts gracies al Hospital Sant Joan de Déu i el disseny de l’aplicació mitjançant el software Unity 3D gracies a l’empresa VISYON. S'han aconseguit millores en quant al software emprat per a la segmentació i reconstrucció, i s’han assolit funcionalitats bàsiques a l’aplicació com importar, moure, rotar i fer captures de pantalla en 3D de l'òrgan cardíac i així, entendre millor la cardiopatia que s’ha de tractar. El resultat ha estat la creació d'un procés òptim, en el que la reconstrucció en 3D ha aconseguit ser ràpida i precisa, el mètode d’importació a l’app dissenyada molt senzill, i una aplicació que permet una interacció atractiva i intuïtiva, gracies a una experiència immersiva i realista per ajustar-se als requeriments d'eficiència i precisió exigits en el camp mèdic
Categorization of indoor places by combining local binary pattern histograms of range and reflectance data from laser range finders
This paper presents an approach to categorize typical places in indoor environments using 3D scans provided by a laser range finder. Examples of such places are offices, laboratories, or kitchens. In our method, we combine the range and reflectance data from the laser scan for the final categorization of places. Range and reflectance images are transformed into histograms of local binary patterns and combined into a single feature vector. This vector is later classified using support vector machines. The results of the presented experiments demonstrate the capability of our technique to categorize indoor places with high accuracy. We also show that the combination of range and reflectance information improves the final categorization results in comparison with a single modality
- …