5 research outputs found
Virtual Movement from Natural Language Text
It is a challenging task for machines to follow a textual instruction. Properly understanding and using the meaning of the textual instruction in some application areas, such as robotics, animation, etc. is very difficult for machines. The interpretation of textual instructions for the automatic generation of the corresponding motions (e.g. exercises) and the validation of these movements are difficult tasks. To achieve our initial goal of having machines properly understand textual instructions and generate some motions accordingly, we recorded five different exercises in random order with the help of seven amateur performers using a Microsoft Kinect device. During the recording, we found that the same exercise was interpreted differently by each human performer even though they were given identical textual instructions. We performed a quality assessment study based on the derived data using a crowdsourcing approach. Later, we tested the inter-rater agreement for different types of visualization, and found the RGB-based visualization showed the best agreement among the annotatorsa animation with a virtual character standing in second position. In the next phase we worked with physical exercise instructions. Physical exercise is an everyday activity domain in which textual exercise descriptions are usually focused on body movements. Body movements are considered to be a common element across a broad range of activities that are of interest for robotic automation. Our main goal is to develop a text-to-animation system which we can use in different application areas and which we can also use to develop multiple-purpose robots whose operations are based on textual instructions. This system could be also used in different text to scene and text to animation systems. To generate a text-based animation system for physical exercises the process requires the robot to have natural language understanding (NLU) including understanding non-declarative sentences. It also requires the extraction of semantic information from complex syntactic structures with a large number of potential interpretations. Despite a comparatively high density of semantic references to body movements, exercise instructions still contain large amounts of underspecified information. Detecting, and bridging and/or filling such underspecified elements is extremely challenging when relying on methods from NLU alone. However, humans can often add such implicit information with ease due to its embodied nature. We present a process that contains the combination of a semantic parser and a Bayesian network. In the semantic parser, the system extracts all the information present in the instruction to generate the animation. The Bayesian network adds some brain to the system to extract the information that is implicit in the instruction. This information is very important for correctly generating the animation and is very easy for a human to extract but very difficult for machines. Using crowdsourcing, with the help of human brains, we updated the Bayesian network. The combination of the semantic parser and the Bayesian network explicates the information that is contained in textual movement instructions so that an animation execution of the motion sequences performed by a virtual humanoid character can be rendered. To generate the animation from the information we basically used two different types of Markup languages. Behaviour Markup Language is used for 2D animation. Humanoid Animation uses Virtual Reality Markup Language for 3D animation
Recommended from our members
Technological framework for ubiquitous interactions using context–aware mobile devices
This report presents research and development of dedicated system architecture, designed to enable its users to interact with each other as well as to access information on Points of Interest that exist in their immediate environment. This is accomplished through managing personal preferences and contextual information in a distributed manner and in real-time. The advantage of this system architecture is that it uses mobile devices, heterogeneous sensors and a selection of user interface paradigms to produce a sociotechnical framework to enhance the perception of the environment and promote intuitive interactions. The thrust of the work has been on software development and component integration. Iterative prototyping was adopted as a development method in order to effectively implement the users’ feedback and establish a platform for collaboration that closely meets the requirements and aids their decision-making process. The requirement acquisition was followed by the system-modelling phase in order to produce a robust software prototype. The implementation includes component-based development and extensive use of design patterns over native programming. Conclusively, the software product has become the means to evaluate differences in the use of mixed reality technologies in a ubiquitous scenario.
The prototype can query a number of context sources such as sensors, or details of the personal profile, to acquire relevant data. The data (and metadata) is stored in opensource structures, so that they are accessible at every layer of the system architecture and at any time. By proactively processing the acquired context, the system can assist the users in their tasks (e.g. navigation) without explicit input – e.g. by simply creating a gesture with the device. However, advanced interaction with the application via the user interface is available for requests that are more complex.
Representations of the real world objects, their spatial relations and other captured features of interest are visualised on scalable interfaces, ranging from 2D to 3D models and from photorealism to stylised clues and symbols. Two principal modes of operation have been implemented; one, using geo-referenced virtual reality models of the environment, updated in real time, and second, using the overlay of descriptive annotations and graphics on the video images of the surroundings, captured by a video camera. The latter is referred to as augmented reality.
The continuous feed of the device position and orientation data, from the GPS receiver and the digital compass, into the application, makes the framework fit for use in unknown environments and therefore suitable for ubiquitous operation. This is one of the novelties of the proposed framework, because it enables a whole range of social, peer-to-peer interactions to take place. The scenarios of how the system could be employed to pursue these remote interactions and collaborative efforts on mobile devices are addressed in the context of urban navigation. The conceptual design and implementation of the novel location and orientation based algorithm for mobile AR are presented in detail. The system is, however, multifaceted and capable of supporting peer-to-peer exchange of information in a pervasive fashion, usable in various contexts. The modalities of these interactions are explored and laid out in several scenarios, but particularly in the context of user adoption. Two evaluation tasks took place. The preliminary evaluation examined certain aspects that influence user interaction while being immersed in a virtual environment, whereas the second summative evaluation compared the utility and certain usability aspects of the AR and VR interfaces
CONVR 2011 : Proceedings of the 11th International Conference on Construction Applications of Virtual Reality
Proceedings of the 11th International Conference on Construction Applications of Virtual Realit