527 research outputs found

    Integration of Computer Vision and Natural Language Processing in Multimedia Robotics Application

    Get PDF
    Computer vision and natural language processing (NLP) are two active machine learning research areas. However, the integration of these two areas gives rise to a new interdisciplinary field, which is currently attracting more attention of researchers. Research has been carried out to extract the text associated with an image or a video that can assist in making computer vision effective. Moreover, researchers focus on utilizing NLP to extract the meaning of words through the use of computer vision. This concept is widely used in robotics. Although robots should observe the surroundings from different ways of interactions, natural gestures and spoken languages are the most convenient way for humans to interact with the robots. This would be possible only if the robots can understand such types of interactions. In the present paper, the proposed integrated application is utilized for guiding vision-impaired people. As vision is the most essential in the life of a human being, an alternative source that helps in guiding the blind in their movements is highly important. For this purpose, the current paper uses a smartphone with the capabilities of vision, language, and intelligence which has been attached to the blind person to capture the images of their surroundings, and it is associated with a Faster Region Convolutional Neural Network (F-RCNN) based central server to detect the objects in the image to inform the person about them and avoid obstacles in their way. These results are passed to the smartphone which produces a speech output for the guidance of the blinds

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Towards a Solution to Create, Test and Publish Mixed Reality Experiences for Occupational Safety and Health Learning: Training-MR

    Get PDF
    Artificial intelligence, Internet of Things, Human Augmentation, virtual reality, or mixed reality have been rapidly implemented in Industry 4.0, as they improve the productivity of workers. This productivity improvement can come largely from modernizing tools, improving training, and implementing safer working methods. Human Augmentation is helping to place workers in unique environments through virtual reality or mixed reality, by applying them to training actions in a totally innovative way. Science still has to overcome several technological challenges to achieve widespread application of these tools. One of them is the democratisation of these experiences, for which is essential to make them more accessible, reducing the cost of creation that is the main barrier to entry. The cost of these mixed reality experiences lies in the effort required to design and build these mixed reality training experiences. Nevertheless, the tool presented in this paper is a solution to these current limitations. A solution for designing, building and publishing experiences is presented in this paper. With the solution, content creators will be able to create their own training experiences in a semiassisted way and eventually publish them in the Cloud. Students will be able to access this training offered as a service, using Microsoft HoloLens2. In this paper, the reader will find technical details of the Training-MR, its architecture, mode of operation and communicatio

    Swarm eye : a distributed autonomous surveillance system

    Get PDF
    Conventional means such as Global Positioning System (GPS) and satellite imaging are important information sources but provide only limited and static information. In tactical situations rich 3D images and dynamically self-adapting information are needed to overcome this restriction; this information should be collected where it is available. Swarms are sets of interconnected units that can be arranged and coordinated in any flexible way to execute a specific task in a distributed manner. This paper introduces Swarm Eye – a concept that provides a platform for combining the powerful techniques of swarm intelligence, emergent behaviour and computer graphics in one system. It allows the testing of new image processing concepts for a better and well informed decision making process. By using advanced collaboratively acting eye units, the system can observe, gather and process images in parallel to provide high value information. To capture visual data from an autonomous UAV unit, the unit has to be in the right position in order to get the best visual sight. The developed system also provides autonomous adoption of formations for UAVs in an autonomous and distributed manner in accordance with the tactical situation

    An event-driven probabilistic model of sound source localization using cochlea spikes

    Full text link
    This work presents a probabilistic model that estimates the location of sound sources using the output spikes of a silicon cochlea such as the Dynamic Audio Sensor. Unlike previous work which estimated the source locations directly from the interaural time differences (ITDs) extracted from the timing of the cochlea spikes, the spikes are used instead to support a distribution model of the ITDs representing possible locations of sound sources. Results on noisy single speaker recordings show average accuracies of approximately 80% on detecting the correct source locations and an estimation lag of <;100ms

    Building a semi-autonomous sociable robot platform for robust interpersonal telecommunication

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 73-74).This thesis presents the design of a software platform for the Huggable project. The Huggable is a new kind of robotic companion being developed at the MIT Media Lab for health care, education, entertainment and social communication applications. This work focuses on the social communication application as it pertains to using a semi-autonomous robotic avatar in a remote environment. The software platform consists of an extensible and robust distributed software system that connects a remote human puppeteer to the Huggable robot via internet. The paper discusses design decisions made in building the software platform and describes the technologies created for the social communication application. An informal trial of the system reveals how the system's puppeteering interface can be improved, and pinpoints where performance enhancements are needed for this particular application.by Robert Lopez Toscano.M.Eng

    Vision-based methods for state estimation and control of robotic systems with application to mobile and surgical robots

    Get PDF
    For autonomous systems that need to perceive the surrounding environment for the accomplishment of a given task, vision is a highly informative exteroceptive sensory source. When gathering information from the available sensors, in fact, the richness of visual data allows to provide a complete description of the environment, collecting geometrical and semantic information (e.g., object pose, distances, shapes, colors, lights). The huge amount of collected data allows to consider both methods exploiting the totality of the data (dense approaches), or a reduced set obtained from feature extraction procedures (sparse approaches). This manuscript presents dense and sparse vision-based methods for control and sensing of robotic systems. First, a safe navigation scheme for mobile robots, moving in unknown environments populated by obstacles, is presented. For this task, dense visual information is used to perceive the environment (i.e., detect ground plane and obstacles) and, in combination with other sensory sources, provide an estimation of the robot motion with a linear observer. On the other hand, sparse visual data are extrapolated in terms of geometric primitives, in order to implement a visual servoing control scheme satisfying proper navigation behaviours. This controller relies on visual estimated information and is designed in order to guarantee safety during navigation. In addition, redundant structures are taken into account to re-arrange the internal configuration of the robot and reduce its encumbrance when the workspace is highly cluttered. Vision-based estimation methods are relevant also in other contexts. In the field of surgical robotics, having reliable data about unmeasurable quantities is of great importance and critical at the same time. In this manuscript, we present a Kalman-based observer to estimate the 3D pose of a suturing needle held by a surgical manipulator for robot-assisted suturing. The method exploits images acquired by the endoscope of the robot platform to extrapolate relevant geometrical information and get projected measurements of the tool pose. This method has also been validated with a novel simulator designed for the da Vinci robotic platform, with the purpose to ease interfacing and employment in ideal conditions for testing and validation. The Kalman-based observers mentioned above are classical passive estimators, whose system inputs used to produce the proper estimation are theoretically arbitrary. This does not provide any possibility to actively adapt input trajectories in order to optimize specific requirements on the performance of the estimation. For this purpose, active estimation paradigm is introduced and some related strategies are presented. More specifically, a novel active sensing algorithm employing visual dense information is described for a typical Structure-from-Motion (SfM) problem. The algorithm generates an optimal estimation of a scene observed by a moving camera, while minimizing the maximum uncertainty of the estimation. This approach can be applied to any robotic platforms and has been validated with a manipulator arm equipped with a monocular camera

    Adaptive Sampling with Mobile Sensor Networks

    Get PDF
    Mobile sensor networks have unique advantages compared with wireless sensor networks. The mobility enables mobile sensors to flexibly reconfigure themselves to meet sensing requirements. In this dissertation, an adaptive sampling method for mobile sensor networks is presented. Based on the consideration of sensing resource constraints, computing abilities, and onboard energy limitations, the adaptive sampling method follows a down sampling scheme, which could reduce the total number of measurements, and lower sampling cost. Compressive sensing is a recently developed down sampling method, using a small number of randomly distributed measurements for signal reconstruction. However, original signals cannot be reconstructed using condensed measurements, as addressed by Shannon Sampling Theory. Measurements have to be processed under a sparse domain, and convex optimization methods should be applied to reconstruct original signals. Restricted isometry property would guarantee signals can be recovered with little information loss. While compressive sensing could effectively lower sampling cost, signal reconstruction is still a great research challenge. Compressive sensing always collects random measurements, whose information amount cannot be determined in prior. If each measurement is optimized as the most informative measurement, the reconstruction performance can perform much better. Based on the above consideration, this dissertation is focusing on an adaptive sampling approach, which could find the most informative measurements in unknown environments and reconstruct original signals. With mobile sensors, measurements are collect sequentially, giving the chance to uniquely optimize each of them. When mobile sensors are about to collect a new measurement from the surrounding environments, existing information is shared among networked sensors so that each sensor would have a global view of the entire environment. Shared information is analyzed under Haar Wavelet domain, under which most nature signals appear sparse, to infer a model of the environments. The most informative measurements can be determined by optimizing model parameters. As a result, all the measurements collected by the mobile sensor network are the most informative measurements given existing information, and a perfect reconstruction would be expected. To present the adaptive sampling method, a series of research issues will be addressed, including measurement evaluation and collection, mobile network establishment, data fusion, sensor motion, signal reconstruction, etc. Two dimensional scalar field will be reconstructed using the method proposed. Both single mobile sensors and mobile sensor networks will be deployed in the environment, and reconstruction performance of both will be compared.In addition, a particular mobile sensor, a quadrotor UAV is developed, so that the adaptive sampling method can be used in three dimensional scenarios
    • …
    corecore