207 research outputs found

    End-to-end people detection in crowded scenes

    Full text link
    Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes.Comment: 9 pages, 7 figures. Submitted to NIPS 2015. Supplementary material video: http://www.youtube.com/watch?v=QeWl0h3kQ2

    Real Time Panoramic Image Processing

    Get PDF
    Image stitching algorithms are able to join sets of images together and provide a wider field of a vision when compared with an image from a single standard camera. Traditional techniques for accomplishing this are able to adequately produce a stitch for a static set of images, but suffer when differing lighting conditions exist between the two images. Additionally, traditional techniques suffer from processing times that are too slow for real time use cases. We propose a solution which resolves the issues encountered by traditional image stitching techniques. To resolve the issues with lighting difference, two blending schemes have been implemented, a standard approach and a superpixel approach. To verify the integrity of the cached solution, a validation scheme has been implemented. Using this scheme, invalid solutions can be detected, and the cache regenerated. Finally, these components are packaged together in a parallel processing architecture to ensure that frame processing is never interrupted

    Dynamic Image Stitching for Panoramic Video

    Get PDF
    The design of this paper is based on the Dynamic image titching for panoramic video. By utilizing OpenCV visual function data library and SIFT algorithm as the basis for presentation, this article brings forward Gaussian second differenced MoG which is processed basing on DoG Gaussian Difference Map to reduce order in synthesizing dynamic images and simplify the algorithm of the Gaussian pyramid structure. MSIFT matches with overlapping segmentation method to simplify the scope of feature extraction in order to enhance speed. And through this method traditional image synthesis can be improved without having to take lots of time in calculation and being limited by space and angle.This research uses four normal Webcams and two IPCAM coupled with several-wide angle lenses. By using wide-angle lenses to monitor over a wide range of an area and then by using image stitching panoramic effect is achieved. In terms of overall image application and control interface, Microsoft Visual Studio C# is adopted to a construct software interface. On a personal computer with 2.4-GHz CPU and 2-GB RAM and with the cameras fixed to it, the execution speed is three images per second, which reduces calculation time of the traditional algorithm

    Webcam Image Alignment

    Get PDF

    i-Car: An Intelligent and Interactive Interface for Driver Assistance System

    Get PDF
    The aim of the present research was to reduce accidents by assisting the driver in various aspects of driving such as lane detection, pedestrian and car detection, driver drowsiness detection and rear view parking assistance. The methodology combines the computer vision techniques with pattern recognition, feature extraction, machine learning, object recognition, human computer interaction and parallel processing in a nutshell. The proposed system provides robust extraction of lane markings in various types and alerts the driver attempting to drift from the lane. It also detects the pedestrians and cars which are at a vulnerable distance to be hit by the vehicle and alarms the driver well ahead of time. The system uses eye closure based decision algorithm to detect driver drowsiness in all conditions and also warns by interactive voice early enough to avoid the accidents. It also assists the driver while reversing the vehicle, by providing a clear view of his blind spot areas. Computer vision algorithms like Hough’s Transform, Canny Edge detection and HAAR classifiers were applied to meet the objectives. The integrated module was analyzed and tested in different terrains and various lighting condition to produce an accurate and robust real-time assistance system (Sivaraman et al., 2014). iCar is an innovative prototype in the Information Technology with minimum hardware like low cost webcams. It emerged as an Interactive Technology with an interactive audio, visual, touch and touch-less interfaces. These can assist to avoid accidents in the world by intelligently ignoring certain hardware sensors like IR, UV, Acoustic, Proximity and mechanical devices like costlier LIDAR (Light Detection and Ranging) fitted in Google Car. Present research findings outperform the state of the art research like CalTech (Aly et al., 1997). Attempts of depth sensing even using Microsoft Kinect could be ignored by the present technology, the iCar.Keywords: iCar; Canny Edge detection; HAAR Classifier; Probabilistic Hough’s; Transfor

    Virtual Reality: Locomotion, Multi-player, and External Camera Integration

    Get PDF
    The field of virtual reality (VR) continues to expand and evolve with each passing year, and in doing so, continues to attract the interest of many. In this thesis we look at three areas. One area of research is a case study comparing two types of locomotion, a natural and inorganic method of movement. Aspects such as completion time and efficiency are measured, as well as personal preferences for the participants favorite form of locomotion. The second research area is the creation process of a multi-user cooperative VR game is described in detail. This game was created using the Unity gaming engine, along with the Blender modeling tool to create the 3D models. The third area focuses on two methods to merge live webcam feeds together into a single panoramic frame to be viewed in a VR environment and hopefully lead to control of a remote mining machine

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Capturing Synchronous Collaborative Design Activities: A State-Of-The-Art Technology Review

    Get PDF

    Sign Language Recognition using Machine Learning

    Get PDF
    Deaf and dumb people communicate with others and within their own groups by using sign language. Beginning with the acquisition of sign gestures, computer recognition of sign language continues until text or speech is produced. There are two types of sign gestures: static and dynamic. Both gesture recognition systems, though static gesture recognition is easier to use than dynamic gesture recognition, are crucial to the human race. In this survey, the steps for sign language recognition are detailed. Examined are the data collection, preprocessing, transformation, feature extraction, classification, and outcomes. There were also some recommendations for furthering this field of study

    Inexpensive solution for real-time video and image stitching

    Get PDF
    Image stitching is the process of joining several images to obtain a bigger view of a scene. It is used, for example, in tourism to transmit to the viewer the sensation of being in another place. I am presenting an inexpensive solution for automatic real time video and image stitching with two web cameras as the video/image sources. The proposed solution relies on the usage of several markers in the scene as reference points for the stitching algorithm. The implemented algorithm is divided in four main steps, the marker detection, camera pose determination (in reference to the markers), video/image size and 3d transformation, and image translation. Wii remote controllers are used to support several steps in the process. The built‐in IR camera provides clean marker detection, which facilitates the camera pose determination. The only restriction in the algorithm is that markers have to be in the field of view when capturing the scene. Several tests where made to evaluate the final algorithm. The algorithm is able to perform video stitching with a frame rate between 8 and 13 fps. The joining of the two videos/images is good with minor misalignments in objects at the same depth of the marker,misalignments in the background and foreground are bigger. The capture process is simple enough so anyone can perform a stitching with a very short explanation. Although real‐time video stitching can be achieved by this affordable approach, there are few shortcomings in current version. For example, contrast inconsistency along the stitching line could be reduced by applying a color correction algorithm to every source videos. In addition, the misalignments in stitched images due to camera lens distortion could be eased by optical correction algorithm. The work was developed in Apple’s Quartz Composer, a visual programming environment. A library of extended functions was developed using Xcode tools also from Apple.Orientador: Mon‐Chu Che
    corecore