207 research outputs found
End-to-end people detection in crowded scenes
Current people detectors operate either by scanning an image in a sliding
window fashion or by classifying a discrete set of proposals. We propose a
model that is based on decoding an image into a set of people detections. Our
system takes an image as input and directly outputs a set of distinct detection
hypotheses. Because we generate predictions jointly, common post-processing
steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM
layer for sequence generation and train our model end-to-end with a new loss
function that operates on sets of detections. We demonstrate the effectiveness
of our approach on the challenging task of detecting people in crowded scenes.Comment: 9 pages, 7 figures. Submitted to NIPS 2015. Supplementary material
video: http://www.youtube.com/watch?v=QeWl0h3kQ2
Real Time Panoramic Image Processing
Image stitching algorithms are able to join sets of images together and provide a wider field of a vision when compared with an image from a single standard camera. Traditional techniques for accomplishing this are able to adequately produce a stitch for a static set of images, but suffer when differing lighting conditions exist between the two images. Additionally, traditional techniques suffer from processing times that are too slow for real time use cases. We propose a solution which resolves the issues encountered by traditional image stitching techniques. To resolve the issues with lighting difference, two blending schemes have been implemented, a standard approach and a superpixel approach. To verify the integrity of the cached solution, a validation scheme has been implemented. Using this scheme, invalid solutions can be detected, and the cache regenerated. Finally, these components are packaged together in a parallel processing architecture to ensure that frame processing is never interrupted
Dynamic Image Stitching for Panoramic Video
The design of this paper is based on the Dynamic image titching for panoramic video. By utilizing OpenCV visual function data library and SIFT algorithm as the basis for presentation, this article brings forward Gaussian second differenced MoG which is processed basing on DoG Gaussian Difference Map to reduce order in synthesizing dynamic images and simplify the algorithm of the Gaussian pyramid structure. MSIFT matches with overlapping segmentation method to simplify the scope of feature extraction in order to enhance speed. And through this method traditional image synthesis can be improved without having to take lots of time in calculation and being limited by space and angle.This research uses four normal Webcams and two IPCAM coupled with several-wide angle lenses. By using wide-angle lenses to monitor over a wide range of an area and then by using image stitching panoramic effect is achieved. In terms of overall image application and control interface, Microsoft Visual Studio C# is adopted to a construct software interface. On a personal computer with 2.4-GHz CPU and 2-GB RAM and with the cameras fixed to it, the execution speed is three images per second, which reduces calculation time of the traditional algorithm
i-Car: An Intelligent and Interactive Interface for Driver Assistance System
The aim of the present research was to reduce accidents by assisting the driver in various aspects of driving such as lane detection, pedestrian and car detection, driver drowsiness detection and rear view parking assistance. The methodology combines the computer vision techniques with pattern recognition, feature extraction, machine learning, object recognition, human computer interaction and parallel processing in a nutshell. The proposed system provides robust extraction of lane markings in various types and alerts the driver attempting to drift from the lane. It also detects the pedestrians and cars which are at a vulnerable distance to be hit by the vehicle and alarms the driver well ahead of time. The system uses eye closure based decision algorithm to detect driver drowsiness in all conditions and also warns by interactive voice early enough to avoid the accidents. It also assists the driver while reversing the vehicle, by providing a clear view of his blind spot areas. Computer vision algorithms like Hough’s Transform, Canny Edge detection and HAAR classifiers were applied to meet the objectives. The integrated module was analyzed and tested in different terrains and various lighting condition to produce an accurate and robust real-time assistance system (Sivaraman et al., 2014). iCar is an innovative prototype in the Information Technology with minimum hardware like low cost webcams. It emerged as an Interactive Technology with an interactive audio, visual, touch and touch-less interfaces. These can assist to avoid accidents in the world by intelligently ignoring certain hardware sensors like IR, UV, Acoustic, Proximity and mechanical devices like costlier LIDAR (Light Detection and Ranging) fitted in Google Car. Present research findings outperform the state of the art research like CalTech (Aly et al., 1997). Attempts of depth sensing even using Microsoft Kinect could be ignored by the present technology, the iCar.Keywords: iCar; Canny Edge detection; HAAR Classifier; Probabilistic Hough’s; Transfor
Virtual Reality: Locomotion, Multi-player, and External Camera Integration
The field of virtual reality (VR) continues to expand and evolve with each passing year, and in doing so, continues to attract the interest of many. In this thesis we look at three areas. One area of research is a case study comparing two types of locomotion, a natural and inorganic method of movement. Aspects such as completion time and efficiency are measured, as well as personal preferences for the participants favorite form of locomotion. The second research area is the creation process of a multi-user cooperative VR game is described in detail. This game was created using the Unity gaming engine, along with the Blender modeling tool to create the 3D models. The third area focuses on two methods to merge live webcam feeds together into a single panoramic frame to be viewed in a VR environment and hopefully lead to control of a remote mining machine
Videos in Context for Telecommunication and Spatial Browsing
The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance
Sign Language Recognition using Machine Learning
Deaf and dumb people communicate with others and within their own groups by using sign language. Beginning with the acquisition of sign gestures, computer recognition of sign language continues until text or speech is produced. There are two types of sign gestures: static and dynamic. Both gesture recognition systems, though static gesture recognition is easier to use than dynamic gesture recognition, are crucial to the human race. In this survey, the steps for sign language recognition are detailed. Examined are the data collection, preprocessing, transformation, feature extraction, classification, and outcomes. There were also some recommendations for furthering this field of study
Inexpensive solution for real-time video and image stitching
Image stitching is the process of joining several images to obtain a bigger view of a
scene. It is used, for example, in tourism to transmit to the viewer the sensation of being in another place. I am presenting an inexpensive solution for automatic real time video and image stitching with two web cameras as the video/image sources. The proposed solution relies on the usage of several markers in the scene as reference points for the stitching algorithm. The implemented algorithm is divided in four main steps, the marker detection, camera pose determination (in reference to the markers), video/image size and 3d transformation, and image translation. Wii remote controllers are used to support several steps in the process. The built‐in IR camera provides clean marker detection, which facilitates the camera pose determination. The only restriction in the algorithm is that markers have to be in the field of view when capturing the scene.
Several tests where made to evaluate the final algorithm. The algorithm is able to
perform video stitching with a frame rate between 8 and 13 fps. The joining of the two
videos/images is good with minor misalignments in objects at the same depth of the marker,misalignments in the background and foreground are bigger. The capture process is simple enough so anyone can perform a stitching with a very short explanation.
Although real‐time video stitching can be achieved by this affordable approach,
there are few shortcomings in current version. For example, contrast inconsistency along the stitching line could be reduced by applying a color correction algorithm to every source videos. In addition, the misalignments in stitched images due to camera lens distortion could be eased by optical correction algorithm.
The work was developed in Apple’s Quartz Composer, a visual programming environment. A library of extended functions was developed using Xcode tools also from Apple.Orientador: Mon‐Chu Che
- …