Search CORE

207 research outputs found

End-to-end people detection in crowded scenes

Author: Andriluka Mykhaylo
Stewart Russell
Publication venue
Publication date: 08/07/2015
Field of study

Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes.Comment: 9 pages, 7 figures. Submitted to NIPS 2015. Supplementary material video: http://www.youtube.com/watch?v=QeWl0h3kQ2

arXiv.org e-Print Archive

MPG.PuRe

Real Time Panoramic Image Processing

Author: Gerlits Matthew
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2023
Field of study

Image stitching algorithms are able to join sets of images together and provide a wider field of a vision when compared with an image from a single standard camera. Traditional techniques for accomplishing this are able to adequately produce a stitch for a static set of images, but suffer when differing lighting conditions exist between the two images. Additionally, traditional techniques suffer from processing times that are too slow for real time use cases. We propose a solution which resolves the issues encountered by traditional image stitching techniques. To resolve the issues with lighting difference, two blending schemes have been implemented, a standard approach and a superpixel approach. To verify the integrity of the cached solution, a validation scheme has been implemented. Using this scheme, invalid solutions can be detected, and the cache regenerated. Finally, these components are packaged together in a parallel processing architecture to ensure that frame processing is never interrupted

SJSU ScholarWorks

Dynamic Image Stitching for Panoramic Video

Author: Liao Yu-Ching
Lioao Guo-Jyun
Liou Yu-Ting
Shieh Jen-Yu
Publication venue: 'Taiwan Association of Engineering and Technology Innovation'
Publication date: 01/10/2014
Field of study

The design of this paper is based on the Dynamic image titching for panoramic video. By utilizing OpenCV visual function data library and SIFT algorithm as the basis for presentation, this article brings forward Gaussian second differenced MoG which is processed basing on DoG Gaussian Difference Map to reduce order in synthesizing dynamic images and simplify the algorithm of the Gaussian pyramid structure. MSIFT matches with overlapping segmentation method to simplify the scope of feature extraction in order to enhance speed. And through this method traditional image synthesis can be improved without having to take lots of time in calculation and being limited by space and angle.This research uses four normal Webcams and two IPCAM coupled with several-wide angle lenses. By using wide-angle lenses to monitor over a wide range of an area and then by using image stitching panoramic effect is achieved. In terms of overall image application and control interface, Microsoft Visual Studio C# is adopted to a construct software interface. On a personal computer with 2.4-GHz CPU and 2-GB RAM and with the cameras fixed to it, the execution speed is three images per second, which reduces calculation time of the traditional algorithm

Directory of Open Access Journals

Taiwan Association of Engineering and Technology Innovation: E-Journals

Webcam Image Alignment

Author: Klein Matthew
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2011
Field of study

Washington University St. Louis: Open Scholarship

i-Car: An Intelligent and Interactive Interface for Driver Assistance System

Author: Ahmed Z
Bangera N
Bangera V
Kumar H
Shetty A
Publication venue: 'African Journals Online (AJOL)'
Publication date: 25/08/2014
Field of study

The aim of the present research was to reduce accidents by assisting the driver in various aspects of driving such as lane detection, pedestrian and car detection, driver drowsiness detection and rear view parking assistance. The methodology combines the computer vision techniques with pattern recognition, feature extraction, machine learning, object recognition, human computer interaction and parallel processing in a nutshell. The proposed system provides robust extraction of lane markings in various types and alerts the driver attempting to drift from the lane. It also detects the pedestrians and cars which are at a vulnerable distance to be hit by the vehicle and alarms the driver well ahead of time. The system uses eye closure based decision algorithm to detect driver drowsiness in all conditions and also warns by interactive voice early enough to avoid the accidents. It also assists the driver while reversing the vehicle, by providing a clear view of his blind spot areas. Computer vision algorithms like Hough’s Transform, Canny Edge detection and HAAR classifiers were applied to meet the objectives. The integrated module was analyzed and tested in different terrains and various lighting condition to produce an accurate and robust real-time assistance system (Sivaraman et al., 2014). iCar is an innovative prototype in the Information Technology with minimum hardware like low cost webcams. It emerged as an Interactive Technology with an interactive audio, visual, touch and touch-less interfaces. These can assist to avoid accidents in the world by intelligently ignoring certain hardware sensors like IR, UV, Acoustic, Proximity and mechanical devices like costlier LIDAR (Light Detection and Ranging) fitted in Google Car. Present research findings outperform the state of the art research like CalTech (Aly et al., 1997). Attempts of depth sensing even using Microsoft Kinect could be ignored by the present technology, the iCar.Keywords: iCar; Canny Edge detection; HAAR Classifier; Probabilistic Hough’s; Transfor

AJOL - African Journals Online

Virtual Reality: Locomotion, Multi-player, and External Camera Integration

Author: Flangas Andrew Thomas
Publication venue
Publication date: 08/07/2021
Field of study

The field of virtual reality (VR) continues to expand and evolve with each passing year, and in doing so, continues to attract the interest of many. In this thesis we look at three areas. One area of research is a case study comparing two types of locomotion, a natural and inorganic method of movement. Aspects such as completion time and efficiency are measured, as well as personal preferences for the participants favorite form of locomotion. The second research area is the creation process of a multi-user cooperative VR game is described in detail. This game was created using the Unity gaming engine, along with the Blender modeling tool to create the 3D models. The third area focuses on two methods to merge live webcam feeds together into a single panoramic frame to be viewed in a VR environment and hopefully lead to control of a remote mining machine

University of Nevada, Reno ScholarWorks Repository

Videos in Context for Telecommunication and Spatial Browsing

Author: Pece F
Publication venue: UCL (University College London)
Publication date: 28/01/2015
Field of study

The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

UCL Discovery

Capturing Synchronous Collaborative Design Activities: A State-Of-The-Art Technology Review

Author: Bermell-Garcia P.
Hall M.
Johansson A.
McMahon C. A.
Ravindranath Ranjitun
Publication venue: 'Faculty of Mechanical Engineering and Naval Architecture, Univ. of Zagreb'
Publication date: 01/01/2018
Field of study

Crossref

Online Research Database In Technology

Sign Language Recognition using Machine Learning

Author: Bhavana c
Chethan v
Harshitha C
j n shreyas
L Girish
Publication venue: Innovative Scientific Research Publisher, Railway Station Road, Gandinagar, Karnataka
Publication date: 27/03/2023
Field of study

Deaf and dumb people communicate with others and within their own groups by using sign language. Beginning with the acquisition of sign gestures, computer recognition of sign language continues until text or speech is produced. There are two types of sign gestures: static and dynamic. Both gesture recognition systems, though static gesture recognition is easier to use than dynamic gesture recognition, are crucial to the human race. In this survey, the steps for sign language recognition are detailed. Examined are the data collection, preprocessing, transformation, feature extraction, classification, and outcomes. There were also some recommendations for furthering this field of study

International Journal of Advanced Scientific Innovation - IJASI

Inexpensive solution for real-time video and image stitching

Author: Quintal Filipe Magno Gouveia
Publication venue: Universidade da Madeira
Publication date: 01/01/2009
Field of study

Image stitching is the process of joining several images to obtain a bigger view of a scene. It is used, for example, in tourism to transmit to the viewer the sensation of being in another place. I am presenting an inexpensive solution for automatic real time video and image stitching with two web cameras as the video/image sources. The proposed solution relies on the usage of several markers in the scene as reference points for the stitching algorithm. The implemented algorithm is divided in four main steps, the marker detection, camera pose determination (in reference to the markers), video/image size and 3d transformation, and image translation. Wii remote controllers are used to support several steps in the process. The built‐in IR camera provides clean marker detection, which facilitates the camera pose determination. The only restriction in the algorithm is that markers have to be in the field of view when capturing the scene. Several tests where made to evaluate the final algorithm. The algorithm is able to perform video stitching with a frame rate between 8 and 13 fps. The joining of the two videos/images is good with minor misalignments in objects at the same depth of the marker,misalignments in the background and foreground are bigger. The capture process is simple enough so anyone can perform a stitching with a very short explanation. Although real‐time video stitching can be achieved by this affordable approach, there are few shortcomings in current version. For example, contrast inconsistency along the stitching line could be reduced by applying a color correction algorithm to every source videos. In addition, the misalignments in stitched images due to camera lens distortion could be eased by optical correction algorithm. The work was developed in Apple’s Quartz Composer, a visual programming environment. A library of extended functions was developed using Xcode tools also from Apple.Orientador: Mon‐Chu Che

Repositório Digital da Universidade da Madeira