164 research outputs found

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Extracting field hockey player coordinates using a single wide-angle camera

    Get PDF
    In elite level sport, coaches are always trying to develop tactics to better their opposition. In a team sport such as field hockey, a coach must consider both the strengths and weaknesses of both their own team and that of the opposition to develop an effective tactic. Previous work has shown that spatiotemporal coordinates of the players are a good indicator of team performance, yet the manual extraction of player coordinates is a laborious process that is impractical for a performance analyst. Subsequently, the key motivation of this work was to use a single camera to capture two-dimensional position information for all players on a field hockey pitch. The study developed an algorithm to automatically extract the coordinates of the players on a field hockey pitch using a single wide-angle camera. This is a non-trivial problem that requires: 1. Segmentation and classification of a set of players that are relatively small compared to the image size, and 2. Transformation from image coordinates to world coordinates, considering the effects of the lens distortion due to the wide-angle lens. Subsequently the algorithm addressed these two points in two sub-algorithms: Player Feature Extraction and Reconstruct World Points. Player Feature Extraction used background subtraction to segment player blob candidates in the frame. 61% of blobs in the dataset were correctly segmented, while a further 15% were over-segmented. Subsequently a Convolutional Neural Network was trained to classify the contents of blobs. The classification accuracy on the test set was 85.9%. This was used to eliminate non-player blobs and reform over-segmented blobs. The Reconstruct World Points sub-algorithm transformed the image coordinates into world coordinates. To do so the intrinsic and extrinsic parameters were estimated using planar camera calibration. Traditionally the extrinsic parameters are optimised by minimising the projection error of a set of control points; it was shown that this calibration method is sub-optimal due to the extreme camera pose. Instead the extrinsic parameters were estimated by minimising the world reconstruction error. For a 1:100 scale model the median reconstruction error was 0.0043 m and the distribution of errors had an interquartile range of 0.0025 m. The Acceptable Error Rate, the percentage of points that were reconstructed with less than 0.005 m of error, was found to be 63.5%. The overall accuracy of the algorithm was assessed using the precision and the recall. It found that players could be extracted within 1 m of their ground truth coordinates with a precision of 75% and a recall of 66%. This is a respective improvement of 20% and 16% improvement on the state-of-the-art. However it also found that the likelihood of extraction decreases the further a player is from the camera, reducing to close to zero in parts of the pitch furthest from the camera. These results suggest that the developed algorithm is unsuitable to identify player coordinates in the extreme regions of a full field hockey pitch; however this limitation may be overcome by using multiple collocated cameras focussed on different regions of the pitch. Equally, the algorithm is sport agnostic, so could be used in a sport that uses a smaller pitch

    An Efficient Algorithm Proposed For Smoke Detection in Video Using Hybrid Feature Selection Techniques

    Get PDF
    As an emerging development in the digital technology era, video processing is useful in a wide range of applications. In the current paper, an algorithm is proposed which is useful for smoke detection in video processing. The algorithm quickly detects fire by eliminating common interruptions like noise, overlapping due to the collision, etc. The proposed algorithm is composed of several techniques such as Haar feature, Bhattacharya distance method, SIFT descriptors, Gabor wavelets approach and SVM classifier to identify the smoke by video processing. Foreground object is identified using a moving object algorithm by predicting the movement of smoke in stable images. The implementation has been carried out in MATLAB

    Change detection in combination with spatial models and its effectiveness on underwater scenarios

    Get PDF
    This thesis proposes a novel change detection approach for underwater scenarios and combines it with different especially developed spatial models, this allows accurate and spatially coherent detection of any moving objects with a static camera in arbitrary environments. To deal with the special problems of underwater imaging pre-segmentations based on the optical flow and other special adaptions were added to the change detection algorithm so that it can better handle typical underwater scenarios like a scene crowded by a whole fish swarm

    Deep Learning for Crowd Anomaly Detection

    Get PDF
    Today, public areas across the globe are monitored by an increasing amount of surveillance cameras. This widespread usage has presented an ever-growing volume of data that cannot realistically be examined in real-time. Therefore, efforts to understand crowd dynamics have brought light to automatic systems for the detection of anomalies in crowds. This thesis explores the methods used across literature for this purpose, with a focus on those fusing dense optical flow in a feature extraction stage to the crowd anomaly detection problem. To this extent, five different deep learning architectures are trained using optical flow maps estimated by three deep learning-based techniques. More specifically, a 2D convolutional network, a 3D convolutional network, and LSTM-based convolutional recurrent network, a pre-trained variant of the latter, and a ConvLSTM-based autoencoder is trained using both regular frames and optical flow maps estimated by LiteFlowNet3, RAFT, and GMA on the UCSD Pedestrian 1 dataset. The experimental results have shown that while prone to overfitting, the use of optical flow maps may improve the performance of supervised spatio-temporal architectures

    Online Mutual Foreground Segmentation for Multispectral Stereo Videos

    Full text link
    The segmentation of video sequences into foreground and background regions is a low-level process commonly used in video content analysis and smart surveillance applications. Using a multispectral camera setup can improve this process by providing more diverse data to help identify objects despite adverse imaging conditions. The registration of several data sources is however not trivial if the appearance of objects produced by each sensor differs substantially. This problem is further complicated when parallax effects cannot be ignored when using close-range stereo pairs. In this work, we present a new method to simultaneously tackle multispectral segmentation and stereo registration. Using an iterative procedure, we estimate the labeling result for one problem using the provisional result of the other. Our approach is based on the alternating minimization of two energy functions that are linked through the use of dynamic priors. We rely on the integration of shape and appearance cues to find proper multispectral correspondences, and to properly segment objects in low contrast regions. We also formulate our model as a frame processing pipeline using higher order terms to improve the temporal coherence of our results. Our method is evaluated under different configurations on multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018

    Recent Developments in Video Surveillance

    Get PDF
    With surveillance cameras installed everywhere and continuously streaming thousands of hours of video, how can that huge amount of data be analyzed or even be useful? Is it possible to search those countless hours of videos for subjects or events of interest? Shouldn’t the presence of a car stopped at a railroad crossing trigger an alarm system to prevent a potential accident? In the chapters selected for this book, experts in video surveillance provide answers to these questions and other interesting problems, skillfully blending research experience with practical real life applications. Academic researchers will find a reliable compilation of relevant literature in addition to pointers to current advances in the field. Industry practitioners will find useful hints about state-of-the-art applications. The book also provides directions for open problems where further advances can be pursued

    Computer Vision Techniques for Background Modeling in Urban Traffic Monitoring

    Get PDF
    Jose Manuel Milla, Sergio Luis Toral, Manuel Vargas and Federico Barrero (2010). Computer Vision Techniques for Background Modeling in Urban Traffic Monitoring, Urban Transport and Hybrid Vehicles, Seref Soylu (Ed.), ISBN: 978-953-307-100-8, InTech, DOI: 10.5772/10179. Available from: http://www.intechopen.com/books/urban-transport-and-hybrid-vehicles/computer-vision-techniques-for-background-modeling-in-urban-traffic-monitoringIn this chapter, several background modelling techniques have been described, analyzed and tested. In particular, different algorithms based on sigma-delta filter have been considered due to their suitability for embedded systems, where computational limitations affect a real-time implementation. A qualitative and a quantitative comparison have been performed among the different algorithms. Obtained results show that the sigma-delta algorithm with confidence measurement exhibits the best performance in terms of adaptation to particular specificities of urban traffic scenes and in terms of computational requirements. A prototype based on an ARM processor has been implemented to test the different versions of the sigma-delta algorithm and to illustrate several applications related to vehicle traffic monitoring and implementation details

    Detecção de eventos complexos em vídeos baseada em ritmos visuais

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O reconhecimento de eventos complexos em vídeos possui várias aplicações práticas relevantes, alavancadas pela grande disponibilidade de câmeras digitais instaladas em aeroportos, estações de ônibus e trens, centros de compras, estádios, hospitais, escolas, prédios, estradas, entre vários outros locais. Avanços na tecnologia digital têm aumentado as capacidades dos sistemas em reconhecer eventos em vídeos por meio do desenvolvimento de dispositivos com alta resolução, dimensões físicas pequenas e altas taxas de amostragem. Muitos trabalhos disponíveis na literatura têm explorado o tema a partir de diferentes pontos de vista. Este trabalho apresenta e avalia uma metodologia para extrair características dos ritmos visuais no contexto de detecção de eventos em vídeos. Um ritmo visual pode ser visto com a projeção de um vídeo em uma imagem, tal que a tarefa de análise de vídeos é reduzida a um problema de análise de imagens, beneficiando-se de seu baixo custo de processamento em termos de tempo e complexidade. Para demonstrar o potencial do ritmo visual na análise de vídeos complexos, três problemas da área de visão computacional são selecionados: detecção de eventos anômalos, classificação de ações humanas e reconhecimento de gestos. No primeiro problema, um modelo e? aprendido com situações de normalidade a partir dos rastros deixados pelas pessoas ao andar, enquanto padro?es representativos das ações são extraídos nos outros dois problemas. Nossa hipo?tese e? de que vídeos similares produzem padro?es semelhantes, tal que o problema de classificação de ações pode ser reduzido a uma tarefa de classificação de imagens. Experimentos realizados em bases públicas de dados demonstram que o método proposto produz resultados promissores com baixo custo de processamento, tornando-o possível aplicar em tempo real. Embora os padro?es dos ritmos visuais sejam extrai?dos como histograma de gradientes, algumas tentativas para adicionar características do fluxo o?tico são discutidas, além de estratégias para obter ritmos visuais alternativosAbstract: The recognition of complex events in videos has currently several important applications, particularly due to the wide availability of digital cameras in environments such as airports, train and bus stations, shopping centers, stadiums, hospitals, schools, buildings, roads, among others. Moreover, advances in digital technology have enhanced the capabilities for detection of video events through the development of devices with high resolution, small physical size, and high sampling rates. Many works available in the literature have explored the subject from different perspectives. This work presents and evaluates a methodology for extracting a feature descriptor from visual rhythms of video sequences in order to address the video event detection problem. A visual rhythm can be seen as the projection of a video onto an image, such that the video analysis task can be reduced into an image analysis problem, benefiting from its low processing cost in terms of time and complexity. To demonstrate the potential of the visual rhythm in the analysis of complex videos, three computer vision problems are selected in this work: abnormal event detection, human action classification, and gesture recognition. The former problem learns a normalcy model from the traces that people leave when they walk, whereas the other two problems extract representative patterns from actions. Our hypothesis is that similar videos produce similar patterns, therefore, the action classification problem is reduced into an image classification task. Experiments conducted on well-known public datasets demonstrate that the method produces promising results at high processing rates, making it possible to work in real time. Even though the visual rhythm features are mainly extracted as histogram of gradients, some attempts for adding optical flow features are discussed, as well as strategies for obtaining alternative visual rhythmsMestradoCiência da ComputaçãoMestre em Ciência da Computação1570507, 1406910, 1374943CAPE

    Vision-Based 2D and 3D Human Activity Recognition

    Get PDF
    corecore