164 research outputs found
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Extracting field hockey player coordinates using a single wide-angle camera
In elite level sport, coaches are always trying to develop tactics to better their
opposition. In a team sport such as field hockey, a coach must consider both the
strengths and weaknesses of both their own team and that of the opposition to
develop an effective tactic. Previous work has shown that spatiotemporal coordinates
of the players are a good indicator of team performance, yet the manual extraction of
player coordinates is a laborious process that is impractical for a performance analyst.
Subsequently, the key motivation of this work was to use a single camera to capture
two-dimensional position information for all players on a field hockey pitch.
The study developed an algorithm to automatically extract the coordinates of the
players on a field hockey pitch using a single wide-angle camera. This is a non-trivial
problem that requires: 1. Segmentation and classification of a set of players that are
relatively small compared to the image size, and 2. Transformation from image
coordinates to world coordinates, considering the effects of the lens distortion due to
the wide-angle lens. Subsequently the algorithm addressed these two points in two
sub-algorithms: Player Feature Extraction and Reconstruct World Points.
Player Feature Extraction used background subtraction to segment player blob
candidates in the frame. 61% of blobs in the dataset were correctly segmented, while a
further 15% were over-segmented. Subsequently a Convolutional Neural Network was
trained to classify the contents of blobs. The classification accuracy on the test set was
85.9%. This was used to eliminate non-player blobs and reform over-segmented blobs.
The Reconstruct World Points sub-algorithm transformed the image coordinates into
world coordinates. To do so the intrinsic and extrinsic parameters were estimated
using planar camera calibration. Traditionally the extrinsic parameters are optimised
by minimising the projection error of a set of control points; it was shown that this
calibration method is sub-optimal due to the extreme camera pose. Instead the
extrinsic parameters were estimated by minimising the world reconstruction error. For
a 1:100 scale model the median reconstruction error was 0.0043 m and the
distribution of errors had an interquartile range of 0.0025 m. The Acceptable Error
Rate, the percentage of points that were reconstructed with less than 0.005 m of
error, was found to be 63.5%.
The overall accuracy of the algorithm was assessed using the precision and the recall. It
found that players could be extracted within 1 m of their ground truth coordinates
with a precision of 75% and a recall of 66%. This is a respective improvement of 20%
and 16% improvement on the state-of-the-art. However it also found that the
likelihood of extraction decreases the further a player is from the camera, reducing to
close to zero in parts of the pitch furthest from the camera. These results suggest that
the developed algorithm is unsuitable to identify player coordinates in the extreme
regions of a full field hockey pitch; however this limitation may be overcome by using
multiple collocated cameras focussed on different regions of the pitch. Equally, the
algorithm is sport agnostic, so could be used in a sport that uses a smaller pitch
An Efficient Algorithm Proposed For Smoke Detection in Video Using Hybrid Feature Selection Techniques
As an emerging development in the digital technology era, video processing is useful in a wide range of applications. In the current paper, an algorithm is proposed which is useful for smoke detection in video processing. The algorithm quickly detects fire by eliminating common interruptions like noise, overlapping due to the collision, etc. The proposed algorithm is composed of several techniques such as Haar feature, Bhattacharya distance method, SIFT descriptors, Gabor wavelets approach and SVM classifier to identify the smoke by video processing. Foreground object is identified using a moving object algorithm by predicting the movement of smoke in stable images. The implementation has been carried out in MATLAB
Change detection in combination with spatial models and its effectiveness on underwater scenarios
This thesis proposes a novel change detection approach for underwater scenarios and combines it with different especially developed spatial models, this allows accurate and spatially coherent detection of any moving objects with a static camera in arbitrary environments. To deal with the special problems of underwater imaging pre-segmentations based on the optical flow and other special adaptions were added to the change detection algorithm so that it can better handle typical underwater scenarios like a scene crowded by a whole fish swarm
Deep Learning for Crowd Anomaly Detection
Today, public areas across the globe are monitored by an increasing amount of surveillance cameras. This widespread usage has presented an ever-growing volume of data that cannot realistically be examined in real-time. Therefore, efforts to understand crowd dynamics have brought light to automatic systems for the detection of anomalies in crowds. This thesis explores the methods used across literature for this purpose, with a focus on those fusing dense optical flow in a feature extraction stage to the crowd anomaly detection problem. To this extent, five different deep learning architectures are trained using optical flow maps estimated by three deep learning-based techniques. More specifically, a 2D convolutional network, a 3D convolutional network, and LSTM-based convolutional recurrent network, a pre-trained variant of the latter, and a ConvLSTM-based autoencoder is trained using both regular frames and optical flow maps estimated by LiteFlowNet3, RAFT, and GMA on the UCSD Pedestrian 1 dataset. The experimental results have shown that while prone to overfitting, the use of optical flow maps may improve the performance of supervised spatio-temporal architectures
Online Mutual Foreground Segmentation for Multispectral Stereo Videos
The segmentation of video sequences into foreground and background regions is
a low-level process commonly used in video content analysis and smart
surveillance applications. Using a multispectral camera setup can improve this
process by providing more diverse data to help identify objects despite adverse
imaging conditions. The registration of several data sources is however not
trivial if the appearance of objects produced by each sensor differs
substantially. This problem is further complicated when parallax effects cannot
be ignored when using close-range stereo pairs. In this work, we present a new
method to simultaneously tackle multispectral segmentation and stereo
registration. Using an iterative procedure, we estimate the labeling result for
one problem using the provisional result of the other. Our approach is based on
the alternating minimization of two energy functions that are linked through
the use of dynamic priors. We rely on the integration of shape and appearance
cues to find proper multispectral correspondences, and to properly segment
objects in low contrast regions. We also formulate our model as a frame
processing pipeline using higher order terms to improve the temporal coherence
of our results. Our method is evaluated under different configurations on
multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018
Recent Developments in Video Surveillance
With surveillance cameras installed everywhere and continuously streaming thousands of hours of video, how can that huge amount of data be analyzed or even be useful? Is it possible to search those countless hours of videos for subjects or events of interest? Shouldn’t the presence of a car stopped at a railroad crossing trigger an alarm system to prevent a potential accident? In the chapters selected for this book, experts in video surveillance provide answers to these questions and other interesting problems, skillfully blending research experience with practical real life applications. Academic researchers will find a reliable compilation of relevant literature in addition to pointers to current advances in the field. Industry practitioners will find useful hints about state-of-the-art applications. The book also provides directions for open problems where further advances can be pursued
Computer Vision Techniques for Background Modeling in Urban Traffic Monitoring
Jose Manuel Milla, Sergio Luis Toral, Manuel Vargas and Federico Barrero (2010). Computer Vision Techniques for Background Modeling in Urban Traffic Monitoring, Urban Transport and Hybrid Vehicles, Seref Soylu (Ed.), ISBN: 978-953-307-100-8, InTech, DOI: 10.5772/10179. Available from: http://www.intechopen.com/books/urban-transport-and-hybrid-vehicles/computer-vision-techniques-for-background-modeling-in-urban-traffic-monitoringIn this chapter, several background modelling techniques have been described, analyzed and tested. In particular, different algorithms based on sigma-delta filter have been considered due to their suitability for embedded systems, where computational limitations affect a real-time implementation. A qualitative and a quantitative comparison have been performed among the different algorithms. Obtained results show that the sigma-delta algorithm with confidence measurement exhibits the best performance in terms of adaptation to particular specificities of urban traffic scenes and in terms of computational requirements. A prototype based on an ARM processor has been implemented to test the different versions of the sigma-delta algorithm and to illustrate several applications related to vehicle traffic monitoring and implementation details
Detecção de eventos complexos em vídeos baseada em ritmos visuais
Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O reconhecimento de eventos complexos em vídeos possui várias aplicações práticas relevantes, alavancadas pela grande disponibilidade de câmeras digitais instaladas em aeroportos, estações de ônibus e trens, centros de compras, estádios, hospitais, escolas, prédios, estradas, entre vários outros locais. Avanços na tecnologia digital têm aumentado as capacidades dos sistemas em reconhecer eventos em vídeos por meio do desenvolvimento de dispositivos com alta resolução, dimensões físicas pequenas e altas taxas de amostragem. Muitos trabalhos disponíveis na literatura têm explorado o tema a partir de diferentes pontos de vista. Este trabalho apresenta e avalia uma metodologia para extrair características dos ritmos visuais no contexto de detecção de eventos em vídeos. Um ritmo visual pode ser visto com a projeção de um vídeo em uma imagem, tal que a tarefa de análise de vídeos é reduzida a um problema de análise de imagens, beneficiando-se de seu baixo custo de processamento em termos de tempo e complexidade. Para demonstrar o potencial do ritmo visual na análise de vídeos complexos, três problemas da área de visão computacional são selecionados: detecção de eventos anômalos, classificação de ações humanas e reconhecimento de gestos. No primeiro problema, um modelo e? aprendido com situações de normalidade a partir dos rastros deixados pelas pessoas ao andar, enquanto padro?es representativos das ações são extraídos nos outros dois problemas. Nossa hipo?tese e? de que vídeos similares produzem padro?es semelhantes, tal que o problema de classificação de ações pode ser reduzido a uma tarefa de classificação de imagens. Experimentos realizados em bases públicas de dados demonstram que o método proposto produz resultados promissores com baixo custo de processamento, tornando-o possível aplicar em tempo real. Embora os padro?es dos ritmos visuais sejam extrai?dos como histograma de gradientes, algumas tentativas para adicionar características do fluxo o?tico são discutidas, além de estratégias para obter ritmos visuais alternativosAbstract: The recognition of complex events in videos has currently several important applications, particularly due to the wide availability of digital cameras in environments such as airports, train and bus stations, shopping centers, stadiums, hospitals, schools, buildings, roads, among others. Moreover, advances in digital technology have enhanced the capabilities for detection of video events through the development of devices with high resolution, small physical size, and high sampling rates. Many works available in the literature have explored the subject from different perspectives. This work presents and evaluates a methodology for extracting a feature descriptor from visual rhythms of video sequences in order to address the video event detection problem. A visual rhythm can be seen as the projection of a video onto an image, such that the video analysis task can be reduced into an image analysis problem, benefiting from its low processing cost in terms of time and complexity. To demonstrate the potential of the visual rhythm in the analysis of complex videos, three computer vision problems are selected in this work: abnormal event detection, human action classification, and gesture recognition. The former problem learns a normalcy model from the traces that people leave when they walk, whereas the other two problems extract representative patterns from actions. Our hypothesis is that similar videos produce similar patterns, therefore, the action classification problem is reduced into an image classification task. Experiments conducted on well-known public datasets demonstrate that the method produces promising results at high processing rates, making it possible to work in real time. Even though the visual rhythm features are mainly extracted as histogram of gradients, some attempts for adding optical flow features are discussed, as well as strategies for obtaining alternative visual rhythmsMestradoCiência da ComputaçãoMestre em Ciência da Computação1570507, 1406910, 1374943CAPE
- …