658 research outputs found

    Object detection, recognition and re-identification in video footage

    Get PDF
    There has been a significant number of security concerns in recent times; as a result, security cameras have been installed to monitor activities and to prevent crimes in most public places. These analysis are done either through video analytic or forensic analysis operations on human observations. To this end, within the research context of this thesis, a proactive machine vision based military recognition system has been developed to help monitor activities in the military environment. The proposed object detection, recognition and re-identification systems have been presented in this thesis. A novel technique for military personnel recognition is presented in this thesis. Initially the detected camouflaged personnel are segmented using a grabcut segmentation algorithm. Since in general a camouflaged personnel's uniform appears to be similar both at the top and the bottom of the body, an image patch is initially extracted from the segmented foreground image and used as the region of interest. Subsequently the colour and texture features are extracted from each patch and used for classification. A second approach for personnel recognition is proposed through the recognition of the badge on the cap of a military person. A feature matching metric based on the extracted Speed Up Robust Features (SURF) from the badge on a personnel's cap enabled the recognition of the personnel's arm of service. A state-of-the-art technique for recognising vehicle types irrespective of their view angle is also presented in this thesis. Vehicles are initially detected and segmented using a Gaussian Mixture Model (GMM) based foreground/background segmentation algorithm. A Canny Edge Detection (CED) stage, followed by morphological operations are used as pre-processing stage to help enhance foreground vehicular object detection and segmentation. Subsequently, Region, Histogram Oriented Gradient (HOG) and Local Binary Pattern (LBP) features are extracted from the refined foreground vehicle object and used as features for vehicle type recognition. Two different datasets with variant views of front/rear and angle are used and combined for testing the proposed technique. For night-time video analytics and forensics, the thesis presents a novel approach to pedestrian detection and vehicle type recognition. A novel feature acquisition technique named, CENTROG, is proposed for pedestrian detection and vehicle type recognition in this thesis. Thermal images containing pedestrians and vehicular objects are used to analyse the performance of the proposed algorithms. The video is initially segmented using a GMM based foreground object segmentation algorithm. A CED based pre-processing step is used to enhance segmentation accuracy prior using Census Transforms for initial feature extraction. HOG features are then extracted from the Census transformed images and used for detection and recognition respectively of human and vehicular objects in thermal images. Finally, a novel technique for people re-identification is proposed in this thesis based on using low-level colour features and mid-level attributes. The low-level colour histogram bin values were normalised to 0 and 1. A publicly available dataset (VIPeR) and a self constructed dataset have been used in the experiments conducted with 7 clothing attributes and low-level colour histogram features. These 7 attributes are detected using features extracted from 5 different regions of a detected human object using an SVM classifier. The low-level colour features were extracted from the regions of a detected human object. These 5 regions are obtained by human object segmentation and subsequent body part sub-division. People are re-identified by computing the Euclidean distance between a probe and the gallery image sets. The experiments conducted using SVM classifier and Euclidean distance has proven that the proposed techniques attained all of the aforementioned goals. The colour and texture features proposed for camouflage military personnel recognition surpasses the state-of-the-art methods. Similarly, experiments prove that combining features performed best when recognising vehicles in different views subsequent to initial training based on multi-views. In the same vein, the proposed CENTROG technique performed better than the state-of-the-art CENTRIST technique for both pedestrian detection and vehicle type recognition at night-time using thermal images. Finally, we show that the proposed 7 mid-level attributes and the low-level features results in improved performance accuracy for people re-identification

    RGB-NIR image categorization with prior knowledge transfer

    Full text link
    Abstract Recent development on image categorization, especially scene categorization, shows that the combination of standard visible RGB image data and near-infrared (NIR) image data performs better than RGB-only image data. However, the size of RGB-NIR image collection is often limited due to the difficulty of acquisition. With limited data, it is difficult to extract effective features using the common deep learning networks. It is observed that humans are able to learn prior knowledge from other tasks or a good mentor, which is helpful to solve the learning problems with limited training samples. Inspired by this observation, we propose a novel training methodology for introducing the prior knowledge into a deep architecture, which allows us to bypass the burdensome labeling large quantity of image data to meet the big data requirements in deep learning. At first, transfer learning is adopted to learn single modal features from a large source database, such as ImageNet. Then, a knowledge distillation method is explored to fuse the RGB and NIR features. Finally, a global optimization method is employed to fine-tune the entire network. The experimental results on two RGB-NIR datasets demonstrate the effectiveness of our proposed approach in comparison with the state-of-the-art multi-modal image categorization methods.https://deepblue.lib.umich.edu/bitstream/2027.42/146762/1/13640_2018_Article_388.pd

    Detecção de eventos violentos em sequências de vídeos baseada no operador histograma da transformada census

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Sistemas de vigilância em sequências de vídeo têm sido amplamente utilizados para o monitoramento de cenas em diversos ambientes, tais como aeroportos, bancos, escolas, indústrias, estações de ônibus e trens, rodovias e lojas. Devido à grande quantidade de informação obtida pelas câmeras de vigilância, o uso de inspeção visual por operadores de câmera se torna uma tarefa cansativa e sujeita a falhas, além de consumir muito tempo. Um desafio é o desenvolvimento de sistemas inteligentes de vigilância capazes de analisar longas sequências de vídeos capturadas por uma rede de câmeras de modo a identificar um determinado comportamento. Neste trabalho, foram propostas e avaliadas diversas técnicas de classificação, tendo como base o operador CENTRIST (Histograma da Transformada Census), no contexto de identificação de eventos violentos em cenas de vídeo. Adicionalmente, foram avaliados outros descritores tradicionais, como HoG (Histograma de Gradientes Orientados), HOF (Histograma do Fluxo Óptico) e descritores extraídos a partir de modelos de aprendizado de máquina profundo pré-treinados. De modo a permitir a avaliação apenas em regiões de interesse presentes nos quadros dos vídeos, técnicas para remoção do fundo da cena. Uma abordagem baseada em janela deslizante foi utilizada para avaliar regiões menores da cena em combinação com um critério de votação. A janela deslizante é então aplicada juntamente com uma filtragem de blocos utilizando fluxo óptico da cena. Para demonstrar a efetividade de nosso método para discriminar violência em cenas de multidões, os resultados obtidos foram comparados com outras abordagens disponíveis na literatura em duas bases de dados públicas (Violence in Crowds e Hockey Fights). A eficácia da combinação entre CENTRIST e HoG foi demonstrada em comparação com a utilização desses operadores individualmente. A combinação desses operadores obteve aproximadamente 88% contra 81% utilizando apenas HoG e 86% utilizando CENTRIST. A partir do refinamento do método proposto, foi identificado que avaliar blocos do quadro com a abordagem de janela deslizante tornou o método mais eficaz. Técnicas para geração de palavras visuais com codificação esparsa, medida de distância com um modelo de misturas Gaussianas e medida de distância entre agrupamentos também foram avaliadas e discutidas. Além disso, também foi avaliado calcular dinamicamente o limiar de votação, o que trouxe resultados melhores em alguns casos. Finalmente, formas de restringir os atores presentes nas cenas utilizando fluxo óptico foram analisadas. Utilizando o método de Otsu para calcular o limiar do fluxo óptico da cena a eficiência supera nossos resultados mais competitivos: 91,46% de acurácia para a base Violence in Crowds e 92,79% para a base Hockey FightsAbstract: Surveillance systems in video sequences have been widely used to monitor scenes in various environments, such as airports, banks, schools, industries, bus and train stations, highways and stores. Due to the large amount of information obtained via surveillance cameras, the use of visual inspection by camera operators becomes a task subject to fatigue and failure, in addition to consuming a lot of time. One challenge is the development of intelligent surveillance systems capable of analyzing long video sequences captured by a network of cameras in order to identify a certain behavior. In this work, we propose and analyze the use of several classification techniques, based on the CENTRIST (Transformation Census Histogram) operator, in the context of identifying violent events in video scenes. Additionally, we evaluated other traditional descriptors, such as HoG (Oriented Gradient Histogram), HOF (Optical Flow Histogram) and descriptors extracted from pre-trained deep machine learning models. In order to allow the evaluation only in regions of interest present in the video frames, we investigated techniques for removing the background from the scene. A sliding window-based approach was used to assess smaller regions of the scene in combination with a voting criterion. The sliding window is then applied along with block filtering using the optical flow of the scene. To demonstrate the effectiveness of our method for discriminating violence in crowd scenes, we compared the results to other approaches available in the literature in two public databases (Violence in Crowds and Hockey Fights). The combination of CENTRIST and HoG was demonstrated in comparison to the use of these operators individually. The combination of both operators obtained approximately 88% against 81% using only HoG and 86% using CENTRIST. From the refinement of the proposed method, we identified that evaluating blocks of the frame with the sliding window-based approach made the method more effective. Techniques for generating a codebook with sparse coding, distance measurement with a Gaussian mixture model and distance measurement between clusters were evaluated and discussed. Also we dynamically calculate the threshold for class voting, which obtained superior results in some cases. Finally, strategies for restricting the actors present in the scenes using optical flow were analyzed. By using the Otsu¿s method to calculate the threshold from the optical flow at the scene, the effectiveness surpasses our most competitive results: 91.46% accuracy for the Violence in Crowds dataset and 92.79% for the Hockey Fights datasetMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Pedestrian Detection and Tracking in Urban Context Using a Mono-camera

    Get PDF
    Jalakäijate tuvastus ja jälgimine on üks tähtsamaid aspekte edasijõudnud sõitja abisüsteemides. Need süsteemid aitavad vältida ohtlikke olukordi, juhendades sõitjaid ja hoiatades ettetulevate riskide eest. Jalakäijate tuvastuse ja jälgimise põhiideed on tuvastada jalakäijad siis, kui nad on turvalises tsoonis ja ennustada nende asukohta ja suunda. Selle lõputöö eesmärk on uurida võimalikke meetodeid ja arendada nende põhjal hea algoritm jalakäijate tuvastuseks ja jälgimiseks.Selles lõputöös arendatud lahendus keskendub jalakäija täpsele tuvastamisele ja jälgimisele. Süsteemi täpsuse hindamiseks on saadud tulemusi võrreldud olemasolevate lahendustega.Pedestrian detection and tracking are one of the important aspects in Advanced Driver Assistance Systems. These systems help to avoid dangerous situations, by guiding drivers and warning them about the upcoming risks. The main ideas of pedestrian detection and tracking are to detect pedestrians, while they are in the secure zone, and predict their position and direction.The goal of this thesis is to examine possible methods and based on these, to develop a good pedestrian detection and tracking algorithm. The solution developed in this thesis, focuses on accurately detecting and tracking a pedestrian. In order to estimate the accuracy of the system, obtained results will be compared to the existing solutions

    Indoor place classification for intelligent mobile systems

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Place classification is an emerging theme in the study of human-robot interaction which requires common understanding of human-defined concepts between the humans and machines. The requirement posts a significant challenge to the current intelligent mobile systems which are more likely to be operating in absolute coordinate systems, and hence unaware of the semantic labels. Aimed at filling this gap, the objective of the research is to develop an approach for intelligent mobile systems to understand and label the indoor environments in a holistic way based on the sensory observations. Focusing on commonly available sensors and machine learning based solutions which play a significant role in the research of place classification, solutions to train a machine to assign unknown instances with concepts understandable to human beings, like room, office and corridor, in both independent and structured prediction ways, have been proposed in this research. The solution modelling dependencies between random variables, which takes the spatial relationship between observations into consideration, is further extended by integrating the logical coexistence of the objects and the places to provide the machine with the additional object detection ability. The main techniques involve logistic regression, support vector machine, and conditional random field, in both supervised and semi-supervised learning frameworks. Experiments in a variety of environments show convincing place classification results through machine learning based approaches on data collected with either single or multiple sensory modalities; modelling spatial dependencies and introducing semi-supervised learning paradigm further improve the accuracy of the prediction and the generalisation ability of the system; and vision-based object detection can be seamlessly integrated into the learning framework to enhance the discrimination ability and the flexibility of the system. The contributions of this research lie in the in-depth studies on the place classification solutions with independent predictions, the improvements on the generalisation ability of the system through semi-supervised learning paradigm, the formulation of training a conditional random field with partially labelled data, and the integration of multiple cues in two sensory modalities to improve the system's functionality. It is anticipated that the findings of this research will significantly enhance the current capabilities of the human robot interaction and robot-environment interaction
    corecore