47 research outputs found

    A review of content-based video retrieval techniques for person identification

    Get PDF
    The rise of technology spurs the advancement in the surveillance field. Many commercial spaces reduced the patrol guard in favor of Closed-Circuit Television (CCTV) installation and even some countries already used surveillance drone which has greater mobility. In recent years, the CCTV Footage have also been used for crime investigation by law enforcement such as in Boston Bombing 2013 incident. However, this led us into producing huge unmanageable footage collection, the common issue of Big Data era. While there is more information to identify a potential suspect, the massive size of data needed to go over manually is a very laborious task. Therefore, some researchers proposed using Content-Based Video Retrieval (CBVR) method to enable to query a specific feature of an object or a human. Due to the limitations like visibility and quality of video footage, only certain features are selected for recognition based on Chicago Police Department guidelines. This paper presents the comprehensive reviews on CBVR techniques used for clothing, gender and ethnic recognition of the person of interest and how can it be applied in crime investigation. From the findings, the three recognition types can be combined to create a Content-Based Video Retrieval system for person identification

    A survey on generative adversarial networks for imbalance problems in computer vision tasks

    Get PDF
    Any computer vision application development starts off by acquiring images and data, then preprocessing and pattern recognition steps to perform a task. When the acquired images are highly imbalanced and not adequate, the desired task may not be achievable. Unfortunately, the occurrence of imbalance problems in acquired image datasets in certain complex real-world problems such as anomaly detection, emotion recognition, medical image analysis, fraud detection, metallic surface defect detection, disaster prediction, etc., are inevitable. The performance of computer vision algorithms can significantly deteriorate when the training dataset is imbalanced. In recent years, Generative Adversarial Neural Networks (GANs) have gained immense attention by researchers across a variety of application domains due to their capability to model complex real-world image data. It is particularly important that GANs can not only be used to generate synthetic images, but also its fascinating adversarial learning idea showed good potential in restoring balance in imbalanced datasets. In this paper, we examine the most recent developments of GANs based techniques for addressing imbalance problems in image data. The real-world challenges and implementations of synthetic image generation based on GANs are extensively covered in this survey. Our survey first introduces various imbalance problems in computer vision tasks and its existing solutions, and then examines key concepts such as deep generative image models and GANs. After that, we propose a taxonomy to summarize GANs based techniques for addressing imbalance problems in computer vision tasks into three major categories: 1. Image level imbalances in classification, 2. object level imbalances in object detection and 3. pixel level imbalances in segmentation tasks. We elaborate the imbalance problems of each group, and provide GANs based solutions in each group. Readers will understand how GANs based techniques can handle the problem of imbalances and boost performance of the computer vision algorithms

    A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition

    Get PDF
    Human gait recognition (HGR) shows high importance in the area of video surveillance due to remote access and security threats. HGR is a technique commonly used for the identification of human style in daily life. However, many typical situations like change of clothes condition and variation in view angles degrade the system performance. Lately, different machine learning (ML) techniques have been introduced for video surveillance which gives promising results among which deep learning (DL) shows best performance in complex scenarios. In this article, an integrated framework is proposed for HGR using deep neural network and fuzzy entropy controlled skewness (FEcS) approach. The proposed technique works in two phases: In the first phase, deep convolutional neural network (DCNN) features are extracted by pre-trained CNN models (VGG19 and AlexNet) and their information is mixed by parallel fusion approach. In the second phase, entropy and skewness vectors are calculated from fused feature vector (FV) to select best subsets of features by suggested FEcS approach. The best subsets of picked features are finally fed to multiple classifiers and finest one is chosen on the basis of accuracy value. The experiments were carried out on four well-known datasets, namely, AVAMVG gait, CASIA A, B and C. The achieved accuracy of each dataset was 99.8, 99.7, 93.3 and 92.2%, respectively. Therefore, the obtained overall recognition results lead to conclude that the proposed system is very promising

    Towards Better Image Embeddings Using Neural Networks

    Get PDF
    The primary focus of this dissertation is to study image embeddings extracted by neural networks. Deep Learning (DL) is preferred over traditional Machine Learning (ML) for the reason that feature representations can be automatically constructed from data without human involvement. On account of the effectiveness of deep features, the last decade has witnessed unprecedented advances in Computer Vision (CV), and more real-world applications are expected to be introduced in the coming years. A diverse collection of studies has been included, covering areas such as person re-identification, vehicle attribute recognition, neural image compression, clustering and unsupervised anomaly detection. More specifically, three aspects of feature representations have been thoroughly analyzed. Firstly, features should be distinctive, i.e., features of samples from distinct categories ought to differ significantly. Extracting distinctive features is essential for image retrieval systems, in which an algorithm finds the gallery sample that is closest to a query sample. Secondly, features should be privacy-preserving, i.e., inferring sensitive information from features must be infeasible. With the widespread adoption of Machine Learning as a Service (MLaaS), utilizing privacy-preserving features prevents privacy violations even if the server has been compromised. Thirdly, features should be compressible, i.e., compact features are preferable as they require less storage space. Obtaining compressible features plays a vital role in data compression. Towards the goal of deriving distinctive, privacy-preserving and compressible feature representations, research articles included in this dissertation reveal different approaches to improving image embeddings learned by neural networks. This topic remains a fundamental challenge in Machine Learning, and further research is needed to gain a deeper understanding

    On Deep Machine Learning Methods for Anomaly Detection within Computer Vision

    Get PDF
    This thesis concerns deep learning approaches for anomaly detection in images. Anomaly detection addresses how to find any kind of pattern that differs from the regularities found in normal data and is receiving increasingly more attention in deep learning research. This is due in part to its wide set of potential applications ranging from automated CCTV surveillance to quality control across a range of industries. We introduce three original methods for anomaly detection applicable to two specific deployment scenarios. In the first, we detect anomalous activity in potentially crowded scenes through imagery captured via CCTV or other video recording devices. In the second, we segment defects in textures and demonstrate use cases representative of automated quality inspection on industrial production lines. In the context of detecting anomalous activity in scenes, we take an existing state-of-the-art method and introduce several enhancements including the use of a region proposal network for region extraction and a more information-preserving feature preprocessing strategy. This results in a simpler method that is significantly faster and suitable for real-time application. In addition, the increased efficiency facilitates building higher-dimensional models capable of improved anomaly detection performance, which we demonstrate on the pedestrian-based UCSD Ped2 dataset. In the context of texture defect detection, we introduce a method based on the idea of texture restoration that surpasses all state-of-the-art methods on the texture classes of the challenging MVTecAD dataset. In the same context, we additionally introduce a method that utilises transformer networks for future pixel and feature prediction. This novel method is able to perform competitive anomaly detection on most of the challenging MVTecAD dataset texture classes and illustrates both the promise and limitations of state-of-the-art deep learning transformers for the task of texture anomaly detection

    Adversarial content manipulation for analyzing and improving model robustness

    Get PDF
    The recent rapid progress in machine learning systems has opened up many real-world applications --- from recommendation engines on web platforms to safety critical systems like autonomous vehicles. A model deployed in the real-world will often encounter inputs far from its training distribution. For example, a self-driving car might come across a black stop sign in the wild. To ensure safe operation, it is vital to quantify the robustness of machine learning models to such out-of-distribution data before releasing them into the real-world. However, the standard paradigm of benchmarking machine learning models with fixed size test sets drawn from the same distribution as the training data is insufficient to identify these corner cases efficiently. In principle, if we could generate all valid variations of an input and measure the model response, we could quantify and guarantee model robustness locally. Yet, doing this with real world data is not scalable. In this thesis, we propose an alternative, using generative models to create synthetic data variations at scale and test robustness of target models to these variations. We explore methods to generate semantic data variations in a controlled fashion across visual and text modalities. We build generative models capable of performing controlled manipulation of data like changing visual context, editing appearance of an object in images or changing writing style of text. Leveraging these generative models we propose tools to study robustness of computer vision systems to input variations and systematically identify failure modes. In the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user. Our studies quantifying model robustness explore two kinds of input manipulations, model-agnostic and model-targeted. The model-agnostic manipulations leverage human knowledge to choose the kinds of changes without considering the target model being tested. This includes automatically editing images to remove objects not directly relevant to the task and create variations in visual context. Alternatively, in the model-targeted approach the input variations performed are directly adversarially guided by the target model. For example, we adversarially manipulate the appearance of an object in the image to fool an object detector, guided by the gradients of the detector. Using these methods, we measure and improve the robustness of various computer vision systems -- specifically image classification, segmentation, object detection and visual question answering systems -- to semantic input variations.Der schnelle Fortschritt von Methoden des maschinellen Lernens hat viele neue Anwendungen ermöglicht – von Recommender-Systemen bis hin zu sicherheitskritischen Systemen wie autonomen Fahrzeugen. In der realen Welt werden diese Systeme oft mit Eingaben außerhalb der Verteilung der Trainingsdaten konfrontiert. Zum Beispiel könnte ein autonomes Fahrzeug einem schwarzen Stoppschild begegnen. Um sicheren Betrieb zu gewährleisten, ist es entscheidend, die Robustheit dieser Systeme zu quantifizieren, bevor sie in der Praxis eingesetzt werden. Aktuell werden diese Modelle auf festen Eingaben von derselben Verteilung wie die Trainingsdaten evaluiert. Allerdings ist diese Strategie unzureichend, um solche Ausnahmefälle zu identifizieren. Prinzipiell könnte die Robustheit “lokal” bestimmt werden, indem wir alle zulässigen Variationen einer Eingabe generieren und die Ausgabe des Systems überprüfen. Jedoch skaliert dieser Ansatz schlecht zu echten Daten. In dieser Arbeit benutzen wir generative Modelle, um synthetische Variationen von Eingaben zu erstellen und so die Robustheit eines Modells zu überprüfen. Wir erforschen Methoden, die es uns erlauben, kontrolliert semantische Änderungen an Bild- und Textdaten vorzunehmen. Wir lernen generative Modelle, die kontrollierte Manipulation von Daten ermöglichen, zum Beispiel den visuellen Kontext zu ändern, die Erscheinung eines Objekts zu bearbeiten oder den Schreibstil von Text zu ändern. Basierend auf diesen Modellen entwickeln wir neue Methoden, um die Robustheit von Bilderkennungssystemen bezüglich Variationen in den Eingaben zu untersuchen und Fehlverhalten zu identifizieren. Im Gebiet von Textdaten verwenden wir diese Modelle, um die Diversität von sogenannten Automatische Bildbeschriftung-Modellen zu verbessern und Schreibtstil-Manipulation zu erlauben, um private Attribute des Benutzers zu verschleiern. Um die Robustheit von Modellen zu quantifizieren, werden zwei Arten von Eingabemanipulationen untersucht: Modell-agnostische und Modell-spezifische Manipulationen. Modell-agnostische Manipulationen basieren auf menschlichem Wissen, um bestimmte Änderungen auszuwählen, ohne das entsprechende Modell miteinzubeziehen. Dies beinhaltet das Entfernen von für die Aufgabe irrelevanten Objekten aus Bildern oder Variationen des visuellen Kontextes. In dem alternativen Modell-spezifischen Ansatz werden Änderungen vorgenommen, die für das Modell möglichst ungünstig sind. Zum Beispiel ändern wir die Erscheinung eines Objekts um ein Modell der Objekterkennung täuschen. Dies ist durch den Gradienten des Modells möglich. Mithilfe dieser Werkzeuge können wir die Robustheit von Systemen zur Bildklassifizierung oder -segmentierung, Objekterkennung und Visuelle Fragenbeantwortung quantifizieren und verbessern

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    High Performance Video Stream Analytics System for Object Detection and Classification

    Get PDF
    Due to the recent advances in cameras, cell phones and camcorders, particularly the resolution at which they can record an image/video, large amounts of data are generated daily. This video data is often so large that manually inspecting it for object detection and classification can be time consuming and error prone, thereby it requires automated analysis to extract useful information and meta-data. The automated analysis from video streams also comes with numerous challenges such as blur content and variation in illumination conditions and poses. We investigate an automated video analytics system in this thesis which takes into account the characteristics from both shallow and deep learning domains. We propose fusion of features from spatial frequency domain to perform highly accurate blur and illumination invariant object classification using deep learning networks. We also propose the tuning of hyper-parameters associated with the deep learning network through a mathematical model. The mathematical model used to support hyper-parameter tuning improved the performance of the proposed system during training. The outcomes of various hyper-parameters on system's performance are compared. The parameters that contribute towards the most optimal performance are selected for the video object classification. The proposed video analytics system has been demonstrated to process a large number of video streams and the underlying infrastructure is able to scale based on the number and size of the video stream(s) being processed. The extensive experimentation on publicly available image and video datasets reveal that the proposed system is significantly more accurate and scalable and can be used as a general purpose video analytics system.N/

    Edge Artificial Intelligence for Real-Time Target Monitoring

    Get PDF
    The key enabling technology for the exponentially growing cellular communications sector is location-based services. The need for location-aware services has increased along with the number of wireless and mobile devices. Estimation problems, and particularly parameter estimation, have drawn a lot of interest because of its relevance and engineers' ongoing need for higher performance. As applications expanded, a lot of interest was generated in the accurate assessment of temporal and spatial properties. In the thesis, two different approaches to subject monitoring are thoroughly addressed. For military applications, medical tracking, industrial workers, and providing location-based services to the mobile user community, which is always growing, this kind of activity is crucial. In-depth consideration is given to the viability of applying the Angle of Arrival (AoA) and Receiver Signal Strength Indication (RSSI) localization algorithms in real-world situations. We presented two prospective systems, discussed them, and presented specific assessments and tests. These systems were put to the test in diverse contexts (e.g., indoor, outdoor, in water...). The findings showed the localization capability, but because of the low-cost antenna we employed, this method is only practical up to a distance of roughly 150 meters. Consequently, depending on the use-case, this method may or may not be advantageous. An estimation algorithm that enhances the performance of the AoA technique was implemented on an edge device. Another approach was also considered. Radar sensors have shown to be durable in inclement weather and bad lighting conditions. Frequency Modulated Continuous Wave (FMCW) radars are the most frequently employed among the several sorts of radar technologies for these kinds of applications. Actually, this is because they are low-cost and can simultaneously provide range and Doppler data. In comparison to pulse and Ultra Wide Band (UWB) radar sensors, they also need a lower sample rate and a lower peak to average ratio. The system employs a cutting-edge surveillance method based on widely available FMCW radar technology. The data processing approach is built on an ad hoc-chain of different blocks that transforms data, extract features, and make a classification decision before cancelling clutters and leakage using a frame subtraction technique, applying DL algorithms to Range-Doppler (RD) maps, and adding a peak to cluster assignment step before tracking targets. In conclusion, the FMCW radar and DL technique for the RD maps performed well together for indoor use-cases. The aforementioned tests used an edge device and Infineon Technologies' Position2Go FMCW radar tool-set

    Emotion-aware cross-modal domain adaptation in video sequences

    Get PDF
    corecore