323 research outputs found

    Frequency Domain Decomposition of Digital Video Containing Multiple Moving Objects

    Get PDF
    Motion estimation has been dominated by time domain methods such as block matching and optical flow. However, these methods have problems with multiple moving objects in the video scene, moving backgrounds, noise, and fractional pixel/frame motion. This dissertation proposes a frequency domain method (FDM) that solves these problems. The methodology introduced here addresses multiple moving objects, with or without a moving background, 3-D frequency domain decomposition of digital video as the sum of locally translational (or, in the case of background, a globally translational motion), with high noise rejection. Additionally, via a version of the chirp-Z, fractional pixel/frame motion detection and quantification is accomplished. Furthermore, images of particular moving objects can be extracted and reconstructed from the frequency domain. Finally, this method can be integrated into a larger system to support motion analysis. The method presented here has been tested with synthetic data, realistic, high fidelity simulations, and actual data from established video archives to verify the claims made for the method, all presented here. In addition, a convincing comparison with an up-and-coming spatial domain method, incremental principal component pursuit (iPCP), is presented, where the FDM performs markedly better than its competition

    Big data analytics in high-throughput phenotyping

    Get PDF
    Doctor of PhilosophyDepartment of Computer ScienceMitchell L. NeilsenAs the global population rises, advancements in plant diversity and crop yield is necessary for resource stability and nutritional security. In the next thirty years, the global population will pass 9 billion. Genetic advancements have become inexpensive and widely available to address this issue; however, phenotypic acquisition development has stagnated. Plant breeding programs have begun to support efforts in data mining, computer vision, and graphics to alleviate the gap from genetic advancements. This dissertation creates a bridge between computer vision research and phenotyping by designing and analyzing various deep neural networks for concrete applications while presenting new and novel approaches. The significant contributions are research advancements to the current state-of-the-art in mobile high-throughput phenotyping (HTP), which promotes more efficient plant science workflow tasks. Novel tools and utilities created for automatic code generation, maintenance, and source translation are featured. Promoted tools replace boiler-plate segments and redundant tasks. Finally, this research investigates various state-of-the-art deep neural network architectures to derive methods for object identification and enumeration. Seed kernel counting is a crucial task in the plant research workflow. This dissertation explains techniques and tools for generating data to scale training. New dataset creation methodologies are debuted and aim to replace the classical approach to labeling data. Although HTP is a general topic, this research focuses on various grains and plant-seed phenotypes. Applying deep neural networks to seed kernels for classification and object detection is a relatively new topic. This research uses a novel open-source dataset that supports future architectures for detecting kernels. State-of-the-art pre-trained regional convolutional neural networks (RCNN) perform poorly on seeds. The proposed counting architectures outperform the models above by focusing on learning a labeled integer count rather than anchor points for localization. Concurrently, pre-trained models on the seed dataset, a composition of geometrically primitive-like objects, boasts improvements to evaluation metrics in comparison to the Common Object in Context (COCO) dataset. A widely accepted problem in image processing is the segmentation of foreground objects from the background. This dissertation shows that state-of-the-art regional convolutional neural networks (RCNN) perform poorly in cases where foreground objects are similar to the background. Instead, transfer learning leverages salient features and boosts performance on noisy background datasets. The accumulation of new ideas and evidence of growth for mobile computer vision surmise a bright future for data-acquisition in various fields of HTP. The results obtained provide horizons and a solid foundation for future research to stabilize and continue the growth of phenotypic acquisition and crop yield

    Tracking interacting targets in multi-modal sensors

    Get PDF
    PhDObject tracking is one of the fundamental tasks in various applications such as surveillance, sports, video conferencing and activity recognition. Factors such as occlusions, illumination changes and limited field of observance of the sensor make tracking a challenging task. To overcome these challenges the focus of this thesis is on using multiple modalities such as audio and video for multi-target, multi-modal tracking. Particularly, this thesis presents contributions to four related research topics, namely, pre-processing of input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking, and interaction recognition. To improve the performance of detection algorithms, especially in the presence of noise, this thesis investigate filtering of the input data through spatio-temporal feature analysis as well as through frequency band analysis. The pre-processed data from multiple modalities is then fused within Particle filtering (PF). To further minimise the discrepancy between the real and the estimated positions, we propose a strategy that associates the hypotheses and the measurements with a real target, using a Weighted Probabilistic Data Association (WPDA). Since the filtering involved in the detection process reduces the available information and is inapplicable on low signal-to-noise ratio data, we investigate simultaneous detection and tracking approaches and propose a multi-target track-beforedetect Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses the detection step and performs tracking in the raw signal. Finally, we apply the proposed multi-modal tracking to recognise interactions between targets in regions within, as well as outside the cameras’ fields of view. The efficiency of the proposed approaches are demonstrated on large uni-modal, multi-modal and multi-sensor scenarios from real world detections, tracking and event recognition datasets and through participation in evaluation campaigns

    Taking the Temperature of Sports Arenas:Automatic Analysis of People

    Get PDF

    Development of Motion Trackers for Space Debris Research

    Get PDF
    Space debris are dysfunctional artificial objects that are orbiting around the earth. Miniaturization and advancement in space technology have encouraged the increase in the number of small-satellite constellations. Over the years, in-orbit catastrophic events have resulted in an exponential increase in space pollution with the ever-expanding coverage area of space debris. An international consortium of private institutions and space agencies works together to address the concern by extensive research and development related to active debris tracking and removal methods. On the same grounds, the Institute of Technical Physics of German Aerospace Center is developing ground based high energy laser facility and optical instruments to track and remove space debris from Low Earth Orbit. The internship project aims to develop a motion tracker software to track the sample in a technology demonstration experiment of impulse generation through laser-matter interaction. Several object detection and motion tracking algorithms in computer vision were reviewed and analyzed to accomplish it. For object detection, Harris Corner Detector and Scale Invariant Feature Transform algorithms exhibit a decent success rate. Optical flow point based tracking was most promising to obtain a 3-Dimensional sample track specifically in a multi-view camera configuration. The reference data files used for software development are the high-speed videos originally obtained during the laser-matter interaction experiment throughout the project

    Video analytics for security systems

    Get PDF
    This study has been conducted to develop robust event detection and object tracking algorithms that can be implemented in real time video surveillance applications. The aim of the research has been to produce an automated video surveillance system that is able to detect and report potential security risks with minimum human intervention. Since the algorithms are designed to be implemented in real-life scenarios, they must be able to cope with strong illumination changes and occlusions. The thesis is divided into two major sections. The first section deals with event detection and edge based tracking while the second section describes colour measurement methods developed to track objects in crowded environments. The event detection methods presented in the thesis mainly focus on detection and tracking of objects that become stationary in the scene. Objects such as baggage left in public places or vehicles parked illegally can cause a serious security threat. A new pixel based classification technique has been developed to detect objects of this type in cluttered scenes. Once detected, edge based object descriptors are obtained and stored as templates for tracking purposes. The consistency of these descriptors is examined using an adaptive edge orientation based technique. Objects are tracked and alarm events are generated if the objects are found to be stationary in the scene after a certain period of time. To evaluate the full capabilities of the pixel based classification and adaptive edge orientation based tracking methods, the model is tested using several hours of real-life video surveillance scenarios recorded at different locations and time of day from our own and publically available databases (i-LIDS, PETS, MIT, ViSOR). The performance results demonstrate that the combination of pixel based classification and adaptive edge orientation based tracking gave over 95% success rate. The results obtained also yield better detection and tracking results when compared with the other available state of the art methods. In the second part of the thesis, colour based techniques are used to track objects in crowded video sequences in circumstances of severe occlusion. A novel Adaptive Sample Count Particle Filter (ASCPF) technique is presented that improves the performance of the standard Sample Importance Resampling Particle Filter by up to 80% in terms of computational cost. An appropriate particle range is obtained for each object and the concept of adaptive samples is introduced to keep the computational cost down. The objective is to keep the number of particles to a minimum and only to increase them up to the maximum, as and when required. Variable standard deviation values for state vector elements have been exploited to cope with heavy occlusion. The technique has been tested on different video surveillance scenarios with variable object motion, strong occlusion and change in object scale. Experimental results show that the proposed method not only tracks the object with comparable accuracy to existing particle filter techniques but is up to five times faster. Tracking objects in a multi camera environment is discussed in the final part of the thesis. The ASCPF technique is deployed within a multi-camera environment to track objects across different camera views. Such environments can pose difficult challenges such as changes in object scale and colour features as the objects move from one camera view to another. Variable standard deviation values of the ASCPF have been utilized in order to cope with sudden colour and scale changes. As the object moves from one scene to another, the number of particles, together with the spread value, is increased to a maximum to reduce any effects of scale and colour change. Promising results are obtained when the ASCPF technique is tested on live feeds from four different camera views. It was found that not only did the ASCPF method result in the successful tracking of the moving object across different views but also maintained the real time frame rate due to its reduced computational cost thus indicating that the method is a potential practical solution for multi camera tracking applications

    Towards automatic model specialization for edge video analytics

    Get PDF
    The number of cameras deployed to the edge of the network increases by the day, while emerging use cases, such as smart cities or autonomous driving, also grow to expect images to be analyzed in real-time by increasingly accurate and complex neural networks. Unfortunately, state-of-the-art accuracy comes at a computational cost rarely available in the edge cloud. At the same time, due to strict latency constraints and the vast amount of bandwidth edge cameras generate, we can no longer rely on offloading the task to a centralized cloud. Consequently, there is a need for a meeting point between the resource-constrained edge cloud and accurate real-time video analytics. If state-of-the-art models are too expensive to run on the edge, and lightweight models are not accurate enough for the use cases in the edge, one solution is to demand less from the lightweight model and specialize it in a narrower scope of the problem, a technique known as model specialization. By specializing a model to the context of a single camera, we can boost its accuracy while keeping its computational cost constant. However, this also involves one training per camera, which quickly becomes unfeasible unless the entire process is fully automated. In this paper, we present and evaluate COVA (Contextually Optimized Video Analytics), a framework to assist in the automatic specialization of models for video analytics in edge cloud cameras. COVA aims to automatically improve the accuracy of lightweight models by specializing them to the context to which they will be deployed. Moreover, we discuss and analyze each step involved in the process to understand the different trade-offs that each one entails. Using COVA, we demonstrate that the whole pipeline can be effectively automated by leveraging large neural networks used as teachers whose predictions are used to train and specialize lightweight neural networks. Results show that COVA can automatically improve pre-trained models by an average of 21% mAP on the different scenes of the VIRAT dataset.This work has been partially supported by the Spanish Government (contract PID2019-107255GB) and by Generalitat de Catalunya, Spain (contract 2014-SGR-1051).Peer ReviewedPostprint (published version

    Advances in Object and Activity Detection in Remote Sensing Imagery

    Get PDF
    The recent revolution in deep learning has enabled considerable development in the fields of object and activity detection. Visual object detection tries to find objects of target classes with precise localisation in an image and assign each object instance a corresponding class label. At the same time, activity recognition aims to determine the actions or activities of an agent or group of agents based on sensor or video observation data. It is a very important and challenging problem to detect, identify, track, and understand the behaviour of objects through images and videos taken by various cameras. Together, objects and their activity recognition in imaging data captured by remote sensing platforms is a highly dynamic and challenging research topic. During the last decade, there has been significant growth in the number of publications in the field of object and activity recognition. In particular, many researchers have proposed application domains to identify objects and their specific behaviours from air and spaceborne imagery. This Special Issue includes papers that explore novel and challenging topics for object and activity detection in remote sensing images and videos acquired by diverse platforms

    Automatic crowdflow estimation enhanced by crowdsourcing

    Get PDF
    [ANGLÈS] Video surveillance systems are evolving from simple closed-circuit television (CCTV) towards intelligent systems capable of understanding the recorded scenes. This trend is accompanied by the widespread increase in the amount of cameras, which makes the continuous monitoring of video feeds a practically impossible task. In this scenario, video surveillance systems make intensive use of video analytics and image processing in order to allow their scalability and boost their effectiveness. One of such video analytics performed in video surveillance systems is crowd analysis. Crowd analysis plays a fundamental role in security applications. For instance, keeping a rough estimate of the amount of people present in a given area or inside a building is critical to prevent jams in an emergency or when planning the distribution of entry and exit nodes. In this thesis, we focus on crowd flow estimation. Crowd flow is defined as the number of people that have crossed a specific region over time. Hence, the goal of the method is to estimate the crowd flow as accurately as possible in real time. Many automatic methods have been proposed in the literature to estimate the crowd flow. However, video analytics techniques often face a wide range of difficulties such as occlusions, shadows, environmental conditions changes or distortions in the video. Developed methods struggle to maintain a high accuracy in such situations. Crowdsourcing has been shown as an effective solution to solve to problems that involve complex cognitive tasks. By incorporating human assistantship, the performance of automatic methods can be enhanced in adverse situations. In this thesis, an automatic crowd flow estimation method, previously developed in the Video and Image Processing Laboratory at Purdue University, is implemented and crowdsourcing is used to enhance its performance. Also, a web platform is developed to control the whole system remotely by the operator of the system, and to allow the crowdsourcing members to perform their tasks.[CASTELLÀ] Los sistemas de videovigilancia están evolucionando desde simples circuitos cerrados de televisión (CCTV) hacia sistemas inteligentes capaces de entender las escenas registradas. A esta tendencia le acompaña el extendido incremento en la cantidad de cámaras, hecho que hace que monitorizar continuamente todos los flujos de vídeo sea una tarea prácticamente imposible. En este escenario, los sistemas de videovigilancia hacen un uso intensivo de analíticas de video y procesado de imagen al fin de permitir su escalabilidad e impulsar su efectividad. Una de estas analíticas de vídeo que sea realizan en los sistemas de videovigilancia es el llamado > o análisis de multitudes. El > lleva a cabo un rol fundamental en aplicaciones de seguridad. Por ejemplo, mantener una estimación aproximada de la cantidad de personas presentes en una área o dentro de un edificio es crítico para prevenir atascos en una emergencia o para planear la distribución de nodos de entrada o salida. En esta tesis, nos focalizamos en estimación del > o flujo de multitudes. > se define como el número de personas que han cruzado una región específica a lo largo del tiempo. Así, el objetivo del método es estimar el > tan precisamente como sea posible en tiempo real. En la literatura se han propuesto muchos métodos automáticos para estimar el >. Aun así, las técnicas de analíticas de vídeo a menudo se enfrentan con una amplia gama de dificultades tales como oclusiones, sombras, cambios en las condiciones ambientales o distorsiones en el vídeo. Los métodos desarrollados pelean por mantener una alta precisión en estas situaciones. El > se ha demostrado como una solución efectiva a los problemas que involucran tareas cognitivas complejas. Incorporando asistencia humana, se puede mejorar el rendimiento de los métodos automáticos en situaciones adversas. En esta tesis, se implementa un método automático de estimación del >, previamente desarrollado en el Video and Image Processing Laboratory en la universidad de Purdue, y se usa > para mejorar su rendimiento. Además, se desarrolla una plataforma web para controlar todo el sistema remotamente por parte del operador, y permitir a los miembros del > llevar a cabo sus tareas.[CATALÀ] Els sistemes de videovigilància estan evolucionant des de simples circuits tancats de televisió (CCTV) cap a sistemes intel·ligents capaços d'entendre les escenes enregistrades. A aquesta tendència li acompanya l'extès increment en la quantitat de càmeres, fet que fa que monitoritzar continuament tots els fluxes de video sigui una tasca pràcticament impossible. En aquest escenari, els sistemes de videovigilància fan un ús intensiu d'analítiques de video i processament d'imatge per tal de permetre la seva escalabilitat i impulsar la seva efectivitat. Una d'aquestes analítiques de video que es realitzen en els sistemes de videvigilància és l'anomenat > o anàlisi de multituds. El > duu a terme un rol fonamental en aplicacions de seguretat. Per exemple, mantenir una estimació aproximada de la quantitat de persones presents en una àrea o dintre d'un edifici és crític per prevenir embusos en una emergència o per planejar la distribució de nodes d'entrada o sortida. En aquesta tesis, ens focalitzem en estimació del > o fluxe de mutituds. > es defineix com el nombre de persones que han creuat una regió específica al llarg del temps. Així, l'objectiu del mètode és estimar el > tan precisament com sigui possible en temps real. En la literatura s'han proposat molts mètodes automàtics per estimar el >. Tot i així, les tècniques d'analítiques de video sovint s'enfronten a una àmplia gamma de dificultats com ara oclusions, sombres, canvis en les condicions ambientals o distorsions en el video. Els mètodes desenvolupats barallen per mantenir una alta precisió en aquestes situacions. El > s'ha demostrat com una sol·lució efectiva als problemes que involucren tasques cognitives complexes. Incorporant assistència humana, es pot millorar el rendiment dels mètodes automàtics en situacions adverses. En aquesta tesi, s'implementa un mètode automàtic d'estimació del >, prèviament desenvolupat al Video and Image Processing Laboratory a la universitat de Purdue, i es fa servir > per millorar el seu rendiment. A més, es desenvolupa una plataforma web per controlar tot el sistema remotament per l'operador, i per permetre als membres del > portar a terme les seves tasques
    corecore