478 research outputs found

    Application of improved you only look once model in road traffic monitoring system

    Get PDF
    The present research focuses on developing an intelligent traffic management solution for tracking the vehicles on roads. Our proposed work focuses on a much better you only look once (YOLOv4) traffic monitoring system that uses the CSPDarknet53 architecture as its foundation. Deep-sort learning methodology for vehicle multi-target detection from traffic video is also part of our research study. We have included features like the Kalman filter, which estimates unknown objects and can track moving targets. Hungarian techniques identify the correct frame for the object. We are using enhanced object detection network design and new data augmentation techniques with YOLOv4, which ultimately aids in traffic monitoring. Until recently, object identification models could either perform quickly or draw conclusions quickly. This was a big improvement, as YOLOv4 has an astoundingly good performance for a very high frames per second (FPS). The current study is focused on developing an intelligent video surveillance-based vehicle tracking system that tracks the vehicles using a neural network, image-based tracking, and YOLOv4. Real video sequences of road traffic are used to test the effectiveness of the method that has been suggested in the research. Through simulations, it is demonstrated that the suggested technique significantly increases graphics processing unit (GPU) speed and FSP as compared to baseline algorithms

    Contributions to improve the technologies supporting unmanned aircraft operations

    Get PDF
    Mención Internacional en el título de doctorUnmanned Aerial Vehicles (UAVs), in their smaller versions known as drones, are becoming increasingly important in today's societies. The systems that make them up present a multitude of challenges, of which error can be considered the common denominator. The perception of the environment is measured by sensors that have errors, the models that interpret the information and/or define behaviors are approximations of the world and therefore also have errors. Explaining error allows extending the limits of deterministic models to address real-world problems. The performance of the technologies embedded in drones depends on our ability to understand, model, and control the error of the systems that integrate them, as well as new technologies that may emerge. Flight controllers integrate various subsystems that are generally dependent on other systems. One example is the guidance systems. These systems provide the engine's propulsion controller with the necessary information to accomplish a desired mission. For this purpose, the flight controller is made up of a control law for the guidance system that reacts to the information perceived by the perception and navigation systems. The error of any of the subsystems propagates through the ecosystem of the controller, so the study of each of them is essential. On the other hand, among the strategies for error control are state-space estimators, where the Kalman filter has been a great ally of engineers since its appearance in the 1960s. Kalman filters are at the heart of information fusion systems, minimizing the error covariance of the system and allowing the measured states to be filtered and estimated in the absence of observations. State Space Models (SSM) are developed based on a set of hypotheses for modeling the world. Among the assumptions are that the models of the world must be linear, Markovian, and that the error of their models must be Gaussian. In general, systems are not linear, so linearization are performed on models that are already approximations of the world. In other cases, the noise to be controlled is not Gaussian, but it is approximated to that distribution in order to be able to deal with it. On the other hand, many systems are not Markovian, i.e., their states do not depend only on the previous state, but there are other dependencies that state space models cannot handle. This thesis deals a collection of studies in which error is formulated and reduced. First, the error in a computer vision-based precision landing system is studied, then estimation and filtering problems from the deep learning approach are addressed. Finally, classification concepts with deep learning over trajectories are studied. The first case of the collection xviiistudies the consequences of error propagation in a machine vision-based precision landing system. This paper proposes a set of strategies to reduce the impact on the guidance system, and ultimately reduce the error. The next two studies approach the estimation and filtering problem from the deep learning approach, where error is a function to be minimized by learning. The last case of the collection deals with a trajectory classification problem with real data. This work completes the two main fields in deep learning, regression and classification, where the error is considered as a probability function of class membership.Los vehículos aéreos no tripulados (UAV) en sus versiones de pequeño tamaño conocidos como drones, van tomando protagonismo en las sociedades actuales. Los sistemas que los componen presentan multitud de retos entre los cuales el error se puede considerar como el denominador común. La percepción del entorno se mide mediante sensores que tienen error, los modelos que interpretan la información y/o definen comportamientos son aproximaciones del mundo y por consiguiente también presentan error. Explicar el error permite extender los límites de los modelos deterministas para abordar problemas del mundo real. El rendimiento de las tecnologías embarcadas en los drones, dependen de nuestra capacidad de comprender, modelar y controlar el error de los sistemas que los integran, así como de las nuevas tecnologías que puedan surgir. Los controladores de vuelo integran diferentes subsistemas los cuales generalmente son dependientes de otros sistemas. Un caso de esta situación son los sistemas de guiado. Estos sistemas son los encargados de proporcionar al controlador de los motores información necesaria para cumplir con una misión deseada. Para ello se componen de una ley de control de guiado que reacciona a la información percibida por los sistemas de percepción y navegación. El error de cualquiera de estos sistemas se propaga por el ecosistema del controlador siendo vital su estudio. Por otro lado, entre las estrategias para abordar el control del error se encuentran los estimadores en espacios de estados, donde el filtro de Kalman desde su aparición en los años 60, ha sido y continúa siendo un gran aliado para los ingenieros. Los filtros de Kalman son el corazón de los sistemas de fusión de información, los cuales minimizan la covarianza del error del sistema, permitiendo filtrar los estados medidos y estimarlos cuando no se tienen observaciones. Los modelos de espacios de estados se desarrollan en base a un conjunto de hipótesis para modelar el mundo. Entre las hipótesis se encuentra que los modelos del mundo han de ser lineales, markovianos y que el error de sus modelos ha de ser gaussiano. Generalmente los sistemas no son lineales por lo que se realizan linealizaciones sobre modelos que a su vez ya son aproximaciones del mundo. En otros casos el ruido que se desea controlar no es gaussiano, pero se aproxima a esta distribución para poder abordarlo. Por otro lado, multitud de sistemas no son markovianos, es decir, sus estados no solo dependen del estado anterior, sino que existen otras dependencias que los modelos de espacio de estados no son capaces de abordar. Esta tesis aborda un compendio de estudios sobre los que se formula y reduce el error. En primer lugar, se estudia el error en un sistema de aterrizaje de precisión basado en visión por computador. Después se plantean problemas de estimación y filtrado desde la aproximación del aprendizaje profundo. Por último, se estudian los conceptos de clasificación con aprendizaje profundo sobre trayectorias. El primer caso del compendio estudia las consecuencias de la propagación del error de un sistema de aterrizaje de precisión basado en visión artificial. En este trabajo se propone un conjunto de estrategias para reducir el impacto sobre el sistema de guiado, y en última instancia reducir el error. Los siguientes dos estudios abordan el problema de estimación y filtrado desde la perspectiva del aprendizaje profundo, donde el error es una función que minimizar mediante aprendizaje. El último caso del compendio aborda un problema de clasificación de trayectorias con datos reales. Con este trabajo se completan los dos campos principales en aprendizaje profundo, regresión y clasificación, donde se plantea el error como una función de probabilidad de pertenencia a una clase.I would like to thank the Ministry of Science and Innovation for granting me the funding with reference PRE2018-086793, associated to the project TEC2017-88048-C2-2-R, which provide me the opportunity to carry out all my PhD. activities, including completing an international research internship.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Antonio Berlanga de Jesús.- Secretario: Daniel Arias Medina.- Vocal: Alejandro Martínez Cav

    Shaped-based IMU/Camera Tightly Coupled Object-level SLAM using Rao-Blackwellized Particle Filtering

    Get PDF
    Simultaneous Localization and Mapping (SLAM) is a decades-old problem. The classical solution to this problem utilizes entities such as feature points that cannot facilitate the interactions between a robot and its environment (e.g., grabbing objects). Recent advances in deep learning have paved the way to accurately detect objects in the image under various illumination conditions and occlusions. This led to the emergence of object-level solutions to the SLAM problem. Current object-level methods depend on an initial solution using classical approaches and assume that errors are Gaussian. This research develops a standalone solution to object-level SLAM that integrates the data from a monocular camera and an IMU (available in low-end devices) using Rao Blackwellized Particle Filter (RBPF). RBPF does not assume Gaussian distribution for the error; thus, it can handle a variety of scenarios (such as when a symmetrical object with pose ambiguities is encountered). The developed method utilizes shape instead of texture; therefore, texture-less objects can be incorporated into the solution. In the particle weighing process, a new method is developed that utilizes the Intersection over the Union (IoU) area of the observed and projected boundaries of the object that does not require point-to-point correspondence. Thus, it is not prone to false data correspondences. Landmark initialization is another important challenge for object-level SLAM. In the state-of-the-art delayed initialization, the trajectory estimation only relies on the motion model provided by IMU mechanization (during the initialization), leading to large errors. In this thesis, two novel undelayed initializations are developed. One relies only on a monocular camera and IMU, and the other utilizes an ultrasonic rangefinder as well. The developed object-level SLAM is tested using wheeled robots and handheld devices, and an error (in the position) of 4.1 to 13.1 cm (0.005 to 0.028 of the total path length) has been obtained through extensive experiments using only a single object. These experiments are conducted in different indoor environments under different conditions (e.g. illumination). Further, it is shown that undelayed initialization using an ultrasonic sensor can reduce the algorithm's runtime by half

    Ghost on the Shell: An Expressive Representation of General 3D Shapes

    Full text link
    The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open, surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modelling. Inspired by the observation that open surfaces can be seen as islands floating on watertight surfaces, we parameterize open surfaces by defining a manifold signed distance field on watertight templates. With this parameterization, we further develop a grid-based and differentiable representation that parameterizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modelling of non-watertight meshes. We empirically demonstrate that G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.Comment: Technical Report (26 pages, 16 figures, Project Page: https://gshell3d.github.io/

    Mixture-Based Clustering and Hidden Markov Models for Energy Management and Human Activity Recognition: Novel Approaches and Explainable Applications

    Get PDF
    In recent times, the rapid growth of data in various fields of life has created an immense need for powerful tools to extract useful information from data. This has motivated researchers to explore and devise new ideas and methods in the field of machine learning. Mixture models have gained substantial attention due to their ability to handle high-dimensional data efficiently and effectively. However, when adopting mixture models in such spaces, four crucial issues must be addressed, including the selection of probability density functions, estimation of mixture parameters, automatic determination of the number of components, identification of features that best discriminate the different components, and taking into account the temporal information. The primary objective of this thesis is to propose a unified model that addresses these interrelated problems. Moreover, this thesis proposes a novel approach that incorporates explainability. This thesis presents innovative mixture-based modelling approaches tailored for diverse applications, such as household energy consumption characterization, energy demand management, fault detection and diagnosis and human activity recognition. The primary contributions of this thesis encompass the following aspects: Initially, we propose an unsupervised feature selection approach embedded within a finite bounded asymmetric generalized Gaussian mixture model. This model is adept at handling synthetic and real-life smart meter data, utilizing three distinct feature extraction methods. By employing the expectation-maximization algorithm in conjunction with the minimum message length criterion, we are able to concurrently estimate the model parameters, perform model selection, and execute feature selection. This unified optimization process facilitates the identification of household electricity consumption profiles along with the optimal subset of attributes defining each profile. Furthermore, we investigate the impact of household characteristics on electricity usage patterns to pinpoint households that are ideal candidates for demand reduction initiatives. Subsequently, we introduce a semi-supervised learning approach for the mixture of mixtures of bounded asymmetric generalized Gaussian and uniform distributions. The integration of the uniform distribution within the inner mixture bolsters the model's resilience to outliers. In the unsupervised learning approach, the minimum message length criterion is utilized to ascertain the optimal number of mixture components. The proposed models are validated through a range of applications, including chiller fault detection and diagnosis, occupancy estimation, and energy consumption characterization. Additionally, we incorporate explainability into our models and establish a moderate trade-off between prediction accuracy and interpretability. Finally, we devise four novel models for human activity recognition (HAR): bounded asymmetric generalized Gaussian mixture-based hidden Markov model with feature selection~(BAGGM-FSHMM), bounded asymmetric generalized Gaussian mixture-based hidden Markov model~(BAGGM-HMM), asymmetric generalized Gaussian mixture-based hidden Markov model with feature selection~(AGGM-FSHMM), and asymmetric generalized Gaussian mixture-based hidden Markov model~(AGGM-HMM). We develop an innovative method for simultaneous estimation of feature saliencies and model parameters in BAGGM-FSHMM and AGGM-FSHMM while integrating the bounded support asymmetric generalized Gaussian distribution~(BAGGD), the asymmetric generalized Gaussian distribution~(AGGD) in the BAGGM-HMM and AGGM-HMM respectively. The aforementioned proposed models are validated using video-based and sensor-based HAR applications, showcasing their superiority over several mixture-based hidden Markov models~(HMMs) across various performance metrics. We demonstrate that the independent incorporation of feature selection and bounded support distribution in a HAR system yields benefits; Simultaneously, combining both concepts results in the most effective model among the proposed models

    Interdisciplinarity in the Age of the Triple Helix: a Film Practitioner's Perspective

    Get PDF
    This integrative chapter contextualises my research including articles I have published as well as one of the creative artefacts developed from it, the feature film The Knife That Killed Me. I review my work considering the ways in which technology, industry methods and academic practice have evolved as well as how attitudes to interdisciplinarity have changed, linking these to Etzkowitz and Leydesdorff’s ‘Triple Helix’ model (1995). I explore my own experiences and observations of opportunities and challenges that have been posed by the intersection of different stakeholder needs and expectations, both from industry and academic perspectives, and argue that my work provides novel examples of the applicability of the ‘Triple Helix’ to the creative industries. The chapter concludes with a reflection on the evolution and direction of my work, the relevance of the ‘Triple Helix’ to creative practice, and ways in which this relationship could be investigated further

    Methods, Models, and Datasets for Visual Servoing and Vehicle Localisation

    Get PDF
    Machine autonomy has become a vibrant part of industrial and commercial aspirations. A growing demand exists for dexterous and intelligent machines that can work in unstructured environments without any human assistance. An autonomously operating machine should sense its surroundings, classify different kinds of observed objects, and interpret sensory information to perform necessary operations. This thesis summarizes original methods aimed at enhancing machine’s autonomous operation capability. These methods and the corresponding results are grouped into two main categories. The first category consists of research works that focus on improving visual servoing systems for robotic manipulators to accurately position workpieces. We start our investigation with the hand-eye calibration problem that focuses on calibrating visual sensors with a robotic manipulator. We thoroughly investigate the problem from various perspectives and provide alternative formulations of the problem and error objectives. The experimental results demonstrate that the proposed methods are robust and yield accurate solutions when tested on real and simulated data. The work package is bundled as a toolkit and available online for public use. In an extension, we proposed a constrained multiview pose estimation approach for robotic manipulators. The approach exploits the available geometric constraints on the robotic system and infuses them directly into the pose estimation method. The empirical results demonstrate higher accuracy and significantly higher precision compared to other studies. In the second part of this research, we tackle problems pertaining to the field of autonomous vehicles and its related applications. First, we introduce a pose estimation and mapping scheme to extend the application of visual Simultaneous Localization and Mapping to unstructured dynamic environments. We identify, extract, and discard dynamic entities from the pose estimation step. Moreover, we track the dynamic entities and actively update the map based on changes in the environment. Upon observing the limitations of the existing datasets during our earlier work, we introduce FinnForest, a novel dataset for testing and validating the performance of visual odometry and Simultaneous Localization and Mapping methods in an un-structured environment. We explored an environment with a forest landscape and recorded data with multiple stereo cameras, an IMU, and a GNSS receiver. The dataset offers unique challenges owing to the nature of the environment, variety of trajectories, and changes in season, weather, and daylight conditions. Building upon the future works proposed in FinnForest Dataset, we introduce a novel scheme that can localize an observer with extreme perspective changes. More specifically, we tailor the problem for autonomous vehicles such that they can recognize a previously visited place irrespective of the direction it previously traveled the route. To the best of our knowledge, this is the first study that accomplishes bi-directional loop closure on monocular images with a nominal field of view. To solve the localisation problem, we segregate the place identification from the pose regression by using deep learning in two steps. We demonstrate that bi-directional loop closure on monocular images is indeed possible when the problem is posed correctly, and the training data is adequately leveraged. All methodological contributions of this thesis are accompanied by extensive empirical analysis and discussions demonstrating the need, novelty, and improvement in performance over existing methods for pose estimation, odometry, mapping, and place recognition

    Multi-Domain Adaptation for Image Classification, Depth Estimation, and Semantic Segmentation

    Get PDF
    The appearance of scenes may change for many reasons, including the viewpoint, the time of day, the weather, and the seasons. Traditionally, deep neural networks are trained and evaluated using images from the same scene and domain to avoid the domain gap. Recent advances in domain adaptation have led to a new type of method that bridges such domain gaps and learns from multiple domains. This dissertation proposes methods for multi-domain adaptation for various computer vision tasks, including image classification, depth estimation, and semantic segmentation. The first work focuses on semi-supervised domain adaptation. I address this semi-supervised setting and propose to use dynamic feature alignment to address both inter- and intra-domain discrepancy. The second work addresses the task of monocular depth estimation in the multi-domain setting. I propose to address this task with a unified approach that includes adversarial knowledge distillation and uncertainty-guided self-supervised reconstruction. The third work considers the problem of semantic segmentation for aerial imagery with diverse environments and viewing geometries. I present CrossSeg: a novel framework that learns a semantic segmentation network that can generalize well in a cross-scene setting with only a few labeled samples. I believe this line of work can be applicable to many domain adaptation scenarios and aerial applications

    CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation

    Full text link
    Realistic simulation is key to enabling safe and scalable development of % self-driving vehicles. A core component is simulating the sensors so that the entire autonomy system can be tested in simulation. Sensor simulation involves modeling traffic participants, such as vehicles, with high quality appearance and articulated geometry, and rendering them in real time. The self-driving industry has typically employed artists to build these assets. However, this is expensive, slow, and may not reflect reality. Instead, reconstructing assets automatically from sensor data collected in the wild would provide a better path to generating a diverse and large set with good real-world coverage. Nevertheless, current reconstruction approaches struggle on in-the-wild sensor data, due to its sparsity and noise. To tackle these issues, we present CADSim, which combines part-aware object-class priors via a small set of CAD models with differentiable rendering to automatically reconstruct vehicle geometry, including articulated wheels, with high-quality appearance. Our experiments show our method recovers more accurate shapes from sparse data compared to existing approaches. Importantly, it also trains and renders efficiently. We demonstrate our reconstructed vehicles in several applications, including accurate testing of autonomy perception systems.Comment: CoRL 2022. Project page: https://waabi.ai/cadsim
    corecore