59 research outputs found

    Collision Avoidance on Unmanned Aerial Vehicles using Deep Neural Networks

    Get PDF
    Unmanned Aerial Vehicles (UAVs), although hardly a new technology, have recently gained a prominent role in many industries, being widely used not only among enthusiastic consumers but also in high demanding professional situations, and will have a massive societal impact over the coming years. However, the operation of UAVs is full of serious safety risks, such as collisions with dynamic obstacles (birds, other UAVs, or randomly thrown objects). These collision scenarios are complex to analyze in real-time, sometimes being computationally impossible to solve with existing State of the Art (SoA) algorithms, making the use of UAVs an operational hazard and therefore significantly reducing their commercial applicability in urban environments. In this work, a conceptual framework for both stand-alone and swarm (networked) UAVs is introduced, focusing on the architectural requirements of the collision avoidance subsystem to achieve acceptable levels of safety and reliability. First, the SoA principles for collision avoidance against stationary objects are reviewed. Afterward, a novel image processing approach that uses deep learning and optical flow is presented. This approach is capable of detecting and generating escape trajectories against potential collisions with dynamic objects. Finally, novel models and algorithms combinations were tested, providing a new approach for the collision avoidance of UAVs using Deep Neural Networks. The feasibility of the proposed approach was demonstrated through experimental tests using a UAV, created from scratch using the framework developed.Os veículos aéreos não tripulados (VANTs), embora dificilmente considerados uma nova tecnologia, ganharam recentemente um papel de destaque em muitas indústrias, sendo amplamente utilizados não apenas por amadores, mas também em situações profissionais de alta exigência, sendo expectável um impacto social massivo nos próximos anos. No entanto, a operação de VANTs está repleta de sérios riscos de segurança, como colisões com obstáculos dinâmicos (pássaros, outros VANTs ou objetos arremessados). Estes cenários de colisão são complexos para analisar em tempo real, às vezes sendo computacionalmente impossível de resolver com os algoritmos existentes, tornando o uso de VANTs um risco operacional e, portanto, reduzindo significativamente a sua aplicabilidade comercial em ambientes citadinos. Neste trabalho, uma arquitectura conceptual para VANTs autônomos e em rede é apresentada, com foco nos requisitos arquitetônicos do subsistema de prevenção de colisão para atingir níveis aceitáveis de segurança e confiabilidade. Os estudos presentes na literatura para prevenção de colisão contra objectos estacionários são revistos e uma nova abordagem é descrita. Esta tecnica usa técnicas de aprendizagem profunda e processamento de imagem, para realizar a prevenção de colisões em tempo real com objetos móveis. Por fim, novos modelos e combinações de algoritmos são propostos, fornecendo uma nova abordagem para evitar colisões de VANTs usando Redes Neurais Profundas. A viabilidade da abordagem foi demonstrada através de testes experimentais utilizando um VANT, desenvolvido a partir da arquitectura apresentada

    Crowd simulation and visualization

    Get PDF
    Large-scale simulation and visualization are essential topics in areas as different as sociology, physics, urbanism, training, entertainment among others. This kind of systems requires a vast computational power and memory resources commonly available in High Performance Computing HPC platforms. Currently, the most potent clusters have heterogeneous architectures with hundreds of thousands and even millions of cores. The industry trends inferred that exascale clusters would have thousands of millions. The technical challenges for simulation and visualization process in the exascale era are intertwined with difficulties in other areas of research, including storage, communication, programming models and hardware. For this reason, it is necessary prototyping, testing, and deployment a variety of approaches to address the technical challenges identified and evaluate the advantages and disadvantages of each proposed solution. The focus of this research is interactive large-scale crowd simulation and visualization. To exploit to the maximum the capacity of the current HPC infrastructure and be prepared to take advantage of the next generation. The project develops a new approach to scale crowd simulation and visualization on heterogeneous computing cluster using a task-based technique. Its main characteristic is hardware agnostic. It abstracts the difficulties that imply the use of heterogeneous architectures like memory management, scheduling, communications, and synchronization — facilitating development, maintenance, and scalability. With the goal of flexibility and take advantage of computing resources as best as possible, the project explores different configurations to connect the simulation with the visualization engine. This kind of system has an essential use in emergencies. Therefore, urban scenes were implemented as realistic as possible; in this way, users will be ready to face real events. Path planning for large-scale crowds is a challenge to solve, due to the inherent dynamism in the scenes and vast search space. A new path-finding algorithm was developed. It has a hierarchical approach which offers different advantages: it divides the search space reducing the problem complexity, it can obtain a partial path instead of wait for the complete one, which allows a character to start moving and compute the rest asynchronously. It can reprocess only a part if necessary with different levels of abstraction. A case study is presented for a crowd simulation in urban scenarios. Geolocated data are used, they were produced by mobile devices to predict individual and crowd behavior and detect abnormal situations in the presence of specific events. It was also address the challenge of combining all these individual’s location with a 3D rendering of the urban environment. The data processing and simulation approach are computationally expensive and time-critical, it relies thus on a hybrid Cloud-HPC architecture to produce an efficient solution. Within the project, new models of behavior based on data analytics were developed. It was developed the infrastructure to be able to consult various data sources such as social networks, government agencies or transport companies such as Uber. Every time there is more geolocation data available and better computation resources which allow performing analysis of greater depth, this lays the foundations to improve the simulation models of current crowds. The use of simulations and their visualization allows to observe and organize the crowds in real time. The analysis before, during and after daily mass events can reduce the risks and associated logistics costs.La simulación y visualización a gran escala son temas esenciales en áreas tan diferentes como la sociología, la física, el urbanismo, la capacitación, el entretenimiento, entre otros. Este tipo de sistemas requiere una gran capacidad de cómputo y recursos de memoria comúnmente disponibles en las plataformas de computo de alto rendimiento. Actualmente, los equipos más potentes tienen arquitecturas heterogéneas con cientos de miles e incluso millones de núcleos. Las tendencias de la industria infieren que los equipos en la era exascale tendran miles de millones. Los desafíos técnicos en el proceso de simulación y visualización en la era exascale se entrelazan con dificultades en otras áreas de investigación, incluidos almacenamiento, comunicación, modelos de programación y hardware. Por esta razón, es necesario crear prototipos, probar y desplegar una variedad de enfoques para abordar los desafíos técnicos identificados y evaluar las ventajas y desventajas de cada solución propuesta. El foco de esta investigación es la visualización y simulación interactiva de multitudes a gran escala. Aprovechar al máximo la capacidad de la infraestructura actual y estar preparado para aprovechar la próxima generación. El proyecto desarrolla un nuevo enfoque para escalar la simulación y visualización de multitudes en un clúster de computo heterogéneo utilizando una técnica basada en tareas. Su principal característica es que es hardware agnóstico. Abstrae las dificultades que implican el uso de arquitecturas heterogéneas como la administración de memoria, las comunicaciones y la sincronización, lo que facilita el desarrollo, el mantenimiento y la escalabilidad. Con el objetivo de flexibilizar y aprovechar los recursos informáticos lo mejor posible, el proyecto explora diferentes configuraciones para conectar la simulación con el motor de visualización. Este tipo de sistemas tienen un uso esencial en emergencias. Por lo tanto, se implementaron escenas urbanas lo más realistas posible, de esta manera los usuarios estarán listos para enfrentar eventos reales. La planificación de caminos para multitudes a gran escala es un desafío a resolver, debido al dinamismo inherente en las escenas y el vasto espacio de búsqueda. Se desarrolló un nuevo algoritmo de búsqueda de caminos. Tiene un enfoque jerárquico que ofrece diferentes ventajas: divide el espacio de búsqueda reduciendo la complejidad del problema, puede obtener una ruta parcial en lugar de esperar a la completa, lo que permite que un personaje comience a moverse y calcule el resto de forma asíncrona, puede reprocesar solo una parte si es necesario con diferentes niveles de abstracción. Se presenta un caso de estudio para una simulación de multitud en escenarios urbanos. Se utilizan datos geolocalizados producidos por dispositivos móviles para predecir el comportamiento individual y público y detectar situaciones anormales en presencia de eventos específicos. También se aborda el desafío de combinar la ubicación de todos estos individuos con una representación 3D del entorno urbano. Dentro del proyecto, se desarrollaron nuevos modelos de comportamiento basados ¿¿en el análisis de datos. Se creo la infraestructura para poder consultar varias fuentes de datos como redes sociales, agencias gubernamentales o empresas de transporte como Uber. Cada vez hay más datos de geolocalización disponibles y mejores recursos de cómputo que permiten realizar un análisis de mayor profundidad, esto sienta las bases para mejorar los modelos de simulación de las multitudes actuales. El uso de simulaciones y su visualización permite observar y organizar las multitudes en tiempo real. El análisis antes, durante y después de eventos multitudinarios diarios puede reducir los riesgos y los costos logísticos asociadosPostprint (published version

    Electric Vehicle Battery Module Dismantling "Analysis and Evaluation of Robotic Dismantling Techniques for Irre- versible Fasteners, including Object Detection of Components."

    Get PDF
    This thesis examines a study of The Litium-Ion Battery (LIB) from a electric vehicle, and it’s recycling processes. A Battery Module (BM) from the LIB is shredded when considered an End-Of-Life product, and motivates for automated dismantling concepts to separate the components to save raw materials. From State-of-the-art (SoA) research projects and background theory, automatic module dis- mantling concepts have been evaluated for a Volkswagen E-Golf 2019 battery module. The presence of irreversible fasteners make the use off destructive dismantling techniques neces- sary. This study evaluates two different concepts to disconnect laser welds holding together the compressive plates made of steel. A hydraulic actuated concept is first investigated to separate the welded compressive plates within the casing. A FEM analysis with different configurations is performed to evaluate the most effective hydraulic solution when analysing the Von Mises stress. This solution is further compared with another automatic dismantling concept, namely milling. For the purpose of an automated milling concept, manipulators from ABB are assessed and the feasibility is verified based on results from manual milling operation. The proposed dismantling operation is made possible by developing a system architecture combining robotic control and computer vision. Open source software based on Robot Op- erating System (ROS) and MoveIt connect and control an ABB IRB4400 industrial robot whereas the computer vision setup involves a cutting edge 3D camera, Zivid, and object detection algorithm YOLOv5 best suited for this task. Adjustable acquisition settings in services from Zivid’s ROS driver are tested to accomplish the optimal capture configuration. Two datasets generated with RoboFlow were exported in the YOLOv5 PyTorch format. Custom object detection models with annotated components from the BM was trained and tested with image captures. All in all, this study demonstrates that the automatic dismantling of battery modules can be achieved even though they include irreversible fasteners. The proposed methods are verified on a specific battery module (Egolf 2019) but are flexible enough to be easily extended to a large variety of EV battery modules

    Proceedings, MSVSCC 2016

    Get PDF
    Proceedings of the 10th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 14, 2016 at VMASC in Suffolk, Virginia

    Will they buy?

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 127-137).The proliferation of inexpensive video recording hardware and enormous storage capacity has enabled the collection of retail customer behavior at an unprecedented scale. The vast majority of this data is used for theft prevention and never used to better understand the customer. In what ways can this huge corpus be leveraged to improve the experience of customer and the performance of the store? This thesis presents MIMIC, a system that processes video captured in a retail store into predictions about customer proclivity to purchase. MIMIC relies on the observation that aggregate patterns of all of a store's patrons-the gestalt-captures behavior indicative of an imminent transaction. Video is distilled into a homogenous feature vector that captures the activity distribution by first tracking the locations of customers, then discretizing their movements into a feature vector using a collection of functional locations-areas of the store relevant to the tasks of patrons and employees. A time series of these feature vectors can then be classified as predictive-of-transaction using a Hidden Markov Model. MIMIc is evaluated on a small operational retail store located in the Mall of America near Minneapolis, Minnesota. Its performance is characterized across a wide cross-section of the model's parameters. Through manipulation of the training data supplied to MiMic, the behavior of customers in the store can be examined at fine levels of detail without foregoing the potential afforded by big data. MIMIC enables a suite of valuable tools. For ethnographic researchers, it offers a technique for identifying key moments in hundreds or thousands of hours of raw video. Retail managers gain a fine-grained metric to evaluate the performance of their stores, and interior designers acquire a critical component in a store layout optimization framework.by Rony Daniel Kubat.Ph.D

    A Pipeline of 3D Scene Reconstruction from Point Clouds

    Get PDF
    3D technologies are becoming increasingly popular as their applications in industrial, consumer, entertainment, healthcare, education, and governmental increase in number. According to market predictions, the total 3D modeling and mapping market is expected to grow from 1.1billionin2013to1.1 billion in 2013 to 7.7 billion by 2018. Thus, 3D modeling techniques for different data sources are urgently needed. This thesis addresses techniques for automated point cloud classification and the reconstruction of 3D scenes (including terrain models, 3D buildings and 3D road networks). First, georeferenced binary image processing techniques were developed for various point cloud classifications. Second, robust methods for the pipeline from the original point cloud to 3D model construction were proposed. Third, the reconstruction for the levels of detail (LoDs) of 1-3 (CityGML website) of 3D models was demonstrated. Fourth, different data sources for 3D model reconstruction were studied. The strengths and weaknesses of using the different data sources were addressed. Mobile laser scanning (MLS), unmanned aerial vehicle (UAV) images, airborne laser scanning (ALS), and the Finnish National Land Survey’s open geospatial data sources e.g. a topographic database, were employed as test data. Among these data sources, MLS data from three different systems were explored, and three different densities of ALS point clouds (0.8, 8 and 50 points/m2) were studied. The results were compared with reference data such as an orthophoto with a ground sample distance of 20cm or measured reference points from existing software to evaluate their quality. The results showed that 74.6% of building roofs were reconstructed with the automated process. The resulting building models provided an average height deviation of 15 cm. A total of 6% of model points had a greater than one-pixel deviation from laser points. A total of 2.5% had a deviation of greater than two pixels. The pixel size was determined by the average distance of input laser points. The 3D roads were reconstructed with an average width deviation of 22 cm and an average height deviation of 14 cm. The results demonstrated that 93.4% of building roofs were correctly classified from sparse ALS and that 93.3% of power line points are detected from the six sets of dense ALS data located in forested areas. This study demonstrates the operability of 3D model construction for LoDs of 1-3 via the proposed methodologies and datasets. The study is beneficial to future applications, such as 3D-model-based navigation applications, the updating of 2D topographic databases into 3D maps and rapid, large-area 3D scene reconstruction. 3D-teknologiat ovat tulleet yhä suositummiksi niiden sovellusalojen lisääntyessä teollisuudessa, kuluttajatuotteissa, terveydenhuollossa, koulutuksessa ja hallinnossa. Ennusteiden mukaan 3D-mallinnus- ja -kartoitusmarkkinat kasvavat vuoden 2013 1,1 miljardista dollarista 7,7 miljardiin vuoteen 2018 mennessä. Erilaisia aineistoja käyttäviä 3D-mallinnustekniikoita tarvitaankin yhä enemmän. Tässä väitöskirjatutkimuksessa kehitettiin automaattisen pistepilviaineiston luokittelutekniikoita ja rekonstruoitiin 3D-ympäristöja (maanpintamalleja, rakennuksia ja tieverkkoja). Georeferoitujen binääristen kuvien prosessointitekniikoita kehitettiin useiden pilvipisteaineistojen luokitteluun. Työssä esitetään robusteja menetelmiä alkuperäisestä pistepilvestä 3D-malliin eri CityGML-standardin tarkkuustasoilla. Myös eri aineistolähteitä 3D-mallien rekonstruointiin tutkittiin. Eri aineistolähteiden käytön heikkoudet ja vahvuudet analysoitiin. Testiaineistona käytettiin liikkuvalla keilauksella (mobile laser scanning, MLS) ja ilmakeilauksella (airborne laser scanning, ALS) saatua laserkeilausaineistoja, miehittämättömillä lennokeilla (unmanned aerial vehicle, UAV) otettuja kuvia sekä Maanmittauslaitoksen avoimia aineistoja, kuten maastotietokantaa. Liikkuvalla laserkeilauksella kerätyn aineiston osalta tutkimuksessa käytettiin kolmella eri järjestelmällä saatua dataa, ja kolmen eri tarkkuustason (0,8, 8 ja 50 pistettä/m2) ilmalaserkeilausaineistoa. Tutkimuksessa saatuja tulosten laatua arvioitiin vertaamalla niitä referenssiaineistoon, jona käytettiin ortokuvia (GSD 20cm) ja nykyisissä ohjelmistoissa olevia mitattuja referenssipisteitä. 74,6 % rakennusten katoista saatiin rekonstruoitua automaattisella prosessilla. Rakennusmallien korkeuksien keskipoikkeama oli 15 cm. 6 %:lla mallin pisteistä oli yli yhden pikselin poikkeama laseraineiston pisteisiin verrattuna. 2,5 %:lla oli yli kahden pikselin poikkeama. Pikselikoko määriteltiin kahden laserpisteen välimatkan keskiarvona. Rekonstruoitujen teiden leveyden keskipoikkeama oli 22 cm ja korkeuden keskipoikkeama oli 14 cm. Tulokset osoittavat että 93,4 % rakennuksista saatiin luokiteltua oikein harvasta ilmalaserkeilausaineistosta ja 93,3 % sähköjohdoista saatiin havaittua kuudesta tiheästä metsäalueen ilmalaserkeilausaineistosta. Tutkimus demonstroi 3D-mallin konstruktion toimivuutta tarkkuustasoilla (LoD) 1-3 esitetyillä menetelmillä ja aineistoilla. Tulokset ovat hyödyllisiä kehitettäessä tulevaisuuden sovelluksia, kuten 3D-malleihin perustuvia navigointisovelluksia, topografisten 2D-karttojen ajantasaistamista 3D-kartoiksi, ja nopeaa suurten alueiden 3D-ympäristöjen rekonstruktiota

    Play Among Books

    Get PDF
    How does coding change the way we think about architecture? Miro Roman and his AI Alice_ch3n81 develop a playful scenario in which they propose coding as the new literacy of information. They convey knowledge in the form of a project model that links the fields of architecture and information through two interwoven narrative strands in an “infinite flow” of real books

    Modelling socio-spatial dynamics from real-time data

    Get PDF
    This thesis introduces a framework for modelling the social dynamic of an urban landscape from multiple and disparate real-time datasets. It seeks to bridge the gap between artificial simulations of human behaviour and periodic real-world observations. The approach is data-intensive, adopting open-source programmatic and visual analytics. The result is a framework that can rapidly produce contextual insights from samples of real-world human activity – behavioural data traces. The framework can be adopted standalone or integrated with other models to produce a more comprehensive understanding of people-place experiences and how context affects behaviour. The research is interdisciplinary. It applies emerging techniques in cognitive and spatial data sciences to extract and analyse latent information from behavioural data traces located in space and time. Three sources are evaluated: mobile device connectivity to a public Wi-Fi network, readings emitted by an installed mobile app, and volunteered status updates. The outcome is a framework that can sample data about real-world activities at street-level and reveal contextual variations in people-place experiences, from cultural and seasonal conditions that create the ‘social heartbeat’ of a landscape to the arrhythmic impact of abnormal events. By continuously or frequently sampling reality, the framework can become self-calibrating, adapting to developments in land-use potential and cultural influences over time. It also enables ‘opportunistic’ geographic information science: the study of unexpected real-world phenomena as and when they occur. The novel contribution of this thesis is to demonstrate the need to improve understanding of and theories about human-environment interactions by incorporating context-specific learning into urban models of behaviour. The framework presents an alternative to abstract generalisations by revealing the variability of human behaviour in public open spaces, where conditions are uncertain and changeable. It offers the potential to create a closer representation of reality and anticipate or recommend behaviour change in response to conditions as they emerge

    Automatic movie analysis and summarisation

    Get PDF
    Automatic movie analysis is the task of employing Machine Learning methods to the field of screenplays, movie scripts, and motion pictures to facilitate or enable various tasks throughout the entirety of a movie’s life-cycle. From helping with making informed decisions about a new movie script with respect to aspects such as its originality, similarity to other movies, or even commercial viability, all the way to offering consumers new and interesting ways of viewing the final movie, many stages in the life-cycle of a movie stand to benefit from Machine Learning techniques that promise to reduce human effort, time, or both. Within this field of automatic movie analysis, this thesis addresses the task of summarising the content of screenplays, enabling users at any stage to gain a broad understanding of a movie from greatly reduced data. The contributions of this thesis are four-fold: (i)We introduce ScriptBase, a new large-scale data set of original movie scripts, annotated with additional meta-information such as genre and plot tags, cast information, and log- and tag-lines. To our knowledge, Script- Base is the largest data set of its kind, containing scripts and information for almost 1,000 Hollywood movies. (ii) We present a dynamic summarisation model for the screenplay domain, which allows for extraction of highly informative and important scenes from movie scripts. The extracted summaries allow for the content of the original script to stay largely intact and provide the user with its important parts, while greatly reducing the script-reading time. (iii) We extend our summarisation model to capture additional modalities beyond the screenplay text. The model is rendered multi-modal by introducing visual information obtained from the actual movie and by extracting scenes from the movie, allowing users to generate visual summaries of motion pictures. (iv) We devise a novel end-to-end neural network model for generating natural language screenplay overviews. This model enables the user to generate short descriptive and informative texts that capture certain aspects of a movie script, such as its genres, approximate content, or style, allowing them to gain a fast, high-level understanding of the screenplay. Multiple automatic and human evaluations were carried out to assess the performance of our models, demonstrating that they are well-suited for the tasks set out in this thesis, outperforming strong baselines. Furthermore, the ScriptBase data set has started to gain traction, and is currently used by a number of other researchers in the field to tackle various tasks relating to screenplays and their analysis
    corecore