39 research outputs found

    OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

    Full text link
    This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to the core breakthrough, it introduces a cohesive data structure proficient in processing and merging multimodal data inputs, which include video, audio, and text. Our crafted algorithm leverages advancements across multiple operations such as video/image caption extraction, dense caption extraction, Automatic Speech Recognition (ASR), Optical Character Recognition (OCR), Recognize Anything Model(RAM), and object tracking. OmniDataComposer is capable of identifying over 6400 categories of objects, substantially broadening the spectrum of visual information. It amalgamates these diverse modalities, promoting reciprocal enhancement among modalities and facilitating cross-modal data correction. \textbf{The final output metamorphoses each video input into an elaborate sequential document}, virtually transmuting videos into thorough narratives, making them easier to be processed by large language models. Future prospects include optimizing datasets for each modality to encourage unlimited data generation. This robust base will offer priceless insights to models like ChatGPT, enabling them to create higher quality datasets for video captioning and easing question-answering tasks based on video content. OmniDataComposer inaugurates a new stage in multimodal learning, imparting enormous potential for augmenting AI's understanding and generation of complex, real-world data

    Vehicular Instrumentation and Data Processing for the Study of Driver Intent

    Get PDF
    The primary goal of this thesis is to provide processed experimental data needed to determine whether driver intentionality and driving-related actions can be predicted from quantitative and qualitative analysis of driver behaviour. Towards this end, an instrumented experimental vehicle capable of recording several synchronized streams of data from the surroundings of the vehicle, the driver gaze with head pose and the vehicle state in a naturalistic driving environment was designed and developed. Several driving data sequences in both urban and rural environments were recorded with the instrumented vehicle. These sequences were automatically annotated for relevant artifacts such as lanes, vehicles and safely driveable areas within road lanes. A framework and associated algorithms required for cross-calibrating the gaze tracking system with the world coordinate system mounted on the outdoor stereo system was also designed and implemented, allowing the mapping of the driver gaze with the surrounding environment. This instrumentation is currently being used for the study of driver intent, geared towards the development of driver maneuver prediction models

    Dynamic Data Assimilation

    Get PDF
    Data assimilation is a process of fusing data with a model for the singular purpose of estimating unknown variables. It can be used, for example, to predict the evolution of the atmosphere at a given point and time. This book examines data assimilation methods including Kalman filtering, artificial intelligence, neural networks, machine learning, and cognitive computing

    Non-Invasive Data Acquisition and IoT Solution for Human Vital Signs Monitoring: Applications, Limitations and Future Prospects

    Get PDF
    The rapid development of technology has brought about a revolution in healthcare stimulating a wide range of smart and autonomous applications in homes, clinics, surgeries and hospitals. Smart healthcare opens the opportunity for a qualitative advance in the relations between healthcare providers and end-users for the provision of healthcare such as enabling doctors to diagnose remotely while optimizing the accuracy of the diagnosis and maximizing the benefits of treatment by enabling close patient monitoring. This paper presents a comprehensive review of non-invasive vital data acquisition and the Internet of Things in healthcare informatics and thus reports the challenges in healthcare informatics and suggests future work that would lead to solutions to address the open challenges in IoT and non-invasive vital data acquisition. In particular, the conducted review has revealed that there has been a daunting challenge in the development of multi-frequency vital IoT systems, and addressing this issue will help enable the vital IoT node to be reachable by the broker in multiple area ranges. Furthermore, the utilization of multi-camera systems has proven its high potential to increase the accuracy of vital data acquisition, but the implementation of such systems has not been fully developed with unfilled gaps to be bridged. Moreover, the application of deep learning to the real-time analysis of vital data on the node/edge side will enable optimal, instant offline decision making. Finally, the synergistic integration of reliable power management and energy harvesting systems into non-invasive data acquisition has been omitted so far, and the successful implementation of such systems will lead to a smart, robust, sustainable and self-powered healthcare system

    3D facial performance capture from monocular RGB video.

    Get PDF
    3D facial performance capture is an essential technique for animation production in featured films, video gaming, human computer interaction, VR/AR asset creation and digital heritage, which all have huge impact on our daily life. Traditionally, dedicated hardware such as depth sensors, laser scanners and camera arrays have been developed to acquire depth information for such purpose. However, such sophisticated instruments can only be operated by trained professionals. In recent years, the wide spread availability of mobile devices, and the increased interest of casual untrained users in applications such as image, video editing, virtual and facial model creation, have sparked interest in 3D facial reconstruction from 2D RGB input. Due to the depth ambiguity and facial appearance variation, 3D facial performance capture and modelling from 2D images are inherently ill-posed problems. However, with strong prior knowledge of the human face, it is possible to accurately infer the true 3D facial shape and performance from multiple observations captured with different viewing angles. Various 3D from 2D methods have been proposed and proven to work well in controlled environments. Nevertheless there are still many unexplored issues in uncontrolled in-the-wild environments. In order to achieve the same level of performance in controlled environments, interfering factors in uncontrolled environments such as varying illumination, partial occlusion and facial variation not captured by prior knowledge would require the development of new techniques. This thesis addresses existing challenges and proposes novel methods involving 2D landmark detection, 3D facial reconstruction and 3D performance tracking, which are validated through theoretical research and experimental studies. 3D facial performance tracking is a multidisciplinary problem involving many areas such as computer vision, computer graphics and machine learning. To deal with the large variations within a single image, we present new machine learning techniques for facial landmark detection based on our observation of the facial features in challenging scenarios to increase the robustness. To take advantage of the evidence aggregated from multiple observations, we present new robust and efficient optimisation techniques that impose consistency constrains that help filter out outliers. To exploit the person-specific model generation, temporal and spatial coherence in continuous video input, we present new methods to improve the performance via optimisation. In order to track the 3D facial performance, the fundamental prerequisite for good results is the accurate underlying 3D model of the actor. In this thesis, we present new methods that are targeted at 3D facial geometry reconstruction, which are more efficient than existing generic 3D geometry reconstruction methods. Evaluation and validation were obtained and analysed from substantial experiment, which shows the proposed methods in this thesis outperform the state-of-the-art methods and enable us to generate high quality results with less constraints

    Ocular motion classification for mobile device presentation attack detection

    Get PDF
    Title from PDF of title page viewed February 25, 2021Dissertation advisor: Reza DerakhshanVitaIncludes bibliographical references (page 105-129)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2020As a practical pursuit of quantified uniqueness, biometrics explores the parameters that make us who we are and provides the tools we need to secure the integrity of that identity. In our culture of constant connectivity, an increasing reliance on biometrically secured mobile devices is transforming them into a target for bad actors. While no system will ever prevent all forms of intrusion, even state of the art biometric methods remain vulnerable to spoof attacks. As these attacks become more sophisticated, ocular motion based presentation attack detection (PAD) methods provide a potential deterrent. This dissertation presents the methods and evaluation of a novel optokinetic nystagmus (OKN) based PAD system for mobile device applications which leverages phase-locked temporal features of a unique reflexive behavioral response. Background is provided for historical and literary context of eye motion and ocular tracking to provide context to the objectives and accomplishments of this work. An evaluation of the improved methods for sample processing and sequential stability is provided with highlights for the presented improvements to the stability of convolutional facial landmark localization, and automated spatiotemporal feature extraction and classification models. Insights gleaned from this work are provided to elucidate some of the major challenges of mobile ocular motion feature extraction, as well as additional future considerations for the refinement and application of OKN motion signatures as a novel mobile device based PAD method.Introduction -- Retrospective, Contextual and Contemporary analysis -- Experimental Design -- Methods and Results -- Discussion -- Conclusion

    Enabling Multi-LiDAR Sensing in GNSS-Denied Environments: SLAM Dataset, Benchmark, and UAV Tracking with LiDAR-as-a-camera

    Get PDF
    The rise of Light Detection and Ranging (LiDAR) sensors has profoundly impacted industries ranging from automotive to urban planning. As these sensors become increasingly affordable and compact, their applications are diversifying, driving precision, and innovation. This thesis delves into LiDAR's advancements in autonomous robotic systems, with a focus on its role in simultaneous localization and mapping (SLAM) methodologies and LiDAR as a camera-based tracking for Unmanned Aerial Vehicles (UAV). Our contributions span two primary domains: the Multi-Modal LiDAR SLAM Benchmark, and the LiDAR-as-a-camera UAV Tracking. In the former, we have expanded our previous multi-modal LiDAR dataset by adding more data sequences from various scenarios. In contrast to the previous dataset, we employ different ground truth-generating approaches. We propose a new multi-modal multi-lidar SLAM-assisted and ICP-based sensor fusion method for generating ground truth maps. Additionally, we also supplement our data with new open road sequences with GNSS-RTK. This enriched dataset, supported by high-resolution LiDAR, provides detailed insights through an evaluation of ten configurations, pairing diverse LiDAR sensors with state-of-the-art SLAM algorithms. In the latter contribution, we leverage a custom YOLOv5 model trained on panoramic low-resolution images from LiDAR reflectivity (LiDAR-as-a-camera) to detect UAVs, demonstrating the superiority of this approach over point cloud or image-only methods. Additionally, we evaluated the real-time performance of our approach on the Nvidia Jetson Nano, a popular mobile computing platform. Overall, our research underscores the transformative potential of integrating advanced LiDAR sensors with autonomous robotics. By bridging the gaps between different technological approaches, we pave the way for more versatile and efficient applications in the future

    Human-vehicle collaborative driving to improve transportation safety

    Get PDF
    This dissertation proposes a collaborative driving framework which is based on the assessments of both internal and external risks involved in vehicle driving. The internal risk analysis includes driver drowsiness detection, driver distraction detection, and driver intention recognition which help us better understand the human driver's behavior. Steering wheel data and facial expression are used to detect the drowsiness. Images from a camera observing the driver are used to detect various types of driver distraction by using the deep learning approach. Hidden Markov Models (HMM) is implemented to recognize the driver's intention using the vehicle's laneposition, control and state data. For the external risk analysis, the co-pilot utilizes a Collision Avoidance System (CAS) to estimate the collision probability between the ego vehicle and other vehicles. Based on these two risk analyses, a novel collaborative driving scheme is proposed by fusing the control inputs from the human driver and the co-pilot to obtain the final control input for the vehicle under different circumstances. The proposed collaborative driving framework is validated in an Intelligent Transportation System (ITS) testbed which enables both autonomous and manual driving capabilities

    Human-Robot Collaborations in Industrial Automation

    Get PDF
    Technology is changing the manufacturing world. For example, sensors are being used to track inventories from the manufacturing floor up to a retail shelf or a customer’s door. These types of interconnected systems have been called the fourth industrial revolution, also known as Industry 4.0, and are projected to lower manufacturing costs. As industry moves toward these integrated technologies and lower costs, engineers will need to connect these systems via the Internet of Things (IoT). These engineers will also need to design how these connected systems interact with humans. The focus of this Special Issue is the smart sensors used in these human–robot collaborations
    corecore