127 research outputs found

    HUMAN ACTIVITY RECOGNITION FROM EGOCENTRIC VIDEOS AND ROBUSTNESS ANALYSIS OF DEEP NEURAL NETWORKS

    Get PDF
    In recent years, there has been significant amount of research work on human activity classification relying either on Inertial Measurement Unit (IMU) data or data from static cameras providing a third-person view. There has been relatively less work using wearable cameras, providing egocentric view, which is a first-person view providing the view of the environment as seen by the wearer. Using only IMU data limits the variety and complexity of the activities that can be detected. Deep machine learning has achieved great success in image and video processing in recent years. Neural network based models provide improved accuracy in multiple fields in computer vision. However, there has been relatively less work focusing on designing specific models to improve the performance of egocentric image/video tasks. As deep neural networks keep improving the accuracy in computer vision tasks, the robustness and resilience of the networks should be improved as well to make it possible to be applied in safety-crucial areas such as autonomous driving. Motivated by these considerations, in the first part of the thesis, the problem of human activity detection and classification from egocentric cameras is addressed. First, anew method is presented to count the number of footsteps and compute the total traveled distance by using the data from the IMU sensors and camera of a smart phone. By incorporating data from multiple sensor modalities, and calculating the length of each step, instead of using preset stride lengths and assuming equal-length steps, the proposed method provides much higher accuracy compared to commercially available step counting apps. After the application of footstep counting, more complicated human activities, such as steps of preparing a recipe and sitting on a sofa, are taken into consideration. Multiple classification methods, non-deep learning and deep-learning-based, are presented, which employ both ego-centric camera and IMU data. Then, a Genetic Algorithm-based approach is employed to set the parameters of an activity classification network autonomously and performance is compared with empirically-set parameters. Then, a new framework is introduced to reduce the computational cost of human temporal activity recognition from egocentric videos while maintaining the accuracy at a comparable level. The actor-critic model of reinforcement learning is applied to optical flow data to locate a bounding box around region of interest, which is then used for clipping a sub-image from a video frame. A shallow and deeper 3D convolutional neural network is designed to process the original image and the clipped image region, respectively.Next, a systematic method is introduced that autonomously and simultaneously optimizes multiple parameters of any deep neural network by using a bi-generative adversarial network (Bi-GAN) guiding a genetic algorithm(GA). The proposed Bi-GAN allows the autonomous exploitation and choice of the number of neurons for the fully-connected layers, and number of filters for the convolutional layers, from a large range of values. The Bi-GAN involves two generators, and two different models compete and improve each other progressively with a GAN-based strategy to optimize the networks during a GA evolution.In this analysis, three different neural network layers and datasets are taken into consideration: First, 3D convolutional layers for ModelNet40 dataset. We applied the proposed approach on a 3D convolutional network by using the ModelNet40 dataset. ModelNet is a dataset of 3D point clouds. The goal is to perform shape classification over 40shape classes. LSTM layers for UCI HAR dataset. UCI HAR dataset is composed of InertialMeasurement Unit (IMU) data captured during activities of standing, sitting, laying, walking, walking upstairs and walking downstairs. These activities were performed by 30 subjects, and the 3-axial linear acceleration and 3-axial angular velocity were collected at a constant rate of 50Hz. 2D convolutional layers for Chars74k Dataset. Chars74k dataset contains 64 classes(0-9, A-Z, a-z), 7705 characters obtained from natural images, 3410 hand-drawn characters using a tablet PC and 62992 synthesised characters from computer fonts giving a total of over 74K images. In the final part of the thesis, network robustness and resilience for neural network models is investigated from adversarial examples (AEs) and automatic driving conditions. The transferability of adversarial examples across a wide range of real-world computer vision tasks, including image classification, explicit content detection, optical character recognition(OCR), and object detection are investigated. It represents the cybercriminal’s situation where an ensemble of different detection mechanisms need to be evaded all at once.Novel dispersion Reduction(DR) attack is designed, which is a practical attack that overcomes existing attacks’ limitation of requiring task-specific loss functions by targeting on the “dispersion” of internal feature map. In the autonomous driving scenario, the adversarial machine learning attacks against the complete visual perception pipeline in autonomous driving is studied. A novel attack technique, tracker hijacking, that can effectively fool Multi-Object Tracking (MOT) using AEs on object detection is presented. Using this technique, successful AEs on as few as one single frame can move an existing object in to or out of the headway of an autonomous vehicle to cause potential safety hazards

    Machine learning for the automation and optimisation of optical coordinate measurement

    Get PDF
    Camera based methods for optical coordinate metrology are growing in popularity due to their non-contact probing technique, fast data acquisition time, high point density and high surface coverage. However, these optical approaches are often highly user dependent, have high dependence on accurate system characterisation, and can be slow in processing the raw data acquired during measurement. Machine learning approaches have the potential to remedy the shortcomings of such optical coordinate measurement systems. The aim of this thesis is to remove dependence on the user entirely by enabling full automation and optimisation of optical coordinate measurements for the first time. A novel software pipeline is proposed, built, and evaluated which will enable automated and optimised measurements to be conducted. No such automated and optimised system for performing optical coordinate measurements currently exists. The pipeline can be roughly summarised as follows: intelligent characterisation -> view planning -> object pose estimation -> automated data acquisition -> optimised reconstruction. Several novel methods were developed in order to enable the embodiment of this pipeline. Chapter 4 presents an intelligent camera characterisation (the process of determining a mathematical model of the optical system) is performed using a hybrid approach wherein an EfficientNet convolutional neural network provides sub-pixel corrections to feature locations provided by the popular OpenCV library. The proposed characterisation scheme is shown to robustly refine the characterisation result as quantified by a 50 % reduction in the mean residual magnitude. The camera characterisation is performed before measurements are performed and the results are fed as an input to the pipeline. Chapter 5 presents a novel genetic optimisation approach is presented to create an imaging strategy, ie. the positions from which data should be captured relative to part’s specific geometry. This approach exploits the computer aided design (CAD) data of a given part, ensuring any measurement is optimal given a specific target geometry. This view planning approach is shown to give reconstructions with closer agreement to tactile coordinate measurement machine (CMM) results from 18 images compared to unoptimised measurements using 60 images. This view planning algorithm assumes the part is perfectly placed in the centre of the measurement volume so is first adjusted for an arbitrary placement of the part before being used for data acquistion. Chapter 6 presents a generative model for the creation of surface texture data is presented, allowing the generation of synthetic butt realistic datasets for the training of statistical models. The surface texture generated by the proposed model is shown to be quantitatively representative of real focus variation microscope measurements. The model developed in this chapter is used to produce large synthetic but realistic datasets for the training of further statistical models. Chapter 7 presents an autonomous background removal approach is proposed which removes superfluous data from images captured during a measurement. Using images processed by this algorithm to reconstruct a 3D measurement of an object is shown to be effective in reducing data processing times and improving measurement results. Use the proposed background removal on images before reconstruction are shown to benefit from up to a 41 % reduction in data processing times, a reduction in superfluous background points of up to 98 %, an increase in point density on the object surface of up to 10 %, and an improved agreement with CMM as measured by both a reduction in outliers and reduction in the standard deviation of point to mesh distances of up to 51 microns. The background removal algorithm is used to both improve the final reconstruction and within stereo pose estimation. Finally, in Chapter 8, two methods (one monocular and one stereo) for establishing the initial pose of the part to be measured relative to the measurement volume are presented. This is an important step to enabling automation as it allows the user to place the object at an arbitrary location in the measurement volume and for the pipeline to adjust the imaging strategy to account for this placement, enabling the optimised view plan to be carried out without the need for special part fixturing. It is shown that the monocular method can locate a part to within an average of 13 mm and the stereo method can locate apart to within an average of 0.44 mm as evaluated on 240 test images. Pose estimation is used to provide a correction to the view plan for an arbitrary part placement without the need for specialised fixturing or fiducial marking. This pipeline enables an inexperienced user to place a part anywhere in the measurement volume of a system and, from the part’s associated CAD data, the system will perform an optimal measurement without the need for any user input. Each new method which was developed as part of this pipeline has been validated against real experimental data from current measurement systems and shown to be effective. In future work given in Section 9.1, a possible hardware integration of the methods developed in this thesis is presented. Although the creation of this hardware is beyond the scope of this thesis

    Perception and Navigation in Autonomous Systems in the Era of Learning: A Survey

    Full text link
    Autonomous systems possess the features of inferring their own state, understanding their surroundings, and performing autonomous navigation. With the applications of learning systems, like deep learning and reinforcement learning, the visual-based self-state estimation, environment perception and navigation capabilities of autonomous systems have been efficiently addressed, and many new learning-based algorithms have surfaced with respect to autonomous visual perception and navigation. In this review, we focus on the applications of learning-based monocular approaches in ego-motion perception, environment perception and navigation in autonomous systems, which is different from previous reviews that discussed traditional methods. First, we delineate the shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions, which demonstrate the necessity to integrate deep learning techniques. Second, we review the visual-based environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation, monocular ego-motion prediction, image enhancement, object detection, semantic segmentation, and their combinations with traditional vSLAM frameworks. Then, we focus on the visual navigation based on learning systems, mainly including reinforcement learning and deep reinforcement learning. Finally, we examine several challenges and promising directions discussed and concluded in related research of learning systems in the era of computer science and robotics.Comment: This paper has been accepted by IEEE TNNL

    Machine learning for the automation and optimisation of optical coordinate measurement

    Get PDF
    Camera based methods for optical coordinate metrology are growing in popularity due to their non-contact probing technique, fast data acquisition time, high point density and high surface coverage. However, these optical approaches are often highly user dependent, have high dependence on accurate system characterisation, and can be slow in processing the raw data acquired during measurement. Machine learning approaches have the potential to remedy the shortcomings of such optical coordinate measurement systems. The aim of this thesis is to remove dependence on the user entirely by enabling full automation and optimisation of optical coordinate measurements for the first time. A novel software pipeline is proposed, built, and evaluated which will enable automated and optimised measurements to be conducted. No such automated and optimised system for performing optical coordinate measurements currently exists. The pipeline can be roughly summarised as follows: intelligent characterisation -> view planning -> object pose estimation -> automated data acquisition -> optimised reconstruction. Several novel methods were developed in order to enable the embodiment of this pipeline. Chapter 4 presents an intelligent camera characterisation (the process of determining a mathematical model of the optical system) is performed using a hybrid approach wherein an EfficientNet convolutional neural network provides sub-pixel corrections to feature locations provided by the popular OpenCV library. The proposed characterisation scheme is shown to robustly refine the characterisation result as quantified by a 50 % reduction in the mean residual magnitude. The camera characterisation is performed before measurements are performed and the results are fed as an input to the pipeline. Chapter 5 presents a novel genetic optimisation approach is presented to create an imaging strategy, ie. the positions from which data should be captured relative to part’s specific geometry. This approach exploits the computer aided design (CAD) data of a given part, ensuring any measurement is optimal given a specific target geometry. This view planning approach is shown to give reconstructions with closer agreement to tactile coordinate measurement machine (CMM) results from 18 images compared to unoptimised measurements using 60 images. This view planning algorithm assumes the part is perfectly placed in the centre of the measurement volume so is first adjusted for an arbitrary placement of the part before being used for data acquistion. Chapter 6 presents a generative model for the creation of surface texture data is presented, allowing the generation of synthetic butt realistic datasets for the training of statistical models. The surface texture generated by the proposed model is shown to be quantitatively representative of real focus variation microscope measurements. The model developed in this chapter is used to produce large synthetic but realistic datasets for the training of further statistical models. Chapter 7 presents an autonomous background removal approach is proposed which removes superfluous data from images captured during a measurement. Using images processed by this algorithm to reconstruct a 3D measurement of an object is shown to be effective in reducing data processing times and improving measurement results. Use the proposed background removal on images before reconstruction are shown to benefit from up to a 41 % reduction in data processing times, a reduction in superfluous background points of up to 98 %, an increase in point density on the object surface of up to 10 %, and an improved agreement with CMM as measured by both a reduction in outliers and reduction in the standard deviation of point to mesh distances of up to 51 microns. The background removal algorithm is used to both improve the final reconstruction and within stereo pose estimation. Finally, in Chapter 8, two methods (one monocular and one stereo) for establishing the initial pose of the part to be measured relative to the measurement volume are presented. This is an important step to enabling automation as it allows the user to place the object at an arbitrary location in the measurement volume and for the pipeline to adjust the imaging strategy to account for this placement, enabling the optimised view plan to be carried out without the need for special part fixturing. It is shown that the monocular method can locate a part to within an average of 13 mm and the stereo method can locate apart to within an average of 0.44 mm as evaluated on 240 test images. Pose estimation is used to provide a correction to the view plan for an arbitrary part placement without the need for specialised fixturing or fiducial marking. This pipeline enables an inexperienced user to place a part anywhere in the measurement volume of a system and, from the part’s associated CAD data, the system will perform an optimal measurement without the need for any user input. Each new method which was developed as part of this pipeline has been validated against real experimental data from current measurement systems and shown to be effective. In future work given in Section 9.1, a possible hardware integration of the methods developed in this thesis is presented. Although the creation of this hardware is beyond the scope of this thesis

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Searching for Sentient Design Tools for Game Development

    Get PDF

    Revolutionizing Dental Caries Diagnosis through Artificial Intelligence

    Get PDF
    The diagnosis and management of dental caries, a prevalent global oral health issue, have traditionally depended on clinical examination and the interpretation of radiographic images. However, with the rapid advancements in technology, the landscape of dental diagnostics is transforming. This chapter delves into the revolutionary impact of artificial intelligence (AI) on detecting and managing dental caries. Dental professionals can now achieve enhanced diagnostic accuracy by harnessing the power of machine learning algorithms and image recognition technologies, even identifying early-stage caries that conventional methods might overlook. The integration of AI into dentistry not only promises improved patient outcomes by facilitating timely interventions and streamlining clinical workflows, potentially redefining the future of oral healthcare. While the prospects are promising, it is imperative to concurrently address the challenges and ethical considerations accompanying AI-driven diagnostics to ensure that the technology augments, rather than supplants, the expertise of dental professionals. The chapter serves as a comprehensive overview of the current state of AI in dental caries diagnosis, its potential benefits, and the road ahead
    • …
    corecore