6,186 research outputs found

    Enhancing robustness in video recognition models : Sparse adversarial attacks and beyond

    Get PDF
    Recent years have witnessed increasing interest in adversarial attacks on images, while adversarial video attacks have seldom been explored. In this paper, we propose a sparse adversarial attack strategy on videos (DeepSAVA). Our model aims to add a small human-imperceptible perturbation to the key frame of the input video to fool the classifiers. To carry out an effective attack that mirrors real-world scenarios, our algorithm integrates spatial transformation perturbations into the frame. Instead of using the norm to gauge the disparity between the perturbed frame and the original frame, we employ the structural similarity index (SSIM), which has been established as a more suitable metric for quantifying image alterations resulting from spatial perturbations. We employ a unified optimisation framework to combine spatial transformation with additive perturbation, thereby attaining a more potent attack. We design an effective and novel optimisation scheme that alternatively utilises Bayesian Optimisation (BO) to identify the most critical frame in a video and stochastic gradient descent (SGD) based optimisation to produce both additive and spatial-transformed perturbations. Doing so enables DeepSAVA to perform a very sparse attack on videos for maintaining human imperceptibility while still achieving state-of-the-art performance in terms of both attack success rate and adversarial transferability. Furthermore, built upon the strong perturbations produced by DeepSAVA, we design a novel adversarial training framework to improve the robustness of video classification models. Our intensive experiments on various types of deep neural networks and video datasets confirm the superiority of DeepSAVA in terms of attacking performance and efficiency. When compared to the baseline techniques, DeepSAVA exhibits the highest level of performance in generating adversarial videos for three distinct video classifiers. Remarkably, it achieves an impressive fooling rate ranging from 99.5% to 100% for the I3D model, with the perturbation of just a single frame. Additionally, DeepSAVA demonstrates favorable transferability across various time series models. The proposed adversarial training strategy is also empirically demonstrated with better performance on training robust video classifiers compared with the state-of-the-art adversarial training with projected gradient descent (PGD) adversary

    Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

    Get PDF
    Recent years have seen a tremendous growth in Artificial Intelligence (AI)-based methodological development in a broad range of domains. In this rapidly evolving field, large number of methods are being reported using machine learning (ML) and Deep Learning (DL) models. Majority of these models are inherently complex and lacks explanations of the decision making process causing these models to be termed as 'Black-Box'. One of the major bottlenecks to adopt such models in mission-critical application domains, such as banking, e-commerce, healthcare, and public services and safety, is the difficulty in interpreting them. Due to the rapid proleferation of these AI models, explaining their learning and decision making process are getting harder which require transparency and easy predictability. Aiming to collate the current state-of-the-art in interpreting the black-box models, this study provides a comprehensive analysis of the explainable AI (XAI) models. To reduce false negative and false positive outcomes of these back-box models, finding flaws in them is still difficult and inefficient. In this paper, the development of XAI is reviewed meticulously through careful selection and analysis of the current state-of-the-art of XAI research. It also provides a comprehensive and in-depth evaluation of the XAI frameworks and their efficacy to serve as a starting point of XAI for applied and theoretical researchers. Towards the end, it highlights emerging and critical issues pertaining to XAI research to showcase major, model-specific trends for better explanation, enhanced transparency, and improved prediction accuracy

    Sustainable Collaboration: Federated Learning for Environmentally Conscious Forest Fire Classification in Green Internet of Things (IoT)

    Get PDF
    Forests are an invaluable natural resource, playing a crucial role in the regulation of both local and global climate patterns. Additionally, they offer a plethora of benefits such as medicinal plants, food, and non-timber forest products. However, with the growing global population, the demand for forest resources has escalated, leading to a decline in their abundance. The reduction in forest density has detrimental impacts on global temperatures and raises the likelihood of forest fires. To address these challenges, this paper introduces a Federated Learning framework empowered by the Internet of Things (IoT). The proposed framework integrates with an Intelligent system, leveraging mounted cameras strategically positioned in highly vulnerable areas susceptible to forest fires. This integration enables the timely detection and monitoring of forest fire occurrences and plays its part in avoiding major catastrophes. The proposed framework incorporates the Federated Stochastic Gradient Descent (FedSGD) technique to aggregate the global model in the cloud. The dataset employed in this study comprises two classes: fire and non-fire images. This dataset is distributed among five nodes, allowing each node to independently train the model on their respective devices. Following the local training, the learned parameters are shared with the cloud for aggregation, ensuring a collective and comprehensive global model. The effectiveness of the proposed framework is assessed by comparing its performance metrics with the recent work. The proposed algorithm achieved an accuracy of 99.27 % and stands out by leveraging the concept of collaborative learning. This approach distributes the workload among nodes, relieving the server from excessive burden. Each node is empowered to obtain the best possible model for classification, even if it possesses limited data. This collaborative learning paradigm enhances the overall efficiency and effectiveness of the classification process, ensuring optimal results in scenarios where data availability may be constrained

    Bridging formal methods and machine learning with model checking and global optimisation

    Get PDF
    Formal methods and machine learning are two research fields with drastically different foundations and philosophies. Formal methods utilise mathematically rigorous techniques for software and hardware systems' specification, development and verification. Machine learning focuses on pragmatic approaches to gradually improve a parameterised model by observing a training data set. While historically, the two fields lack communication, this trend has changed in the past few years with an outburst of research interest in the robustness verification of neural networks. This paper will briefly review these works, and focus on the urgent need for broader and more in-depth communication between the two fields, with the ultimate goal of developing learning-enabled systems with excellent performance and acceptable safety and security. We present a specification language, MLS2, and show that it can express a set of known safety and security properties, including generalisation, uncertainty, robustness, data poisoning, backdoor, model stealing, membership inference, model inversion, interpretability, and fairness. To verify MLS2 properties, we promote the global optimisation-based methods, which have provable guarantees on the convergence to the optimal solution. Many of them have theoretical bounds on the gap between current solutions and the optimal solution

    Capsule networks with residual pose routing

    Get PDF
    Capsule networks (CapsNets) have been known difficult to develop a deeper architecture, which is desirable for high performance in the deep learning era, due to the complex capsule routing algorithms. In this article, we present a simple yet effective capsule routing algorithm, which is presented by a residual pose routing. Specifically, the higher-layer capsule pose is achieved by an identity mapping on the adjacently lower-layer capsule pose. Such simple residual pose routing has two advantages: 1) reducing the routing computation complexity and 2) avoiding gradient vanishing due to its residual learning framework. On top of that, we explicitly reformulate the capsule layers by building a residual pose block. Stacking multiple such blocks results in a deep residual CapsNets (ResCaps) with a ResNet-like architecture. Results on MNIST, AffNIST, SmallNORB, and CIFAR-10/100 show the effectiveness of ResCaps for image classification. Furthermore, we successfully extend our residual pose routing to large-scale real-world applications, including 3-D object reconstruction and classification, and 2-D saliency dense prediction. The source code has been released on https://github.com/liuyi1989/ResCaps

    Modelling phenomenological differences in aetiologically distinct visual hallucinations using deep neural networks

    Get PDF
    Visual hallucinations (VHs) are perceptions of objects or events in the absence of the sensory stimulation that would normally support such perceptions. Although all VHs share this core characteristic, there are substantial phenomenological differences between VHs that have different aetiologies, such as those arising from Neurodegenerative conditions, visual loss, or psychedelic compounds. Here, we examine the potential mechanistic basis of these differences by leveraging recent advances in visualising the learned representations of a coupled classifier and generative deep neural network—an approach we call ‘computational (neuro)phenomenology’. Examining three aetiologically distinct populations in which VHs occur—Neurodegenerative conditions (Parkinson’s Disease and Lewy Body Dementia), visual loss (Charles Bonnet Syndrome, CBS), and psychedelics—we identified three dimensions relevant to distinguishing these classes of VHs: realism (veridicality), dependence on sensory input (spontaneity), and complexity. By selectively tuning the parameters of the visualisation algorithm to reflect influence along each of these phenomenological dimensions we were able to generate ‘synthetic VHs’ that were characteristic of the VHs experienced by each aetiology. We verified the validity of this approach experimentally in two studies that examined the phenomenology of VHs in Neurodegenerative and CBS patients, and in people with recent psychedelic experience. These studies confirmed the existence of phenomenological differences across these three dimensions between groups, and crucially, found that the appropriate synthetic VHs were rated as being representative of each group’s hallucinatory phenomenology. Together, our findings highlight the phenomenological diversity of VHs associated with distinct causal factors and demonstrate how a neural network model of visual phenomenology can successfully capture the distinctive visual characteristics of hallucinatory experience

    An embodied approach to informational interventions: using conceptual metaphors to promote sustainable healthy diets

    Get PDF
    Poor diet quality and environmental degradation are two major challenges of our times. Unhealthy and unsustainable dietary practices, such as the overconsumption of meat and consumer food waste behaviour, contribute greatly to both issues. Across seventeen online and field experiments, in two different cultures (US and China), this thesis investigates if the embodied cognition approach, and more specifically, research on conceptual metaphors, can be used to develop interventions to promote sustainable healthy diets. Interventions relying on conceptual metaphors have been shown to stimulate attitudinal and behavioural changes in other fields (e.g., marketing and political communications), but are rarely adopted to encourage sustainable healthy diets. To fill in this gap in the literature, I conducted five sets of experimental studies examining the effects of different metaphors on specific sustainable healthy dietary practices, each of which forms an independent empirical paper (Chapters 2-6 of the thesis). After introducing the current perspectives on embodied cognition and conceptual metaphors in the context of this research (Chapter 1), Chapter 2 looks into the conceptual metaphor “Healthy is Up”, demonstrating that US people implicitly associate healthiness with verticality, and offering recommendations for healthy eating guidelines. Chapter 3 extends this research to Chinese samples and partially replicates the results. Chapter 4 shows that the anthropomorphic metaphor “Animals are Friends” discourages meat consumption by inducing anticipatory guilt among US omnivores, whereas Chapter 5 reveals that Chinese omnivores are more responsive to another anthropomorphic metaphor, namely, “Animals are Family”. Bringing lab insights 6 to the real world, Chapter 6 demonstrates with a longitudinal field experiment that anthropomorphic metaphors together with environmental feedback result in a higher reduction in food waste as compared to other feedback interventions. The strengths, limitations and implications of those empirical papers are discussed in the conclusive part of the thesis

    Self-supervised learning for transferable representations

    Get PDF
    Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks

    Dermoscopic dark corner artifacts removal: Friend or foe?

    Get PDF
    Background and Objectives: One of the more significant obstacles in classification of skin cancer is the presence of artifacts. This paper investigates the effect of dark corner artifacts, which result from the use of dermoscopes, on the performance of a deep learning binary classification task. Previous research attempted to remove and inpaint dark corner artifacts, with the intention of creating an ideal condition for models. However, such research has been shown to be inconclusive due to a lack of available datasets with corresponding labels for dark corner artifact cases. Methods: To address these issues, we label 10,250 skin lesion images from publicly available datasets and introduce a balanced dataset with an equal number of melanoma and non-melanoma cases. The training set comprises 6126 images without artifacts, and the testing set comprises 4124 images with dark corner artifacts. We conduct three experiments to provide new understanding on the effects of dark corner artifacts, including inpainted and synthetically generated examples, on a deep learning method. Results: Our results suggest that introducing synthetic dark corner artifacts which have been superimposed onto the training set improved model performance, particularly in terms of the true negative rate. This indicates that deep learning learnt to ignore dark corner artifacts, rather than treating it as melanoma, when dark corner artifacts were introduced into the training set. Further, we propose a new approach to quantifying heatmaps indicating network focus using a root mean square measure of the brightness intensity in the different regions of the heatmaps. Conclusions: The proposed artifact methods can be used in future experiments to help alleviate possible impacts on model performance. Additionally, the newly proposed heatmap quantification analysis will help to better understand the relationships between heatmap results and other model performance metrics
    • 

    corecore