9 research outputs found

    MACHINE LEARNING MODELS INTERPRETABILITY FOR MALWARE DETECTION USING MODEL AGNOSTIC LANGUAGE FOR EXPLORATION AND EXPLANATION

    Get PDF
    The adoption of the internet as a global platform has birthed a significant rise in cyber-attacks of various forms ranging from Trojans, worms, spyware, ransomware, botnet malware, rootkit, etc. In order to tackle the issue of all these forms of malware, there is a need to understand and detect them. There are various methods of detecting malware which include signature, behavioral, and machine learning. Machine learning methods have proven to be the most efficient of all for malware detection. In this thesis, a system that utilizes both the signature and dynamic behavior-based detection techniques, with the added layer of the machine learning algorithm with model explainability capability is proposed. This hybrid system provides not only predictions but also their interpretation and explanation for a malware detection task. The layer of a machine learning algorithm can be Logistic Regression, Random Forest, Naive Bayes, Decision Tree, or Support Vector Machine. Empirical performance evaluation results on publicly available datasets and manually acquired samples (both benign and malicious) are used to compare the five machine learning algorithms. DALEX (moDel Agnostic Language for Exploration and explanation) is integrated into the proposed hybrid approach to support the interpretation and understanding of the prediction to improve the trust of cyber security stakeholders in complex machine learning predictive models

    Explainable artificial intelligence for Healthcare applications using Random Forest Classifier with LIME and SHAP

    Full text link
    With the advances in computationally efficient artificial Intelligence (AI) techniques and their numerous applications in our everyday life, there is a pressing need to understand the computational details hidden in black box AI techniques such as most popular machine learning and deep learning techniques; through more detailed explanations. The origin of explainable AI (xAI) is coined from these challenges and recently gained more attention by the researchers by adding explainability comprehensively in traditional AI systems. This leads to develop an appropriate framework for successful applications of xAI in real life scenarios with respect to innovations, risk mitigation, ethical issues and logical values to the users. In this book chapter, an in-depth analysis of several xAI frameworks and methods including LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) are provided. Random Forest Classifier as black box AI is used on a publicly available Diabetes symptoms dataset with LIME and SHAP for better interpretations. The results obtained are interesting in terms of transparency, valid and trustworthiness in diabetes disease prediction.Comment: Chapter-6: Accepted Book Chapter in: Transparent, Interpretable and Explainable AI Systems, BK Tripathy & Hari Seetha (Editors), CRC Press, May 202

    INSOMNIA:Towards Concept-Drift Robustness in Network Intrusion Detection

    Get PDF
    Despite decades of research in network traffic analysis and incredible advances in artificial intelligence, network intrusion detection systems based on machine learning (ML) have yet to prove their worth. One core obstacle is the existence of concept drift, an issue for all adversary-facing security systems. Additionally, specific challenges set intrusion detection apart from other ML-based security tasks, such as malware detection. In this work, we offer a new perspective on these challenges. We propose INSOMNIA, a semi-supervised intrusion detector which continuously updates the underlying ML model as network traffic characteristics are affected by concept drift. We use active learning to reduce latency in the model updates, label estimation to reduce labeling overhead, and apply explainable AI to better interpret how the model reacts to the shifting distribution. To evaluate INSOMNIA, we extend TESSERACT - a framework originally proposed for performing sound time-aware evaluations of ML-based malware detectors - to the network intrusion domain. Our evaluation shows that accounting for drifting scenarios is vital for effective intrusion detection systems

    UAV Remote Sensing for High-Throughput Phenotyping and for Yield Prediction of Miscanthus by Machine Learning Techniques

    Get PDF
    Miscanthus holds a great potential in the frame of the bioeconomy, and yield prediction can help improve Miscanthus’ logistic supply chain. Breeding programs in several countries are attempting to produce high-yielding Miscanthus hybrids better adapted to different climates and end-uses. Multispectral images acquired from unmanned aerial vehicles (UAVs) in Italy and in the UK in 2021 and 2022 were used to investigate the feasibility of high-throughput phenotyping (HTP) of novel Miscanthus hybrids for yield prediction and crop traits estimation. An intercalibration procedure was performed using simulated data from the PROSAIL model to link vegetation indices (VIs) derived from two different multispectral sensors. The random forest algorithm estimated with good accuracy yield traits (light interception, plant height, green leaf biomass, and standing biomass) using a VIs time series, and predicted yield using a peak descriptor derived from a VIs time series with 2.3 Mg DM ha−1 of the root mean square error (RMSE). The study demonstrates the potential of UAVs’ multispectral images in HTP applications and in yield prediction, providing important information needed to increase sustainable biomass production

    UAV Remote Sensing for High-Throughput Phenotyping and for Yield Prediction of Miscanthus by Machine Learning Techniques

    Get PDF
    Miscanthus holds a great potential in the frame of the bioeconomy, and yield prediction can help improve Miscanthus’ logistic supply chain. Breeding programs in several countries are attempting to produce high-yielding Miscanthus hybrids better adapted to different climates and end-uses. Multispectral images acquired from unmanned aerial vehicles (UAVs) in Italy and in the UK in 2021 and 2022 were used to investigate the feasibility of high-throughput phenotyping (HTP) of novel Miscanthus hybrids for yield prediction and crop traits estimation. An intercalibration procedure was performed using simulated data from the PROSAIL model to link vegetation indices (VIs) derived from two different multispectral sensors. The random forest algorithm estimated with good accuracy yield traits (light interception, plant height, green leaf biomass, and standing biomass) using a VIs time series, and predicted yield using a peak descriptor derived from a VIs time series with 2.3 Mg DM ha−1 of the root mean square error (RMSE). The study demonstrates the potential of UAVs’ multispectral images in HTP applications and in yield prediction, providing important information needed to increase sustainable biomass production

    The Blind Oracle, eXplainable Artififical Intelligence (XAI) and human agency

    Get PDF
    An explainable machine learning model is a requirement for trust. Without it the human operator cannot form a correct mental model and will distrust and reject the machine learning model. Nobody will ever trust a system which exhibit an apparent erratic behaviour. The development of eXplainable AI (XAI) techniques try to uncover how a model works internally and the reasons why they make some predictions and not others. But the ultimate objective is to use these techniques to guide the training and deployment of fair automated decision systems that support human agency and are beneficial to humanity. In addition, automated decision systems based on Machine Learning models are being used for an increasingly number of purposes. However, the use of black-box models and massive quantities of data to train them make the deployed models inscrutable. Consequently, predictions made by systems integrating these models might provoke rejection by their users when they made seemingly arbitrary predictions. Moreover, the risk is compounded by the use of models in high-risk environments or in situations when the predictions might have serious consequences.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Máster en Ingeniería Informátic

    UAV Remote Sensing for High-Throughput Phenotyping and for Yield Prediction of Miscanthus by Machine Learning Techniques

    Get PDF
    Miscanthus holds a great potential in the frame of the bioeconomy, and yield prediction can help improve Miscanthus’ logistic supply chain. Breeding programs in several countries are attempting to produce high-yielding Miscanthus hybrids better adapted to different climates and end-uses. Multispectral images acquired from unmanned aerial vehicles (UAVs) in Italy and in the UK in 2021 and 2022 were used to investigate the feasibility of high-throughput phenotyping (HTP) of novel Miscanthus hybrids for yield prediction and crop traits estimation. An intercalibration procedure was performed using simulated data from the PROSAIL model to link vegetation indices (VIs) derived from two different multispectral sensors. The random forest algorithm estimated with good accuracy yield traits (light interception, plant height, green leaf biomass, and standing biomass) using a VIs time series, and predicted yield using a peak descriptor derived from a VIs time series with 2.3 Mg DM ha−1 of the root mean square error (RMSE). The study demonstrates the potential of UAVs’ multispectral images in HTP applications and in yield prediction, providing important information needed to increase sustainable biomass production

    Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability

    No full text
    Explaining AI systems is fundamental both to the development of high performing models and to the trust placed in them by their users. The Shapley framework for explainability has strength in its general applicability combined with its precise, rigorous foundation: it provides a common, model-agnostic language for AI explainability and uniquely satisfies a set of intuitive mathematical axioms. However, Shapley values are too restrictive in one significant regard: they ignore all causal structure in the data. We introduce a less restrictive framework, Asymmetric Shapley values (ASVs), which are rigorously founded on a set of axioms, applicable to any AI system, and flexible enough to incorporate any causal structure known to be respected by the data. We demonstrate that ASVs can (i) improve model explanations by incorporating causal information, (ii) provide an unambiguous test for unfair discrimination in model predictions, (iii) enable sequentially incremental explanations in time-series models, and (iv) support feature-selection studies without the need for model retraining.Comment: To appear in NeurIPS 2020; 9 pages, 2 figures, 2 appendice
    corecore