791 research outputs found

    How is a data-driven approach better than random choice in label space division for multi-label classification?

    Full text link
    We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform RAkELd than otherwise in all measures, but Hamming Loss. We show that fastgreedy and walktrap community detection methods on weighted label co-occurence graphs are 85-92% more likely to yield better F1 scores than random partitioning. Infomap on the unweighted label co-occurence graphs is on average 90% of the times better than random paritioning in terms of Subset Accuracy and 89% when it comes to Jaccard similarity. Weighted fastgreedy is better on average than RAkELd when it comes to Hamming Loss

    Predicting supply chain risks using machine learning: The trade-off between performance and interpretability

    Get PDF
    Managing supply chain risks has received increased attention in recent years, aiming to shield supply chains from disruptions by predicting their occurrence and mitigating their adverse effects. At the same time, the resurgence of Artificial Intelligence (AI) has led to the investigation of machine learning techniques and their applicability in supply chain risk management. However, most works focus on prediction performance and neglect the importance of interpretability so that results can be understood by supply chain practitioners, helping them make decisions that can mitigate or prevent risks from occurring. In this work, we first propose a supply chain risk prediction framework using data-driven AI techniques and relying on the synergy between AI and supply chain experts. We then explore the trade-off between prediction performance and interpretability by implementing and applying the framework on the case of predicting delivery delays in a real-world multi-tier manufacturing supply chain. Experiment results show that prioritising interpretability over performance may require a level of compromise, especially with regard to average precision scores

    Military and Security Applications: Cybersecurity (Encyclopedia of Optimization, Third Edition)

    Get PDF
    The domain of cybersecurity is growing as part of broader military and security applications, and the capabilities and processes in this realm have qualities and characteristics that warrant using solution methods in mathematical optimization. Problems of interest may involve continuous or discrete variables, a convex or non-convex decision space, differing levels of uncertainty, and constrained or unconstrained frameworks. Cyberattacks, for example, can be modeled using hierarchical threat structures and may involve decision strategies from both an organization or individual and the adversary. Network traffic flow, intrusion detection and prevention systems, interconnected human-machine interfaces, and automated systems – these all require higher levels of complexity in mathematical optimization modeling and analysis. Attributes such as cyber resiliency, network adaptability, security capability, and information technology flexibility – these require the measurement of multiple characteristics, many of which may involve both quantitative and qualitative interpretations. And for nearly every organization that is invested in some cybersecurity practice, decisions must be made that involve the competing objectives of cost, risk, and performance. As such, mathematical optimization has been widely used and accepted to model important and complex decision problems, providing analytical evidence for helping drive decision outcomes in cybersecurity applications. In the paragraphs that follow, this chapter highlights some of the recent mathematical optimization research in the body of knowledge applied to the cybersecurity space. The subsequent literature discussed fits within a broader cybersecurity domain taxonomy considering the categories of analyze, collect and operate, investigate, operate and maintain, oversee and govern, protect and defend, and securely provision. Further, the paragraphs are structured around generalized mathematical optimization categories to provide a lens to summarize the existing literature, including uncertainty (stochastic programming, robust optimization, etc.), discrete (integer programming, multiobjective, etc.), continuous-unconstrained (nonlinear least squares, etc.), continuous-constrained (global optimization, etc.), and continuous-constrained (nonlinear programming, network optimization, linear programming, etc.). At the conclusion of this chapter, research implications and extensions are offered to the reader that desires to pursue further mathematical optimization research for cybersecurity within a broader military and security applications context

    User Modeling via Machine Learning and Rule-Based Reasoning to Understand and Predict Errors in Survey Systems

    Get PDF
    User modeling is traditionally applied to systems were users have a large degree of control over their goals, the content they view, and the manner in which they navigate through the system. These systems aim to both recommend useful goals to users and to assist them in achieving perceived goals. Systems such as online or telephone surveys are different in that users have only a singular goal of survey completion, extremely limited control over navigation, and content is restricted to prescribed set of survey tasks; changing the user modeling problem to one in which the best means of assisting users is to identify rare-actions hazardous to their singular goal, by observing their interactions with common contexts. With this goal in mind, predictive mechanisms based on a combination of Machine Learning classifiers and survey domain knowledge encapsulated in sets of rules are developed that utilize user behavioral, demographic, and survey state data in order to predict when user actions leading to irreparable harm to the user\u27s singular goal of successful survey completion will occur. We show that despite a large class imbalance problem associated with detecting these actions and their associated users, we are able to predict such actions at a rate better than random guessing and that the application of domain knowledge via rule-sets improves performance further. We also identify traits of surveys and users that are associated with rare-action incidence. For future work, it is recommended that existence of potential sub-concepts related to users who perform these rare-actions be explored, as well as exploring alternative means of identifying such users, and that system adaptations be developed that can prevent users from performing these rare and harmful actions. Advisor: LeenKiat So
    corecore