962 research outputs found

    A Data-Driven Decision Support System for Scoliosis Prognosis

    Get PDF
    published_or_final_versio

    Principled Data-Driven Decision Support for Cyber-Forensic Investigations

    Full text link
    In the wake of a cybersecurity incident, it is crucial to promptly discover how the threat actors breached security in order to assess the impact of the incident and to develop and deploy countermeasures that can protect against further attacks. To this end, defenders can launch a cyber-forensic investigation, which discovers the techniques that the threat actors used in the incident. A fundamental challenge in such an investigation is prioritizing the investigation of particular techniques since the investigation of each technique requires time and effort, but forensic analysts cannot know which ones were actually used before investigating them. To ensure prompt discovery, it is imperative to provide decision support that can help forensic analysts with this prioritization. A recent study demonstrated that data-driven decision support, based on a dataset of prior incidents, can provide state-of-the-art prioritization. However, this data-driven approach, called DISCLOSE, is based on a heuristic that utilizes only a subset of the available information and does not approximate optimal decisions. To improve upon this heuristic, we introduce a principled approach for data-driven decision support for cyber-forensic investigations. We formulate the decision-support problem using a Markov decision process, whose states represent the states of a forensic investigation. To solve the decision problem, we propose a Monte Carlo tree search based method, which relies on a k-NN regression over prior incidents to estimate state-transition probabilities. We evaluate our proposed approach on multiple versions of the MITRE ATT&CK dataset, which is a knowledge base of adversarial techniques and tactics based on real-world cyber incidents, and demonstrate that our approach outperforms DISCLOSE in terms of techniques discovered per effort spent

    Fuzzy competence model drift detection for data-driven decision support systems

    Full text link
    © 2017 Elsevier B.V. This paper focuses on concept drift in business intelligence and data-driven decision support systems (DSSs). The assumption of a fixed distribution in the data renders conventional static DSSs inaccurate and unable to make correct decisions when concept drift occurs. However, it is important to know when, how, and where concept drift occurs so a DSS can adjust its decision processing knowledge to adapt to an ever-changing environment at the appropriate time. This paper presents a data distribution-based concept drift detection method called fuzzy competence model drift detection (FCM-DD). By introducing fuzzy sets theory and replacing crisp boundaries with fuzzy ones, we have improved the competence model to provide a better, more refined empirical distribution of the data stream. FCM-DD requires no prior knowledge of the underlying distribution and provides statistical guarantee of the reliability of the detected drift, based on the theory of bootstrapping. A series of experiments show that our proposed FCM-DD method can detect drift more accurately, has good sensitivity, and is robust

    Data-driven decision support for perishable goods

    Full text link
    Retailers offering perishable consumer goods such as baked goods have to make hundreds of ordering decisions every day because they typically operate numerous stores and offer a wide range of products. Daily decisions or even intraday decisions are necessary as perishable goods deteriorate quickly and can usually only be sold on one day. Obviously, decision making concerning ordering quantities is a challenging but important task for each retailer as it affects its operational performance. Ordering too little leads to unsatisfied customers while ordering too much leads to discarded goods, which is a major cost factor. In practice, store managers are typically responsible for decisions related to perishable goods, which is not optimal for various reasons. Most importantly, the task is time consuming and some store managers may not have the necessary skills, which results in poor decisions. Hence, our goal is to develop and evaluate methods to support the decision-making process, which is made possible by advances in information technology and data analysis. In particular, we investigate how to exploit large datasets to make better decisions. For daily ordering decisions, we prose data-driven solution approaches for inventory management models that capture the trade-off of ordering too much or ordering too little such that the profits are maximized. First, we optimize the order quantity for each product independently. Second, we consider demand substitution and jointly optimize the order quantities of substitutable products. For intraday decisions, we formulate a scheduling problem for the optimization of baking plans based on hourly forecasts. Demand forecasts are an essential input for operational decisions. However, retail forecasting research is mainly devoted to weekly data using statistical time series models or linear regression models, whereas large-scale forecasting on daily data is understudied. We phrase the forecasting problem as a supervised Machine Learning task and conduct a comprehensive empirical evaluation to illustrate the suitability of Machine Learning methods. We empirically evaluate our solution approaches on real-world datasets from the bakery domain that are enriched with explanatory feature data. We find that our approaches perform competitive to state-of-the-art methods. Data-driven approaches substantially outperform traditional methods if the dataset is large enough. We also find that the benefit of improved forecasting dominates other potential benefits of data-driven solution methods for decision optimization. Overall, we conclude that data-driven decision support for perishable goods is feasible and superior to alternatives that are based on unreasonable assumptions or established time series models

    Design of Data-Driven Decision Support Systems for Business Process Standardization

    Get PDF
    Increasingly dynamic environments require organizations to engage in business process standardization (BPS) in response to environmental change. However, BPS depends on numerous contingency factors from different layers of the organization, such as strategy, business models (BMs), business processes (BPs) and application systems that need to be well-understood (“comprehended”) and taken into account by decision-makers for selecting appropriate standard BP designs that fit the organization. Besides, common approaches to BPS are non-data-driven and frequently do not exploit increasingly avail-able data in organizations. Therefore, this thesis addresses the following research ques-tion: “How to design data-driven decision support systems to increase the comprehen-sion of contingency factors on business process standardization?”. Theoretically grounded in organizational contingency theory (OCT), this thesis address-es the research question by conducting three design science research (DSR) projects to design data-driven decision support systems (DSSs) for SAP R/3 and S/4 HANA ERP systems that increase comprehension of BPS contingency factors. The thesis conducts the DSR projects at an industry partner within the context of a BPS and SAP S/4 HANA transformation program at a global manufacturing corporation. DSR project 1 designs a data-driven “Business Model Mining” system that automatical-ly “mines” BMs from data in application systems and represents results in an interactive “Business Model Canvas” (BMC) BI dashboard to comprehend BM-related BPS con-tingency factors. The project derives generic design requirements and a blueprint con-ceptualization for BMM systems and suggests an open, standardized reference data model for BMM. The project implements the software artifact “Business Model Miner” in Microsoft Azure / PowerBI and demonstrates technical feasibility by using data from an educational SAP S/4 HANA system, an open reference dataset, and three real-life SAP R/3 ERP systems. A field evaluation with 21 managers at the industry partner finds differences between tool results and BMCs created by managers and thus the po-tential for a complementary role of BMM tools to enrich the comprehension of BMs. A further controlled laboratory experiment with 142 students finds significant beneficial impacts on subjective and objective comprehension in terms of effectiveness, efficiency, and relative efficiency. Second, DSR project 2 designs a data-driven process mining DSS “KeyPro” to semi-automatically discover and prioritize the set of BPs occurring in an organization from log data to concentrate BPS initiatives on important BPs given limited organizational resources. The project derives objective and quantifiable BP importance metrics from BM and BPM literature and implements KeyPro for SAP R/3 ERP and S/4 HANA sys-tems in Microsoft SQL Server / Azure and interactive PowerBI dashboards. A field evaluation with 52 managers compares BPs detected manually by decision-makers against BPs discovered by KeyPro and reveals significant differences and a complemen-tary role of the artifact to deliver additional insights into the set of BPs in the organiza-tion. Finally, a controlled laboratory experiment with 30 students identifies the dash-boards with the lowest comprehension for further development. Third, OCT requires organizations to select a standard BP design that matches contin-gencies. Thus, DSR project 3 designs a process mining DSS to select a standard BP from a repository of different alternative designs based on the similarity of BPS contin-gency factors between the as-is process and the to-be standard processes. DSR project 3 thus derives four different process model variants for representing BPS contingency factors that vary according to determinant factors of process model comprehension (PMC) identified in PMC literature. A controlled laboratory evaluation with 150 stu-dents identifies significant differences in PMC. Based on laboratory findings, the DSS is implemented in the BPM platform “Apromore” to select standard BP reference mod-els from the SAP Best Practices Explorer for SAP S/4 HANA and applied for the pur-chase-to-pay and order-to-cash process of a manufacturing company

    Optimal management of bio-based energy supply chains under parametric uncertainty through a data-driven decision-support framework

    Get PDF
    This paper addresses the optimal management of a multi-objective bio-based energy supply chain network subjected to multiple sources of uncertainty. The complexity to obtain an optimal solution using traditional uncertainty management methods dramatically increases with the number of uncertain factors considered. Such a complexity produces that, if tractable, the problem is solved after a large computational effort. Therefore, in this work a data-driven decision-making framework is proposed to address this issue. Such a framework exploits machine learning techniques to efficiently approximate the optimal management decisions considering a set of uncertain parameters that continuously influence the process behavior as an input. A design of computer experiments technique is used in order to combine these parameters and produce a matrix of representative information. These data are used to optimize the deterministic multi-objective bio-based energy network problem through conventional optimization methods, leading to a detailed (but elementary) map of the optimal management decisions based on the uncertain parameters. Afterwards, the detailed data-driven relations are described/identified using an Ordinary Kriging meta-model. The result exhibits a very high accuracy of the parametric meta-models for predicting the optimal decision variables in comparison with the traditional stochastic approach. Besides, and more importantly, a dramatic reduction of the computational effort required to obtain these optimal values in response to the change of the uncertain parameters is achieved. Thus the use of the proposed data-driven decision tool promotes a time-effective optimal decision making, which represents a step forward to use data-driven strategy in large-scale/complex industrial problems.Peer ReviewedPostprint (published version

    Feature selection strategies for improving data-driven decision support in bank telemarketing

    Get PDF
    The usage of data mining techniques to unveil previously undiscovered knowledge has been applied in past years to a wide number of domains, including banking and marketing. Raw data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw data manipulation is feature engineering and it is related with the correct characterization or selection of relevant features (or variables) that conceal relations with the target goal. This study is particularly focused on feature engineering, aiming at the unfolding features that best characterize the problem of selling long-term bank deposits through telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank, ranging the 2008-2013 year period and encompassing the recent global financial crisis, was addressed. To assess the relevance of such problem, a novel literature analysis using text mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a research gap for bank telemarketing. Starting from a dataset containing typical telemarketing contacts and client information, research followed three different and complementary strategies: first, by enriching the dataset with social and economic context features; then, by including customer lifetime value related features; finally, by applying a divide and conquer strategy for splitting the problem in smaller fractions, leading to optimized sub-problems. Each of the three approaches improved previous results in terms of model metrics related to prediction performance. The relevance of the proposed features was evaluated, confirming the obtained models as credible and valuable for telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido aplicada nos últimos anos a uma grande variedade de domínios, incluindo banca e marketing. Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou variáveis) que se relacionem com o alvo da descoberta de conhecimento. Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de campanhas de telemarketing. Sendo um estudo empírico, foi utilizado um caso de estudo de um banco português, abrangendo o período 2008-2013, que inclui os efeitos da crise financeira internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a existência de uma lacuna nesta matéria. Utilizando como base um conjunto de dados de contactos de telemarketing e informação sobre os clientes, três estratégias diferentes e complementares foram propostas: primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram adicionadas características associadas ao valor do cliente ao longo do seu tempo de vida; finalmente, o problema foi dividido em problemas mais específicos, permitindo abordagens otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas à capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada, confirmando os modelos obtidos como credíveis e valiosos para gestores de campanhas de telemarketing

    Implementing data-driven decision support system based on independent educational data mart

    Get PDF
    Decision makers in the educational field always seek new technologies and tools, which provide solid, fast answers that can support decision-making process. They need a platform that utilize the students’ academic data and turn them into knowledge to make the right strategic decisions. In this paper, a roadmap for implementing a data driven decision support system (DSS) is presented based on an educational data mart. The independent data mart is implemented on the students’ degrees in 8 subjects in a private school (Al-Iskandaria Primary School in Basrah province, Iraq). The DSS implementation roadmap is started from pre-processing paper-based data source and ended with providing three categories of online analytical processing (OLAP) queries (multidimensional OLAP, desktop OLAP and web OLAP). Key performance indicator (KPI) is implemented as an essential part of educational DSS to measure school performance. The static evaluation method shows that the proposed DSS follows the privacy, security and performance aspects with no errors after inspecting the DSS knowledge base. The evaluation shows that the data driven DSS based on independent data mart with KPI, OLAP is one of the best platforms to support short-to-long term academic decisions
    • …
    corecore