275 research outputs found

    When less is more: How increasing the complexity of machine learning strategies for geothermal energy assessments may not lead toward better estimates

    Get PDF
    Previous moderate- and high-temperature geothermal resource assessments of the western United States utilized data-driven methods and expert decisions to estimate resource favorability. Although expert decisions can add confidence to the modeling process by ensuring reasonable models are employed, expert decisions also introduce human and, thereby, model bias. This bias can present a source of error that reduces the predictive performance of the models and confidence in the resulting resource estimates. Our study aims to develop robust data-driven methods with the goals of reducing bias and improving predictive ability. We present and compare nine favorability maps for geothermal resources in the western United States using data from the U.S. Geological Survey\u27s 2008 geothermal resource assessment. Two favorability maps are created using the expert decision-dependent methods from the 2008 assessment (i.e., weight-of-evidence and logistic regression). With the same data, we then create six different favorability maps using logistic regression (without underlying expert decisions), XGBoost, and support-vector machines paired with two training strategies. The training strategies are customized to address the inherent challenges of applying machine learning to the geothermal training data, which have no negative examples and severe class imbalance. We also create another favorability map using an artificial neural network. We demonstrate that modern machine learning approaches can improve upon systems built with expert decisions. We also find that XGBoost, a non-linear algorithm, produces greater agreement with the 2008 results than linear logistic regression without expert decisions, because the expert decisions in the 2008 assessment rendered the otherwise linear approaches non-linear despite the fact that the 2008 assessment used only linear methods. The F1 scores for all approaches appear low (F1 score \u3c 0.10), do not improve with increasing model complexity, and, therefore, indicate the fundamental limitations of the input features (i.e., training data). Until improved feature data are incorporated into the assessment process, simple non-linear algorithms (e.g., XGBoost) perform equally well or better than more complex methods (e.g., artificial neural networks) and remain easier to interpret

    Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms

    Full text link
    Many different machine learning algorithms exist; taking into account each algorithm's hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that addresses these issues in isolation. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA, spanning 2 ensemble methods, 10 meta-methods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR-10, we show classification performance often much better than using standard selection/hyperparameter optimization methods. We hope that our approach will help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.Comment: 9 pages, 3 figure

    Anomaly-based insider threat detection with expert feedback and descriptions

    Get PDF
    Abstract. Insider threat is one of the most significant security risks for organizations, hence insider threat detection is an important task. Anomaly detection is a one approach to insider threat detection. Anomaly detection techniques can be categorized into three categories with respect to how much labelled data is needed: unsupervised, semi-supervised and supervised. Obtaining accurate labels of all kinds of incidents for supervised learning is often expensive and impractical. Unsupervised methods do not require labelled data, but they have a high false positive rate because they operate on the assumption that anomalies are rarer than nominals. This can be mitigated by introducing feedback, known as expert-feedback or active learning. This allows the analyst to label a subset of the data. Another problem is the fact that models often are not interpretable, thus it is unclear why the model decided that a data instance is an anomaly. This thesis presents a literature review of insider threat detection, unsupervised and semi-supervised anomaly detection. The performance of various unsupervised anomaly detectors are evaluated. Knowledge is introduced into the system by using state-of-the-art feedback technique for ensembles, known as active anomaly discovery, which is incorporated into the anomaly detector, known as isolation forest. Additionally, to improve interpretability techniques of creating rule-based descriptions for the isolation forest are evaluated. Experiments were performed on CMU-CERT dataset, which is the only publicly available insider threat dataset with logon, removable device and HTTP log data. Models use usage count and session-based features that are computed for users on every day. The results show that active anomaly discovery helps in ranking true positives higher on the list, lowering the amount of data analysts have to analyse. Results also show that both compact description and Bayesian rulesets have the potential to be used in generating decision-rules that aid in analysing incidents; however, these rules are not correct in every instance.Poikkeamapohjainen sisäpiiriuhkien havainta palautteen ja kuvauksien avulla. Tiivistelmä. Sisäpiirinuhat ovat yksi vakavimmista riskeistä organisaatioille. Tästä syystä sisäpiiriuhkien havaitseminen on tärkeää. Sisäpiiriuhkia voidaan havaita poikkeamien havaitsemismenetelmillä. Nämä menetelmät voidaan luokitella kolmeen oppimisluokkaan saatavilla olevan tietomäärän perusteella: ohjaamaton, puoli-ohjattu ja ohjattu. Täysin oikein merkatun tiedon saaminen ohjattua oppimista varten voi olla hyvin kallista ja epäkäytännöllistä. Ohjaamattomat oppimismenetelmät eivät vaadi merkattua tietoa, mutta väärien positiivisten osuus on suurempi, koska nämä menetelmät perustuvat oletukseen että poikkeamat ovat harvinaisempia kuin normaalit tapaukset. Väärien positiivisten osuutta voidaan pienentää ottamalla käyttöön palaute, jolloin analyytikko voi merkata osan datasta. Tässä opinnäytetyössä tutustutaan ensin sisäpiiriuhkien havaitsemiseen, mitä tutkimuksia on tehty ja ohjaamattomaan ja puoli-ohjattuun poikkeamien havaitsemiseen. Muutamien lupaavien ohjaamattomien poikkeamatunnistimien toimintakyky arvioidaan. Järjestelmään lisätään tietoisuutta havaitsemisongelmasta käyttämällä urauurtavaa active anomaly discovery -palautemetelmää, joka on tehty havaitsinjoukoille (engl. ensembles). Tätä arvioidaan Isolation Forest -havaitsimen kanssa. Lisäksi, jotta analytiikko pystyisi paremmin käsittelemään havainnot, tässä työssä myös arvioidaan sääntöpohjaisten kuvausten luontimenetelmä Isolation Forest -havaitsimelle. Kokeilut suoritettiin käyttäen julkista CMU-CERT:in aineistoa, joka on ainoa julkinen aineisto, missä on muun muuassa kirjautumis-, USB-laite- ja HTTP-tapahtumia. Mallit käyttävät käyttöluku- ja istuntopohjaisia piirteitä, jotka luodaan jokaista käyttäjää ja päivää kohti. Tuloksien perusteella Active Anomaly Discovery auttaa epäilyttävämpien tapahtumien sijoittamisessa listan kärkeen vähentäen tiedon määrä, jonka analyytikon tarvitsee tutkia. Kompaktikuvakset (engl. compact descriptions)- ja Bayesian sääntöjoukko -menetelmät pystyvät luomaan sääntöjä, jotka kuvaavat minkä takia tapahtuma on epäilyttävä, mutta nämä säännöt eivät aina ole oikein

    Operationalizing fairness for responsible machine learning

    Get PDF
    As machine learning (ML) is increasingly used for decision making in scenarios that impact humans, there is a growing awareness of its potential for unfairness. A large body of recent work has focused on proposing formal notions of fairness in ML, as well as approaches to mitigate unfairness. However, there is a growing disconnect between the ML fairness literature and the needs to operationalize fairness in practice. This thesis addresses the need for responsible ML by developing new models and methods to address challenges in operationalizing fairness in practice. Specifically, it makes the following contributions. First, we tackle a key assumption in the group fairness literature that sensitive demographic attributes such as race and gender are known upfront, and can be readily used in model training to mitigate unfairness. In practice, factors like privacy and regulation often prohibit ML models from collecting or using protected attributes in decision making. To address this challenge we introduce the novel notion of computationally-identifiable errors and propose Adversarially Reweighted Learning (ARL), an optimization method that seeks to improve the worst-case performance over unobserved groups, without requiring access to the protected attributes in the dataset. Second, we argue that while group fairness notions are a desirable fairness criterion, they are fundamentally limited as they reduce fairness to an average statistic over pre-identified protected groups. In practice, automated decisions are made at an individual level, and can adversely impact individual people irrespective of the group statistic. We advance the paradigm of individual fairness by proposing iFair (individually fair representations), an optimization approach for learning a low dimensional latent representation of the data with two goals: to encode the data as well as possible, while removing any information about protected attributes in the transformed representation. Third, we advance the individual fairness paradigm, which requires that similar individuals receive similar outcomes. However, similarity metrics computed over observed feature space can be brittle, and inherently limited in their ability to accurately capture similarity between individuals. To address this, we introduce a novel notion of fairness graphs, wherein pairs of individuals can be identified as deemed similar with respect to the ML objective. We cast the problem of individual fairness into graph embedding, and propose PFR (pairwise fair representations), a method to learn a unified pairwise fair representation of the data. Fourth, we tackle the challenge that production data after model deployment is constantly evolving. As a consequence, in spite of the best efforts in training a fair model, ML systems can be prone to failure risks due to a variety of unforeseen reasons. To ensure responsible model deployment, potential failure risks need to be predicted, and mitigation actions need to be devised, for example, deferring to a human expert when uncertain or collecting additional data to address model’s blind-spots. We propose Risk Advisor, a model-agnostic meta-learner to predict potential failure risks and to give guidance on the sources of uncertainty inducing the risks, by leveraging information theoretic notions of aleatoric and epistemic uncertainty. This dissertation brings ML fairness closer to real-world applications by developing methods that address key practical challenges. Extensive experiments on a variety of real-world and synthetic datasets show that our proposed methods are viable in practice.Mit der zunehmenden Verwendung von Maschinellem Lernen (ML) in Situationen, die Auswirkungen auf Menschen haben, nimmt das Bewusstsein über das Potenzial für Unfair- ness zu. Ein großer Teil der jüngeren Forschung hat den Fokus auf das formale Verständnis von Fairness im Zusammenhang mit ML sowie auf Ansätze zur Überwindung von Unfairness gelegt. Jedoch driften die Literatur zu Fairness in ML und die Anforderungen zur Implementierung in der Praxis zunehmend auseinander. Diese Arbeit beschäftigt sich mit der Notwendigkeit für verantwortungsvolles ML, wofür neue Modelle und Methoden entwickelt werden, um die Herausforderungen im Fairness-Bereich in der Praxis zu bewältigen. Ihr wissenschaftlicher Beitrag ist im Folgenden dargestellt. In Kapitel 3 behandeln wir die Schlüsselprämisse in der Gruppenfairnessliteratur, dass sensible demografische Merkmale wie etwa die ethnische Zugehörigkeit oder das Geschlecht im Vorhinein bekannt sind und während des Trainings eines Modells zur Reduzierung der Unfairness genutzt werden können. In der Praxis hindern häufig Einschränkungen zum Schutz der Privatsphäre oder gesetzliche Regelungen ML-Modelle daran, geschützte Merkmale für die Entscheidungsfindung zu sammeln oder zu verwenden. Um diese Herausforderung zu überwinden, führen wir das Konzept der Komputational-identifizierbaren Fehler ein und stellen Adversarially Reweighted Learning (ARL) vor, ein Optimierungsverfahren, das die Worst-Case-Performance bei unbekannter Gruppenzugehörigkeit ohne Wissen über die geschützten Merkmale verbessert. In Kapitel 4 stellen wir dar, dass Konzepte für Gruppenfairness trotz ihrer Eignung als Fairnesskriterium grundsätzlich beschränkt sind, da Fairness auf eine gemittelte statistische Größe für zuvor identifizierte geschützte Gruppen reduziert wird. In der Praxis werden automatisierte Entscheidungen auf einer individuellen Ebene gefällt, und können unabhängig von der gruppenbezogenen Statistik Nachteile für Individuen haben. Wir erweitern das Konzept der individuellen Fairness um unsere Methode iFair (individually fair representations), ein Optimierungsverfahren zum Erlernen einer niedrigdimensionalen Darstellung der Daten mit zwei Zielen: die Daten so akkurat wie möglich zu enkodieren und gleichzeitig jegliche Information über die geschützten Merkmale in der transformierten Darstellung zu entfernen. In Kapitel 5 entwickeln wir das Paradigma der individuellen Fairness weiter, das ein ähnliches Ergebnis für ähnliche Individuen erfordert. Ähnlichkeitsmetriken im beobachteten Featureraum können jedoch unzuverlässig und inhärent beschränkt darin sein, Ähnlichkeit zwischen Individuen korrekt abzubilden. Um diese Herausforderung anzugehen, führen wir den neue Konzept der Fairnessgraphen ein, in denen Paare (oder Sets) von Individuen als ähnlich im Bezug auf die ML-Aufgabe identifiziert werden. Wir übersetzen das Problem der individuellen Fairness in eine Grapheinbindung und stellen PFR (pairwise fair representations) vor, eine Methode zum Erlernen einer vereinheitlichten paarweisen fairen Abbildung der Daten. In Kapitel 6 gehen wir die Herausforderung an, dass sich die Daten im Feld nach der Inbetriebnahme des Modells fortlaufend ändern. In der Konsequenz können ML-Systeme trotz größter Bemühungen, ein faires Modell zu trainieren, aufgrund einer Vielzahl an unvorhergesehenen Gründen scheitern. Um eine verantwortungsvolle Implementierung sicherzustellen, gilt es, Risiken für ein potenzielles Versagen vorherzusehen und Gegenmaßnahmen zu entwickeln,z.B. die Übertragung der Entscheidung an einen menschlichen Experten bei Unsicherheit oder das Sammeln weiterer Daten, um die blinden Flecken des Modells abzudecken. Wir stellen mit Risk Advisor einen modell-agnostischen Meta-Learner vor, der Risiken für potenzielles Versagen vorhersagt und Anhaltspunkte für die Ursache der zugrundeliegenden Unsicherheit basierend auf informationstheoretischen Konzepten der aleatorischen und epistemischen Unsicherheit liefert. Diese Dissertation bringt Fairness für verantwortungsvolles ML durch die Entwicklung von Ansätzen für die Lösung von praktischen Kernproblemen näher an die Anwendungen im Feld. Umfassende Experimente mit einer Vielzahl von synthetischen und realen Datensätzen zeigen, dass unsere Ansätze in der Praxis umsetzbar sind.The International Max Planck Research School for Computer Science (IMPRS-CS

    CHIRPS: Explaining random forest classification

    Get PDF
    Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop pro-cesses, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04–0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting

    Uncertainty-aware and Explainable Artificial Intelligence for Identification of Human Errors in Nuclear Power Plants

    Get PDF
    Nuclear Power Plants (NPPs) can face challenges in maintaining standard operations due to a range of issues, including human mistakes, mechanical breakdowns, electrical problems, measurement errors, and external influences. Swift and precise detection of these issues is crucial for stabilizing the NPPs. Identifying such operational anomalies is complex due to the numerous potential scenarios. Additionally, operators need to promptly discern the nature of an incident by tracking various indicators, a process that can be mentally taxing and increase the likelihood of human errors. Inaccurate identification of problems leads to inappropriate corrective actions, adversely affecting the safety and efficiency of NPPs. In this study, we leverage ensemble and uncertainty-aware models to identify such errors, and thereby increase the chances of mitigating them, using the data collected from a physical testbed. Furthermore, the goal is to identify both certain and reliable models. For this, the two main aspects of focus are, EXplainable Artificial intelligence (XAI) and Uncertainty Quantification (UQ). While XAI elucidates the decision pathway, UQ evaluates decision reliability. Their integration paints a comprehensive picture, signifying that understanding decisions and their confidence should be interlinked. Thus, in this study, we leverage measures like entropy and mutual information along with SHAP (SHapley Additive explanations) and LIME (Local Interpretable Model-Agnostic Explanations) to gain insights into the features contributing to the identification. Our results show that uncertainty-aware models combined with XAI tools can explain the AI-prescribed decisions, with the potential of better explaining errors for the operators

    Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies

    Get PDF
    Through a review of epistemological frameworks in social sciences, history of frameworks in statistics, as well as the current state of research, we establish that there appears to be no consistent, quantitatively motivated model development framework in data science, and the downstream analysis effects of various modeling choices are not uniformly documented. Examples are provided which illustrate that analytic choices, even if justifiable and statistically valid, have a downstream analysis effect on model results. This study proposes a unified model development framework that allows researchers to make statistically motivated modeling choices within the development pipeline. Additionally, a simulation study is used to determine empirical justification of the proposed framework. This study tests the utility of the proposed framework by investigating the effects of normalization on downstream analysis results. Normalization methods are investigated by utilizing a decomposition of the empirical risk functions, measuring effects on model bias, variance, and irreducible error. Measurements of bias and variance are then applied as diagnostic procedures for model pre-processing and development within the unified framework. Findings from simulation results are included in the proposed framework and stress-tested on benchmark datasets as well as several applications

    Adoption of Big Data and AI methods to manage medication administration and intensive care environments

    Get PDF
    Artificial Intelligence (AI) has proven to be very helpful in different areas, including the medical field. One important parameter for healthcare professionals’ decision-making process is blood pressure, specifically mean arterial pressure (MAP). The application of AI in medicine, more specifically in Intensive Care Units (ICU) has the potential to improve the efficiency of healthcare and boost telemedicine operations with access to real-time predictions from remote locations. Operations that once required the presence of a healthcare professional, can be done at a distance, which facing the recent COVID-19 pandemic, proved to be crucial. This dissertation presents a solution to develop an AI system capable of accurately predicting MAP values. Many ICU patients suffer from sepsis or septic shock, and they can be identified by the need for vasopressors, such as noradrenaline, to keep their MAP above 65 mm Hg. The presented solution facilitates early interventions, thereby minimising the risk to patients. The current study reviews various machine learning (ML) models, training them to predict MAP values. One of the challenges is to see how the different models behave during their training process and choose the most promising one to test in a controlled environment. The dataset used to train the models contains identical data to the one generated by bedside monitors, which ensures that the models’ predictions align with real-world scenarios. The medical data generated is processed by a separate component that performs data cleaning, after which is directed to the application responsible for loading, classifying the data and utilising the ML model. To increase trust between healthcare professionals and the system to be developed, it is also intended to provide insights into how the results are achieved. The solution was integrated, for validation, with one of the telemedicine hubs deployed by the European project ICU4Covid through its CPS4TIC component.A Inteligência Artificial (IA) é muito útil em diferentes áreas, incluindo a saúde. Um parâmetro importante para a tomada de decisão dos profissionais de saúde é a pressão arterial, especificamente a pressão arterial média (PAM). A aplicação da IA na medicina, mais especificamente nas Unidades de Cuidados Intensivos (UCI), tem o potencial de melhorar a eficiência dos cuidados de saúde e impulsionar operações de telemedicina com acesso a previsões em tempo real a partir de locais remotos. As operações que exigiam a presença de um profissional de saúde, podem ser feitas à distância, o que, face à recente pandemia da COVID-19, se revelou crucial. Esta dissertação apresenta como solução um sistema de IA capaz de prever valores de PAM. Muitos pacientes nas UCI sofrem de sepse ou choque séptico, e podem ser identificados pela necessidade de vasopressores, como a noradrenalina, para manter a sua PAM acima dos 65 mm Hg. A solução apresentada facilita intervenções antecipadas, minimizando o risco para doentes. O estudo atual analisa vários modelos de machine learning (ML), e treina-os para preverem valores de PAM. Um desafio é ver o desempenho dos diferentes modelos durante o seu treino, e escolher o mais promissor para testar num ambiente controlado. O dataset utilizado para treinar os modelos contém dados idênticos aos gerados por monitores de cabeceira, o que assegura que as previsões se alinhem com cenários realistas. Os dados médicos gerados são processados por um componente separado responsável pela sua limpeza e envio para a aplicação responsável pelo seu carregamento, classificação e utilização do modelo ML. Para aumentar a confiança entre os profissionais de saúde e o sistema, pretende-se também fornecer uma explicação relativa à previsão dada. A solução foi integrada, para validação, com um dos centros de telemedicina implantado pelo projeto europeu ICU4Covid através da sua componente CPS4TIC
    corecore