1,050 research outputs found

    Decision diagrams in machine learning: an empirical study on real-life credit-risk data.

    Get PDF
    Decision trees are a widely used knowledge representation in machine learning. However, one of their main drawbacks is the inherent replication of isomorphic subtrees, as a result of which the produced classifiers might become too large to be comprehensible by the human experts that have to validate them. Alternatively, decision diagrams, a generalization of decision trees taking on the form of a rooted, acyclic digraph instead of a tree, have occasionally been suggested as a potentially more compact representation. Their application in machine learning has nonetheless been criticized, because the theoretical size advantages of subgraph sharing did not always directly materialize in the relatively scarce reported experiments on real-world data. Therefore, in this paper, starting from a series of rule sets extracted from three real-life credit-scoring data sets, we will empirically assess to what extent decision diagrams are able to provide a compact visual description. Furthermore, we will investigate the practical impact of finding a good attribute ordering on the achieved size savings.Advantages; Classifiers; Credit scoring; Data; Decision; Decision diagrams; Decision trees; Empirical study; Knowledge; Learning; Real life; Representation; Size; Studies;

    Interpretable Binary and Multiclass Prediction Models for Insolvencies and Credit Ratings

    Get PDF
    Insolvenzprognosen und Ratings sind wichtige Aufgaben der Finanzbranche und dienen der Kreditwürdigkeitsprüfung von Unternehmen. Eine Möglichkeit dieses Aufgabenfeld anzugehen, ist maschinelles Lernen. Dabei werden Vorhersagemodelle aufgrund von Beispieldaten aufgestellt. Methoden aus diesem Bereich sind aufgrund Ihrer Automatisierbarkeit vorteilhaft. Dies macht menschliche Expertise in den meisten Fällen überflüssig und bietet dadurch einen höheren Grad an Objektivität. Allerdings sind auch diese Ansätze nicht perfekt und können deshalb menschliche Expertise nicht gänzlich ersetzen. Sie bieten sich aber als Entscheidungshilfen an und können als solche von Experten genutzt werden, weshalb interpretierbare Modelle wünschenswert sind. Leider bieten nur wenige Lernalgorithmen interpretierbare Modelle. Darüber hinaus sind einige Aufgaben wie z.B. Rating häufig Mehrklassenprobleme. Mehrklassenklassifikationen werden häufig durch Meta-Algorithmen erreicht, welche mehrere binäre Algorithmen trainieren. Die meisten der üblicherweise verwendeten Meta-Algorithmen eliminieren jedoch eine gegebenenfalls vorhandene Interpretierbarkeit. In dieser Dissertation untersuchen wir die Vorhersagegenauigkeit von interpretierbaren Modellen im Vergleich zu nicht interpretierbaren Modellen für Insolvenzprognosen und Ratings. Wir verwenden disjunktive Normalformen und Entscheidungsbäume mit Schwellwerten von Finanzkennzahlen als interpretierbare Modelle. Als nicht interpretierbare Modelle werden Random Forests, künstliche Neuronale Netze und Support Vector Machines verwendet. Darüber hinaus haben wir einen eigenen Lernalgorithmus Thresholder entwickelt, welcher disjunktive Normalformen und interpretierbare Mehrklassenmodelle generiert. Für die Aufgabe der Insolvenzprognose zeigen wir, dass interpretierbare Modelle den nicht interpretierbaren Modellen nicht unterlegen sind. Dazu wird in einer ersten Fallstudie eine in der Praxis verwendete Datenbank mit Jahresabschlüssen von 5152 Unternehmen verwendet, um die Vorhersagegenauigkeit aller oben genannter Modelle zu messen. In einer zweiten Fallstudie zur Vorhersage von Ratings demonstrieren wir, dass interpretierbare Modelle den nicht interpretierbaren Modellen sogar überlegen sind. Die Vorhersagegenauigkeit aller Modelle wird anhand von drei in der Praxis verwendeten Datensätzen bestimmt, welche jeweils drei Ratingklassen aufweisen. In den Fallstudien vergleichen wir verschiedene interpretierbare Ansätze bezüglich deren Modellgrößen und der Form der Interpretierbarkeit. Wir präsentieren exemplarische Modelle, welche auf den entsprechenden Datensätzen basieren und bieten dafür Interpretationsansätze an. Unsere Ergebnisse zeigen, dass interpretierbare, schwellwertbasierte Modelle den Klassifikationsproblemen in der Finanzbranche angemessen sind. In diesem Bereich sind sie komplexeren Modellen, wie z.B. den Support Vector Machines, nicht unterlegen. Unser Algorithmus Thresholder erzeugt die kleinsten Modelle während seine Vorhersagegenauigkeit vergleichbar mit den anderen interpretierbaren Modellen bleibt. In unserer Fallstudie zu Rating liefern die interpretierbaren Modelle deutlich bessere Ergebnisse als bei der zur Insolvenzprognose (s. o.). Eine mögliche Erklärung dieser Ergebnisse bietet die Tatsache, dass Ratings im Gegensatz zu Insolvenzen menschengemacht sind. Das bedeutet, dass Ratings auf Entscheidungen von Menschen beruhen, welche in interpretierbaren Regeln, z.B. logischen Verknüpfungen von Schwellwerten, denken. Daher gehen wir davon aus, dass interpretierbare Modelle zu den Problemstellungen passen und diese interpretierbaren Regeln erkennen und abbilden

    A Multidimensional Perceptual Map Approach to Project Prioritization and Selection

    Get PDF
    When prioritizing projects, managers usually have to evaluate multiple attributes (dimensions) of project data. However, these dimensions are usually condensed into one or two indicators in many existing analysis processes. For example, projects are commonly prioritized using a scoring approach: they are evaluated according to predefined categories, which are then aggregated into one or two priority numbers. We argue that aggregated scores may only offer a limited view of project importance. This often leads decision makers to ignore the possible differences masked by the aggregation. Following the design science research paradigm, this paper presents a visual exploration approach based on multi-dimensional perceptual maps. It incorporates human intuition in the process and maintains the multidimensionality of project data as a decision basis for project prioritization and selection. A prototype system based on the approach was developed and qualitatively evaluated by a group of project managers. A qualitative analysis of the data collected shows its utility and usability

    Data journalism, data literacy and data visualizations : a quantitative study

    Get PDF
    Professional project report submitted in partial fulfillment of the requirements for the degree of Masters of Arts in Journalism from the School of Journalism, University of Missouri--Columbia.As data becomes increasingly important in contemporary society, data journalism and data literacy also become more important. This project explores these concepts and examines the role each can play in writing about and understanding data intensive information. To test the effects of data visualizations and data literacy on comprehension, this project uses a quantitative experimental design where subjects read different versions of an article followed by a comprehension test. The article treatments include a text-only version, a version with a bar graph and a version with a data table. In addition, subjects were classified as data literate and non-data literate based on a survey. As hypothesized, the results showed a significant comprehension benefit for both groups of subjects with access to a data visualization, with the text-only group scoring lowest in comprehension. The results also showed significant comprehension differences based on data literacy in the bar graph test condition. These results can be used to inform future study, as well as to inform best practices in data journalism and in data science education.Includes bibliographical references

    Making and using large models of complex systems: The Poverty Reduction Model

    Get PDF
    A system model is an abstract representation of a complex social system, which can be useful for facilitated sensemaking and decision support. This study presents a causal model format adapted from causal loop diagramming to integrate more knowledge of complexity, with higher comprehension. As a case study, a Poverty Reduction Model was developed with over 1100 cause-and-effect relationships between more than 550 factors. Staff of the Yonge Street Mission social services agency used this model to find interventions to reduce poverty in Toronto, which were prioritized using the system model in combination with rating, scoring and discussion. A framework is provided to balance model scope and quality requirements with the time and resources available to an organization. Modelling and option-comparison methods are documented for potential re-use by other organizations

    Use of (Q)SAR genotoxicity predictions and fuzzy multicriteria decision-making for priority ranking of ethoxyquin transformation products

    Get PDF
    Ethoxyquin (EQ; 6-ethoxy-2,2,4-trimethyl-1,2-dihydroquinoline) has been used as an antioxidant in feed for pets and food-producing animals, including farmed fish such as Atlantic salmon. In Europe, the authorization for use of EQ as a feed additive was suspended, due to knowledge gaps concerning the presence and toxicity of EQ transformation products (TPs). Recent analytical studies focusing on the detection of EQ TPs in farmed Atlantic salmon feed and fillets reported the detection of a total of 27 EQ TPs, comprising both known and previously not described EQ TPs. We devised and applied an in silico workflow to rank these EQ TPs according to their genotoxic potential and their occurrence data in Atlantic salmon feed and fillet. Ames genotoxicity predictions were obtained applying a suite of five (quantitative) structure–activity relationship ((Q)SAR) tools, namely VEGA, TEST, LAZAR, Derek Nexus and Sarah Nexus. (Q)SAR Ames genotoxicity predictions were aggregated using fuzzy analytic hierarchy process (fAHP) multicriteria decision-making (MCDM). A priority ranking of EQ TPs was performed based on combining both fAHP ranked (Q)SAR predictions and analytical occurrence data. The applied workflow prioritized four newly identified EQ TPs for further investigation of genotoxicity. The fAHP-based prioritization strategy described here, can easily be applied to other toxicity endpoints and groups of chemicals for priority ranking of compounds of most concern for subsequent experimental and mechanistic toxicology analyses.publishedVersio
    corecore