Search CORE

1,050 research outputs found

Decision diagrams in machine learning: an empirical study on real-life credit-risk data.

Author: Baesens Bart
Files CM
Mues Christophe
Vanthienen Jan
Publication venue
Publication date
Field of study

Decision trees are a widely used knowledge representation in machine learning. However, one of their main drawbacks is the inherent replication of isomorphic subtrees, as a result of which the produced classifiers might become too large to be comprehensible by the human experts that have to validate them. Alternatively, decision diagrams, a generalization of decision trees taking on the form of a rooted, acyclic digraph instead of a tree, have occasionally been suggested as a potentially more compact representation. Their application in machine learning has nonetheless been criticized, because the theoretical size advantages of subgraph sharing did not always directly materialize in the relatively scarce reported experiments on real-world data. Therefore, in this paper, starting from a series of rule sets extracted from three real-life credit-scoring data sets, we will empirically assess to what extent decision diagrams are able to provide a compact visual description. Furthermore, we will investigate the practical impact of finding a good attribute ordering on the achieved size savings.Advantages; Classifiers; Credit scoring; Data; Decision; Decision diagrams; Decision trees; Empirical study; Knowledge; Learning; Real life; Representation; Size; Studies;

Research Papers in Economics

Interpretable Binary and Multiclass Prediction Models for Insolvencies and Credit Ratings

Author: Obermann Lennart
Publication venue
Publication date: 10/05/2016
Field of study

Insolvenzprognosen und Ratings sind wichtige Aufgaben der Finanzbranche und dienen der Kreditwürdigkeitsprüfung von Unternehmen. Eine Möglichkeit dieses Aufgabenfeld anzugehen, ist maschinelles Lernen. Dabei werden Vorhersagemodelle aufgrund von Beispieldaten aufgestellt. Methoden aus diesem Bereich sind aufgrund Ihrer Automatisierbarkeit vorteilhaft. Dies macht menschliche Expertise in den meisten Fällen überflüssig und bietet dadurch einen höheren Grad an Objektivität. Allerdings sind auch diese Ansätze nicht perfekt und können deshalb menschliche Expertise nicht gänzlich ersetzen. Sie bieten sich aber als Entscheidungshilfen an und können als solche von Experten genutzt werden, weshalb interpretierbare Modelle wünschenswert sind. Leider bieten nur wenige Lernalgorithmen interpretierbare Modelle. Darüber hinaus sind einige Aufgaben wie z.B. Rating häufig Mehrklassenprobleme. Mehrklassenklassifikationen werden häufig durch Meta-Algorithmen erreicht, welche mehrere binäre Algorithmen trainieren. Die meisten der üblicherweise verwendeten Meta-Algorithmen eliminieren jedoch eine gegebenenfalls vorhandene Interpretierbarkeit. In dieser Dissertation untersuchen wir die Vorhersagegenauigkeit von interpretierbaren Modellen im Vergleich zu nicht interpretierbaren Modellen für Insolvenzprognosen und Ratings. Wir verwenden disjunktive Normalformen und Entscheidungsbäume mit Schwellwerten von Finanzkennzahlen als interpretierbare Modelle. Als nicht interpretierbare Modelle werden Random Forests, künstliche Neuronale Netze und Support Vector Machines verwendet. Darüber hinaus haben wir einen eigenen Lernalgorithmus Thresholder entwickelt, welcher disjunktive Normalformen und interpretierbare Mehrklassenmodelle generiert. Für die Aufgabe der Insolvenzprognose zeigen wir, dass interpretierbare Modelle den nicht interpretierbaren Modellen nicht unterlegen sind. Dazu wird in einer ersten Fallstudie eine in der Praxis verwendete Datenbank mit Jahresabschlüssen von 5152 Unternehmen verwendet, um die Vorhersagegenauigkeit aller oben genannter Modelle zu messen. In einer zweiten Fallstudie zur Vorhersage von Ratings demonstrieren wir, dass interpretierbare Modelle den nicht interpretierbaren Modellen sogar überlegen sind. Die Vorhersagegenauigkeit aller Modelle wird anhand von drei in der Praxis verwendeten Datensätzen bestimmt, welche jeweils drei Ratingklassen aufweisen. In den Fallstudien vergleichen wir verschiedene interpretierbare Ansätze bezüglich deren Modellgrößen und der Form der Interpretierbarkeit. Wir präsentieren exemplarische Modelle, welche auf den entsprechenden Datensätzen basieren und bieten dafür Interpretationsansätze an. Unsere Ergebnisse zeigen, dass interpretierbare, schwellwertbasierte Modelle den Klassifikationsproblemen in der Finanzbranche angemessen sind. In diesem Bereich sind sie komplexeren Modellen, wie z.B. den Support Vector Machines, nicht unterlegen. Unser Algorithmus Thresholder erzeugt die kleinsten Modelle während seine Vorhersagegenauigkeit vergleichbar mit den anderen interpretierbaren Modellen bleibt. In unserer Fallstudie zu Rating liefern die interpretierbaren Modelle deutlich bessere Ergebnisse als bei der zur Insolvenzprognose (s. o.). Eine mögliche Erklärung dieser Ergebnisse bietet die Tatsache, dass Ratings im Gegensatz zu Insolvenzen menschengemacht sind. Das bedeutet, dass Ratings auf Entscheidungen von Menschen beruhen, welche in interpretierbaren Regeln, z.B. logischen Verknüpfungen von Schwellwerten, denken. Daher gehen wir davon aus, dass interpretierbare Modelle zu den Problemstellungen passen und diese interpretierbaren Regeln erkennen und abbilden

Georg-August-University Göttingen

A Multidimensional Perceptual Map Approach to Project Prioritization and Selection

Author: Vaishnavi Vijay K.
Zheng Guangzhi
Publication venue: AIS Electronic Library (AISeL)
Publication date: 29/06/2011
Field of study

When prioritizing projects, managers usually have to evaluate multiple attributes (dimensions) of project data. However, these dimensions are usually condensed into one or two indicators in many existing analysis processes. For example, projects are commonly prioritized using a scoring approach: they are evaluated according to predefined categories, which are then aggregated into one or two priority numbers. We argue that aggregated scores may only offer a limited view of project importance. This often leads decision makers to ignore the possible differences masked by the aggregation. Following the design science research paradigm, this paper presents a visual exploration approach based on multi-dimensional perceptual maps. It incorporates human intuition in the process and maintains the multidimensionality of project data as a decision basis for project prioritization and selection. A prototype system based on the approach was developed and qualitatively evaluated by a group of project managers. A qualitative analysis of the data collected shows its utility and usability

AIS Electronic Library (AISeL)

Data journalism, data literacy and data visualizations : a quantitative study

Author: Perrin Steve
Publication venue: University of Missouri--Columbia
Publication date
Field of study

Professional project report submitted in partial fulfillment of the requirements for the degree of Masters of Arts in Journalism from the School of Journalism, University of Missouri--Columbia.As data becomes increasingly important in contemporary society, data journalism and data literacy also become more important. This project explores these concepts and examines the role each can play in writing about and understanding data intensive information. To test the effects of data visualizations and data literacy on comprehension, this project uses a quantitative experimental design where subjects read different versions of an article followed by a comprehension test. The article treatments include a text-only version, a version with a bar graph and a version with a data table. In addition, subjects were classified as data literate and non-data literate based on a survey. As hypothesized, the results showed a significant comprehension benefit for both groups of subjects with access to a data visualization, with the text-only group scoring lowest in comprehension. The results also showed significant comprehension differences based on data literacy in the bar graph test condition. These results can be used to inform future study, as well as to inform best practices in data journalism and in data science education.Includes bibliographical references

University of Missouri: MOspace

Recommended from our members

A novel knowledge discovery based approach for supplier risk scoring with application in the HVAC industry

Author: Chuddher Bilal Akbar
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonThis research has led to a novel methodology for assessment and quantification of supply risks in the supply chain. The research has built on advanced Knowledge Discovery techniques and has resulted to a software implementation to be able to do so. The methodology developed and presented here resembles the well-known consumer credit scoring methods as it leads to a similar metric, or score, for assessing a supplier’s reliability and risk of conducting business with that supplier. However, the focus is on a wide range of operational metrics rather than just financial, which credit scoring techniques typically focus on. The core of the methodology comprises the application of Knowledge Discovery techniques to extract the likelihood of possible risks from within a range of available datasets. In combination with cross-impact analysis, those datasets are examined for establish the inter-relationships and mutual connections among several factors that are likely contribute to risks associated with particular suppliers. This approach is called conjugation analysis. The resulting parameters become the inputs into a logistic regression which leads to a risk scoring model the outcome of the process is the standardized risk score which is analogous to the well-known consumer risk scoring model, better known as FICO score. The proposed methodology has been applied to an Air Conditioning manufacturing company. Two models have been developed. The first identifies the supply risks based on the data about purchase orders and selected risk factors. With this model the likelihoods of delivery failures, quality failures and cost failures are obtained. The second model built on the first one but also used the actual data about the performance of supplier to identify risks of conducting business with particular suppliers. Its target was to provide quantitative measures of an individual supplier’s risk level. The supplier risk scoring model is tested on the data acquired from the company for its performance analysis. The supplier risk scoring model achieved 86.2% accuracy, while the area under curve (AUC) was 0.863. The AUC curve is much higher than required model’s validity threshold value of 0.5. It represents developed model’s validity and reliability for future data. The numerical studies conducted with real-life datasets have demonstrated the effectiveness of the proposed methodology and system as well as its future potential for industrial adoption

Brunel University Research Archive

Making and using large models of complex systems: The Poverty Reduction Model

Author: Boltwood Alana
Publication venue
Publication date: 18/12/2018
Field of study

A system model is an abstract representation of a complex social system, which can be useful for facilitated sensemaking and decision support. This study presents a causal model format adapted from causal loop diagramming to integrate more knowledge of complexity, with higher comprehension. As a case study, a Poverty Reduction Model was developed with over 1100 cause-and-effect relationships between more than 550 factors. Staff of the Yonge Street Mission social services agency used this model to find interventions to reduce poverty in Toronto, which were prioritized using the system model in combination with rating, scoring and discussion. A framework is provided to balance model scope and quality requirements with the time and resources available to an organization. Modelling and option-comparison methods are documented for potential re-use by other organizations

OCAD University Open Research Repository

Use of (Q)SAR genotoxicity predictions and fuzzy multicriteria decision-making for priority ranking of ethoxyquin transformation products

Author: Bernhard Annette
Berntssen Marc HG
Braeuning A.
Frenzel F.
Merel Sylvain Alain Yves
Rasinger Josef
Ørnsrud Robin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Ethoxyquin (EQ; 6-ethoxy-2,2,4-trimethyl-1,2-dihydroquinoline) has been used as an antioxidant in feed for pets and food-producing animals, including farmed fish such as Atlantic salmon. In Europe, the authorization for use of EQ as a feed additive was suspended, due to knowledge gaps concerning the presence and toxicity of EQ transformation products (TPs). Recent analytical studies focusing on the detection of EQ TPs in farmed Atlantic salmon feed and fillets reported the detection of a total of 27 EQ TPs, comprising both known and previously not described EQ TPs. We devised and applied an in silico workflow to rank these EQ TPs according to their genotoxic potential and their occurrence data in Atlantic salmon feed and fillet. Ames genotoxicity predictions were obtained applying a suite of five (quantitative) structure–activity relationship ((Q)SAR) tools, namely VEGA, TEST, LAZAR, Derek Nexus and Sarah Nexus. (Q)SAR Ames genotoxicity predictions were aggregated using fuzzy analytic hierarchy process (fAHP) multicriteria decision-making (MCDM). A priority ranking of EQ TPs was performed based on combining both fAHP ranked (Q)SAR predictions and analytical occurrence data. The applied workflow prioritized four newly identified EQ TPs for further investigation of genotoxicity. The fAHP-based prioritization strategy described here, can easily be applied to other toxicity endpoints and groups of chemicals for priority ranking of compounds of most concern for subsequent experimental and mechanistic toxicology analyses.publishedVersio

Brage IMR