150 research outputs found

    An Interactive Visual Tool to Enhance Understanding of Random Forest Predictions

    Get PDF
    Random forests are known to provide accurate predictions, but the predictions are not easy to understand. In order to provide support for understanding such predictions, an interactive visual tool has been developed. The tool can be used to manipulate selected features to explore “what-if” scenarios. It exploits the internal structure of decision trees in a trained forest model and presents this information as interactive plots and charts. In addition, the tool presents a simple decision rule as an explanation for the prediction. It also presents the recommendation for reassignments of feature values of the example that leads to change in the prediction to a preferred class. An evaluation of the tool was undertaken in a large truck manufacturing company, targeting the fault prediction of a selected component in trucks. A set of domain experts were invited to use the tool and provide feedback in post-task interviews. The result of this investigation suggests that the tool indeed may aid in understanding the predictions of a random forest, and also allows for gaining new insights

    Interpretable Graph Neural Networks for Tabular Data

    Full text link
    Data in tabular format is frequently occurring in real-world applications. Graph Neural Networks (GNNs) have recently been extended to effectively handle such data, allowing feature interactions to be captured through representation learning. However, these approaches essentially produce black-box models, in the form of deep neural networks, precluding users from following the logic behind the model predictions. We propose an approach, called IGNNet (Interpretable Graph Neural Network for tabular data), which constrains the learning algorithm to produce an interpretable model, where the model shows how the predictions are exactly computed from the original input features. A large-scale empirical investigation is presented, showing that IGNNet is performing on par with state-of-the-art machine-learning algorithms that target tabular data, including XGBoost, Random Forests, and TabNet. At the same time, the results show that the explanations obtained from IGNNet are aligned with the true Shapley values of the features without incurring any additional computational overhead.Comment: 18 pages, 12 figure

    Explaining Random Forest Predictions with Association Rules

    Get PDF
    Random forests frequently achieve state-of-the-art predictive performance. However, the logic behind their predictions cannot be easily understood, since they are the result of averaging often hundreds or thousands of, possibly conflicting, individual predictions. Instead of presenting all the individual predictions, an alternative is proposed, by which the predictions are explained using association rules generated from itemsets representing paths in the trees of the forest. An empirical investigation is presented, in which alternative ways of generating the association rules are compared with respect to explainability, as measured by the fraction of predictions for which there is no applicable rule and by the fraction of predictions for which there is at least one applicable rule that conflicts with the forest prediction. For the considered datasets, it can be seen that most predictions can be explained by the discovered association rules, which have a high level of agreement with the underlying forest. The results do not single out a clear winner of the considered alternatives in terms of unexplained and disagreement rates, but show that they are associated with substantial differences in computational cost

    Releasing a Swedish Clinical Corpus after Removing all Words - De-identification Experiments with Conditional Random Fields and Random Forests

    Get PDF
    Abstract Patient records contain valuable information in the form of both structured data and free text; however this information is sensitive since it can reveal the identity of patients. In order to allow new methods and techniques to be developed and evaluated on real world clinical data without revealing such sensitive information, researchers could be given access to de-identified records without protected health information (PHI), such as names, telephone numbers, and so on. One approach to minimizing the risk of revealing PHI when releasing text corpora from such records is to include only features of the words instead of the words themselves. Such features may include parts of speech, word length, and so on from which the sensitive information cannot be derived. In order to investigate what performance losses can be expected when replacing specific words with features, an experiment with two state-of-the-art machine learning methods, conditional random fields and random forests, is presented, comparing their ability to support de-identification, using the Stockholm EPR PHI corpus as a benchmark test. The results indicate severe performance losses when the actual words are removed, leading to the conclusion that the chosen features are not sufficient for the suggested approach to be viable

    Predicting NOx sensor failure in heavy duty trucks using histogram-based random forests

    Get PDF
    Being able to accurately predict the impending failures of truck components is often associated with significant amount of cost savings, customer satisfaction and flexibility in maintenance service plans. However, because of the diversity in the way trucks typically are configured and their usage under different conditions, the creation of accurate prediction models is not an easy task. This paper describes an effort in creating such a prediction model for the NOx sensor, i.e., a component measuring the emitted level of nitrogen oxide in the exhaust of the engine. This component was chosen because it is vital for the truck to function properly, while at the same time being very fragile and costly to repair. As input to the model, technical specifications of trucks and their operational data are used. The process of collecting the data and making it ready for training the model via a slightly modified Random Forest learning algorithm is described along with various challenges encountered during this process. The operational data consists of features represented as histograms, posing an additional challenge for the data analysis task. In the study, a modified version of the random forest algorithm is employed, which exploits the fact that the individual bins in the histograms are related, in contrast to the standard approach that would consider the bins as independent features. Experiments are conducted using the updated random forest algorithm, and they clearly show that the modified version is indeed beneficial when compared to the standard random forest algorithm. The performance of the resulting prediction model for the NOx sensor is promising and may be adopted for the benefit of operators of heavy trucks

    REAR SEAT SAFETY IN FRONTAL TO SIDE IMPACTS – FOCUSING ON OCCUPANTS FROM 3YRS TO SMALL ADULTS

    Get PDF
    ABSTRACT This study presents a broad comprehensive research effort that combines expertise from industry and academia and uses various methodologies with applied research directed towards countermeasures. The project includes real world crash data analysis, real world driving studies and crash testing and simulations, aiming at enhancing the safety of forward facing child occupants (aged 3y to small adults) in the rear seat during frontal to side impacts. The real world crash data analyses of properly restrained children originate from European as well as US data. Frontal and side impact crash tests are analyzed using different sizes of crash test dummies in different sitting postures. Side impact parameter studies using FE-models are run. The sitting posture and behavior of 12 children are monitored while riding in the rear seat. Also, the body kinematics and belt position during actual braking and turning maneuvers are studied for 16 rear seat child occupants and for various child dummies. Real world crash data indicates that several of the injured children in frontal impacts, despite being properly restrained, impacted the vehicle interior structure with their head/face resulting in serious injury. This was attributed to oblique crashes, pre-crash vehicle maneuvers or high crash severity. Crash tests confirm the importance of proper initial belt-fit for best protection. The crash tests also highlight the difficulty in obtaining the real world kinematics and head impact locations using existing crashtest dummies and test procedures. The side impact parameter studies indicate that the vehicle’s occupant protection systems, such as airbags and seat belt pretensioners, play an important role in protecting children as well. The results from the on-road driving studies illustrate the variation of sitting postures during riding in the rear seat giving valuable input to the effects of the restraint systems and to how representative the standardized dummy seating positioning procedures are. The results from the maneuver driving studies illustrate the importance of understanding the kinematics of a child relative to the seat belt in a real world maneuver situation. Real world safety of rear seat occupants, especially children, involves evaluation of protection beyond standard crash testing scenarios in frontal and side impact conditions. This project explores the complete context of rear seat protection in impact situations ranging from front to side and directions in between highlighting the importance of pre-crash posture and behavior. This research project at SAFER (Vehicle and Traffic Safety Centre at Chalmers), where researchers from the industry and universities cooperate with the aim to further improve safety for children (from 3y) to small adults in the rear seat, speeds up the process to safety implementation due to the interaction between academic and industrial researchers

    Observation of a new light-induced skyrmion phase in the Mott insulator Cu2OSeO3

    Full text link
    We report the discovery of a novel skyrmion phase in the multiferroic insulator Cu2OSeO3 for magnetic fields below the equilibrium skyrmion pocket. This phase can be accessed by exciting the sample out of equilibrium with near-infrared (NIR) femtosecond laser pulses but can not be reached by any conventional field cooling protocol. From the strong wavelength dependence of the photocreation process and via spin dynamics simulations, we identify the magnetoelastic effect as the most likely photocreation mechanism. This effect results in a transient modification of the magnetic interaction extending the equilibrium skyrmion pocket to lower magnetic fields. Once created, the skyrmions rearrange and remain stable over a long time, reaching minutes. The presented results are relevant for designing high-efficiency non-volatile data storage based on magnetic skyrmions.Comment: 11 pages, 5 figure

    Variation in plasma calcium analysis in primary care in Sweden - a multilevel analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Primary hyperparathyroidism (pHPT) is a common disease that often remains undetected and causes severe disturbance especially in postmenopausal women. Therefore, national recommendations promoting early pHPT detection by plasma calcium (P-Ca) have been issued in Sweden. In this study we aimed to investigate variation of P-Ca analysis between physicians and health care centres (HCCs) in primary care in county of Skaraborg, Sweden.</p> <p>Methods</p> <p>In this cross sectional study of patients' records during 2005 we analysed records from 154 629 patients attending 457 physicians at 24 HCCs. We used multilevel logistic regression analysis (MLRA) and adjusted for patient, physician and HCC characteristics. Differences were expressed as median odds ratio (MOR).</p> <p>Results</p> <p>There was a substantial variation in number of P-Ca analyses between both HCCs (MOR<sub>HCC </sub>1.65 [1.44-2.07]) and physicians (MOR<sub>physician </sub>1.95 [1.85-2.08]). The odds for a P-Ca analysis were lower for male patients (OR 0.80 [0.77-0.83]) and increased with the number of diagnoses (OR 25.8 [23.5-28.5]). Sex of the physician had no influence on P-Ca test ordering (OR 0.93 [0.78-1.09]). Physicians under education ordered most P-Ca analyses (OR 1.69 [1.35-2.24]) and locum least (OR 0.73 [0.57-0.94]). More of the variance was attributed to the physician level than the HCC level. Different mix of patients did not explain this variance between physicians. Theoretically, if a patient were able to change both GP and HCC, the odds of a P-Ca analysis would in median increase by 2.45. Including characteristics of the patients, physicians and HCCs in the MLRA model did not explain the variance.</p> <p>Conclusions</p> <p>The physician level was more important than the HCC level for the variation in P-Ca analysis, but further exploration of unidentified contextual factors is crucial for future monitoring of practice variation.</p

    Validity of registration of ICD codes and prescriptions in a research database in Swedish primary care: a cross-sectional study in Skaraborg primary care database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, several primary care databases recording information from computerized medical records have been established and used for quality assessment of medical care and research. However, to be useful for research purposes, the data generated routinely from every day practice require registration of high quality. In this study we aimed to investigate (i) the frequency and validity of ICD code and drug prescription registration in the new Skaraborg primary care database (SPCD) and (ii) to investigate the sources of variation in this registration.</p> <p>Methods</p> <p>SPCD contains anonymous electronic medical records (ProfDoc III) automatically retrieved from all 24 public health care centres (HCC) in Skaraborg, Sweden. The frequencies of ICD code registration for the selected diagnoses diabetes mellitus, hypertension and chronic cardiovascular disease and the relevant drug prescriptions in the time period between May 2002 and October 2003 were analysed. The validity of data registration in the SPCD was assessed in a random sample of 50 medical records from each HCC (n = 1200 records) using the medical record text as gold standard. The variance of ICD code registration was studied with multi-level logistic regression analysis and expressed as median odds ratio (MOR).</p> <p>Results</p> <p>For diabetes mellitus and hypertension ICD codes were registered in 80-90% of cases, while for congestive heart failure and ischemic heart disease ICD codes were registered more seldom (60-70%). Drug prescription registration was overall high (88%). A correlation between the frequency of ICD coded visits and the sensitivity of the ICD code registration was found for hypertension and congestive heart failure but not for diabetes or ischemic heart disease.</p> <p>The frequency of ICD code registration varied from 42 to 90% between HCCs, and the greatest variation was found at the physician level (MOR<sub>PHYSICIAN </sub>= 4.2 and MOR<sub>HCC </sub>= 2.3).</p> <p>Conclusions</p> <p>Since the frequency of ICD code registration varies between different diagnoses, each diagnosis must be separately validated. Improved frequency and quality of ICD code registration might be achieved by interventions directed towards the physicians where the greatest amount of variation was found.</p
    • 

    corecore