150 research outputs found
An Interactive Visual Tool to Enhance Understanding of Random Forest Predictions
Random forests are known to provide accurate predictions, but the predictions are not easy to understand. In order to provide support for understanding such predictions, an interactive visual tool has been developed. The tool can be used to manipulate selected features to explore âwhat-ifâ scenarios. It exploits the internal structure of decision trees in a trained forest model and presents this information as interactive plots and charts. In addition, the tool presents a simple decision rule as an explanation for the prediction. It also presents the recommendation for reassignments of feature values of the example that leads to change in the prediction to a preferred class. An evaluation of the tool was undertaken in a large truck manufacturing company, targeting the fault prediction of a selected component in trucks. A set of domain experts were invited to use the tool and provide feedback in post-task interviews. The result of this investigation suggests that the tool indeed may aid in understanding the predictions of a random forest, and also allows for gaining new insights
Interpretable Graph Neural Networks for Tabular Data
Data in tabular format is frequently occurring in real-world applications.
Graph Neural Networks (GNNs) have recently been extended to effectively handle
such data, allowing feature interactions to be captured through representation
learning. However, these approaches essentially produce black-box models, in
the form of deep neural networks, precluding users from following the logic
behind the model predictions. We propose an approach, called IGNNet
(Interpretable Graph Neural Network for tabular data), which constrains the
learning algorithm to produce an interpretable model, where the model shows how
the predictions are exactly computed from the original input features. A
large-scale empirical investigation is presented, showing that IGNNet is
performing on par with state-of-the-art machine-learning algorithms that target
tabular data, including XGBoost, Random Forests, and TabNet. At the same time,
the results show that the explanations obtained from IGNNet are aligned with
the true Shapley values of the features without incurring any additional
computational overhead.Comment: 18 pages, 12 figure
Explaining Random Forest Predictions with Association Rules
Random forests frequently achieve state-of-the-art predictive performance. However, the logic behind their predictions cannot be easily understood, since they are the result of averaging often hundreds or thousands of, possibly conflicting, individual predictions. Instead of presenting all the individual predictions, an alternative is proposed, by which the predictions are explained using association rules generated from itemsets representing paths in the trees of the forest. An empirical investigation is presented, in which alternative ways of generating the association rules are compared with respect to explainability, as measured by the fraction of predictions for which there is no applicable rule and by the fraction of predictions for which there is at least one applicable rule that conflicts with the forest prediction. For the considered datasets, it can be seen that most predictions can be explained by the discovered association rules, which have a high level of agreement with the underlying forest. The results do not single out a clear winner of the considered alternatives in terms of unexplained and disagreement rates, but show that they are associated with substantial differences in computational cost
Releasing a Swedish Clinical Corpus after Removing all Words - De-identification Experiments with Conditional Random Fields and Random Forests
Abstract Patient records contain valuable information in the form of both structured data and free text; however this information is sensitive since it can reveal the identity of patients. In order to allow new methods and techniques to be developed and evaluated on real world clinical data without revealing such sensitive information, researchers could be given access to de-identified records without protected health information (PHI), such as names, telephone numbers, and so on. One approach to minimizing the risk of revealing PHI when releasing text corpora from such records is to include only features of the words instead of the words themselves. Such features may include parts of speech, word length, and so on from which the sensitive information cannot be derived. In order to investigate what performance losses can be expected when replacing specific words with features, an experiment with two state-of-the-art machine learning methods, conditional random fields and random forests, is presented, comparing their ability to support de-identification, using the Stockholm EPR PHI corpus as a benchmark test. The results indicate severe performance losses when the actual words are removed, leading to the conclusion that the chosen features are not sufficient for the suggested approach to be viable
Predicting NOx sensor failure in heavy duty trucks using histogram-based random forests
Being able to accurately predict the impending failures of truck components is often associated with significant amount of cost savings, customer satisfaction and flexibility in maintenance service plans. However, because of the diversity in the way trucks typically are configured and their usage under different conditions, the creation of accurate prediction models is not an easy task. This paper describes an effort in creating such a prediction model for the NOx sensor, i.e., a component measuring the emitted level of nitrogen oxide in the exhaust of the engine. This component was chosen because it is vital for the truck to function properly, while at the same time being very fragile and costly to repair. As input to the model, technical specifications of trucks and their operational data are used. The process of collecting the data and making it ready for training the model via a slightly modified Random Forest learning algorithm is described along with various challenges encountered during this process. The operational data consists of features represented as histograms, posing an additional challenge for the data analysis task. In the study, a modified version of the random forest algorithm is employed, which exploits the fact that the individual bins in the histograms are related, in contrast to the standard approach that would consider the bins as independent features. Experiments are conducted using the updated random forest algorithm, and they clearly show that the modified version is indeed beneficial when compared to the standard random forest algorithm. The performance of the resulting prediction model for the NOx sensor is promising and may be adopted for the benefit of operators of heavy trucks
REAR SEAT SAFETY IN FRONTAL TO SIDE IMPACTS â FOCUSING ON OCCUPANTS FROM 3YRS TO SMALL ADULTS
ABSTRACT
This study presents a broad comprehensive
research effort that combines expertise from
industry and academia and uses various
methodologies with applied research directed
towards countermeasures. The project includes
real world crash data analysis, real world driving
studies and crash testing and simulations,
aiming at enhancing the safety of forward facing
child occupants (aged 3y to small adults) in the
rear seat during frontal to side impacts.
The real world crash data analyses of properly
restrained children originate from European as
well as US data. Frontal and side impact crash
tests are analyzed using different sizes of crash
test dummies in different sitting postures. Side
impact parameter studies using FE-models are
run. The sitting posture and behavior of 12
children are monitored while riding in the rear
seat. Also, the body kinematics and belt position
during actual braking and turning maneuvers are
studied for 16 rear seat child occupants and for
various child dummies.
Real world crash data indicates that several of
the injured children in frontal impacts, despite
being properly restrained, impacted the vehicle
interior structure with their head/face resulting in
serious injury. This was attributed to oblique
crashes, pre-crash vehicle maneuvers or high
crash severity. Crash tests confirm the
importance of proper initial belt-fit for best
protection. The crash tests also highlight the
difficulty in obtaining the real world kinematics
and head impact locations using existing crashtest dummies and test procedures. The side
impact parameter studies indicate that the
vehicleâs occupant protection systems, such as
airbags and seat belt pretensioners, play an
important role in protecting children as well.
The results from the on-road driving studies
illustrate the variation of sitting postures during
riding in the rear seat giving valuable input to the
effects of the restraint systems and to how
representative the standardized dummy seating
positioning procedures are. The results from the
maneuver driving studies illustrate the
importance of understanding the kinematics of a
child relative to the seat belt in a real world
maneuver situation.
Real world safety of rear seat occupants,
especially children, involves evaluation of
protection beyond standard crash testing
scenarios in frontal and side impact conditions.
This project explores the complete context of
rear seat protection in impact situations ranging
from front to side and directions in between
highlighting the importance of pre-crash posture
and behavior.
This research project at SAFER (Vehicle and
Traffic Safety Centre at Chalmers), where
researchers from the industry and universities
cooperate with the aim to further improve safety
for children (from 3y) to small adults in the rear
seat, speeds up the process to safety
implementation due to the interaction between
academic and industrial researchers
Observation of a new light-induced skyrmion phase in the Mott insulator Cu2OSeO3
We report the discovery of a novel skyrmion phase in the multiferroic
insulator Cu2OSeO3 for magnetic fields below the equilibrium skyrmion pocket.
This phase can be accessed by exciting the sample out of equilibrium with
near-infrared (NIR) femtosecond laser pulses but can not be reached by any
conventional field cooling protocol. From the strong wavelength dependence of
the photocreation process and via spin dynamics simulations, we identify the
magnetoelastic effect as the most likely photocreation mechanism. This effect
results in a transient modification of the magnetic interaction extending the
equilibrium skyrmion pocket to lower magnetic fields. Once created, the
skyrmions rearrange and remain stable over a long time, reaching minutes. The
presented results are relevant for designing high-efficiency non-volatile data
storage based on magnetic skyrmions.Comment: 11 pages, 5 figure
Variation in plasma calcium analysis in primary care in Sweden - a multilevel analysis
<p>Abstract</p> <p>Background</p> <p>Primary hyperparathyroidism (pHPT) is a common disease that often remains undetected and causes severe disturbance especially in postmenopausal women. Therefore, national recommendations promoting early pHPT detection by plasma calcium (P-Ca) have been issued in Sweden. In this study we aimed to investigate variation of P-Ca analysis between physicians and health care centres (HCCs) in primary care in county of Skaraborg, Sweden.</p> <p>Methods</p> <p>In this cross sectional study of patients' records during 2005 we analysed records from 154 629 patients attending 457 physicians at 24 HCCs. We used multilevel logistic regression analysis (MLRA) and adjusted for patient, physician and HCC characteristics. Differences were expressed as median odds ratio (MOR).</p> <p>Results</p> <p>There was a substantial variation in number of P-Ca analyses between both HCCs (MOR<sub>HCC </sub>1.65 [1.44-2.07]) and physicians (MOR<sub>physician </sub>1.95 [1.85-2.08]). The odds for a P-Ca analysis were lower for male patients (OR 0.80 [0.77-0.83]) and increased with the number of diagnoses (OR 25.8 [23.5-28.5]). Sex of the physician had no influence on P-Ca test ordering (OR 0.93 [0.78-1.09]). Physicians under education ordered most P-Ca analyses (OR 1.69 [1.35-2.24]) and locum least (OR 0.73 [0.57-0.94]). More of the variance was attributed to the physician level than the HCC level. Different mix of patients did not explain this variance between physicians. Theoretically, if a patient were able to change both GP and HCC, the odds of a P-Ca analysis would in median increase by 2.45. Including characteristics of the patients, physicians and HCCs in the MLRA model did not explain the variance.</p> <p>Conclusions</p> <p>The physician level was more important than the HCC level for the variation in P-Ca analysis, but further exploration of unidentified contextual factors is crucial for future monitoring of practice variation.</p
Validity of registration of ICD codes and prescriptions in a research database in Swedish primary care: a cross-sectional study in Skaraborg primary care database
<p>Abstract</p> <p>Background</p> <p>In recent years, several primary care databases recording information from computerized medical records have been established and used for quality assessment of medical care and research. However, to be useful for research purposes, the data generated routinely from every day practice require registration of high quality. In this study we aimed to investigate (i) the frequency and validity of ICD code and drug prescription registration in the new Skaraborg primary care database (SPCD) and (ii) to investigate the sources of variation in this registration.</p> <p>Methods</p> <p>SPCD contains anonymous electronic medical records (ProfDoc III) automatically retrieved from all 24 public health care centres (HCC) in Skaraborg, Sweden. The frequencies of ICD code registration for the selected diagnoses diabetes mellitus, hypertension and chronic cardiovascular disease and the relevant drug prescriptions in the time period between May 2002 and October 2003 were analysed. The validity of data registration in the SPCD was assessed in a random sample of 50 medical records from each HCC (n = 1200 records) using the medical record text as gold standard. The variance of ICD code registration was studied with multi-level logistic regression analysis and expressed as median odds ratio (MOR).</p> <p>Results</p> <p>For diabetes mellitus and hypertension ICD codes were registered in 80-90% of cases, while for congestive heart failure and ischemic heart disease ICD codes were registered more seldom (60-70%). Drug prescription registration was overall high (88%). A correlation between the frequency of ICD coded visits and the sensitivity of the ICD code registration was found for hypertension and congestive heart failure but not for diabetes or ischemic heart disease.</p> <p>The frequency of ICD code registration varied from 42 to 90% between HCCs, and the greatest variation was found at the physician level (MOR<sub>PHYSICIAN </sub>= 4.2 and MOR<sub>HCC </sub>= 2.3).</p> <p>Conclusions</p> <p>Since the frequency of ICD code registration varies between different diagnoses, each diagnosis must be separately validated. Improved frequency and quality of ICD code registration might be achieved by interventions directed towards the physicians where the greatest amount of variation was found.</p
- âŠ