Prognostic methods in cardiac surgery and postoperative intensive care

Verduijn, M Marion

Prognostic methods in cardiac surgery and postoperative intensive care

Authors: M Marion Verduijn
Publication date: 1 January 2007
Publisher: Technische Universiteit Eindhoven
Doi

Abstract

Cardiac surgery has become an important medical intervention in the treatment of end-stage cardiac diseases. Similar to many clinical domains, however, today the field of cardiac surgery is under pressure: more and more patients are expected to be treated with high-quality care within limited time and cost spans. This has induced an increasing urge to evaluate and improve the efficiency and quality of the delivered care. Research on predictive factors of clinical outcomes (e.g., death) and the amount and duration of treatment is indispensable in this respect. A common strategy to identify predictive factors is the development of prognostic models from data. The resulting models can be used for risk assessment and case load planning. Furthermore, the models form instruments that can assist in the evaluation of care quality by adjusting raw outcomes for case mix. The development of new prognostic methods using machine learning methodology for cardiac surgery and postoperative intensive care is the topic of this thesis. Chapter 1 introduces the multidisciplinary care process of cardiac surgery and presents the objectives of the thesis. The care process is roughly composed of a preoperative stage of preassessment, a stage of the surgical intervention in the operation room, and a postoperative stage of recovery at the intensive care unit (ICU) and the nursing ward. With the introduction of modern clinical information systems, large amounts of patient data are routinely recorded during patient care, including data of the (cardiac) disease history of the patients, operative details, and monitoring data. Moreover, clinical outcomes such as length of stay and death are recorded in these systems. The information systems form a new data source for development of prognostic models. Instruments that are currently in the prognostic toolbox of clinicians and managers involved in cardiac surgery are models that generally allow only preoperative risk assessment of a single outcome variable; standard statistical methods (e.g., logistic regression analysis) have been used for model development. The field of machine learning offers methods for data modeling that are potentially suitable for development of prognostic models for their graphical model representation. Tree models and Bayesian networks are typical examples hereof; their graphical representation may contribute to the interpretation of the models. The general objective of this thesis is to employ and investigate these machine learning methods for modeling data that are recorded during routine patient care, in order to extend the practitioner’s prognostic toolbox. The project aims to provide a ‘proof of concept’ of the prognostic methods rather than delivering prognostic instruments as clinical end products. Chapter 2 presents the prognostic Bayesian network (PBN) as a new type of prognostic model that builds on the Bayesian network methodology, and implements a dynamic, process-oriented view on prognosis. In this model, the mutual relationships between variables that come into play during subsequent stages of the care process, including clinical outcomes, are modeled as a Bayesian network. A procedure for learning PBNs from data is introduced that optimizes performance of the network’s primary task, outcome prediction, and exploits the temporal structure of the health care process being modeled. Furthermore, it adequately handles the fact that patients may die during the intervention and ‘drop out’ of the process; this phenomenon is represented in the network by subsidiary outcome variables. In the procedure, the structure of the Bayesian network is induced from the data by selecting, for each network variable, the best predictive feature subset of the other variables. For that purpose, local supervised learning models are recursively learned in a top-down approach, starting at the outcome variable of the health care process. Each set of selected features is used as the set of parent nodes of the corresponding variable, and represented as such with incoming arcs in a graph. Application of the procedure yields a directed acyclic graph as graphical part of the network, and a collection of local predictive models as the numerical part; they jointly constitute the PBN. In contrast to traditional prognostic models, PBNs explicate the scenarios that lead to disease outcomes, and can be used to update predictions when new information becomes available. Moreover, they can be used for what-if scenario analysis to identify critical events to account for during patient care, and risk factor analysis to examine which variables are important predictors of these events. In order to support their use in clinical practice, PBNs are proposed to be embedded in a prognostic system with a three-tiered architecture. In the architecture, a PBN is supplemented with a task layer that translates the user’s prognostic information needs to probabilistic inference queries for the network, and a presentation layer that presents the aggregated results of the inference to the user. An application of the proposed PBN, the learning procedure, and the threetiered prognostic system in cardiac surgery is presented in Chapter 3. The learning procedures was applied to a data set of 6778 patients for development of a PBN that includes 22 preoperative, operative, and postoperative variables. Hospital mortality was used as outcome variable in the network, and operative mortality and postoperative mortality as subsidiary outcome variables to represent patient dropout. The method of class probability trees served as supervised learning method for feature subset selection and induction of local predictive models. The predictive performance of the resulting PBN was evaluated for a number of complication and mortality variables on an independent set of 3336 patients for two prediction times: during the preoperative stage, and at ICU admission. The results showed a good calibration for the variables that describe ICU length of stay longer than 24h and the occurrence of cardiac complications, but a poor calibration for the mortality variables; especially for these variables, the predicted probabilities of the PBN were found to be underdispersed. The mortality variables had best discrimination, though. In order to verify the effectiveness of the dedicated PBN learning procedure, the performance results of the PBN were compared to the predictive performance of a network that was induced from the learning set using a standard network learning algorithm where candidate networks are selected using the minimal description length (MDL) principle. The PBN outperformed the MDL network for all variables at both prediction times with respect to its discriminative ability. Similar calibration results were observed for the MDL network, suggesting that the underdispersion of predicted probabilities is directly related to the Bayesian network methodology. The chapter concludes with presenting a prototype implementation of a prognostic system that embeds the PBN, ProCarSur. Prediction of the postoperative ICU length of stay (LOS) fulfils an important role in identification of patients with a high risk for a slow and laborious recovery process. Furthermore, it provides useful information for resource allocation and case load planning. When developing predictive models for this outcome, the prediction problem is frequently reduced to a two-class problem to estimate a patient’s risk of a prolonged ICU LOS. The dichotomization threshold is often chosen in an unsystematic manner prior to model development. In Chapter 4, methodology is presented that extends existing procedures for predictive modeling with optimization of the outcome definition for prognostic purposes. From the range of possible threshold values, the value is chosen for which the corresponding predictive model has maximal precision based on the data. The MALOR performance statistic is proposed to compare the precision of models for different dichotomizations of the outcome. Unlike other precision measures, this statistic is insensitive to the prevalence of positive cases in a two-class prediction problem, and therefore a suitable performance statistic to optimize the outcome definition in the modeling process. We applied this procedure to data from 2327 cardiac surgery patients who stayed at the ICU for at least one day to build a model for prediction of the outcome ICU LOS after one day of stay. The method of class probability trees was used for model development, and model precision was assessed in comparison to predictions from tree ensembles. Within the data set, the best model precision was found at a dichotomization threshold of seven days. The value of the MALOR statistic for this threshold was not statistically different than for the threshold of four days, which was therefore also considered as a good candidate to dichotomize ICU LOS within this patient group. During a patient’s postoperative ICU stay, many physiological variables are measured with high frequencies by monitoring systems and the resulting measurements automatically recorded in information systems. The temporal structure of these data requires application of dedicated machine learning methods. A common strategy in prediction from temporal data is the extraction of relevant meta features prior to the use of standard supervised learning methods. This strategy involves the fundamental dilemma to what extent feature extraction should be guided by domain knowledge, and to what extent it should be guided by the available data. Chapter 5 presents an empirical comparison of two temporal abstraction procedures that differ in this respect. The first procedure derives meta features that are predefined using existing concepts from the clinician’s language and form symbolic descriptions of the data. The second procedure searches among a large set of numerical meta features number (summary statistics) to discover those that have predictive value. The procedures were applied to ICU monitoring data of 664 patients who underwent cardiac surgery to estimate the risk of prolonged mechanical ventilation. The predictive value of the features resulting from both procedures were systematically compared, and based on each type of abstraction, a class probability tree model was developed. The numerical meta features extracted by the second procedure were found to be more informative than the symbolic meta features of the first procedure, and a superior predictive performance was observed for the associated tree model. The findings in this case study indicate that in prediction from monitoring data, it is preferable to reserve a more important role for the available data in feature extraction than using existing concepts from the medical language for this purpose. Automatically recorded monitoring data often contain inaccurate and erroneous measurements, or ‘artifacts’. Data artifacts hamper interpretation and analysis of the data, as they do not reflect the true state of the patient. In the literature, several methods have been described for filtering artifacts from ICU monitoring data. These methods require however that a reference standard be available in the form of a data sample where artifacts are marked by an experienced clinician. Chapter 6 presents a study on the reliability of such reference standards obtained from clinical experts and on its effect on the generalizability of the resulting artifact filters. Individual judgments of four physicians, a majority vote judgment, and a consensus judgment were obtained for 30 time series of three monitoring variables: mean arterial blood pressure (ABPm), central venous pressure (CVP), and heart rate (HR). The individual and joint judgments were used to tune three existing automated filtering methods and to evaluate the performance of the resulting filters. The results showed good agreement among the physicians for the CVP data; low interrater agreement was observed for the ABPm and HR data. Artifact filters for these two variables developed using judgments of individual experts were found to moderately generalize to new time series and other experts. An improved performance of the filters was found for the three variable types when joint judgments were used for tuning the filtering methods. These results indicate that reference standards obtained from individual experts are less suitable for development and evaluation of ar tifact filters for monitoring data than joint judgments. A basic, and frequently applied, method for automated artifact detection is moving median filtering. Furthermore, alternative methods such as ArtiDetect described by C. Cao et al. and a tree induction method described by C.L. Tsien et al. have been proposed in the literature for artifacts detection in ICU monitoring data. Chapter 7 presents an empirical comparison of the performance of filters developed using these three methods and a new method that combines these three methods. The 30 ABPm, CVP, and HR time series were used for filter development and evaluation; the consensus judgment of the time series obtained from the four physicians was used as reference standard in this study. No single method outperformed the others on all variables. For the ABPm series, the highest sensitivity value was observed for ArtiDetect, while moving median filtering had superior positive predictive value. All methods obtained satisfactory results for the CVP data; high performance was observed for ArtiDetect and the combined method both in terms of sensitivity and positive predictive value. The combined method performed better than the other methods for the HR data. Because of the large differences between variables, it is advised to employ a well-chosen inductive bias when choosing an artifact detection method for a given variable, i.e., a bias that fits the variable’s characteristics and the corresponding types of artifact. The principal findings of this thesis are summarized and discussed in Chapter 8. The thesis primarily contributes to adapting machine learning methods to the induction of prognostic models from routinely recorded data in contemporary cardiac surgery and postoperative intensive care. Notwithstanding the graphical representation of Bayesian networks, the interpretation of the cardiac surgical PBN was experienced to be difficult (Chapter 3). In addition, tree models were observed to be somewhat misleading: they may not reveal all factors in the data that are important for the prediction problem at hand. A persistent problem turned out to be the incorporation of domain knowledge into machine learning methods: knowledge appeared to be not readily available for prognostic problems in cardiac surgery. Moreover, the formats in which knowledge is represented in existing methods were found to be not always appropriate for prognosis. These findings are clearly illustrated in the study on feature extraction from ICU monitoring data (Chapter 5). Furthermore, generally agreed knowledge on artifact measurements in monitoring data appeared to be limitedly available, and employing opinions of individual experts in modeling was found to highly affect the generalizability of the resulting models (Chapter 6). Future steps to come from a ‘proof of concept’ of the presented methods to reliable prognostic instruments for clinical practice involve model development from multicenter data sets that include the relevant patient and process variables, and their implementation in clinical practice. Finally, evaluation studies will be necessary to assess the actual benefit of the instruments in supporting clinical staff and management for evaluation and improvement of the efficiency and quality of patient care