24 research outputs found

    USES OF CONTINUOUS-TIME MARKOV CHAIN TO DESCRIBE LONGITUDINAL PATIENT-REPORTED OUTCOMES FOR SURVIVAL PREDICTION AND DIMENSION REDUCTION

    Get PDF
    A patient-reported outcome (PRO) is a type of outcome reported directly from patients, and it has been widely used in medical research and clinical trials to measure a patient’s symptoms, health-related quality of life, physical functioning, and health status. Previous studies have linked PROs to survival outcomes, but most of them only used the PRO information at baseline or at a specific clinical time point [1, 2]. Even though some of these studies collected longitudinal PROs, only few of them evaluated the association between the longitudinal PROs and a survival outcome. One of the major challenges in longitudinal PRO studies is to address the individual heterogeneity in PRO repeated measurements. Due to the fact that PRO is reported directly from patients, and different patients may have different experiences, longitudinal PROs have been often observed with individual heterogeneity, yet current methods [3-5] are not able to account for the individual heterogeneity. Therefore, in this research, we developed three methods using two-state Continuous-Time Markov Chain (CTMC) to summarize longitudinal PRO. The primary summary used is the estimated state transition rates, which serve as summary statistics to depict longitudinal PRO patterns at the individual level. These transition rates can also be incorporated into survival models as predictors or into factor analysis as observed variables. Specifically, in the first two papers, we developed prognostic models that contained baseline covariates and a longitudinal process in two survival models, Weibull Regression and Cox Proportional Hazard Regression, with different estimation approaches. Simulation studies were conducted to validate the proposed methods, and the proposed models were then applied to two PRO studies separately, with both using repeated PRO measurements during the treatment period in cancer patients to predict the survival outcomes that happened after the treatment. In the third paper, we then integrated two-state CTMC with factor analysis to evaluate the usage of CTMC in PRO symptom clustering. This study showed that CTMC could well summarize the longitudinal PRO information during the treatment period of cancer patients. The underlying construct of patient-reported symptoms had also met our expectations from clinical experience

    Modeling Approaches for Cost and Cost-Effectiveness Estimation Using Observational Data

    Get PDF
    The estimation of treatment effects on medical costs and cost effectiveness measures is complicated by the need to account for non-independent censoring, skewness and the effects of confounders. In this dissertation, we develop several cost and cost-effectiveness tools that account for these issues. Since medical costs are often collected from observational claims data, we investigate propensity score methods such as covariate adjustment, stratification, inverse probability weighting and doubly robust weighting. We also propose several doubly robust estimators for common cost effectiveness measures. Lastly, we explore the role of big data tools and machine learning algorithms in cost estimation. We show how these modern techniques can be applied to big data manipulation, cost prediction and dimension reduction

    C4 - TOC

    Get PDF

    Kolmogorov extraction and resource-bounded zero-one laws

    Get PDF
    Traditional extractors show how to efficiently extract randomness from weak random sources with help of small truly random bits. Recent breakthroughs on multi-source extractors gave an efficient way to extract randomness from independent sources. We apply these techniques to extract Kolmogorov complexity. More formally: 1. for any [alpha]\u3e 0, given a string x with K(x)\u3e (x)[superscript a], we show how to use O(log (x)) bits of advice to efficiently compute another string y, (y) = (x)[superscript omega (1)], with K(y)\u3e (y) - O(log (y)); 2. for any [alpha, xi]\u3e 0, given a string x with K(x)\u3e [alpha] (x), we show how to use a constant number of advice bits to efficiently compute another string y, (y) = [omega]((x)), with K(y)\u3e (1 - [epsilon])(y). This result holds for both classical and space-bounded Kolmogorov complexity. We use the above extraction procedure for space-bounded complexity to establish zero-one laws for both polynomial-space strong dimension and strong scaled dimension. Our results include: (i) If Dim[subscript pspace](E)\u3e 0, then Dim[subscript pspace](E/O(l)) = 1. (ii) Dim(E/O(l) l ESPACE) is either 0 or 1. (iii) Dim(E/poly l ESPACE) is either 0 or 1. (iv) Either Dim[superscript (1) over subscript psspace](E/O(n)) = 0 or Dim[superscript ( -1) over subscript pspace(E/0(n)) = 1. In other words, from a dimension standpoint and with respect to a small amount of advice, the exponential-time class E is either minimally complex or maximally complex within ESPACE

    Martingale families and dimension in P

    Get PDF
    AbstractWe introduce a new measure notion on small complexity classes (called F-measure), based on martingale families, that gets rid of some drawbacks of previous measure notions: it can be used to define dimension because martingale families can make money on all strings, and it yields random sequences with an equal frequency of 0’s and 1’s. On larger complexity classes (E and above), F-measure is equivalent to Lutz resource-bounded measure. As applications to F-measure, we answer a question raised in [E. Allender, M. Strauss, Measure on small complexity classes, with application for BPP, in: Proc. of the 35th Ann. IEEE Symp. on Found. of Comp. Sci., 1994, pp. 807–818] by improving their result to: for almost every language A decidable in subexponential time, PA=BPPA. We show that almost all languages in PSPACE do not have small non-uniform complexity. We compare F-measure to previous notions and prove that martingale families are strictly stronger than Γ-measure [E. Allender, M. Strauss, Measure on small complexity classes, with application for BPP, in: Proc. of the 35th Ann. IEEE Symp. on Found. of Comp. Sci., 1994, pp. 807–818], we also discuss the limitations of martingale families concerning finite unions. We observe that all classes closed under polynomial many-one reductions have measure zero in EXP iff they have measure zero in SUBEXP. We use martingale families to introduce a natural generalization of Lutz resource-bounded dimension [J.H. Lutz, Dimension in complexity classes, in: Proceedings of the 15th Annual IEEE Conference on Computational Complexity, 2000, pp. 158–169] on P, which meets the intuition behind Lutz’s notion. We show that P-dimension lies between finite-state dimension and dimension on E. We prove an analogue of a Theorem of Eggleston in P, i.e. the class of languages whose characteristic sequence contains 1’s with frequency α, has dimension the Shannon entropy of α in P

    Benchmarking and Practical Evaluation of Machine and Statistical Learning Methods in Credit Scoring: A Method Selection Perspective

    Full text link
    Predictive models are important tools used in all scientific fields. Machine learning (ML) algorithms and statistical models are widely used for decision-making because of their capability to tackle intricate and unique problems. In domains where data are high-dimensional and contain irrelevant and redundant features, ML algorithms are known to have superior performance over traditional (statistical) learning methods. However, researchers and analysts are often faced with a myriad of techniques to choose from, with no clear consensus on which will perform best for their specific task. Considering resource limitations, exhaustive exploration of all available methods is impractical and often fails to yield significant improvements, making it an unadvised approach.In this study, we propose an efficient methodology for benchmarking feature selection and machine-learning algorithms with a practical evaluation in the context of credit scoring. A survey of credit-scoring literature was conducted to identify prevalent and high-performing methods, and a subset of methods was selected based on computational efficiency, interpretability, and predictive performance. The search led to the methods of chi-square, oblique principal component analysis, and genetic algorithm for feature selection, penalized logistic regression, support vector machines, extreme gradient boosted decision trees, and random forest for classification. We then designed a simulation study to evaluate the performance of the selected methods using relevant metrics. These results guided the selection of the most practical and effective methods, which were subsequently tested in a real-world credit-scoring environment. The simulation results indicate that penalized logistic regression and extreme gradient boosting with genetic algorithm feature selection emerged as the best-performing methods for prediction and dimension reduction. Furthermore, the study examined the impact of data characteristics on prediction performance. This research contributes to the method selection and optimization in credit scoring and highlights avenues for further investigation in related research areas
    corecore