1,208 research outputs found

    Analysis of overfitting in the regularized Cox model

    Get PDF
    The Cox proportional hazards model is ubiquitous in the analysis of time-to-event data. However, when the data dimension p is comparable to the sample size NN, maximum likelihood estimates for its regression parameters are known to be biased or break down entirely due to overfitting. This prompted the introduction of the so-called regularized Cox model. In this paper we use the replica method from statistical physics to investigate the relationship between the true and inferred regression parameters in regularized multivariate Cox regression with L2 regularization, in the regime where both p and N are large but with p/N ~ O(1). We thereby generalize a recent study from maximum likelihood to maximum a posteriori inference. We also establish a relationship between the optimal regularization parameter and p/N, allowing for straightforward overfitting corrections in time-to-event analysis

    Dynamical Probability Distribution Function of the SK Model at High Temperatures

    Full text link
    The microscopic probability distribution function of the Sherrington-Kirkpatrick (SK) model of spin glasses is calculated explicitly as a function of time by a high-temperature expansion. The resulting formula to the third order of the inverse temperature shows that an assumption made by Coolen, Laughton and Sherrington in their recent theory of dynamics is violated. Deviations of their theory from exact results are estimated quantitatively. Our formula also yields explicit expressions of the time dependence of various macroscopic physical quantities when the temperature is suddenly changed within the high-temperature region.Comment: LaTeX, 6 pages, Figures upon request (here revised), To be published in J. Phys. Soc. Jpn. 65 (1996) No.

    Nonparametric predictive inference for diagnostic test thresholds

    Get PDF
    Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine, machine learning and credit scoring. The receiver operating characteristic (ROC) curve and surface are useful tools to assess the ability of diagnostic tests to discriminate between ordered classes or groups. To define these diagnostic tests, selecting the optimal thresholds that maximize the accuracy of these tests is required. One procedure that is commonly used to find the optimal thresholds is by maximizing what is known as Youden’s index. This article presents nonparametric predictive inference (NPI) for selecting the optimal thresholds of a diagnostic test. NPI is a frequentist statistical method that is explicitly aimed at using few modeling assumptions, enabled through the use of lower and upper probabilities to quantify uncertainty. Based on multiple future observations, the NPI approach is presented for selecting the optimal thresholds for two-group and three-group scenarios. In addition, a pairwise approach has also been presented for the three-group scenario. The article ends with an example to illustrate the proposed methods and a simulation study of the predictive performance of the proposed methods along with some classical methods such as Youden index. The NPI-based methods show some interesting results that overcome some of the issues concerning the predictive performance of Youden’s index

    Dynamics of on-line Hebbian learning with structurally unrealizable restricted training sets

    Full text link
    We present an exact solution for the dynamics of on-line Hebbian learning in neural networks, with restricted and unrealizable training sets. In contrast to other studies on learning with restricted training sets, unrealizability is here caused by structural mismatch, rather than data noise: the teacher machine is a perceptron with a reversed wedge-type transfer function, while the student machine is a perceptron with a sigmoidal transfer function. We calculate the glassy dynamics of the macroscopic performance measures, training error and generalization error, and the (non-Gaussian) student field distribution. Our results, which find excellent confirmation in numerical simulations, provide a new benchmark test for general formalisms with which to study unrealizable learning processes with restricted training sets.Comment: 7 pages including 3 figures, using IOP latex2e preprint class fil

    Nonparametric predictive inference for comparison of two diagnostic tests

    Get PDF
    An important aim in diagnostic medical research is comparison of the accuracy of two diagnostic tests. In this paper, comparison of two diagnostic tests is presented using nonparametric predictive inference (NPI) for future order statistics. The tests are assumed to be applied on the same individuals from two groups, e.g., healthy and diseased individuals, or from three groups with a known ordering, e.g., adding a group of severely diseased individuals to the two group scenario. Our comparison is explicitly in terms of lower and upper probabilities for proportions of correctly diagnosed future individuals from each group, for a given total number of such individuals. We include in our comparison the possibility that it is more important to get a correct diagnosis for individuals from one group than from another group

    Generating functional analysis of Minority Games with real market histories

    Full text link
    It is shown how the generating functional method of De Dominicis can be used to solve the dynamics of the original version of the minority game (MG), in which agents observe real as opposed to fake market histories. Here one again finds exact closed equations for correlation and response functions, but now these are defined in terms of two connected effective non-Markovian stochastic processes: a single effective agent equation similar to that of the `fake' history models, and a second effective equation for the overall market bid itself (the latter is absent in `fake' history models). The result is an exact theory, from which one can calculate from first principles both the persistent observables in the MG and the distribution of history frequencies.Comment: 39 pages, 5 postscript figures, iop styl
    • …
    corecore