24,773 research outputs found

    Exploiting the noise: improving biomarkers with ensembles of data analysis methodologies.

    Get PDF
    BackgroundThe advent of personalized medicine requires robust, reproducible biomarkers that indicate which treatment will maximize therapeutic benefit while minimizing side effects and costs. Numerous molecular signatures have been developed over the past decade to fill this need, but their validation and up-take into clinical settings has been poor. Here, we investigate the technical reasons underlying reported failures in biomarker validation for non-small cell lung cancer (NSCLC).MethodsWe evaluated two published prognostic multi-gene biomarkers for NSCLC in an independent 442-patient dataset. We then systematically assessed how technical factors influenced validation success.ResultsBoth biomarkers validated successfully (biomarker #1: hazard ratio (HR) 1.63, 95% confidence interval (CI) 1.21 to 2.19, P = 0.001; biomarker #2: HR 1.42, 95% CI 1.03 to 1.96, P = 0.030). Further, despite being underpowered for stage-specific analyses, both biomarkers successfully stratified stage II patients and biomarker #1 also stratified stage IB patients. We then systematically evaluated reasons for reported validation failures and find they can be directly attributed to technical challenges in data analysis. By examining 24 separate pre-processing techniques we show that minor alterations in pre-processing can change a successful prognostic biomarker (HR 1.85, 95% CI 1.37 to 2.50, P < 0.001) into one indistinguishable from random chance (HR 1.15, 95% CI 0.86 to 1.54, P = 0.348). Finally, we develop a new method, based on ensembles of analysis methodologies, to exploit this technical variability to improve biomarker robustness and to provide an independent confidence metric.ConclusionsBiomarkers comprise a fundamental component of personalized medicine. We first validated two NSCLC prognostic biomarkers in an independent patient cohort. Power analyses demonstrate that even this large, 442-patient cohort is under-powered for stage-specific analyses. We then use these results to discover an unexpected sensitivity of validation to subtle data analysis decisions. Finally, we develop a novel algorithmic approach to exploit this sensitivity to improve biomarker robustness

    The Cure: Making a game of gene selection for breast cancer survival prediction

    Get PDF
    Motivation: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility and biological interpretability. Methods that take advantage of structured prior knowledge (e.g. protein interaction networks) show promise in helping to define better signatures but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes previously unheard of. Here, we developed and evaluated a game called The Cure on the task of gene selection for breast cancer survival prediction. Our central hypothesis was that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from game players. We envisioned capturing knowledge both from the players prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. Results: Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted more than 1,000 registered players who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data clearly demonstrated the accumulation of relevant expert knowledge. In terms of predictive accuracy, these gene sets provided comparable performance to gene sets generated using other methods including those used in commercial tests. The Cure is available at http://genegames.org/cure

    The R Package JMbayes for Fitting Joint Models for Longitudinal and Time-to-Event Data using MCMC

    Full text link
    Joint models for longitudinal and time-to-event data constitute an attractive modeling framework that has received a lot of interest in the recent years. This paper presents the capabilities of the R package JMbayes for fitting these models under a Bayesian approach using Markon chain Monte Carlo algorithms. JMbayes can fit a wide range of joint models, including among others joint models for continuous and categorical longitudinal responses, and provides several options for modeling the association structure between the two outcomes. In addition, this package can be used to derive dynamic predictions for both outcomes, and offers several tools to validate these predictions in terms of discrimination and calibration. All these features are illustrated using a real data example on patients with primary biliary cirrhosis.Comment: 42 pages, 6 figure

    Top Physics in WHIZARD

    Get PDF
    In this talk we summarize the top physics setup in the event generator WHIZARD with a main focus on lepton colliders. This includes full six-, eight- and ten-fermion processes, factorized processes and spin correlations. For lepton colliders, QCD NLO processes for top quark physics are available and will be discussed. A special focus is on the top-quark pair threshold, where a special implementation combines a non-relativistic effective field theory calculation augmented by a next-to-leading threshold logarithm resummation with a continuum relativistic fixed-order QCD NLO simulation.Comment: 6 pages, 2 figures, Talk presented at the International Workshop on Future Linear Colliders (LCWS15), Whistler, Canada, 2-6 November 201

    The asset-correlation parameter in Basel II for mortgages on single-family residences

    Get PDF
    Bank capital ; Risk management ; Basel capital accord ; Mortgages

    Advanced survival modelling for consumer credit risk assessment: addressing recurrent events, multiple outcomes and frailty

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Statistics and EconometricsThis thesis worked on the application of advanced survival models in consumer credit risk assessment, particularly to address issues of recurrent delinquency (or default) and recovery (cure) events as well as multiple risk events and frailty. Each chapter (2 to 5) addressed a separate problem and several key conclusions were reached. Chapter 2 addressed the neglected area of modelling recovery from delinquency to normal performance on retail consumer loans taking into account the recurrent nature of delinquency and also including time-dependent macroeconomic variables. Using data from a lending company in Zimbabwe, we provided a comprehensive analysis of the recovery patterns using the extended Cox model. The findings vividly showed that behavioural variables were the most important in understanding recovery patterns of obligors. This confirms and underscores the importance of using behavioural models to understand the recovery patterns of obligors in order to prevent credit loss. The findings also strongly revealed that the falling real gross domestic product, representing a deteriorating economic situation significantly explained the diminishing rate of recovery from delinquency to normal performance among consumers. The study pointed to the urgent need for policy measures aimed at promoting economic growth for the stabilisation of consumer welfare and the financial system at large.Chapter 3 extends the work in chapter 2 and notes that, even though multiple failure-time data are ubiquitous in finance and economics especially in the credit risk domain, it is unfortunate that naive statistical techniques which ignore the subsequent events are commonly used to analyse such data. Applying standard statistical methods without addressing the recurrence of the events produces biased and inefficient estimates, thus offering erroneous predictions. We explore various ways of modelling and forecasting recurrent delinquency and recovery events on consumer loans. Using consumer loans data from a severely distressed economic environment, we illustrate and empirically compare extended Cox models for ordered recurrent recovery events. We highlight that accounting for multiple events proffers detailed information, thus providing a nuanced understanding of the recovery prognosis of delinquents. For ordered indistinguishable recurrent recovery events, we recommend using the Andersen and Gill (1982) model since it fits these assumptions and performs well on predicting recovery.Chapter 4 extends chapters 2 and 3 and highlight that rigorous credit risk analysis is not only of significance to lenders and banks but is also of paramount importance for sound regulatory and economic policy making. Increasing loan impairment or delinquency, defaults and mortgage foreclosures signals a sick economy and generates considerable financial stability concerns. For lenders and banks, the accurate estimation of credit risk parameters remains essential for pricing, profit testing, capital provisioning as well as for managing delinquents. Traditional credit scoring models such as the logit regression only provide estimates of the lifetime probability of default for a loan but cannot identify the existence of cures and or other movements. These methods lack the ability to characterise the progression of borrowers over time and cannot utilise all the available data to understand the recurrence of risk events and possible occurrence of multiple loan outcomes. In this paper, we propose a system-wide multi-state framework to jointly model state occupations and the transitions between normal performance (current), delinquency, prepayment, repurchase, short sale and foreclosure on mortgage loans. The probability of loans transitioning to and from the various states is estimated in a discrete-time multi-state Markov model with seven allowable states and sixteen possible transitions. Additionally, we investigate the relationship between the probability of loans transitioning to and from various loan outcomes and loan-level covariates. We empirically test the performance of the model using the US single-family mortgage loans originated during the first quarter of 2009 and were followed on their monthly repayment performance until the third quarter of 2016. Our results show that the main factors affecting the transition into various loan outcomes are affordability as measured by debt-to-income ratio, equity as marked by loan-to-value ratio, interest rates and the property type. In chapter 5, we note that there has been increasing availability of consumer credit in Zimbabwe, yet the credit information sharing systems are not as advanced. Using frailty survival models on credit bureau data from Zimbabwe, the study investigates the possible underestimation of credit losses under the assumption of independence of default event times. The study found that adding a frailty term significantly improved the models, thus indicating the presence of unobserved heterogeneity. The major policy recommendation is for the regulator to institute appropriate policy frameworks to allow robust and complete credit information sharing and reporting as doing so will significantly improve the functioning of the credit market
    corecore