966 research outputs found

    Genetic algorithms: a tool for optimization in econometrics - basic concept and an example for empirical applications

    Get PDF
    This paper discusses a tool for optimization of econometric models based on genetic algorithms. First, we briefly describe the concept of this optimization technique. Then, we explain the design of a specifically developed algorithm and apply it to a difficult econometric problem, the semiparametric estimation of a censored regression model. We carry out some Monte Carlo simulations and compare the genetic algorithm with another technique, the iterative linear programming algorithm, to run the censored least absolute deviation estimator. It turns out that both algorithms lead to similar results in this case, but that the proposed method is computationally more stable than its competitor. --Genetic Algorithm,Semiparametrics,Monte Carlo Simulation

    Statistical and Computational Tradeoff in Genetic Algorithm-Based Estimation

    Full text link
    When a Genetic Algorithm (GA), or a stochastic algorithm in general, is employed in a statistical problem, the obtained result is affected by both variability due to sampling, that refers to the fact that only a sample is observed, and variability due to the stochastic elements of the algorithm. This topic can be easily set in a framework of statistical and computational tradeoff question, crucial in recent problems, for which statisticians must carefully set statistical and computational part of the analysis, taking account of some resource or time constraints. In the present work we analyze estimation problems tackled by GAs, for which variability of estimates can be decomposed in the two sources of variability, considering some constraints in the form of cost functions, related to both data acquisition and runtime of the algorithm. Simulation studies will be presented to discuss the statistical and computational tradeoff question.Comment: 17 pages, 5 figure

    Statistical Methods for Gene-Environment Interactions

    Get PDF
    Despite significant main effects of genetic and environmental risk factors have been found, the interactions between them can play critical roles and demonstrate important implications in medical genetics and epidemiology. Although many important gene-environment (G-E) interactions have been identified, the existing findings are still insufficient and there exists a strong need to develop statistical methods for analyzing G-E interactions. In this dissertation, we propose four statistical methodologies and computational algorithms for detecting G-E interactions and one application to imaging data. Extensive simulation studies are conducted in comparison with multiple advanced alternatives. In the analyses of The Cancer Genome Atlas datasets on multiple cancers, biologically meaningful findings are obtained. First, we develop two robust interaction analysis methods for prognostic outcomes. Compared to continuous and categorical outcomes, prognosis has been less investigated, with additional challenges brought by the unique characteristics of survival times. Most of the existing G-E interaction approaches for prognosis data share the limitation that they cannot accommodate long-tailed or contaminated outcomes. In the first method, we adopt the censored quantile regression and partial correlation for survival outcomes. Under a marginal modeling framework, this proposed approach is robust to long-tailed prognosis and is computationally straightforward to apply. Furthermore, outliers and contaminations among predictors are observed in real data. In the second method, we propose a joint model using the penalized trimmed regression that is robust to leverage points and vertical outliers. The proposed method respects the hierarchical structure of main effects and interactions and has an effective computational algorithm based on coordinate descent optimization and stability selection. Second, we propose a penalized approach to incorporate additional information for identifying important hierarchical interactions. Due to the high dimensionality and low signal levels, it is challenging to analyze interactions so that incorporating additional information is desired. We adopt the minimax concave penalty for regularized estimation and the Laplacian quadratic penalty for additional information. Under a unified formulation, multiple types of additional information and genetic measurements can be effectively utilized and improved identification accuracy can be achieved. Third, we develop a three-step procedure using multidimensional molecular data to identify G-E interactions. Recent studies have shown that collectively analyzing multiple types of molecular changes is not only biologically sensible but also leads to improved estimation and prediction. In this proposed method, we first estimate the relationship between gene expressions and their regulators by a multivariate penalized regression, and then identify regulatory modules via sparse biclustering. Next, we establish integrative covariates by principal components extracted from the identified regulatory modules. Last but not least, we construct a joint model for disease outcomes and employ Lasso-based penalization to select important main effects and hierarchical interactions. The proposed method expands the scope of interaction analysis to multidimensional molecular data. Last, we present an application using both marginal and joint models to analyze histopathological imaging-environment interactions. In cancer diagnosis, histopathological imaging has been routinely conducted and can be processed to generate high-dimensional features. To explore potential interactions, we conduct marginal and joint analyses, which have been extensively examined in the context of G-E interactions. This application extends the practical applicability of interaction analysis to imaging data and provides an alternative venue that combines histopathological imaging and environmental data in cancer modeling. Motivated by the important implications of G-E interactions and to overcome the limitations of the existing methods, the goal of this dissertation is to advance in methodological development for G-E interaction analysis and to provide practically useful tools for identifying important interactions. The proposed methods emerge from practical issues observed in real data and have solid statistical properties. With a balance between theory, computation, and data analysis, this dissertation provide four novel approaches for analyzing interactions to achieve more robust and accurate identification of biologically meaningful interactions

    Genetic Algorithms: A Tool for Optimization in Econometrics – Basic Concept and an Example for Empirical Applications

    Get PDF
    This paper discusses a tool for optimization of econometric models based on genetic algorithms. First, we briefly describe the concept of this optimization technique. Then, we explain the design of a specifically developed algorithm and apply it to a difficult econometric problem, the semiparametric estimation of a censored regression model. We carry out some Monte Carlo simulations and compare the genetic algorithm with another technique, the iterative linear programming algorithm, to run the censored least absolute deviation estimator. It turns out that both algorithms lead to similar results in this case, but that the proposed method is computationally more stable than its competitor

    CAViaR: Conditional Autoregressive Value at Risk by Regression Quantiles

    Get PDF
    Value at Risk has become the standard measure of market risk employed by financial institutions for both internal and regulatory purposes. Despite its conceptual simplicity, its measurement is a very challenging statistical problem and none of the methodologies developed so far give satisfactory solutions. Interpreting Value at Risk as a quantile of future portfolio values conditional on current information, we propose a new approach to quantile estimation that does not require any of the extreme assumptions invoked by existing methodologies (such as normality or i.i.d. returns). The Conditional Value at Risk or CAViaR model moves the focus of attention from the distribution of returns directly to the behavior of the quantile. Utilizing the criterion from Regression Quantiles, and postulating a variety of dynamic updating processes we propose methods based on a Genetic Algorithm to estimate the unknown parameters of CAViaR models. We propose a Dynamic Quantile Test of model adequacy that tests the hypothesis that in each period the probability of exceeding the VaR must be independent of all the past information. Applications to simulated and real data provide empirical support to our methodology and illustrate the ability of these algorithms to adapt to new risk environments.

    CAViaR: Conditional Value at Risk by Quantile Regression

    Get PDF
    Value at Risk has become the standard measure of market risk employed by financial institutions for both internal and regulatory purposes. Despite its conceptual simplicity, its measurement is a very challenging statistical problem and none of the methodologies developed so far give satisfactory solutions. Interpreting Value at Risk as a quantile of future portfolio values conditional on current information, we propose a new approach to quantile estimation which does not require any of the extreme assumptions invoked by existing methodologies (such as normality or i.i.d. returns). The Conditional Value at Risk or CAViaR model moves the focus of attention from the distribution of returns directly to the behavior of the quantile. We postulate a variety of dynamic processes for updating the quantile and use regression quantile estimation to determine the parameters of the updating process. Tests of model adequacy utilize the criterion that each period the probability of exceeding the VaR must be independent of all the past information. We use a differential evolutionary genetic algorithm to optimize an objective function which is non-differentiable and hence cannot be optimized using traditional algorithms. Applications to simulated and real data provide empirical support to our methodology and illustrate the ability of these algorithms to adapt to new risk environments.

    Empirical likelihood for median regression model with designed censoring variables

    Get PDF
    AbstractWe propose a new and simple estimating equation for the parameters in median regression models with designed censoring variables, and then apply the empirical log likelihood ratio statistic to construct confidence region for the parameters. The empirical log likelihood ratio statistic is shown to have a standard chi-square distribution, which makes this method easy to implement. At the same time, another empirical log likelihood ratio statistic is proposed based on an existing estimating equation and the limiting distribution of the empirical likelihood ratio statistic is shown to be a sum of weighted chi-square distributions. We compare the performance of the empirical likelihood confidence region based on the new estimating equation, with that based on the existing estimating equation and a normal approximation method by simulation studies

    Bayesian inference for high-dimensional discrete-time epidemic models: spatial dynamics of the UK COVID-19 outbreak

    Full text link
    In the event of a disease outbreak emergency, such as COVID-19, the ability to construct detailed stochastic models of infection spread is key to determining crucial policy-relevant metrics such as the reproduction number, true prevalence of infection, and the contribution of population characteristics to transmission. In particular, the interaction between space and human mobility is key to prioritising outbreak control resources to appropriate areas of the country. Model-based epidemiological intelligence must therefore be provided in a timely fashion so that resources can be adapted to a changing disease landscape quickly. The utility of these models is reliant on fast and accurate parameter inference, with the ability to account for large amount of censored data to ensure estimation is unbiased. Yet methods to fit detailed spatial epidemic models to national-level population sizes currently do not exist due to the difficulty of marginalising over the censored data. In this paper we develop a Bayesian data-augmentation method which operates on a stochastic spatial metapopulation SEIR state-transition model, using model-constrained Metropolis-Hastings samplers to improve the efficiency of an MCMC algorithm. Coupling this method with state-of-the-art GPU acceleration enabled us to provide nightly analyses of the UK COVID-19 outbreak, with timely information made available for disease nowcasting and forecasting purposes

    Socioeconomic Status and Sickness Absence - What do twins tell us about causality?

    Get PDF
    The purpose of this study is to empirically investigate causal effects between socioeconomic status and absence from the workplace due to sickness. To be able to conclude that income causally affects health it is important to control for both reverse causality and unobserved heterogeneity. This study uses a Swedish sample of female twins and a semiparametric censored fixed-effects model. Spousal income is correlated in cross-section with the share of total income that comes from benefits due to sickness absence. Results from this twin study indicate that male spousal income, i.e. a non-shared environmental influence, does not have a causal effect.Income; education; health; causality; twins
    • …
    corecore