966 research outputs found
Genetic algorithms: a tool for optimization in econometrics - basic concept and an example for empirical applications
This paper discusses a tool for optimization of econometric models based on genetic algorithms. First, we briefly describe the concept of this optimization technique. Then, we explain the design of a specifically developed algorithm and apply it to a difficult econometric problem, the semiparametric estimation of a censored regression model. We carry out some Monte Carlo simulations and compare the genetic algorithm with another technique, the iterative linear programming algorithm, to run the censored least absolute deviation estimator. It turns out that both algorithms lead to similar results in this case, but that the proposed method is computationally more stable than its competitor. --Genetic Algorithm,Semiparametrics,Monte Carlo Simulation
Statistical and Computational Tradeoff in Genetic Algorithm-Based Estimation
When a Genetic Algorithm (GA), or a stochastic algorithm in general, is
employed in a statistical problem, the obtained result is affected by both
variability due to sampling, that refers to the fact that only a sample is
observed, and variability due to the stochastic elements of the algorithm. This
topic can be easily set in a framework of statistical and computational
tradeoff question, crucial in recent problems, for which statisticians must
carefully set statistical and computational part of the analysis, taking
account of some resource or time constraints. In the present work we analyze
estimation problems tackled by GAs, for which variability of estimates can be
decomposed in the two sources of variability, considering some constraints in
the form of cost functions, related to both data acquisition and runtime of the
algorithm. Simulation studies will be presented to discuss the statistical and
computational tradeoff question.Comment: 17 pages, 5 figure
Statistical Methods for Gene-Environment Interactions
Despite significant main effects of genetic and environmental risk factors have been found, the interactions between them can play critical roles and demonstrate important implications in medical genetics and epidemiology. Although many important gene-environment (G-E) interactions have been identified, the existing findings are still insufficient and there exists a strong need to develop statistical methods for analyzing G-E interactions. In this dissertation, we propose four statistical methodologies and computational algorithms for detecting G-E interactions and one application to imaging data. Extensive simulation studies are conducted in comparison with multiple advanced alternatives. In the analyses of The Cancer Genome Atlas datasets on multiple cancers, biologically meaningful findings are obtained. First, we develop two robust interaction analysis methods for prognostic outcomes. Compared to continuous and categorical outcomes, prognosis has been less investigated, with additional challenges brought by the unique characteristics of survival times. Most of the existing G-E interaction approaches for prognosis data share the limitation that they cannot accommodate long-tailed or contaminated outcomes. In the first method, we adopt the censored quantile regression and partial correlation for survival outcomes. Under a marginal modeling framework, this proposed approach is robust to long-tailed prognosis and is computationally straightforward to apply. Furthermore, outliers and contaminations among predictors are observed in real data. In the second method, we propose a joint model using the penalized trimmed regression that is robust to leverage points and vertical outliers. The proposed method respects the hierarchical structure of main effects and interactions and has an effective computational algorithm based on coordinate descent optimization and stability selection. Second, we propose a penalized approach to incorporate additional information for identifying important hierarchical interactions. Due to the high dimensionality and low signal levels, it is challenging to analyze interactions so that incorporating additional information is desired. We adopt the minimax concave penalty for regularized estimation and the Laplacian quadratic penalty for additional information. Under a unified formulation, multiple types of additional information and genetic measurements can be effectively utilized and improved identification accuracy can be achieved. Third, we develop a three-step procedure using multidimensional molecular data to identify G-E interactions. Recent studies have shown that collectively analyzing multiple types of molecular changes is not only biologically sensible but also leads to improved estimation and prediction. In this proposed method, we first estimate the relationship between gene expressions and their regulators by a multivariate penalized regression, and then identify regulatory modules via sparse biclustering. Next, we establish integrative covariates by principal components extracted from the identified regulatory modules. Last but not least, we construct a joint model for disease outcomes and employ Lasso-based penalization to select important main effects and hierarchical interactions. The proposed method expands the scope of interaction analysis to multidimensional molecular data. Last, we present an application using both marginal and joint models to analyze histopathological imaging-environment interactions. In cancer diagnosis, histopathological imaging has been routinely conducted and can be processed to generate high-dimensional features. To explore potential interactions, we conduct marginal and joint analyses, which have been extensively examined in the context of G-E interactions. This application extends the practical applicability of interaction analysis to imaging data and provides an alternative venue that combines histopathological imaging and environmental data in cancer modeling. Motivated by the important implications of G-E interactions and to overcome the limitations of the existing methods, the goal of this dissertation is to advance in methodological development for G-E interaction analysis and to provide practically useful tools for identifying important interactions. The proposed methods emerge from practical issues observed in real data and have solid statistical properties. With a balance between theory, computation, and data analysis, this dissertation provide four novel approaches for analyzing interactions to achieve more robust and accurate identification of biologically meaningful interactions
Genetic Algorithms: A Tool for Optimization in Econometrics – Basic Concept and an Example for Empirical Applications
This paper discusses a tool for optimization of econometric models based on genetic algorithms. First, we briefly describe the concept of this optimization technique. Then, we explain the design of a specifically developed algorithm and apply it to a difficult econometric problem, the semiparametric estimation of a censored regression model. We carry out some Monte Carlo simulations and compare the genetic algorithm with another technique, the iterative linear programming algorithm, to run the censored least absolute deviation estimator. It turns out that both algorithms lead to similar results in this case, but that the proposed method is computationally more stable than its competitor
CAViaR: Conditional Autoregressive Value at Risk by Regression Quantiles
Value at Risk has become the standard measure of market risk employed by financial institutions for both internal and regulatory purposes. Despite its conceptual simplicity, its measurement is a very challenging statistical problem and none of the methodologies developed so far give satisfactory solutions. Interpreting Value at Risk as a quantile of future portfolio values conditional on current information, we propose a new approach to quantile estimation that does not require any of the extreme assumptions invoked by existing methodologies (such as normality or i.i.d. returns). The Conditional Value at Risk or CAViaR model moves the focus of attention from the distribution of returns directly to the behavior of the quantile. Utilizing the criterion from Regression Quantiles, and postulating a variety of dynamic updating processes we propose methods based on a Genetic Algorithm to estimate the unknown parameters of CAViaR models. We propose a Dynamic Quantile Test of model adequacy that tests the hypothesis that in each period the probability of exceeding the VaR must be independent of all the past information. Applications to simulated and real data provide empirical support to our methodology and illustrate the ability of these algorithms to adapt to new risk environments.
CAViaR: Conditional Value at Risk by Quantile Regression
Value at Risk has become the standard measure of market risk employed by financial institutions for both internal and regulatory purposes. Despite its conceptual simplicity, its measurement is a very challenging statistical problem and none of the methodologies developed so far give satisfactory solutions. Interpreting Value at Risk as a quantile of future portfolio values conditional on current information, we propose a new approach to quantile estimation which does not require any of the extreme assumptions invoked by existing methodologies (such as normality or i.i.d. returns). The Conditional Value at Risk or CAViaR model moves the focus of attention from the distribution of returns directly to the behavior of the quantile. We postulate a variety of dynamic processes for updating the quantile and use regression quantile estimation to determine the parameters of the updating process. Tests of model adequacy utilize the criterion that each period the probability of exceeding the VaR must be independent of all the past information. We use a differential evolutionary genetic algorithm to optimize an objective function which is non-differentiable and hence cannot be optimized using traditional algorithms. Applications to simulated and real data provide empirical support to our methodology and illustrate the ability of these algorithms to adapt to new risk environments.
Empirical likelihood for median regression model with designed censoring variables
AbstractWe propose a new and simple estimating equation for the parameters in median regression models with designed censoring variables, and then apply the empirical log likelihood ratio statistic to construct confidence region for the parameters. The empirical log likelihood ratio statistic is shown to have a standard chi-square distribution, which makes this method easy to implement. At the same time, another empirical log likelihood ratio statistic is proposed based on an existing estimating equation and the limiting distribution of the empirical likelihood ratio statistic is shown to be a sum of weighted chi-square distributions. We compare the performance of the empirical likelihood confidence region based on the new estimating equation, with that based on the existing estimating equation and a normal approximation method by simulation studies
Bayesian inference for high-dimensional discrete-time epidemic models: spatial dynamics of the UK COVID-19 outbreak
In the event of a disease outbreak emergency, such as COVID-19, the ability
to construct detailed stochastic models of infection spread is key to
determining crucial policy-relevant metrics such as the reproduction number,
true prevalence of infection, and the contribution of population
characteristics to transmission. In particular, the interaction between space
and human mobility is key to prioritising outbreak control resources to
appropriate areas of the country. Model-based epidemiological intelligence must
therefore be provided in a timely fashion so that resources can be adapted to a
changing disease landscape quickly. The utility of these models is reliant on
fast and accurate parameter inference, with the ability to account for large
amount of censored data to ensure estimation is unbiased. Yet methods to fit
detailed spatial epidemic models to national-level population sizes currently
do not exist due to the difficulty of marginalising over the censored data. In
this paper we develop a Bayesian data-augmentation method which operates on a
stochastic spatial metapopulation SEIR state-transition model, using
model-constrained Metropolis-Hastings samplers to improve the efficiency of an
MCMC algorithm. Coupling this method with state-of-the-art GPU acceleration
enabled us to provide nightly analyses of the UK COVID-19 outbreak, with timely
information made available for disease nowcasting and forecasting purposes
Socioeconomic Status and Sickness Absence - What do twins tell us about causality?
The purpose of this study is to empirically investigate causal effects between socioeconomic status and absence from the workplace due to sickness. To be able to conclude that income causally affects health it is important to control for both reverse causality and unobserved heterogeneity. This study uses a Swedish sample of female twins and a semiparametric censored fixed-effects model. Spousal income is correlated in cross-section with the share of total income that comes from benefits due to sickness absence. Results from this twin study indicate that male spousal income, i.e. a non-shared environmental influence, does not have a causal effect.Income; education; health; causality; twins
- …