1,208 research outputs found

    Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity

    Full text link
    We propose and study a maximum likelihood estimator of stochastic frontier models with endogeneity in cross-section data when the composite error term may be correlated with inputs and environmental variables. Our framework is a generalization of the normal half-normal stochastic frontier model with endogeneity. We derive the likelihood function in closed form using three fundamental assumptions: the existence of control functions that fully capture the dependence between regressors and unobservables; the conditional independence of the two error components given the control functions; and the conditional distribution of the stochastic inefficiency term given the control functions being a folded normal distribution. We also provide a Battese-Coelli estimator of technical efficiency. Our estimator is computationally fast and easy to implement. We study some of its asymptotic properties, and we showcase its finite sample behavior in Monte-Carlo simulations and an empirical application to farmers in Nepal

    Adaptive estimation with partially overlapping models

    Get PDF
    In many problems, one has several models of interest that capture key parameters describing the distribution of the data. Partially overlapping models are taken as models in which at least one covariate effect is common to the models. A priori knowledge of such structure enables efficient estimation of all model parameters. However, in practice, this structure may be unknown. We propose adaptive composite M-estimation (ACME) for partially overlapping models using a composite loss function, which is a linear combination of loss functions defining the individual models. Penalization is applied to pairwise differences of parameters across models, resulting in data driven identification of the overlap structure. Further penalization is imposed on the individual parameters, enabling sparse estimation in the regression setting. The recovery of the overlap structure enables more efficient parameter estimation. An oracle result is established. Simulation studies illustrate the advantages of ACME over existing methods that fit individual models separately or make strong a priori assumption about the overlap structure

    Inference of historical population-size changes with allele-frequency data

    No full text
    With up to millions of nearly neutral polymorphisms now being routinely sampled in population-genomic surveys, it is possible to estimate the site-frequency spectrum of such sites with high precision. Each frequency class reflects a mixture of potentially unique demographic histories, which can be revealed using theory for the probability distributions of the starting and ending points of branch segments over all possible coalescence trees. Such distributions are completely independent of past population history, which only influences the segment lengths, providing the basis for estimating average population sizes separating tree-wide coalescence events. The history of population-size change experienced by a sample of polymorphisms can then be dissected in a model-flexible fashion, and extension of this theory allows estimation of the mean and full distribution of long-term effective population sizes and ages of alleles of specific frequencies. Here, we outline the basic theory underlying the conceptual approach, develop and test an efficient statistical procedure for parameter estimation, and apply this to multiple population-genomic datasets for the microcrustacean Daphnia pulex

    Natural selection reduced diversity on human Y chromosomes

    Get PDF
    The human Y chromosome exhibits surprisingly low levels of genetic diversity. This could result from neutral processes if the effective population size of males is reduced relative to females due to a higher variance in the number of offspring from males than from females. Alternatively, selection acting on new mutations, and affecting linked neutral sites, could reduce variability on the Y chromosome. Here, using genome-wide analyses of X, Y, autosomal and mitochondrial DNA, in combination with extensive population genetic simulations, we show that low observed Y chromosome variability is not consistent with a purely neutral model. Instead, we show that models of purifying selection are consistent with observed Y diversity. Further, the number of sites estimated to be under purifying selection greatly exceeds the number of Y-linked coding sites, suggesting the importance of the highly repetitive ampliconic regions. While we show that purifying selection removing deleterious mutations can explain the low diversity on the Y chromosome, we cannot exclude the possibility that positive selection acting on beneficial mutations could have also reduced diversity in linked neutral regions, and may have contributed to lowering human Y chromosome diversity. Because the functional significance of the ampliconic regions is poorly understood, our findings should motivate future research in this area.Comment: 43 pages, 11 figure

    Contributions to Penalized Estimation

    Get PDF
    Penalized estimation is a useful statistical technique to prevent overfitting problems. In penalized methods, the common objective function is in the form of a loss function for goodness of fit plus a penalty function for complexity control. In this dissertation, we develop several new penalization approaches for various statistical models. These methods aim for effective model selection and accurate parameter estimation. The first part introduces the notion of partially overlapping models across multiple regression models on the same dataset. Such underlying models have at least one overlapping structure sharing the same parameter value. To recover the sparse and overlapping structure, we develop adaptive composite M-estimation (ACME) by doubly penalizing a composite loss function, as a weighted linear combination of the loss functions. ACME automatically circumvents the model misspecification issues inherent in other composite-loss-based estimators. The second part proposes a new refit method and its applications in the regression setting through model combination: ensemble variable selection (EVS) and ensemble variable selection and estimation (EVE). The refit method estimates the regression parameters restricted to the selected covariates by a penalization method. EVS combines model selection decisions from multiple penalization methods and selects the optimal model via the refit and a model selection criterion. EVE considers a factorizable likelihood-based model whose full likelihood is the multiplication of likelihood factors. EVE is shown to have asymptotic efficiency and computational efficiency. The third part studies a sparse undirected Gaussian graphical model (GGM) to explain conditional dependence patterns among variables. The edge set consists of conditionally dependent variable pairs and corresponds to nonzero elements of the inverse covariance matrix under the Gaussian assumption. We propose a consistent validation method for edge selection (CoVES) in the penalization framework. CoVES selects candidate edge sets along the solution path and finds the optimal set via repeated subsampling. CoVES requires simple computation and delivers excellent performance in our numerical studies.Doctor of Philosoph

    Detecting Ancient Balancing Selection: Methods And Application To Human

    Get PDF
    Balancing selection can maintain genetic variation in a population over long evolutionary time periods. Identifying genomic loci under this type of selection not only elucidates selective pressures and adaptations but can also help interpret common genetic variation contributing to disease. Summary statistics which capture signatures in the site frequency spectrum are frequently used to scan the genome to detect loci showing evidence of balancing selection. However, these approaches have limited power because they rely on imprecise signatures such as a general excess of heterozygosity or number of genetic variants. A second class of statistics, based on likelihoods, have higher power but are often computationally prohibitive. In addition, a majority of methods in both classes require a high-quality sequenced outgroup, which is unavailable for many species of interest. Therefore, there is a need for a well-powered and widely-applicable statistical approach to detect balancing selection. Theory suggests that long-term balancing selection will result in a genealogy with very long internal branches. In this thesis, I show that this leads to a precise signature: an excess of genetic variants at near identical allele frequencies to one another. We have developed novel summary statistics to detect this signature of balancing selection, termed the β statistics. Using simulations, we show that these statistics are not only computationally light but also have high power even if an outgroup is unavailable. We have derived the variance of these statistics, allowing proper comparison of β values across sample sizes, mutation rates, and allele frequencies - variables not fully accounted for by many previous methods. We scanned the 1000 Genomes Project data with β to find balanced loci in humans. Here, I report multiple balanced haplotypes that are strongly linked to both association signals for complex traits and regulatory variants, indicating balancing selection may be affecting complex trait architecture. Due to their high power and wide applicability, the β statistics enable evolutionary biologists to detect targets of balancing selection in a range of species and with a degree of specificity previously unattainable

    Distinguishing models of reionization using future radio observations of 21-cm 1-point statistics

    Full text link
    We explore the impact of reionization topology on 21-cm statistics. Four reionization models are presented which emulate large ionized bubbles around over-dense regions (21CMFAST/ global-inside- out), small ionized bubbles in over-dense regions (local-inside-out), large ionized bubbles around under-dense regions (global-outside-in) and small ionized bubbles around under-dense regions (local-outside-in). We show that first-generation instruments might struggle to distinguish global models using the shape of the power spectrum alone. All instruments considered are capable of breaking this degeneracy with the variance, which is higher in outside-in models. Global models can also be distinguished at small scales from a boost in the power spectrum from a positive correlation between the density and neutral-fraction fields in outside-in models. Negative skewness is found to be unique to inside-out models and we find that pre-SKA instruments could detect this feature in maps smoothed to reduce noise errors. The early, mid and late phases of reionization imprint signatures in the brightness-temperature moments, we examine their model dependence and find pre-SKA instruments capable of exploiting these timing constraints in smoothed maps. The dimensional skewness is introduced and is shown to have stronger signatures of the early and mid-phase timing if the inside-out scenario is correct.Comment: 18 pages, 13 figures, updated to agree with published versio

    Effects of Destriping Errors on Estimates of the CMB Power Spectrum

    Full text link
    Destriping methods for constructing maps of the Cosmic Microwave Background (CMB) anisotropies have been investigated extensively in the literature. However, their error properties have been studied in less detail. Here we present an analysis of the effects of destriping errors on CMB power spectrum estimates for Planck-like scanning strategies. Analytic formulae are derived for certain simple scanning geometries that can be rescaled to account for different detector noise. Assuming {Planck-like low-frequency noise, the noise power spectrum is accurately white at high multipoles (l<50). D estriping errors, though dominant at lower multipoles, are small in comparison to the cosmic variance. These results show that simple destriping map-making methods should be perfectly adequate for the analysis of Planck data and support the arguments given in an earlier paper in favour of applying a fast hybrid power spectrum estimator to CMB data with realistic `1/f' noise.Comment: 13 pages, 6 figures, submitted to MNRA
    corecore