1,208 research outputs found
Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity
We propose and study a maximum likelihood estimator of stochastic frontier
models with endogeneity in cross-section data when the composite error term may
be correlated with inputs and environmental variables. Our framework is a
generalization of the normal half-normal stochastic frontier model with
endogeneity. We derive the likelihood function in closed form using three
fundamental assumptions: the existence of control functions that fully capture
the dependence between regressors and unobservables; the conditional
independence of the two error components given the control functions; and the
conditional distribution of the stochastic inefficiency term given the control
functions being a folded normal distribution. We also provide a Battese-Coelli
estimator of technical efficiency. Our estimator is computationally fast and
easy to implement. We study some of its asymptotic properties, and we showcase
its finite sample behavior in Monte-Carlo simulations and an empirical
application to farmers in Nepal
Adaptive estimation with partially overlapping models
In many problems, one has several models of interest that capture key parameters describing the distribution of the data. Partially overlapping models are taken as models in which at least one covariate effect is common to the models. A priori knowledge of such structure enables efficient estimation of all model parameters. However, in practice, this structure may be unknown. We propose adaptive composite M-estimation (ACME) for partially overlapping models using a composite loss function, which is a linear combination of loss functions defining the individual models. Penalization is applied to pairwise differences of parameters across models, resulting in data driven identification of the overlap structure. Further penalization is imposed on the individual parameters, enabling sparse estimation in the regression setting. The recovery of the overlap structure enables more efficient parameter estimation. An oracle result is established. Simulation studies illustrate the advantages of ACME over existing methods that fit individual models separately or make strong a priori assumption about the overlap structure
Inference of historical population-size changes with allele-frequency data
With up to millions of nearly neutral polymorphisms now being routinely sampled in population-genomic surveys, it is possible to estimate the site-frequency spectrum of such sites with high precision. Each frequency class reflects a mixture of potentially unique demographic histories, which can be revealed using theory for the probability distributions of the starting and ending points of branch segments over all possible coalescence trees. Such distributions are completely independent of past population history, which only influences the segment lengths, providing the basis for estimating average population sizes separating tree-wide coalescence events. The history of population-size change experienced by a sample of polymorphisms can then be dissected in a model-flexible fashion, and extension of this theory allows estimation of the mean and full distribution of long-term effective population sizes and ages of alleles of specific frequencies. Here, we outline the basic theory underlying the conceptual approach, develop and test an efficient statistical procedure for parameter estimation, and apply this to multiple population-genomic datasets for the microcrustacean Daphnia pulex
Natural selection reduced diversity on human Y chromosomes
The human Y chromosome exhibits surprisingly low levels of genetic diversity.
This could result from neutral processes if the effective population size of
males is reduced relative to females due to a higher variance in the number of
offspring from males than from females. Alternatively, selection acting on new
mutations, and affecting linked neutral sites, could reduce variability on the
Y chromosome. Here, using genome-wide analyses of X, Y, autosomal and
mitochondrial DNA, in combination with extensive population genetic
simulations, we show that low observed Y chromosome variability is not
consistent with a purely neutral model. Instead, we show that models of
purifying selection are consistent with observed Y diversity. Further, the
number of sites estimated to be under purifying selection greatly exceeds the
number of Y-linked coding sites, suggesting the importance of the highly
repetitive ampliconic regions. While we show that purifying selection removing
deleterious mutations can explain the low diversity on the Y chromosome, we
cannot exclude the possibility that positive selection acting on beneficial
mutations could have also reduced diversity in linked neutral regions, and may
have contributed to lowering human Y chromosome diversity. Because the
functional significance of the ampliconic regions is poorly understood, our
findings should motivate future research in this area.Comment: 43 pages, 11 figure
Contributions to Penalized Estimation
Penalized estimation is a useful statistical technique to prevent overfitting problems. In penalized methods, the common objective function is in the form of a loss function for goodness of fit plus a penalty function for complexity control. In this dissertation, we develop several new penalization approaches for various statistical models. These methods aim for effective model selection and accurate parameter estimation. The first part introduces the notion of partially overlapping models across multiple regression models on the same dataset. Such underlying models have at least one overlapping structure sharing the same parameter value. To recover the sparse and overlapping structure, we develop adaptive composite M-estimation (ACME) by doubly penalizing a composite loss function, as a weighted linear combination of the loss functions. ACME automatically circumvents the model misspecification issues inherent in other composite-loss-based estimators. The second part proposes a new refit method and its applications in the regression setting through model combination: ensemble variable selection (EVS) and ensemble variable selection and estimation (EVE). The refit method estimates the regression parameters restricted to the selected covariates by a penalization method. EVS combines model selection decisions from multiple penalization methods and selects the optimal model via the refit and a model selection criterion. EVE considers a factorizable likelihood-based model whose full likelihood is the multiplication of likelihood factors. EVE is shown to have asymptotic efficiency and computational efficiency. The third part studies a sparse undirected Gaussian graphical model (GGM) to explain conditional dependence patterns among variables. The edge set consists of conditionally dependent variable pairs and corresponds to nonzero elements of the inverse covariance matrix under the Gaussian assumption. We propose a consistent validation method for edge selection (CoVES) in the penalization framework. CoVES selects candidate edge sets along the solution path and finds the optimal set via repeated subsampling. CoVES requires simple computation and delivers excellent performance in our numerical studies.Doctor of Philosoph
Detecting Ancient Balancing Selection: Methods And Application To Human
Balancing selection can maintain genetic variation in a population over long evolutionary time periods. Identifying genomic loci under this type of selection not only elucidates selective pressures and adaptations but can also help interpret common genetic variation contributing to disease. Summary statistics which capture signatures in the site frequency spectrum are frequently used to scan the genome to detect loci showing evidence of balancing selection. However, these approaches have limited power because they rely on imprecise signatures such as a general excess of heterozygosity or number of genetic variants. A second class of statistics, based on likelihoods, have higher power but are often computationally prohibitive. In addition, a majority of methods in both classes require a high-quality sequenced outgroup, which is unavailable for many species of interest. Therefore, there is a need for a well-powered and widely-applicable statistical approach to detect balancing selection. Theory suggests that long-term balancing selection will result in a genealogy with very long internal branches. In this thesis, I show that this leads to a precise signature: an excess of genetic variants at near identical allele frequencies to one another. We have developed novel summary statistics to detect this signature of balancing selection, termed the β statistics. Using simulations, we show that these statistics are not only computationally light but also have high power even if an outgroup is unavailable. We have derived the variance of these statistics, allowing proper comparison of β values across sample sizes, mutation rates, and allele frequencies - variables not fully accounted for by many previous methods. We scanned the 1000 Genomes Project data with β to find balanced loci in humans. Here, I report multiple balanced haplotypes that are strongly linked to both association signals for complex traits and regulatory variants, indicating balancing selection may be affecting complex trait architecture. Due to their high power and wide applicability, the β statistics enable evolutionary biologists to detect targets of balancing selection in a range of species and with a degree of specificity previously unattainable
Distinguishing models of reionization using future radio observations of 21-cm 1-point statistics
We explore the impact of reionization topology on 21-cm statistics. Four
reionization models are presented which emulate large ionized bubbles around
over-dense regions (21CMFAST/ global-inside- out), small ionized bubbles in
over-dense regions (local-inside-out), large ionized bubbles around under-dense
regions (global-outside-in) and small ionized bubbles around under-dense
regions (local-outside-in). We show that first-generation instruments might
struggle to distinguish global models using the shape of the power spectrum
alone. All instruments considered are capable of breaking this degeneracy with
the variance, which is higher in outside-in models. Global models can also be
distinguished at small scales from a boost in the power spectrum from a
positive correlation between the density and neutral-fraction fields in
outside-in models. Negative skewness is found to be unique to inside-out models
and we find that pre-SKA instruments could detect this feature in maps smoothed
to reduce noise errors. The early, mid and late phases of reionization imprint
signatures in the brightness-temperature moments, we examine their model
dependence and find pre-SKA instruments capable of exploiting these timing
constraints in smoothed maps. The dimensional skewness is introduced and is
shown to have stronger signatures of the early and mid-phase timing if the
inside-out scenario is correct.Comment: 18 pages, 13 figures, updated to agree with published versio
Effects of Destriping Errors on Estimates of the CMB Power Spectrum
Destriping methods for constructing maps of the Cosmic Microwave Background
(CMB) anisotropies have been investigated extensively in the literature.
However, their error properties have been studied in less detail. Here we
present an analysis of the effects of destriping errors on CMB power spectrum
estimates for Planck-like scanning strategies. Analytic formulae are derived
for certain simple scanning geometries that can be rescaled to account for
different detector noise. Assuming {Planck-like low-frequency noise, the noise
power spectrum is accurately white at high multipoles (l<50). D estriping
errors, though dominant at lower multipoles, are small in comparison to the
cosmic variance. These results show that simple destriping map-making methods
should be perfectly adequate for the analysis of Planck data and support the
arguments given in an earlier paper in favour of applying a fast hybrid power
spectrum estimator to CMB data with realistic `1/f' noise.Comment: 13 pages, 6 figures, submitted to MNRA
- …