1,534 research outputs found

    Solving Inverse Problems with Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity

    Full text link
    A general framework for solving image inverse problems is introduced in this paper. The approach is based on Gaussian mixture models, estimated via a computationally efficient MAP-EM algorithm. A dual mathematical interpretation of the proposed framework with structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared to traditional sparse inverse problem techniques. This interpretation also suggests an effective dictionary motivated initialization for the MAP-EM algorithm. We demonstrate that in a number of image inverse problems, including inpainting, zooming, and deblurring, the same algorithm produces either equal, often significantly better, or very small margin worse results than the best published ones, at a lower computational cost.Comment: 30 page

    On some limitations of probabilistic models for dimension-reduction: illustration in the case of one particular probabilistic formulation of PLS

    Full text link
    Partial Least Squares (PLS) refer to a class of dimension-reduction techniques aiming at the identification of two sets of components with maximal covariance, in order to model the relationship between two sets of observed variables x∈Rpx\in\mathbb{R}^p and y∈Rqy\in\mathbb{R}^q, with p≄1,q≄1p\geq 1, q\geq 1. El Bouhaddani et al. (2017) have recently proposed a probabilistic formulation of PLS. Under the constraints they consider for the parameters of their model, this latter can be seen as a probabilistic formulation of one version of PLS, namely the PLS-SVD. However, we establish that these constraints are too restrictive as they define a very particular subset of distributions for (x,y)(x,y) under which, roughly speaking, components with maximal covariance (solutions of PLS-SVD), are also necessarily of respective maximal variances (solutions of the principal components analyses of xx and yy, respectively). Then, we propose a simple extension of el Bouhaddani et al.'s model, which corresponds to a more general probabilistic formulation of PLS-SVD, and which is no longer restricted to these particular distributions. We present numerical examples to illustrate the limitations of the original model of el Bouhaddani et al. (2017)

    Parametric and Semi-parametric Estimations of the Return to Schooling in South Africa

    Get PDF
    This paper estimates return to schooling for african and coloured women in South Africa. It compares parametric and semiparametric estimates of the sample selection model for the case of return to schooling. The parametric estimator is the one proposed by Heckman (1979) and the semiparametric estimator proposed by Newey (1991) and Klein and Spady (1993). It also attempts to correct endogeneity and mesurement error by using instruments of schooling. Following recent literature, the paper uses community variables primary and secondary school proximity and availability as instruments. Using instrumental variables increases the return to schooling substantially. Parametric corrections does not change the results but semiparametric corrections increases the return even morereturn to schooling, sample selection bias, semiparametric regression, instrumental variables, south africa

    A geometric relationship of F2, F3 and F4-statistics with principal component analysis

    Get PDF
    Principal component analysis (PCA) and F-statistics sensu Patterson are two of the most widely used population genetic tools to study human genetic variation. Here, I derive explicit connections between the two approaches and show that these two methods are closely related. F-statistics have a simple geometrical interpretation in the context of PCA, and orthogonal projections are a key concept to establish this link. I show that for any pair of populations, any population that is admixed as determined by an F3-statistic will lie inside a circle on a PCA plot. Furthermore, the F4-statistic is closely related to an angle measurement, and will be zero if the differences between pairs of populations intersect at a right angle in PCA space. I illustrate my results on two examples, one of Western Eurasian, and one of global human diversity. In both examples, I find that the first few PCs are sufficient to approximate most F-statistics, and that PCA plots are effective at predicting F-statistics. Thus, while F-statistics are commonly understood in terms of discrete populations, the geometric perspective illustrates that they can be viewed in a framework of populations that vary in a more continuous manner.This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’

    Improving polygenic prediction with genetically inferred ancestry.

    Get PDF
    Genome-wide association studies (GWASs) have demonstrated that most common diseases have a strong genetic component from many genetic variants each with a small effect size. GWAS summary statistics have allowed the construction of polygenic scores (PGSs) estimating part of the individual risk for common diseases. Here, we propose to improve PGS-based risk estimation by incorporating genetic ancestry derived from genome-wide genotyping data. Our method involves three cohorts: a base (or discovery) for association studies, a target for phenotype/risk prediction, and a map for ancestry mapping; successively, (1) it generates for each individual in the base and target cohorts a set of principal components based on the map cohort-called mapped PCs, (2) it associates in the base cohort the phenotype with the mapped-PCs, and (3) it uses the mapped PCs in the target cohort to generate a phenotypic predictor called the ancestry score. We evaluated the ancestry score by comparing a predictive model using a PGS with one combining a PGS and an ancestry score. First, we performed simulations and found that the ancestry score has a greater impact on traits that correlate with ancestry-specific variants. Second, we showed, using UK Biobank data, that the ancestry score improves genetic prediction for our nine phenotypes to very different degrees. Third, we performed simulations and found that the more heterogeneous the base and target cohorts, the more beneficial the ancestry score is. Finally, we validated our approach under realistic conditions with UK Biobank as the base cohort and Swiss individuals from the CoLaus|PsyCoLaus study as the target cohort

    Complex population structure and haplotype patterns in the Western European honey bee from sequencing a large panel of haploid drones

    Get PDF
    Honey bee subspecies originate from specific geographical areas in Africa, Europe and the Middle East, and beekeepers interested in specific phenotypes have imported genetic material to regions outside of the bees' original range for use either in pure lines or controlled crosses. Moreover, imported drones are present in the environment and mate naturally with queens from the local subspecies. The resulting admixture complicates population genetics analyses, and population stratification can be a major problem for association studies. To better understand Western European honey bee populations, we produced a whole genome sequence and single nucleotide polymorphism (SNP) genotype data set from 870 haploid drones and demonstrate its utility for the identification of nine genetic backgrounds and various degrees of admixture in a subset of 629 samples. Five backgrounds identified correspond to subspecies, two to isolated populations on islands and two to managed populations. We also highlight several large haplotype blocks, some of which coincide with the position of centromeres. The largest is 3.6 Mb long and represents 21% of chromosome 11, with two major haplotypes corresponding to the two dominant genetic backgrounds identified. This large naturally phased data set is available as a single vcf file that can now serve as a reference for subsequent populations genomics studies in the honey bee, such as (i) selecting individuals of verified homogeneous genetic backgrounds as references, (ii) imputing genotypes from a lower-density data set generated by an SNP-chip or by low-pass sequencing, or (iii) selecting SNPs compatible with the requirements of genotyping chips.This work was performed in collaboration with the GeT platform, Toulouse (France), a partner of the National Infrastructure France Génomique, thanks to support by the Commissariat aux Grands Invetissements (ANR-10-INBS-0009). Bioinformatics analyses were performed on the GenoToul Bioinfo computer cluster. This work was funded by a grant from the INRA Département de Génétique Animale (INRA Animal Genetics division) and by the SeqApiPop programme, funded by the FranceAgriMer grant 14-21-AT. We thank John Kefuss for helpful discussions. We thank Andrew Abrahams for providing honey bee samples from Colonsay (Scotland), the Association Conservatoire de l'Abeille Noire Bretonne (ACANB) for samples from Ouessant (France), CETA de Savoie for sample from Savoie, ADAPI for samples from Porquerolles and all beekeepers and bee breeders who kindly participated in this study by providing samples from their colonies.info:eu-repo/semantics/publishedVersio
    • 

    corecore