17 research outputs found

    Tracing Sub-Structure in the European American Population with PCA-Informative Markers

    Get PDF
    Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals–307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150–200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs

    Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers

    Get PDF
    Recent large-scale studies of European populations have demonstrated the existence of population genetic structure within Europe and the potential to accurately infer individual ancestry when information from hundreds of thousands of genetic markers is used. In fact, when genomewide genetic variation of European populations is projected down to a two-dimensional Principal Components Analysis plot, a surprising correlation with actual geographic coordinates of self-reported ancestry has been reported. This substructure can hamper the search of susceptibility genes for common complex disorders leading to spurious correlations. The identification of genetic markers that can correct for population stratification becomes therefore of paramount importance. Analyzing 1,200 individuals from 11 populations genotyped for more than 500,000 SNPs (Population Reference Sample), we present a systematic exploration of the extent to which geographic coordinates of origin within Europe can be predicted, with small panels of SNPs. Markers are selected to correlate with the top principal components of the dataset, as we have previously demonstrated. Performing thorough cross-validation experiments we show that it is indeed possible to predict individual ancestry within Europe down to a few hundred kilometers from actual individual origin, using information from carefully selected panels of 500 or 1,000 SNPs. Furthermore, we show that these panels can be used to correctly assign the HapMap Phase 3 European populations to their geographic origin. The SNPs that we propose can prove extremely useful in a variety of different settings, such as stratification correction or genetic ancestry testing, and the study of the history of European populations

    Tracing Cattle Breeds with Principal Components Analysis Ancestry Informative SNPs

    Get PDF
    The recent release of the Bovine HapMap dataset represents the most detailed survey of bovine genetic diversity to date, providing an important resource for the design and development of livestock production. We studied this dataset, comprising more than 30,000 Single Nucleotide Polymorphisms (SNPs) for 19 breeds (13 taurine, three zebu, and three hybrid breeds), seeking to identify small panels of genetic markers that can be used to trace the breed of unknown cattle samples. Taking advantage of the power of Principal Components Analysis and algorithms that we have recently described for the selection of Ancestry Informative Markers from genomewide datasets, we present a decision-tree which can be used to accurately infer the origin of individual cattle. In doing so, we present a thorough examination of population genetic structure in modern bovine breeds. Performing extensive cross-validation experiments, we demonstrate that 250-500 carefully selected SNPs suffice in order to achieve close to 100% prediction accuracy of individual ancestry, when this particular set of 19 breeds is considered. Our methods, coupled with the dense genotypic data that is becoming increasingly available, have the potential to become a valuable tool and have considerable impact in worldwide livestock production. They can be used to inform the design of studies of the genetic basis of economically important traits in cattle, as well as breeding programs and efforts to conserve biodiversity. Furthermore, the SNPs that we have identified can provide a reliable solution for the traceability of breed-specific branded products

    Altered glycosidase Activities at Physiological pH in the Pathogenesis of Sepsis

    No full text
    Aziz P, Haslund-Gourley B, Heithoff D, et al. Altered glycosidase Activities at Physiological pH in the Pathogenesis of Sepsis. In: FASEB JOURNAL. Vol 34. Hoboken: Wiley; 2020.Glycosidases are hydrolytic enzymes that are primarily studied in the context of intracellular catabolic pathways within the lysosome. Reductions in circulating glycosidase activities have been linked to lysosomal storage diseases and are typically detected in the blood acidified to mimic lysosomal pH. There are also instances of lysosomal storage diseases linked to increased glycosidase activities in blood circulation wherein the mannose‐6‐phosphate‐dependent trafficking is rendered dysfunctional. In addition, changes in circulating glycosidase activities have been associated with other syndromes including cancer, arthritis, alcohol abuse, sepsis, and colitis. We recently discovered that glycosidases present in multiple cell types and the sera and plasma are involved in the aging of secreted and cell surface glycoproteins. The exo‐glycosidase activities of endogenous circulating glycosidases generate the stepwise loss of glycan linkages over time as glycoproteins age in circulation, sequentially exposing underlying glycan linkages starting with the removal of the terminally‐positioned sialic acids. More is known of the neuraminidases in this first step and the role of asialoglycoprotein lectin receptors that can bind and endocytose the previously underlying and cryptic galactose ligands. We have optimized the detection of glycosidases involving those with galactosidase, glucosaminidase, mannosidase, and fucosidase activities using fluorimetric substrates in blood serum and plasma at physiological pH 7.4. We found all four glycosidase activities significantly above background in the blood and plasma at normal basal levels among healthy mouse and human species. We also identify the source of these glycosidases using glycosidase‐deficient mouse strains. We further present these measurements of normality and origin in comparison with measurements made during the onset and progression of sepsis caused by different pathogens in mice and humans. Our findings to be presented include the determinations of specific activities of glycosidases in response to experimental sepsis in the mouse caused by each of the five different clinically‐derived bacterial pathogens, and the discovery of a specific change in glycosidase activity statistically linked to a poor outcome (death) in human sepsis patients
    corecore