65 research outputs found
Fast and efficient statistical methods for detecting genetic admixture events and its applications in large-scale data cohorts
Present-day cohorts of genome-wide DNA provide a powerful means of elucidating admixture events where different human groups intermixed, providing new insights into human history and population movements. The method GLOBETROTTER (Hellenthal et al., 2014) shows increased precision over other available techniques for characterising admixture due to modelling haplotype information, i.e. associations among tightly linked Single Nucleotide Polymorphisms (SNPs). However, because of its computational demands, GLOBETROTTER can only handle relatively small sample sizes of tens to hundreds of admixed individuals. In this thesis, I present a new statistical method, fastGLOBETROTTER, that both reduces computational time and increases accuracy relative to GLOBETROTTER. In particular, fastGLOBETROTTER more efficiently models admixture linkage disequilibrium by sampling sets of genomic regions within individuals that are the most informative for admixture events. Additionally, I have developed an algorithm for allocating memory more efficiently to enable a factor of up to 20 fold improvement in computation time relative to GLOBETROTTER. Therefore, this technique can cope with the rapidly emerging large-scale cohorts of genetically homogeneous populations sampled from small geographic regions, e.g. within a country (China Kadoorie Biobank, UK Biobank), to provide more precise estimates of admixture dates. Via simulations, I use fastGLOBETROTTER to demonstrate the sample sizes required to characterize admixture between groups with high levels of genetic similarity, and the time depths for which these approaches can reliably detect such past intermixing. I also apply fastGLOBETROTTER to over 6000 European individuals, using over 2500 individuals as ancestry surrogates, revealing new insights into admixture across Western Europe. These include admixture events dated to ∼500-600 CE from sources carrying DNA related to present-day West Asian and North African populations found in individuals within France, Belgium and parts of Germany. I also report admixture from East-Asian/Siberian-like sources in individuals within Finland, Norway and Sweden at different times starting ∼1900 years ago
An efficient method to identify, date, and describe admixture events using haplotype information
We present fastGLOBETROTTER, an efficient new haplotype-based technique to identify, date, and describe admixture events using genome-wide autosomal data. With simulations, we demonstrate how fastGLOBETROTTER reduces computation time by an order of magnitude relative to the related technique GLOBETROTTER without suffering loss of accuracy. We apply fastGLOBETROTTER to a cohort of >6000 Europeans from ten countries, revealing previously unreported admixture signals. In particular we infer multiple periods of admixture related to East Asian or Siberian-like sources, starting >2000 years ago, in people living in countries north of the Baltic Sea. In contrast, we infer admixture related to West Asian, North African and/or Southern European sources in populations south of the Baltic Sea, including admixture dated to ≈300-700CE, overlapping the fall of the Roman Empire, in people from Belgium, France and parts of Germany. Our new approach scales to analyzing hundreds to thousands of individuals from a putatively admixed population and hence is applicable to emerging large-scale cohorts of genetically homogeneous populations
Iterative pruning PCA improves resolution of highly structured populations
BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population
WASP: a Web-based Allele-Specific PCR assay designing tool for detecting SNPs and mutations
BACKGROUND: Allele-specific (AS) Polymerase Chain Reaction is a convenient and inexpensive method for genotyping Single Nucleotide Polymorphisms (SNPs) and mutations. It is applied in many recent studies including population genetics, molecular genetics and pharmacogenomics. Using known AS primer design tools to create primers leads to cumbersome process to inexperience users since information about SNP/mutation must be acquired from public databases prior to the design. Furthermore, most of these tools do not offer the mismatch enhancement to designed primers. The available web applications do not provide user-friendly graphical input interface and intuitive visualization of their primer results. RESULTS: This work presents a web-based AS primer design application called WASP. This tool can efficiently design AS primers for human SNPs as well as mutations. To assist scientists with collecting necessary information about target polymorphisms, this tool provides a local SNP database containing over 10 million SNPs of various populations from public domain databases, namely NCBI dbSNP, HapMap and JSNP respectively. This database is tightly integrated with the tool so that users can perform the design for existing SNPs without going off the site. To guarantee specificity of AS primers, the proposed system incorporates a primer specificity enhancement technique widely used in experiment protocol. In particular, WASP makes use of different destabilizing effects by introducing one deliberate 'mismatch' at the penultimate (second to last of the 3'-end) base of AS primers to improve the resulting AS primers. Furthermore, WASP offers graphical user interface through scalable vector graphic (SVG) draw that allow users to select SNPs and graphically visualize designed primers and their conditions. CONCLUSION: WASP offers a tool for designing AS primers for both SNPs and mutations. By integrating the database for known SNPs (using gene ID or rs number), this tool facilitates the awkward process of getting flanking sequences and other related information from public SNP databases. It takes into account the underlying destabilizing effect to ensure the effectiveness of designed primers. With user-friendly SVG interface, WASP intuitively presents resulting designed primers, which assist users to export or to make further adjustment to the design. This software can be freely accessed at http://bioinfo.biotec.or.th/WASP
Tissue dyslipidemia in salmonella-infected rats treatTissue dyslipidemia in salmonella-infected rats treated with amoxillin and pefloxac
Background: This study investigated the effects of salmonella infection and its chemotherapy on lipid metabolism
in tissues of rats infected orally with Salmonella typhimurium and treated intraperitoneally with pefloxacin and
amoxillin.
Methods: Animals were infected with Salmonella enterica serovar Typhimurium strain TA 98. After salmonellosis was
confirmed, they were divided into 7 groups of 5 animals each. While one group served as infected control group, three
groups were treated with amoxillin (7.14 mg/kg body weight, 8 hourly) and the remaining three groups with
pefloxacin (5.71mg/kg body weight, 12 hourly) for 5 and 10 days respectively. Uninfected control animals received
0.1ml of vehicle. Rats were sacrificed 24h after 5 and 10 days of antibiotic treatment and 5 days after discontinuation of
antibiotic treatment. Their corresponding controls were also sacrificed at the same time point. Blood and tissue lipids
were then evaluated.
Results: Salmonella infection resulted in dyslipidemia characterised by increased concentrations of free fatty acids
(FFA) in plasma and erythrocyte, as well as enhanced cholesterogenesis, hypertriglyceridemia and phospholipidosis in
plasma, low density lipoprotein-very low density lipoprotein (LDL-VLDL), erythrocytes, erythrocyte ghost and the
organs. The antibiotics reversed the dyslipidemia but not totally. A significant correlation was observed between fecal
bacterial load and plasma cholesterol (r=0.456, p<0.01), plasma triacyglycerols (r=0.485, p<0.01), plasma phospholipid
(r=0.414, p<0.05), plasma free fatty acids (r=0.485, p<0.01), liver phospholipid (r=0.459, p<0.01) and brain phospholipid
(r=0.343, p<0.05).
Conclusion: The findings of this study suggest that salmonella infection in rats and its therapy with pefloxacin and
amoxillin perturb lipid metabolism and this perturbation is characterised by cholesterogenesis
Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure
<p>Abstract</p> <p>Background</p> <p>The ever increasing sizes of population genetic datasets pose great challenges for population structure analysis. The Tracy-Widom (TW) statistical test is widely used for detecting structure. However, it has not been adequately investigated whether the TW statistic is susceptible to type I error, especially in large, complex datasets. Non-parametric, Principal Component Analysis (PCA) based methods for resolving structure have been developed which rely on the TW test. Although PCA-based methods can resolve structure, they cannot infer ancestry. Model-based methods are still needed for ancestry analysis, but they are not suitable for large datasets. We propose a new structure analysis framework for large datasets. This includes a new heuristic for detecting structure and incorporation of the structure patterns inferred by a PCA method to complement STRUCTURE analysis.</p> <p>Results</p> <p>A new heuristic called EigenDev for detecting population structure is presented. When tested on simulated data, this heuristic is robust to sample size. In contrast, the TW statistic was found to be susceptible to type I error, especially for large population samples. EigenDev is thus better-suited for analysis of large datasets containing many individuals, in which spurious patterns are likely to exist and could be incorrectly interpreted as population stratification. EigenDev was applied to the iterative pruning PCA (ipPCA) method, which resolves the underlying subpopulations. This subpopulation information was used to supervise STRUCTURE analysis to infer patterns of ancestry at an unprecedented level of resolution. To validate the new approach, a bovine and a large human genetic dataset (3945 individuals) were analyzed. We found new ancestry patterns consistent with the subpopulations resolved by ipPCA.</p> <p>Conclusions</p> <p>The EigenDev heuristic is robust to sampling and is thus superior for detecting structure in large datasets. The application of EigenDev to the ipPCA algorithm improves the estimation of the number of subpopulations and the individual assignment accuracy, especially for very large and complex datasets. Furthermore, we have demonstrated that the structure resolved by this approach complements parametric analysis, allowing a much more comprehensive account of population structure. The new version of the ipPCA software with EigenDev incorporated can be downloaded from <url>http://www4a.biotec.or.th/GI/tools/ippca</url>.</p
Genetic analysis of Thai cattle reveals a Southeast Asian indicine ancestry
Cattle commonly raised in Thailand have characteristics of [i]Bos indicus[/i] (zebu). We do not know when or how cattle domestication in Thailand occurred, and so questions remain regarding their origins and relationships to other breeds. We obtained genome-wide SNP genotypic data of 28 bovine individuals sampled from four regions: North (Kho-Khaolampoon), Northeast (Kho-Isaan), Central (Kho-Lan) and South (Kho-Chon) Thailand. These regional varieties have distinctive traits suggestive of breed-like genetic variations. From these data, we confirmed that all four Thai varieties are [i]Bos indicus[/i] and that they are distinct from other indicine breeds. Among these Thai cattle, a distinctive ancestry pattern is apparent, which is the purest within Kho-Chon individuals. This ancestral component is only present outside of Thailand among other indicine breeds in Southeast Asia. From this pattern, we conclude that a unique [i]Bos indicus[/i] ancestor originated in Southeast Asia, and native Kho-Chon Thai cattle retain the signal of this ancestry with limited admixture of other bovine ancestors
Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study
Background: Originally believed to be a rare phenomenon, heteroplasmy - the presence of more than one mitochondrial DNA (mtDNA) variant within a cell, tissue, or individual - is emerging as an important component of eukaryotic genetic diversity. Heteroplasmies can be used as genetic markers in applications ranging from forensics to cancer diagnostics. Yet the frequency of heteroplasmic alleles may vary from generation to generation due to the bottleneck occurring during oogenesis. Therefore, to understand the alterations in allele frequencies at heteroplasmic sites, it is of critical importance to investigate the dynamics of maternal mtDNA transmission. Results: Here we sequenced, at high coverage, mtDNA from blood and buccal tissues of nine individuals from three families with a total of six maternal transmission events. Using simulations and re-sequencing of clonal DNA, we devised a set of criteria for detecting polymorphic sites in heterogeneous genetic samples that is resistant to the noise originating from massively parallel sequencing technologies. Application of these criteria to nine human mtDNA samples revealed four heteroplasmic sites. Conclusions: Our results suggest that the incidence of heteroplasmy may be lower than estimated in some other recent re-sequencing studies, and that mtDNA allelic frequencies differ significantly both between tissues of the same individual and between a mother and her offspring. We designed our study in such a way that the complete analysis described here can be repeated by anyone either at our site or directly on the Amazon Cloud. Our computational pipeline can be easily modified to accommodate other applications, such as viral re-sequencing
The Cumulative Effects of Polymorphisms in the DNA Mismatch Repair Genes and Tobacco Smoking in Oesophageal Cancer Risk
The DNA mismatch repair (MMR) enzymes repair errors in DNA that occur during normal DNA metabolism or are induced by certain cancer-contributing exposures. We assessed the association between 10 single-nucleotide polymorphisms (SNPs) in 5 MMR genes and oesophageal cancer risk in South Africans. Prior to genotyping, SNPs were selected from the HapMap database, based on their significantly different genotypic distributions between European ancestry populations and four HapMap populations of African origin. In the Mixed Ancestry group, the MSH3 rs26279 G/G versus A/A or A/G genotype was positively associated with cancer (OR = 2.71; 95% CI: 1.34–5.50). Similar associations were observed for PMS1 rs5742938 (GG versus AA or AG: OR = 1.73; 95% CI: 1.07–2.79) and MLH3 rs28756991 (AA or GA versus GG: OR = 2.07; 95% IC: 1.04–4.12). In Black individuals, however, no association between MMR polymorhisms and cancer risk was observed in individual SNP analysis. The interactions between MMR genes were evaluated using the model-based multifactor-dimensionality reduction approach, which showed a significant genetic interaction between SNPs in MSH2, MSH3 and PMS1 genes in Black and Mixed Ancestry subjects, respectively. The data also implies that pathogenesis of common polymorphisms in MMR genes is influenced by exposure to tobacco smoke. In conclusion, our findings suggest that common polymorphisms in MMR genes and/or their combined effects might be involved in the aetiology of oesophageal cancer
- …