8 research outputs found
Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies
BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data.
RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp.
CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses
Recommended from our members
Genomic Ancestry, Self-Rated Health and Its Association with Mortality in an Admixed Population: 10 Year Follow-Up of the Bambui-Epigen (Brazil) Cohort Study of Ageing.
BackgroundSelf-rated health (SRH) has strong predictive value for mortality in different contexts and cultures, but there is inconsistent evidence on ethnoracial disparities in SRH in Latin America, possibly due to the complexity surrounding ethnoracial self-classification.Materials/methodsWe used 370,539 Single Nucleotide Polymorphisms (SNPs) to examine the association between individual genomic proportions of African, European and Native American ancestry, and ethnoracial self-classification, with baseline and 10-year SRH trajectories in 1,311 community dwelling older Brazilians. We also examined whether genomic ancestry and ethnoracial self-classification affect the predictive value of SRH for subsequent mortality.ResultsEuropean ancestry predominated among participants, followed by African and Native American (median = 84.0%, 9.6% and 5.3%, respectively); the prevalence of Non-White (Mixed and Black) was 39.8%. Persons at higher levels of African and Native American genomic ancestry, and those self-identified as Non-White, were more likely to report poor health than other groups, even after controlling for socioeconomic conditions and an array of self-reported and objective physical health measures. Increased risks for mortality associated with worse SRH trajectories were strong and remarkably similar (hazard ratio ~3) across all genomic ancestry and ethno-racial groups.ConclusionsOur results demonstrated for the first time that higher levels of African and Native American genomic ancestry--and the inverse for European ancestry--were strongly correlated with worse SRH in a Latin American admixed population. Both genomic ancestry and ethnoracial self-classification did not modify the strong association between baseline SRH or SRH trajectory, and subsequent mortality
Origin and dynamics of admixture in Brazilians and its effect on the pattern of deleterious mutations.
While South Americans are underrepresented in human genomic diversity studies, Brazil has been a classical model for population genetics studies on admixture. We present the results of the EPIGEN Brazil Initiative, the most comprehensive up-to-date genomic analysis of any Latin-American population. A population-based genome-wide analysis of 6,487 individuals was performed in the context of worldwide genomic diversity to elucidate how ancestry, kinship, and inbreeding interact in three populations with different histories from the Northeast (African ancestry: 50%), Southeast, and South (both with European ancestry >70%) of Brazil. We showed that ancestry-positive assortative mating permeated Brazilian history. We traced European ancestry in the Southeast/South to a wider European/Middle Eastern region with respect to the Northeast, where ancestry seems restricted to Iberia. By developing an approximate Bayesian computation framework, we infer more recent European immigration to the Southeast/South than to the Northeast. Also, the observed low Native-American ancestry (6-8%) was mostly introduced in different regions of Brazil soon after the European Conquest. We broadened our understanding of the African diaspora, the major destination of which was Brazil, by revealing that Brazilians display two within-Africa ancestry components: one associated with non-Bantu/western Africans (more evident in the Northeast and African Americans) and one associated with Bantu/eastern Africans (more present in the Southeast/South). Furthermore, the whole-genome analysis of 30 individuals (42-fold deep coverage) shows that continental admixture rather than local post-Columbian history is the main and complex determinant of the individual amount of deleterious genotypes