130 research outputs found

    Mitochondrial DNA Diversity in South America and the Genetic History of Andean Highlanders

    Get PDF
    We analyzed mtDNA sequence variation in 590 individuals from 18 south Amerindian populations. The spatial pattern of mtDNA diversity in these populations fits well the model proposed on the basis of Y-chromosome data. We found evidence of a differential action of genetic drift and gene flow in western and eastern populations, which has led to genetic divergence in the latter but not in the former. Although it is not possible to identify a pattern of genetic variation common to all South America, when western and eastern populations are analyzed separately, the mtDNA diversity in both regions fits the isolation-by-distance model, suggesting independent evolutionary dynamics. Maximum-likelihood estimates of divergence times between central and south Amerindian populations fall between 13,000 and 19,000 years, which is consistent with a Pleistocenic peopling of South America. Moreover, comparison of among-population variability of mtDNA and Y-chromosome DNA seems to indicate that South America is the only continent where the levels of differentiation are similar for maternal and paternal lineages

    A graph-based approach for designing extensible pipelines

    Get PDF
    Abstract"/p" "p"Background"/p" "p"In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps."/p" "p"Results"/p" "p"We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at "url"http://code.google.com/p/dynamic-pipeline"/url". The system has been tested on Linux and Windows platforms."/p" "p"Conclusions"/p" "p"Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats. Document type: Articl

    Genomic African and Native American Ancestry and Chagas Disease: The Bambui (Brazil) Epigen Cohort Study of Aging.

    Get PDF
    BackgroundThe influence of genetic ancestry on Trypanosoma cruzi infection and Chagas disease outcomes is unknown.Methodology/principal findingsWe used 370,539 Single Nucleotide Polymorphisms (SNPs) to examine the association between individual proportions of African, European and Native American genomic ancestry with T. cruzi infection and related outcomes in 1,341 participants (aged ≥ 60 years) of the Bambui (Brazil) population-based cohort study of aging. Potential confounding variables included sociodemographic characteristics and an array of health measures. The prevalence of T. cruzi infection was 37.5% and 56.3% of those infected had a major ECG abnormality. Baseline T. cruzi infection was correlated with higher levels of African and Native American ancestry, which in turn were strongly associated with poor socioeconomic circumstances. Cardiomyopathy in infected persons was not significantly associated with African or Native American ancestry levels. Infected persons with a major ECG abnormality were at increased risk of 15-year mortality relative to their counterparts with no such abnormalities (adjusted hazard ratio = 1.80; 95% 1.41, 2.32). African and Native American ancestry levels had no significant effect modifying this association.Conclusions/significanceOur findings indicate that African and Native American ancestry have no influence on the presence of major ECG abnormalities and had no influence on the ability of an ECG abnormality to predict mortality in older people infected with T. cruzi. In contrast, our results revealed a strong and independent association between prevalent T. cruzi infection and higher levels of African and Native American ancestry. Whether this association is a consequence of genetic background or differential exposure to infection remains to be determined

    Diversity in the Glucose Transporter-4 Gene (SLC2A4) in Humans Reflects the Action of Natural Selection along the Old-World Primates Evolution

    Get PDF
    BACKGROUND: Glucose is an important source of energy for living organisms. In vertebrates it is ingested with the diet and transported into the cells by conserved mechanisms and molecules, such as the trans-membrane Glucose Transporters (GLUTs). Members of this family have tissue specific expression, biochemical properties and physiologic functions that together regulate glucose levels and distribution. GLUT4 -coded by SLC2A4 (17p13) is an insulin-sensitive transporter with a critical role in glucose homeostasis and diabetes pathogenesis, preferentially expressed in the adipose tissue, heart muscle and skeletal muscle. We tested the hypothesis that natural selection acted on SLC2A4. METHODOLOGY/PRINCIPAL FINDINGS: We re-sequenced SLC2A4 and genotyped 104 SNPs along a approximately 1 Mb region flanking this gene in 102 ethnically diverse individuals. Across the studied populations (African, European, Asian and Latin-American), all the eight common SNPs are concentrated in the N-terminal region upstream of exon 7 ( approximately 3700 bp), while the C-terminal region downstream of intron 6 ( approximately 2600 bp) harbors only 6 singletons, a pattern that is not compatible with neutrality for this part of the gene. Tests of neutrality based on comparative genomics suggest that: (1) episodes of natural selection (likely a selective sweep) predating the coalescent of human lineages, within the last 25 million years, account for the observed reduced diversity downstream of intron 6 and, (2) the target of natural selection may not be in the SLC2A4 coding sequence. CONCLUSIONS: We propose that the contrast in the pattern of genetic variation between the N-terminal and C-terminal regions are signatures of the action of natural selection and thus follow-up studies should investigate the functional importance of different regions of the SLC2A4 gene

    Limited diversity of Anopheles darlingi in the Peruvian Amazon region of Iquitos.

    Get PDF
    Anopheles darlingi is the most important malaria vector in the Amazon basin of South America, and is capable of transmitting both Plasmodium falciparum and P. vivax. To understand the genetic structure of this vector in the Amazonian region of Peru, a simple polymerase chain reaction (PCR)-based test to identify this species of mosquito was used. A random amplified polymorphic DNA-PCR was used to study genetic variation at the micro-geographic level in nine geographically separate populations of An. darlingi collected in areas with different degrees of deforestation surrounding the city of Iquitos. Within-population genetic diversity in nine populations, as quantified by the expected heterozygosity (H(E)), ranged from 0.27 to 0.32. Average genetic distance (F(ST)) among these populations was 0.017. These results show that the nine studied populations are highly homogeneous, suggesting that strategies can be developed to combat this malaria vector as a single epidemiologic unit

    Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies

    Get PDF
    BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses

    Biogeographical ancestry is associated with socioenvironmental conditions and infections in a Latin American urban population.

    Get PDF
    Racial inequalities are observed for different diseases and are mainly caused by differences in socioeconomic status between ethnoracial groups. Genetic factors have also been implicated, and recently, several studies have investigated the association between biogeographical ancestry (BGA) and complex diseases. However, the role of BGA as a proxy for non-genetic health determinants has been little investigated. Similarly, studies comparing the association of BGA and self-reported skin colour with these determinants are scarce. Here, we report the association of BGA and self-reported skin colour with socioenvironmental conditions and infections. We studied 1246 children living in a Brazilian urban poor area. The BGA was estimated using 370,539 genome-wide autosomal markers. Standardised questionnaires were administered to the children's guardians to evaluate socioenvironmental conditions. Infection (or pathogen exposure) was defined by the presence of positive serologic test results for IgG to seven pathogens (Toxocara spp, Toxoplasma gondii, Helicobacter pylori, and hepatitis A, herpes simplex, herpes zoster and Epstein-Barr viruses) and the presence of intestinal helminth eggs in stool samples (Ascaris lumbricoides and Trichiuris trichiura). African ancestry was negatively associated with maternal education and household income and positively associated with infections and variables, indicating poorer housing and living conditions. The self-reported skin colour was associated with infections only. In stratified analyses, the proportion of African ancestry was associated with most of the outcomes investigated, particularly among admixed individuals. In conclusion, BGA was associated with socioenvironmental conditions and infections even in a low-income and highly admixed population, capturing differences that self-reported skin colour miss. Importantly, our findings suggest caution in interpreting significant associations between BGA and diseases as indicative of the genetic factors involved

    Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil

    Get PDF
    As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS
    corecore