46 research outputs found

    A graph-based approach for designing extensible pipelines

    Get PDF
    Abstract"/p" "p"Background"/p" "p"In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps."/p" "p"Results"/p" "p"We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at "url"http://code.google.com/p/dynamic-pipeline"/url". The system has been tested on Linux and Windows platforms."/p" "p"Conclusions"/p" "p"Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats. Document type: Articl

    Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies

    Get PDF
    BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses

    Population Genetics of GYPB and Association Study between GYPB*S/s Polymorphism and Susceptibility to P. falciparum Infection in the Brazilian Amazon

    Get PDF
    Merozoites of Plasmodium falciparum invade through several pathways using different RBC receptors. Field isolates appear to use a greater variability of these receptors than laboratory isolates. Brazilian field isolates were shown to mostly utilize glycophorin A-independent invasion pathways via glycophorin B (GPB) and/or other receptors. The Brazilian population exhibits extensive polymorphism in blood group antigens, however, no studies have been done to relate the prevalence of the antigens that function as receptors for P. falciparum and the ability of the parasite to invade. Our study aimed to establish whether variation in the GYPB*S/s alleles influences susceptibility to infection with P. falciparum in the admixed population of Brazil.Two groups of Brazilian Amazonians from Porto Velho were studied: P. falciparum infected individuals (cases); and uninfected individuals who were born and/or have lived in the same endemic region for over ten years, were exposed to infection but have not had malaria over the study period (controls). The GPB Ss phenotype and GYPB*S/s alleles were determined by standard methods. Sixty two Ancestry Informative Markers were genotyped on each individual to estimate admixture and control its potential effect on the association between frequency of GYPB*S and malaria infection.GYPB*S is associated with host susceptibility to infection with P. falciparum; GYPB*S/GYPB*S and GYPB*S/GYPB*s were significantly more prevalent in the in the P. falciparum infected individuals than in the controls (69.87% vs. 49.75%; P<0.02). Moreover, population genetics tests applied on the GYPB exon sequencing data suggest that natural selection shaped the observed pattern of nucleotide diversity.Epidemiological and evolutionary approaches suggest an important role for the GPB receptor in RBC invasion by P. falciparum in Brazilian Amazons. Moreover, an increased susceptibility to infection by this parasite is associated with the GPB S+ variant in this population

    XAF1 as a modifier of p53 function and cancer susceptibility

    Get PDF
    Cancer risk is highly variable in carriers of the common TP53-R337H founder allele, possibly due to the influence of modifier genes. Whole-genome sequencing identified a variant in the tumor suppressor XAF1 (E134*/Glu134Ter/rs146752602) in a subset of R337H carriers. Haplotype-defining variants were verified in 203 patients with cancer, 582 relatives, and 42,438 newborns. The compound mutant haplotype was enriched in patients with cancer, conferring risk for sarcoma (P = 0.003) and subsequent malignancies (P = 0.006). Functional analyses demonstrated that wild-type XAF1 enhances transactivation of wild-type and hypomorphic TP53 variants, whereas XAF1-E134* is markedly attenuated in this activity. We propose that cosegregation of XAF1-E134* and TP53-R337H mutations leads to a more aggressive cancer phenotype than TP53-R337H alone, with implications for genetic counseling and clinical management of hypomorphic TP53 mutant carriers.Fil: Pinto, Emilia M.. St. Jude Children's Research Hospital; Estados UnidosFil: Figueiredo, Bonald C.. Instituto de Pesquisa Pelé Pequeno Principe; BrasilFil: Chen, Wenan. St. Jude Children's Research Hospital; Estados UnidosFil: Galvao, Henrique C.R.. Hospital de Câncer de Barretos; BrasilFil: Formiga, Maria Nirvana. A.c.camargo Cancer Center; BrasilFil: Fragoso, Maria Candida B.V.. Universidade de Sao Paulo; BrasilFil: Ashton Prolla, Patricia. Universidade Federal do Rio Grande do Sul; BrasilFil: Ribeiro, Enilze M.S.F.. Universidade Federal do Paraná; BrasilFil: Felix, Gabriela. Universidade Federal da Bahia; BrasilFil: Costa, Tatiana E.B.. Hospital Infantil Joana de Gusmao; BrasilFil: Savage, Sharon A.. National Cancer Institute; Estados UnidosFil: Yeager, Meredith. National Cancer Institute; Estados UnidosFil: Palmero, Edenir I.. Hospital de Câncer de Barretos; BrasilFil: Volc, Sahlua. Hospital de Câncer de Barretos; BrasilFil: Salvador, Hector. Hospital Sant Joan de Deu Barcelona; EspañaFil: Fuster Soler, Jose Luis. Hospital Clínico Universitario Virgen de la Arrixaca; EspañaFil: Lavarino, Cinzia. Hospital Sant Joan de Deu Barcelona; EspañaFil: Chantada, Guillermo Luis. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. St. Jude Children's Research Hospital; Estados UnidosFil: Vaur, Dominique. Comprehensive Cancer Center François Baclesse; FranciaFil: Odone Filho, Vicente. Universidade de Sao Paulo; BrasilFil: Brugières, Laurence. Institut de Cancerologie Gustave Roussy; FranciaFil: Else, Tobias. University of Michigan; Estados UnidosFil: Stoffel, Elena M.. University of Michigan; Estados UnidosFil: Maxwell, Kara N.. University of Pennsylvania; Estados UnidosFil: Achatz, Maria Isabel. Hospital Sirio-libanês; BrasilFil: Kowalski, Luis. A.c.camargo Cancer Center; BrasilFil: De Andrade, Kelvin C.. National Cancer Institute; Estados UnidosFil: Pappo, Alberto. St. Jude Children's Research Hospital; Estados UnidosFil: Letouze, Eric. Centre de Recherche Des Cordeliers; FranciaFil: Latronico, Ana Claudia. Universidade de Sao Paulo; BrasilFil: Mendonca, Berenice B.. Universidade de Sao Paulo; BrasilFil: Almeida, Madson Q.. Universidade de Sao Paulo; BrasilFil: Brondani, Vania B.. Universidade de Sao Paulo; BrasilFil: Bittar, Camila M.. Universidade Federal do Rio Grande do Sul; BrasilFil: Soares, Emerson W.S.. Hospital Do Câncer de Cascavel; BrasilFil: Mathias, Carolina. Universidade Federal do Paraná; BrasilFil: Ramos, Cintia R.N.. Hospital de Câncer de Barretos; BrasilFil: Machado, Moara. National Cancer Institute; Estados UnidosFil: Zhou, Weiyin. National Cancer Institute; Estados UnidosFil: Jones, Kristine. National Cancer Institute; Estados UnidosFil: Vogt, Aurelie. National Cancer Institute; Estados UnidosFil: Klincha, Payal P.. National Cancer Institute; Estados UnidosFil: Santiago, Karina M.. A.c.camargo Cancer Center; BrasilFil: Komechen, Heloisa. Instituto de Pesquisa Pelé Pequeno Principe; BrasilFil: Paraizo, Mariana M.. Instituto de Pesquisa Pelé Pequeno Principe; BrasilFil: Parise, Ivy Z.S.. Instituto de Pesquisa Pelé Pequeno Principe; BrasilFil: Hamilton, Kayla V.. St. Jude Children's Research Hospital; Estados UnidosFil: Wang, Jinling. St. Jude Children's Research Hospital; Estados UnidosFil: Rampersaud, Evadnie. St. Jude Children's Research Hospital; Estados UnidosFil: Clay, Michael R.. St. Jude Children's Research Hospital; Estados UnidosFil: Murphy, Andrew J.. St. Jude Children's Research Hospital; Estados UnidosFil: Lalli, Enzo. Institut de Pharmacologie Moléculaire et Cellulaire; FranciaFil: Nichols, Kim E.. St. Jude Children's Research Hospital; Estados UnidosFil: Ribeiro, Raul C.. St. Jude Children's Research Hospital; Estados UnidosFil: Rodriguez-Galindo, Carlos. St. Jude Children's Research Hospital; Estados UnidosFil: Korbonits, Marta. Queen Mary University of London; Reino UnidoFil: Zhang, Jinghui. St. Jude Children's Research Hospital; Estados UnidosFil: Thomas, Mark G.. Colegio Universitario de Londres; Reino UnidoFil: Connelly, Jon P.. St. Jude Children's Research Hospital; Estados UnidosFil: Pruett-Miller, Shondra. St. Jude Children's Research Hospital; Estados UnidosFil: Diekmann, Yoan. Colegio Universitario de Londres; Reino UnidoFil: Neale, Geoffrey. St. Jude Children's Research Hospital; Estados UnidosFil: Wu, Gang. St. Jude Children's Research Hospital; Estados UnidosFil: Zambetti, Gerard P.. St. Jude Children's Research Hospital; Estados Unido

    A saturated map of common genetic variants associated with human height

    Get PDF
    Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40–50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10–20% (14–24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.publishedVersionPeer reviewe

    Genome-wide homozygosity and risk of four non-Hodgkin lymphoma subtypes

    Get PDF
    AIM: Recessive genetic variation is thought to play a role in non-Hodgkin lymphoma (NHL) etiology. Runs of homozygosity (ROH), defined based on long, continuous segments of homozygous SNPs, can be used to estimate both measured and unmeasured recessive genetic variation. We sought to examine genome-wide homozygosity and NHL risk. METHODS: We used data from eight genome-wide association studies of four common NHL subtypes: 3061 chronic lymphocytic leukemia (CLL), 3814 diffuse large B-cell lymphoma (DLBCL), 2784 follicular lymphoma (FL), and 808 marginal zone lymphoma (MZL) cases, as well as 9374 controls. We examined the effect of homozygous variation on risk by: (1) estimating the fraction of the autosome containing runs of homozygosity (FROH); (2) calculating an inbreeding coefficient derived from the correlation among uniting gametes (F3); and (3) examining specific autosomal regions containing ROH. For each, we calculated beta coefficients and standard errors using logistic regression and combined estimates across studies using random-effects meta-analysis. RESULTS: We discovered positive associations between FROH and CLL (β = 21.1, SE = 4.41, P = 1.6 × 10(-6)) and FL (β = 11.4, SE = 5.82, P = 0.02) but not DLBCL (P = 1.0) or MZL (P = 0.91). For F3, we observed an association with CLL (β = 27.5, SE = 6.51, P = 2.4 × 10(-5)). We did not find evidence of associations with specific ROH, suggesting that the associations observed with FROH and F3 for CLL and FL risk were not driven by a single region of homozygosity. CONCLUSION: Our findings support the role of recessive genetic variation in the etiology of CLL and FL; additional research is needed to identify the specific loci associated with NHL risk

    Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia.

    Get PDF
    Chronic lymphocytic leukemia (CLL) is a common lymphoid malignancy with strong heritability. To further understand the genetic susceptibility for CLL and identify common loci associated with risk, we conducted a meta-analysis of four genome-wide association studies (GWAS) composed of 3,100 cases and 7,667 controls with follow-up replication in 1,958 cases and 5,530 controls. Here we report three new loci at 3p24.1 (rs9880772, EOMES, P=2.55 × 10(-11)), 6p25.2 (rs73718779, SERPINB6, P=1.97 × 10(-8)) and 3q28 (rs9815073, LPP, P=3.62 × 10(-8)), as well as a new independent SNP at the known 2q13 locus (rs9308731, BCL2L11, P=1.00 × 10(-11)) in the combined analysis. We find suggestive evidence (P<5 × 10(-7)) for two additional new loci at 4q24 (rs10028805, BANK1, P=7.19 × 10(-8)) and 3p22.2 (rs1274963, CSRNP1, P=2.12 × 10(-7)). Pathway analyses of new and known CLL loci consistently show a strong role for apoptosis, providing further evidence for the importance of this biological pathway in CLL susceptibility
    corecore