39 research outputs found

    A graph-based approach for designing extensible pipelines

    Get PDF
    Abstract"/p" "p"Background"/p" "p"In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps."/p" "p"Results"/p" "p"We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at "url"http://code.google.com/p/dynamic-pipeline"/url". The system has been tested on Linux and Windows platforms."/p" "p"Conclusions"/p" "p"Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats. Document type: Articl

    Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies

    Get PDF
    BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses

    XAF1 as a modifier of p53 function and cancer susceptibility

    Get PDF
    Cancer risk is highly variable in carriers of the common TP53-R337H founder allele, possibly due to the influence of modifier genes. Whole-genome sequencing identified a variant in the tumor suppressor XAF1 (E134*/Glu134Ter/rs146752602) in a subset of R337H carriers. Haplotype-defining variants were verified in 203 patients with cancer, 582 relatives, and 42,438 newborns. The compound mutant haplotype was enriched in patients with cancer, conferring risk for sarcoma (P = 0.003) and subsequent malignancies (P = 0.006). Functional analyses demonstrated that wild-type XAF1 enhances transactivation of wild-type and hypomorphic TP53 variants, whereas XAF1-E134* is markedly attenuated in this activity. We propose that cosegregation of XAF1-E134* and TP53-R337H mutations leads to a more aggressive cancer phenotype than TP53-R337H alone, with implications for genetic counseling and clinical management of hypomorphic TP53 mutant carriers.Fil: Pinto, Emilia M.. St. Jude Children's Research Hospital; Estados UnidosFil: Figueiredo, Bonald C.. Instituto de Pesquisa Pelé Pequeno Principe; BrasilFil: Chen, Wenan. St. Jude Children's Research Hospital; Estados UnidosFil: Galvao, Henrique C.R.. Hospital de Câncer de Barretos; BrasilFil: Formiga, Maria Nirvana. A.c.camargo Cancer Center; BrasilFil: Fragoso, Maria Candida B.V.. Universidade de Sao Paulo; BrasilFil: Ashton Prolla, Patricia. Universidade Federal do Rio Grande do Sul; BrasilFil: Ribeiro, Enilze M.S.F.. Universidade Federal do Paraná; BrasilFil: Felix, Gabriela. Universidade Federal da Bahia; BrasilFil: Costa, Tatiana E.B.. Hospital Infantil Joana de Gusmao; BrasilFil: Savage, Sharon A.. National Cancer Institute; Estados UnidosFil: Yeager, Meredith. National Cancer Institute; Estados UnidosFil: Palmero, Edenir I.. Hospital de Câncer de Barretos; BrasilFil: Volc, Sahlua. Hospital de Câncer de Barretos; BrasilFil: Salvador, Hector. Hospital Sant Joan de Deu Barcelona; EspañaFil: Fuster Soler, Jose Luis. Hospital Clínico Universitario Virgen de la Arrixaca; EspañaFil: Lavarino, Cinzia. Hospital Sant Joan de Deu Barcelona; EspañaFil: Chantada, Guillermo Luis. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. St. Jude Children's Research Hospital; Estados UnidosFil: Vaur, Dominique. Comprehensive Cancer Center François Baclesse; FranciaFil: Odone Filho, Vicente. Universidade de Sao Paulo; BrasilFil: Brugières, Laurence. Institut de Cancerologie Gustave Roussy; FranciaFil: Else, Tobias. University of Michigan; Estados UnidosFil: Stoffel, Elena M.. University of Michigan; Estados UnidosFil: Maxwell, Kara N.. University of Pennsylvania; Estados UnidosFil: Achatz, Maria Isabel. Hospital Sirio-libanês; BrasilFil: Kowalski, Luis. A.c.camargo Cancer Center; BrasilFil: De Andrade, Kelvin C.. National Cancer Institute; Estados UnidosFil: Pappo, Alberto. St. Jude Children's Research Hospital; Estados UnidosFil: Letouze, Eric. Centre de Recherche Des Cordeliers; FranciaFil: Latronico, Ana Claudia. Universidade de Sao Paulo; BrasilFil: Mendonca, Berenice B.. Universidade de Sao Paulo; BrasilFil: Almeida, Madson Q.. Universidade de Sao Paulo; BrasilFil: Brondani, Vania B.. Universidade de Sao Paulo; BrasilFil: Bittar, Camila M.. Universidade Federal do Rio Grande do Sul; BrasilFil: Soares, Emerson W.S.. Hospital Do Câncer de Cascavel; BrasilFil: Mathias, Carolina. Universidade Federal do Paraná; BrasilFil: Ramos, Cintia R.N.. Hospital de Câncer de Barretos; BrasilFil: Machado, Moara. National Cancer Institute; Estados UnidosFil: Zhou, Weiyin. National Cancer Institute; Estados UnidosFil: Jones, Kristine. National Cancer Institute; Estados UnidosFil: Vogt, Aurelie. National Cancer Institute; Estados UnidosFil: Klincha, Payal P.. National Cancer Institute; Estados UnidosFil: Santiago, Karina M.. A.c.camargo Cancer Center; BrasilFil: Komechen, Heloisa. Instituto de Pesquisa Pelé Pequeno Principe; BrasilFil: Paraizo, Mariana M.. Instituto de Pesquisa Pelé Pequeno Principe; BrasilFil: Parise, Ivy Z.S.. Instituto de Pesquisa Pelé Pequeno Principe; BrasilFil: Hamilton, Kayla V.. St. Jude Children's Research Hospital; Estados UnidosFil: Wang, Jinling. St. Jude Children's Research Hospital; Estados UnidosFil: Rampersaud, Evadnie. St. Jude Children's Research Hospital; Estados UnidosFil: Clay, Michael R.. St. Jude Children's Research Hospital; Estados UnidosFil: Murphy, Andrew J.. St. Jude Children's Research Hospital; Estados UnidosFil: Lalli, Enzo. Institut de Pharmacologie Moléculaire et Cellulaire; FranciaFil: Nichols, Kim E.. St. Jude Children's Research Hospital; Estados UnidosFil: Ribeiro, Raul C.. St. Jude Children's Research Hospital; Estados UnidosFil: Rodriguez-Galindo, Carlos. St. Jude Children's Research Hospital; Estados UnidosFil: Korbonits, Marta. Queen Mary University of London; Reino UnidoFil: Zhang, Jinghui. St. Jude Children's Research Hospital; Estados UnidosFil: Thomas, Mark G.. Colegio Universitario de Londres; Reino UnidoFil: Connelly, Jon P.. St. Jude Children's Research Hospital; Estados UnidosFil: Pruett-Miller, Shondra. St. Jude Children's Research Hospital; Estados UnidosFil: Diekmann, Yoan. Colegio Universitario de Londres; Reino UnidoFil: Neale, Geoffrey. St. Jude Children's Research Hospital; Estados UnidosFil: Wu, Gang. St. Jude Children's Research Hospital; Estados UnidosFil: Zambetti, Gerard P.. St. Jude Children's Research Hospital; Estados Unido

    A saturated map of common genetic variants associated with human height

    Get PDF
    Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40–50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10–20% (14–24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.publishedVersionPeer reviewe

    Genome-wide homozygosity and risk of four non-Hodgkin lymphoma subtypes

    Get PDF
    AIM: Recessive genetic variation is thought to play a role in non-Hodgkin lymphoma (NHL) etiology. Runs of homozygosity (ROH), defined based on long, continuous segments of homozygous SNPs, can be used to estimate both measured and unmeasured recessive genetic variation. We sought to examine genome-wide homozygosity and NHL risk. METHODS: We used data from eight genome-wide association studies of four common NHL subtypes: 3061 chronic lymphocytic leukemia (CLL), 3814 diffuse large B-cell lymphoma (DLBCL), 2784 follicular lymphoma (FL), and 808 marginal zone lymphoma (MZL) cases, as well as 9374 controls. We examined the effect of homozygous variation on risk by: (1) estimating the fraction of the autosome containing runs of homozygosity (FROH); (2) calculating an inbreeding coefficient derived from the correlation among uniting gametes (F3); and (3) examining specific autosomal regions containing ROH. For each, we calculated beta coefficients and standard errors using logistic regression and combined estimates across studies using random-effects meta-analysis. RESULTS: We discovered positive associations between FROH and CLL (β = 21.1, SE = 4.41, P = 1.6 × 10(-6)) and FL (β = 11.4, SE = 5.82, P = 0.02) but not DLBCL (P = 1.0) or MZL (P = 0.91). For F3, we observed an association with CLL (β = 27.5, SE = 6.51, P = 2.4 × 10(-5)). We did not find evidence of associations with specific ROH, suggesting that the associations observed with FROH and F3 for CLL and FL risk were not driven by a single region of homozygosity. CONCLUSION: Our findings support the role of recessive genetic variation in the etiology of CLL and FL; additional research is needed to identify the specific loci associated with NHL risk

    Evolutionary dynamics of the human NADPH oxidase genes CYBB, CYBA, NCF2, and NCF4: Functional implications

    No full text
    The phagocyte NADPH oxidase catalyzes the reduction of O2 to reactive oxygen species with microbicidal activity. It is composed of two membrane-spanning subunits, gp91-phox and p22-phox (encoded by CYBB and CYBA, respectively), and three cytoplasmic subunits, p40-phox, p47-phox, and p67-phox (encoded by NCF4, NCF1, and NCF2, respectively). Mutations in any of these genes can result in chronic granulomatous disease, a primary immunodeficiency characterized by recurrent infections. Using evolutionary mapping, we determined that episodes of adaptive natural selection have shaped the extracellular portion of gp91-phox during the evolution of mammals, which suggests that this region may have a function in host-pathogen interactions. On the basis of a resequencing analysis of approximately 35 kb of CYBB, CYBA, NCF2, and NCF4 in 102 ethnically diverse individuals (24 of African ancestry, 31 of European ancestry, 24 of Asian/Oceanians, and 23 US Hispanics), we show that the pattern of CYBA diversity is compatible with balancing natural selection, perhaps mediated by catalase-positive pathogens. NCF2 in Asian populations shows a pattern of diversity characterized by a differentiated haplotype structure. Our study provides insight into the role of pathogen-driven natural selection in an innate immune pathway and sheds light on the role of CYBA in endothelial, nonphagocytic NADPH oxidases, which are relevant in the pathogenesis of cardiovascular and other complex diseases
    corecore