44 research outputs found

    Optimal neighborhood indexing for protein similarity search

    Get PDF
    Background: Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet.\ud \ud Results: The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum.\ud \ud Conclusions: We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction

    Fine-Scale Mapping of the 4q24 Locus Identifies Two Independent Loci Associated with Breast Cancer Risk

    Get PDF
    Background: A recent association study identified a common variant (rs9790517) at 4q24 to be associated with breast cancer risk. Independent association signals and potential functional variants in this locus have not been explored. Methods: We conducted a fine-mapping analysis in 55,540 breast cancer cases and 51,168 controls from the Breast Cancer Association Consortium. Results: Conditional analyses identified two independent association signals among women of European ancestry, represented by rs9790517 [conditional P = 2.51 × 10−4; OR, 1.04; 95% confidence interval (CI), 1.02–1.07] and rs77928427 (P = 1.86 × 10−4; OR, 1.04; 95% CI, 1.02–1.07). Functional annotation using data from the Encyclopedia of DNA Elements (ENCODE) project revealed two putative functional variants, rs62331150 and rs73838678 in linkage disequilibrium (LD) with rs9790517 (r2 ≥ 0.90) residing in the active promoter or enhancer, respectively, of the nearest gene, TET2. Both variants are located in DNase I hypersensitivity and transcription factor–binding sites. Using data from both The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), we showed that rs62331150 was associated with level of expression of TET2 in breast normal and tumor tissue. Conclusion: Our study identified two independent association signals at 4q24 in relation to breast cancer risk and suggested that observed association in this locus may be mediated through the regulation of TET2. Impact: Fine-mapping study with large sample size warranted for identification of independent loci for breast cancer risk

    Critical Assessment of Metagenome Interpretation:A benchmark of metagenomics software

    Get PDF
    International audienceIn metagenome analysis, computational methods for assembly, taxonomic profilingand binning are key components facilitating downstream biological datainterpretation. However, a lack of consensus about benchmarking datasets andevaluation metrics complicates proper performance assessment. The CriticalAssessment of Metagenome Interpretation (CAMI) challenge has engaged the globaldeveloper community to benchmark their programs on datasets of unprecedentedcomplexity and realism. Benchmark metagenomes were generated from newlysequenced ~700 microorganisms and ~600 novel viruses and plasmids, includinggenomes with varying degrees of relatedness to each other and to publicly availableones and representing common experimental setups. Across all datasets, assemblyand genome binning programs performed well for species represented by individualgenomes, while performance was substantially affected by the presence of relatedstrains. Taxonomic profiling and binning programs were proficient at high taxonomicranks, with a notable performance decrease below the family level. Parametersettings substantially impacted performances, underscoring the importance ofprogram reproducibility. While highlighting current challenges in computationalmetagenomics, the CAMI results provide a roadmap for software selection to answerspecific research questions

    A case-only study to identify genetic modifiers of breast cancer risk for BRCA1/BRCA2 mutation carriers

    Get PDF
    Breast cancer (BC) risk for BRCA1 and BRCA2 mutation carriers varies by genetic and familial factors. About 50 common variants have been shown to modify BC risk for mutation carriers. All but three, were identified in general population studies. Other mutation carrier-specific susceptibility variants may exist but studies of mutation carriers have so far been underpowered. We conduct a novel case-only genome-wide association study comparing genotype frequencies between 60,212 general population BC cases and 13,007 cases with BRCA1 or BRCA2 mutations. We identify robust novel associations for 2 variants with BC for BRCA1 and 3 for BRCA2 mutation carriers, P < 10−8, at 5 loci, which are not associated with risk in the general population. They include rs60882887 at 11p11.2 where MADD, SP11 and EIF1, genes previously implicated in BC biology, are predicted as potential targets. These findings will contribute towards customising BC polygenic risk scores for BRCA1 and BRCA2 mutation carriers

    Exploring the link between MORF4L1 and risk of breast cancer.

    Get PDF
    INTRODUCTION: Proteins encoded by Fanconi anemia (FA) and/or breast cancer (BrCa) susceptibility genes cooperate in a common DNA damage repair signaling pathway. To gain deeper insight into this pathway and its influence on cancer risk, we searched for novel components through protein physical interaction screens. METHODS: Protein physical interactions were screened using the yeast two-hybrid system. Co-affinity purifications and endogenous co-immunoprecipitation assays were performed to corroborate interactions. Biochemical and functional assays in human, mouse and Caenorhabditis elegans models were carried out to characterize pathway components. Thirteen FANCD2-monoubiquitinylation-positive FA cell lines excluded for genetic defects in the downstream pathway components and 300 familial BrCa patients negative for BRCA1/2 mutations were analyzed for genetic mutations. Common genetic variants were genotyped in 9,573 BRCA1/2 mutation carriers for associations with BrCa risk. RESULTS: A previously identified co-purifying protein with PALB2 was identified, MRG15 (MORF4L1 gene). Results in human, mouse and C. elegans models delineate molecular and functional relationships with BRCA2, PALB2, RAD51 and RPA1 that suggest a role for MRG15 in the repair of DNA double-strand breaks. Mrg15-deficient murine embryonic fibroblasts showed moderate sensitivity to γ-irradiation relative to controls and reduced formation of Rad51 nuclear foci. Examination of mutants of MRG15 and BRCA2 C. elegans orthologs revealed phenocopy by accumulation of RPA-1 (human RPA1) nuclear foci and aberrant chromosomal compactions in meiotic cells. However, no alterations or mutations were identified for MRG15/MORF4L1 in unclassified FA patients and BrCa familial cases. Finally, no significant associations between common MORF4L1 variants and BrCa risk for BRCA1 or BRCA2 mutation carriers were identified: rs7164529, Ptrend = 0.45 and 0.05, P2df = 0.51 and 0.14, respectively; and rs10519219, Ptrend = 0.92 and 0.72, P2df = 0.76 and 0.07, respectively. CONCLUSIONS: While the present study expands on the role of MRG15 in the control of genomic stability, weak associations cannot be ruled out for potential low-penetrance variants at MORF4L1 and BrCa risk among BRCA2 mutation carriers.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Functional mechanisms underlying pleiotropic risk alleles at the 19p13.1 breast-ovarian cancer susceptibility locus

    Get PDF
    A locus at 19p13 is associated with breast cancer (BC) and ovarian cancer (OC) risk. Here we analyse 438 SNPs in this region in 46,451 BC and 15,438 OC cases, 15,252 BRCA1 mutation carriers and 73,444 controls and identify 13 candidate causal SNPs associated with serous OC (P=9.2 × 10-20), ER-negative BC (P=1.1 × 10-13), BRCA1-associated BC (P=7.7 × 10-16) and triple negative BC (P-diff=2 × 10-5). Genotype-gene expression associations are identified for candidate target genes ANKLE1 (P=2 × 10-3) and ABHD8 (P<2 × 10-3). Chromosome conformation capture identifies interactions between four candidate SNPs and ABHD8, and luciferase assays indicate six risk alleles increased transactivation of the ADHD8 promoter. Targeted deletion of a region containing risk SNP rs56069439 in a putative enhancer induces ANKLE1 downregulation; and mRNA stability assays indicate functional effects for an ANKLE1 3′-UTR SNP. Altogether, these data suggest that multiple SNPs at 19p13 regulate ABHD8 and perhaps ANKLE1 expression, and indicate common mechanisms underlying breast and ovarian cancer risk

    Identification of a BRCA2-Specific modifier locus at 6p24 related to breast cancer risk

    Get PDF
    Common genetic variants contribute to the observed variation in breast cancer risk for BRCA2 mutation carriers; those known to date have all been found through population-based genome-wide association studies (GWAS). To comprehensively identify breast cancer risk modifying loci for BRCA2 mutation carriers, we conducted a deep replication of an ongoing GWAS discovery study. Using the ranked P-values of the breast cancer associations with the imputed genotype of 1.4 M SNPs, 19,029 SNPs were selected and designed for inclusion on a custom Illumina array that included a total of 211,155 SNPs as part of a multi-consortial project. DNA samples from 3,881 breast cancer affected and 4,330 unaffected BRCA2 mutation carriers from 47 studies belonging to the Consortium of Investigators of Modifiers of BRCA1/2 were genotyped and available for analysis. We replicated previously reported breast cancer susceptibility alleles in these BRCA2 mutation carriers and for several regions (including FGFR2, MAP3K1, CDKN2A/B, and PTHLH) identified SNPs that have stronger evidence of association than those previously published. We also identified a novel susceptibility allele at 6p24 that was inversely associated with risk in BRCA2 mutation carriers (rs9348512; per allele HR = 0.85, 95% CI 0.80-0.90, P = 3.9×10−8). This SNP was not associated with breast cancer risk either in the general population or in BRCA1 mutation carriers. The locus lies within a region containing TFAP2A, which encodes a transcriptional activation protein that interacts with several tumor suppressor genes. This report identifies the first breast cancer risk locus specific to a BRCA2 mutation background. This comprehensive update of novel and previously reported breast cancer susceptibility loci contributes to the establishment of a panel of SNPs that modify breast cancer risk in BRCA2 mutation carriers. This panel may have clinical utility for women with BRCA2 mutations weighing options for medical prevention of breast cancer

    Genome-Wide Association Study in BRCA1 Mutation Carriers Identifies Novel Loci Associated with Breast and Ovarian Cancer Risk

    Get PDF
    BRCA1-associated breast and ovarian cancer risks can be modified by common genetic variants. To identify further cancer risk-modifying loci, we performed a multi-stage GWAS of 11,705 BRCA1 carriers (of whom 5,920 were diagnosed with breast and 1,839 were diagnosed with ovarian cancer), with a further replication in an additional sample of 2,646 BRCA1 carriers. We identified a novel breast cancer risk modifier locus at 1q32 for BRCA1 carriers (rs2290854, P = 2.7×10-8, HR = 1.14, 95% CI: 1.09-1.20). In addition, we identified two novel ovarian cancer risk modifier loci: 17q21.31 (rs17631303, P = 1.4×10-8, HR = 1.27, 95% CI: 1.17-1.38) and 4q32.3 (rs4691139, P = 3.4×10-8, HR = 1.20, 95% CI: 1.17-1.38). The 4q32.3 locus was not associated with ovarian cancer risk in the general population or BRCA2 carriers, suggesting a BRCA1-specific associat

    An original phylogenetic approach identified mitochondrial haplogroup T1a1 as inversely associated with breast cancer risk in BRCA2 mutation carriers

    Get PDF
    Introduction: Individuals carrying pathogenic mutations in the BRCA1 and BRCA2 genes have a high lifetime risk of breast cancer. BRCA1 and BRCA2 are involved in DNA double-strand break repair, DNA alterations that can be caused by exposure to reactive oxygen species, a main source of which are mitochondria. Mitochondrial genome variations affect electron transport chain efficiency and reactive oxygen species production. Individuals with different mitochondrial haplogroups differ in their metabolism and sensitivity to oxidative stress. Variability in mitochondrial genetic background can alter reactive oxygen species production, leading to cancer risk. In the present study, we tested the hypothesis that mitochondrial haplogroups modify breast cancer risk in BRCA1/2 mutation carriers. Methods: We genotyped 22,214 (11,421 affected, 10,793 unaffected) mutation carriers belonging to the Consortium of Investigators of Modifiers of BRCA1/2 for 129 mitochondrial polymorphisms using the iCOGS array. Haplogroup inference and association detection were performed using a phylogenetic approach. ALTree was applied to explore the reference mitochondrial evolutionary tree and detect subclades enriched in affected or unaffected individuals. Results: We discovered that subclade T1a1 was depleted in affected BRCA2 mutation carriers compared with the rest of clade T (hazard ratio (HR) = 0.55; 95% confidence interval (CI), 0.34 to 0.88; P = 0.01). Compared with the most frequent haplogroup in the general population (that is, H and T clades), the T1a1 haplogroup has a HR of 0.62 (95% CI, 0.40 to 0.95; P = 0.03). We also identified three potential susceptibility loci, including G13708A/rs28359178, which has demonstrated an inverse association with familial breast cancer risk. Conclusions: This study illustrates how original approaches such as the phylogeny-based method we used can empower classical molecular epidemiological studies aimed at identifying association or risk modification effects.Peer reviewe

    Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses.

    Get PDF
    Breast cancer susceptibility variants frequently show heterogeneity in associations by tumor subtype1-3. To identify novel loci, we performed a genome-wide association study including 133,384 breast cancer cases and 113,789 controls, plus 18,908 BRCA1 mutation carriers (9,414 with breast cancer) of European ancestry, using both standard and novel methodologies that account for underlying tumor heterogeneity by estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 status and tumor grade. We identified 32 novel susceptibility loci (P < 5.0 × 10-8), 15 of which showed evidence for associations with at least one tumor feature (false discovery rate < 0.05). Five loci showed associations (P < 0.05) in opposite directions between luminal and non-luminal subtypes. In silico analyses showed that these five loci contained cell-specific enhancers that differed between normal luminal and basal mammary cells. The genetic correlations between five intrinsic-like subtypes ranged from 0.35 to 0.80. The proportion of genome-wide chip heritability explained by all known susceptibility loci was 54.2% for luminal A-like disease and 37.6% for triple-negative disease. The odds ratios of polygenic risk scores, which included 330 variants, for the highest 1% of quantiles compared with middle quantiles were 5.63 and 3.02 for luminal A-like and triple-negative disease, respectively. These findings provide an improved understanding of genetic predisposition to breast cancer subtypes and will inform the development of subtype-specific polygenic risk scores
    corecore