184 research outputs found

    SNPFile – A software library and file format for large scale association mapping and population genetics studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput genotyping technology has enabled cost effective typing of thousands of individuals in hundred of thousands of markers for use in genome wide studies. This vast improvement in data acquisition technology makes it an informatics challenge to efficiently store and manipulate the data. While spreadsheets and at text files were adequate solutions earlier, the increased data size mandates more efficient solutions.</p> <p>Results</p> <p>We describe a new binary file format for SNP data, together with a software library for file manipulation. The file format stores genotype data together with any kind of additional data, using a flexible serialisation mechanism. The format is designed to be IO efficient for the access patterns of most multi-locus analysis methods.</p> <p>Conclusion</p> <p>The new file format has been very useful for our own studies where it has significantly reduced the informatics burden in keeping track of various secondary data, and where the memory and IO efficiency has greatly simplified analysis runs. A main limitation with the file format is that it is only supported by the very limited set of analysis tools developed in our own lab. This is somewhat alleviated by a scripting interfaces that makes it easy to write converters to and from the format.</p

    Estimates of heritable and environmental components of familial breast cancer using family history information

    Get PDF
    Using the Swedish Family-Cancer Database, the increased risk of breast cancer in women with relatives with the disease did not vary with paternal/maternal lineage. Familial breast cancer heritable component was 73% and the environmental proportion 27%. Familial aggregation of breast cancer in women below age 51 years is mainly due to heritable causes

    A fast algorithm for genome-wide haplotype pattern mining

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The <it>Haplotype Pattern Mining </it>(HPM) method is a machine learning approach to do exactly this.</p> <p>Results</p> <p>We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased.</p> <p>Conclusion</p> <p>The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers.</p

    The β€˜Common Disease-Common Variant’ Hypothesis and Familial Risks

    Get PDF
    The recent large genotyping studies have identified a new repertoire of disease susceptibility loci of unknown function, characterized by high allele frequencies and low relative risks, lending support to the common disease-common variant (CDCV) hypothesis. The variants explain a much larger proportion of the disease etiology, measured by the population attributable fraction, than of the familial risk. We show here that if the identified polymorphisms were markers of rarer functional alleles they would explain a much larger proportion of the familial risk. For example, in a plausible scenario where the marker is 10 times more common than the causative allele, the excess familial risk of the causative allele is over 10 times higher than that of the marker allele. However, the population attributable fractions of the two alleles are equal. The penetrance mode of the causative locus may be very difficult to deduce from the apparent penetrance mode of the marker locus

    Comprehensive resequence analysis of a 136Β kb region of human chromosome 8q24 associated with prostate and colon cancers

    Get PDF
    Recently, genome-wide association studies have identified loci across a segment of chromosome 8q24 (128,100,000–128,700,000) associated with the risk of breast, colon and prostate cancers. At least three regions of 8q24 have been independently associated with prostate cancer risk; the most centromeric of which appears to be population specific. Haplotypes in two contiguous but independent loci, marked by rs6983267 and rs1447295, have been identified in the Cancer Genetic Markers of Susceptibility project (http://cgems.cancer.gov), which genotyped more than 5,000 prostate cancer cases and 5,000 controls of European origin. The rs6983267 locus is also strongly associated with colorectal cancer. To ascertain a comprehensive catalog of common single-nucleotide polymorphisms (SNPs) across the two regions, we conducted a resequence analysis of 136Β kb (chr8: 128,473,000–128,609,802) using the Roche/454 next-generation sequencing technology in 39 prostate cancer cases and 40 controls of European origin. We have characterized a comprehensive catalog of common (MAFΒ >Β 1%) SNPs within this region, including 442 novel SNPs and have determined the pattern of linkage disequilibrium across the region. Our study has generated a detailed map of genetic variation across the region, which should be useful for choosing SNPs for fine mapping of association signals in 8q24 and investigations of the functional consequences of select common variants

    Nucleotide Discrimination with DNA Immobilized in the MspA Nanopore

    Get PDF
    Nanopore sequencing has the potential to become a fast and low-cost DNA sequencing platform. An ionic current passing through a small pore would directly map the sequence of single stranded DNA (ssDNA) driven through the constriction. The pore protein, MspA, derived from Mycobacterium smegmatis, has a short and narrow channel constriction ideally suited for nanopore sequencing. To study MspA's ability to resolve nucleotides, we held ssDNA within the pore using a biotin-NeutrAvidin complex. We show that homopolymers of adenine, cytosine, thymine, and guanine in MspA exhibit much larger current differences than in Ξ±-hemolysin. Additionally, methylated cytosine is distinguishable from unmethylated cytosine. We establish that single nucleotide substitutions within homopolymer ssDNA can be detected when held in MspA's constriction. Using genomic single nucleotide polymorphisms, we demonstrate that single nucleotides within random DNA can be identified. Our results indicate that MspA has high signal-to-noise ratio and the single nucleotide sensitivity desired for nanopore sequencing devices

    Familial risks in nervous system tumours: joint Nordic study

    Get PDF
    Background:Familial nervous system cancers are rare and limited data on familial aspects are available particularly on site-specific tumours.Methods:Data from five Nordic countries were used to analyse familial risks of nervous system tumours. Standardised incidence ratios (SIRs) were calculated for offspring of affected relatives compared with offspring of non-affected relatives.Results:The total number of patients with nervous system tumour was 63 307, of whom 32 347 belonged to the offspring generation. Of 851 familial patients (2.6%) in the offspring generation, 42 (4.7%) belonged to the families of a parent and at least two siblings affected. The SIR of brain tumours was 1.7 in offspring of affected parents; it was 2.0 in siblings and 9.4 in families with a parent and sibling affected. For spinal tumours, the SIRs were much higher for offspring of early onset tumours, 14.0 for offspring of affected parents and 22.7 for siblings. The SIRs for peripheral nerve tumours were 16.3 in offspring of affected parents, 27.7 in siblings and 943.9 in multiplex families.Conclusion:The results of this population-based study on medically diagnosed tumours show site-, proband- and age-specific risks for familial tumours, with implications for clinical genetic counselling and identification of the underlying genes.British Journal of Cancer advance online publication, 25 May 2010; doi:10.1038/sj.bjc.6605708 www.bjcancer.com

    Identification of Lck-derived peptides applicable to anti-cancer vaccine for patients with human leukocyte antigen-A3 supertype alleles

    Get PDF
    The identification of peptide vaccine candidates to date has been focused on human leukocyte antigen (HLA)-A2 and -A24 alleles. In this study, we attempted to identify cytotoxic T lymphocyte (CTL)-directed Lck-derived peptides applicable to HLA-A11+, -A31+, or -A33+ cancer patients, because these HLA-A alleles share binding motifs, designated HLA-A3 supertype alleles, and because the Lck is preferentially expressed in metastatic cancer. Twenty-one Lck-derived peptides were prepared based on the binding motif to the HLA-A3 supertype alleles. They were first screened for their recognisability by immunoglobulin G (IgG) in the plasma of prostate cancer patients, and the selected candidates were subsequently tested for their potential to induce peptide-specific CTLs from peripheral blood mononuclear cells of HLA-A3 supertype+ cancer patients. As a result, four Lck peptides were frequently recognised by IgGs, and three of them – Lck90βˆ’99, Lck449βˆ’458, and Lck450βˆ’458 – efficiently induced peptide-specific and cancer-reactive CTLs. Their cytotoxicity towards cancer cells was mainly ascribed to HLA class I-restricted and peptide-specific CD8+ T cells. These results indicate that these three Lck peptides are applicable to HLA-A3 supertype+ cancer patients, especially those with metastasis. This information could facilitate the development of peptide-based anti-cancer vaccine for patients with alleles other than HLA-A2 and -A24

    Familial risk for gastric carcinoma: an updated study from Sweden

    Get PDF
    Reliable data on familial risks are important for clinical counselling and cancer genetics. However, the estimates of familial risk of gastric cancer vary widely. We examined the risk of familial gastric cancer using the updated Swedish Family-Cancer Database with 5358 patients among offspring and 36 486 patients among parents. There were 133 families with one parent and one offspring diagnosed with gastric cancer, and 20 families with two affected offspring. Familial standardised incidence ratios (SIRs) were 1.63 and 2.93 when parents and siblings presented with gastric cancer, respectively. The high sibling risk was owing to cancer in the corpus (SIR 7.28). The SIR for cardia cancer was 1.54 when parents were diagnosed with any gastric cancer. Cardia cancer associated with oesophageal cancer, particularly with oesophageal adenocarcinoma. Among specific histologies, signet ring cancer showed an increase. A few associations were noted for discordant sites, including the urinary bladder and the endometrium. H. pylori infection, although not measured in the present study, is probably an important risk factor for the high sibling risk of corpus cancer. Familial clustering of cardia cancer is independent of H. pylori infection, and may have a genetic basis. The familial association of cardia cancer with oesophageal adenocarcinoma may provide aetiological clues
    • …
    corecore