128,059 research outputs found

    Simultaneous SNP identification in association studies with missing data

    Full text link
    Association testing aims to discover the underlying relationship between genotypes (usually Single Nucleotide Polymorphisms, or SNPs) and phenotypes (attributes, or traits). The typically large data sets used in association testing often contain missing values. Standard statistical methods either impute the missing values using relatively simple assumptions, or delete them, or both, which can generate biased results. Here we describe the Bayesian hierarchical model BAMD (Bayesian Association with Missing Data). BAMD is a Gibbs sampler, in which missing values are multiply imputed based upon all of the available information in the data set. We estimate the parameters and prove that updating one SNP at each iteration preserves the ergodic property of the Markov chain, and at the same time improves computational speed. We also implement a model selection option in BAMD, which enables potential detection of SNP interactions. Simulations show that unbiased estimates of SNP effects are recovered with missing genotype data. Also, we validate associations between SNPs and a carbon isotope discrimination phenotype that were previously reported using a family based method, and discover an additional SNP associated with the trait. BAMD is available as an R-package from http://cran.r-project.org/package=BAMDComment: Published in at http://dx.doi.org/10.1214/11-AOAS516 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    SNP Miniplexes for Individual Identification of Random-Bred Domestic Cats.

    Get PDF
    Phenotypic and genotypic characteristics of the cat can be obtained from single nucleotide polymorphisms (SNPs) analyses of fur. This study developed miniplexes using SNPs with high discriminating power for random-bred domestic cats, focusing on individual and phenotypic identification. Seventy-eight SNPs were investigated using a multiplex PCR followed by a fluorescently labeled single base extension (SBE) technique (SNaPshot(®) ). The SNP miniplexes were evaluated for reliability, reproducibility, sensitivity, species specificity, detection limitations, and assignment accuracy. Six SNPplexes were developed containing 39 intergenic SNPs and 26 phenotypic SNPs, including a sex identification marker, ZFXY. The combined random match probability (cRMP) was 6.58 × 10(-19) across all Western cat populations and the likelihood ratio was 1.52 × 10(18) . These SNPplexes can distinguish individual cats and their phenotypic traits, which could provide insight into crime reconstructions. A SNP database of 237 cats from 13 worldwide populations is now available for forensic applications

    A new and versatile method for the successful conversion of AFLP-TM markers into simple single locus markers

    Get PDF
    Genetic markers can efficiently be obtained by using amplified fragment length polymorphism (AFLP) fingerprinting because no prior information on DNA sequence is required. However, the conversion of AFLP markers from complex fingerprints into simple single locus assays is perceived as problematic because DNA sequence information is required for the design of new locus-specific PCR primers. In addition, single locus polymorphism (SNP) information is required to design an allele-specific assay. This paper describes a new and versatile method for the conversion of AFLP markers into simple assays. The protocol presented in this paper offers solutions for frequently occurring pitfalls and describes a procedure for the identification of the SNP responsible for the AFLP. By following this approach, a high success rate for the conversion of AFLP markers into locus-specific markers was obtained

    Identification of SNP interactions using logic regression

    Get PDF
    Interactions of single nucleotide polymorphisms (SNPs) are assumed to be responsible for complex diseases such as sporadic breast cancer. Important goals of studies concerned with such genetic data are thus to identify combinations of SNPs that lead to a higher risk of developing a disease and to measure the importance of these interactions. There are many approaches based on classification methods such as CART and Random Forests that allow measuring the importance of single variables. But with none of these methods the importance of combinations of variables can be quantified directly. In this paper, we show how logic regression can be employed to identify SNP interactions explanatory for the disease status in a case- control study and propose two measures for quantifying the importance of these interactions for classification. These approaches are then applied, on the one hand, to simulated data sets, and on the other hand, to the SNP data of the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer. --Single Nucleotide Polymorphism,Feature Selection,Variable Importance Measure,GENICA

    Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao

    Full text link
    In order to increase the efficiency of cacao tree resistance to witches¿ broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease. (Résumé d'auteur

    Comparison of TCGA and GENIE genomic datasets for the detection of clinically actionable alterations in breast cancer.

    Get PDF
    Whole exome sequencing (WES), targeted gene panel sequencing and single nucleotide polymorphism (SNP) arrays are increasingly used for the identification of actionable alterations that are critical to cancer care. Here, we compared The Cancer Genome Atlas (TCGA) and the Genomics Evidence Neoplasia Information Exchange (GENIE) breast cancer genomic datasets (array and next generation sequencing (NGS) data) in detecting genomic alterations in clinically relevant genes. We performed an in silico analysis to determine the concordance in the frequencies of actionable mutations and copy number alterations/aberrations (CNAs) in the two most common breast cancer histologies, invasive lobular and invasive ductal carcinoma. We found that targeted sequencing identified a larger number of mutational hotspots and clinically significant amplifications that would have been missed by WES and SNP arrays in many actionable genes such as PIK3CA, EGFR, AKT3, FGFR1, ERBB2, ERBB3 and ESR1. The striking differences between the number of mutational hotspots and CNAs generated from these platforms highlight a number of factors that should be considered in the interpretation of array and NGS-based genomic data for precision medicine. Targeted panel sequencing was preferable to WES to define the full spectrum of somatic mutations present in a tumor

    Similarity Measures for Clustering SNP Data

    Get PDF
    The issue of suitable similarity measures for a particular kind of genetic data – so called SNP data – arises from the GENICA (Interdisciplinary Study Group on Gene Environment Interaction and Breast Cancer in Germany) case-control study of sporadic breast cancer. The GENICA study aims to investigate the influence and interaction of single nucleotide polymorphic (SNP) loci and exogenous risk factors. A single nucleotide polymorphism is a point mutation that is present in at least 1 % of a population. SNPs are the most common form of human genetic variations. In particular, we consider 65 SNP loci and 2 insertions of longer sequences in genes involved in the metabolism of hormones, xenobiotics and drugs as well as in the repair of DNA and signal transduction. Assuming that these single nucleotide changes may lead, for instance, to altered enzymes or to a reduced or enhanced amount of the original enzymes – with each alteration alone having minor effects – we aim to detect combinations of SNPs that under certain environmental conditions increase the risk of sporadic breast cancer. The search for patterns in the present data set may be performed by a variety of clustering and classification approaches. We consider here the problem of suitable measures of proximity of two variables or subjects as an indispensable basis for a further cluster analysis. Generally, clustering approaches are a useful tool to detect structures and to generate hypothesis about potential relationships in complex data situations. Searching for patterns in the data there are two possible objectives: the identification of groups of similar objects or subjects or the identification of groups of similar variables within the whole or within subpopulations. Comparing the individual genetic profiles as well as comparing the genetic information across subpopulations we discuss possible choices of similarity measures, in particular similarity measures based on the counts of matches and mismatches. New matching coefficients are introduced with a more flexible weighting scheme to account for the general problem of the comparison of SNP data: The large proportion of homozygous reference sequences relative to the homo- and heterozygous SNPs is masking the accordances and differences of interest. --GENICA,single nucleotide polymorphism (SNP),sporadic breast cancer,similarity,Matching Coefficient,Flexible Matching Coefficient
    corecore