128,059 research outputs found
Simultaneous SNP identification in association studies with missing data
Association testing aims to discover the underlying relationship between
genotypes (usually Single Nucleotide Polymorphisms, or SNPs) and phenotypes
(attributes, or traits). The typically large data sets used in association
testing often contain missing values. Standard statistical methods either
impute the missing values using relatively simple assumptions, or delete them,
or both, which can generate biased results. Here we describe the Bayesian
hierarchical model BAMD (Bayesian Association with Missing Data). BAMD is a
Gibbs sampler, in which missing values are multiply imputed based upon all of
the available information in the data set. We estimate the parameters and prove
that updating one SNP at each iteration preserves the ergodic property of the
Markov chain, and at the same time improves computational speed. We also
implement a model selection option in BAMD, which enables potential detection
of SNP interactions. Simulations show that unbiased estimates of SNP effects
are recovered with missing genotype data. Also, we validate associations
between SNPs and a carbon isotope discrimination phenotype that were previously
reported using a family based method, and discover an additional SNP associated
with the trait. BAMD is available as an R-package from
http://cran.r-project.org/package=BAMDComment: Published in at http://dx.doi.org/10.1214/11-AOAS516 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
SNP Miniplexes for Individual Identification of Random-Bred Domestic Cats.
Phenotypic and genotypic characteristics of the cat can be obtained from single nucleotide polymorphisms (SNPs) analyses of fur. This study developed miniplexes using SNPs with high discriminating power for random-bred domestic cats, focusing on individual and phenotypic identification. Seventy-eight SNPs were investigated using a multiplex PCR followed by a fluorescently labeled single base extension (SBE) technique (SNaPshot(®) ). The SNP miniplexes were evaluated for reliability, reproducibility, sensitivity, species specificity, detection limitations, and assignment accuracy. Six SNPplexes were developed containing 39 intergenic SNPs and 26 phenotypic SNPs, including a sex identification marker, ZFXY. The combined random match probability (cRMP) was 6.58 × 10(-19) across all Western cat populations and the likelihood ratio was 1.52 × 10(18) . These SNPplexes can distinguish individual cats and their phenotypic traits, which could provide insight into crime reconstructions. A SNP database of 237 cats from 13 worldwide populations is now available for forensic applications
A new and versatile method for the successful conversion of AFLP-TM markers into simple single locus markers
Genetic markers can efficiently be obtained by using amplified fragment length polymorphism (AFLP) fingerprinting because no prior information on DNA sequence is required. However, the conversion of AFLP markers from complex fingerprints into simple single locus assays is perceived as problematic because DNA sequence information is required for the design of new locus-specific PCR primers. In addition, single locus polymorphism (SNP) information is required to design an allele-specific assay. This paper describes a new and versatile method for the conversion of AFLP markers into simple assays. The protocol presented in this paper offers solutions for frequently occurring pitfalls and describes a procedure for the identification of the SNP responsible for the AFLP. By following this approach, a high success rate for the conversion of AFLP markers into locus-specific markers was obtained
Identification of SNP interactions using logic regression
Interactions of single nucleotide polymorphisms (SNPs) are assumed to be responsible for complex diseases such as sporadic breast cancer. Important goals of studies concerned with such genetic data are thus to identify combinations of SNPs that lead to a higher risk of developing a disease and to measure the importance of these interactions. There are many approaches based on classification methods such as CART and Random Forests that allow measuring the importance of single variables. But with none of these methods the importance of combinations of variables can be quantified directly. In this paper, we show how logic regression can be employed to identify SNP interactions explanatory for the disease status in a case- control study and propose two measures for quantifying the importance of these interactions for classification. These approaches are then applied, on the one hand, to simulated data sets, and on the other hand, to the SNP data of the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer. --Single Nucleotide Polymorphism,Feature Selection,Variable Importance Measure,GENICA
Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao
In order to increase the efficiency of cacao tree resistance to witches¿ broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease. (Résumé d'auteur
Comparison of TCGA and GENIE genomic datasets for the detection of clinically actionable alterations in breast cancer.
Whole exome sequencing (WES), targeted gene panel sequencing and single nucleotide polymorphism (SNP) arrays are increasingly used for the identification of actionable alterations that are critical to cancer care. Here, we compared The Cancer Genome Atlas (TCGA) and the Genomics Evidence Neoplasia Information Exchange (GENIE) breast cancer genomic datasets (array and next generation sequencing (NGS) data) in detecting genomic alterations in clinically relevant genes. We performed an in silico analysis to determine the concordance in the frequencies of actionable mutations and copy number alterations/aberrations (CNAs) in the two most common breast cancer histologies, invasive lobular and invasive ductal carcinoma. We found that targeted sequencing identified a larger number of mutational hotspots and clinically significant amplifications that would have been missed by WES and SNP arrays in many actionable genes such as PIK3CA, EGFR, AKT3, FGFR1, ERBB2, ERBB3 and ESR1. The striking differences between the number of mutational hotspots and CNAs generated from these platforms highlight a number of factors that should be considered in the interpretation of array and NGS-based genomic data for precision medicine. Targeted panel sequencing was preferable to WES to define the full spectrum of somatic mutations present in a tumor
Evidence of uneven selective pressure on different subsets of the conserved human genome : implications for the significance of intronic and intergenic DNA
Peer reviewedPublisher PD
Similarity Measures for Clustering SNP Data
The issue of suitable similarity measures for a particular kind of genetic data – so called SNP data – arises from the GENICA (Interdisciplinary Study Group on Gene Environment Interaction and Breast Cancer in Germany) case-control study of sporadic breast cancer. The GENICA study aims to investigate the influence and interaction of single nucleotide polymorphic (SNP) loci and exogenous risk factors. A single nucleotide polymorphism is a point mutation that is present in at least 1 % of a population. SNPs are the most common form of human genetic variations. In particular, we consider 65 SNP loci and 2 insertions of longer sequences in genes involved in the metabolism of hormones, xenobiotics and drugs as well as in the repair of DNA and signal transduction. Assuming that these single nucleotide changes may lead, for instance, to altered enzymes or to a reduced or enhanced amount of the original enzymes – with each alteration alone having minor effects – we aim to detect combinations of SNPs that under certain environmental conditions increase the risk of sporadic breast cancer. The search for patterns in the present data set may be performed by a variety of clustering and classification approaches. We consider here the problem of suitable measures of proximity of two variables or subjects as an indispensable basis for a further cluster analysis. Generally, clustering approaches are a useful tool to detect structures and to generate hypothesis about potential relationships in complex data situations. Searching for patterns in the data there are two possible objectives: the identification of groups of similar objects or subjects or the identification of groups of similar variables within the whole or within subpopulations. Comparing the individual genetic profiles as well as comparing the genetic information across subpopulations we discuss possible choices of similarity measures, in particular similarity measures based on the counts of matches and mismatches. New matching coefficients are introduced with a more flexible weighting scheme to account for the general problem of the comparison of SNP data: The large proportion of homozygous reference sequences relative to the homo- and heterozygous SNPs is masking the accordances and differences of interest. --GENICA,single nucleotide polymorphism (SNP),sporadic breast cancer,similarity,Matching Coefficient,Flexible Matching Coefficient
- …
