48 research outputs found

    Linkage analysis of HLA and candidate genes for celiac disease in a North American family-based study

    Get PDF
    BACKGROUND: Celiac disease has a strong genetic association with HLA. However, this association only explains approximately half of the sibling risk for celiac disease. Therefore, other genes must be involved in susceptibility to celiac disease. We tested for linkage to genes or loci that could play a role in pathogenesis of celiac disease. METHODS: DNA samples, from members of 62 families with a minimum of two cases of celiac disease, were genotyped at HLA and at 13 candidate gene regions, including CD4, CTLA4, four T-cell receptor regions, and 7 insulin-dependent diabetes regions. Two-point and multipoint heterogeneity LOD (HLOD) scores were examined. RESULTS: The highest two-point and multipoint HLOD scores were obtained in the HLA region, with a two-point HLOD of 3.1 and a multipoint HLOD of 5.0. For the candidate genes, we found no evidence for linkage. CONCLUSIONS: Our significant evidence of linkage to HLA replicates the known linkage and association of HLA with CD. In our families, likely candidate genes did not explain the susceptibility to celiac disease

    Supplementing High-Density SNP Microarrays for Additional Coverage of Disease-Related Genes: Addiction as a Paradigm

    Get PDF
    Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well represented by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions

    Database resources of the National Center for Biotechnology Information

    Get PDF
    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov

    MS

    No full text
    thesisCeliac disease (CD) is an immunologically mediated inflamatory disorder of the small intestine. CD is induced by dietary exposure to the gluten proteins found in wheat, rye and barley. This disorder affects as many as 1 in 300 people of European descent, in whom it is associated with, the HLA-DQA1*0501 - DQB1*0201 alleles. Although the biochemical mechanism of the HLA contribution to the pathogenesis of CD is not fully understood, the linkage of this locus to CD is established. This thesis describes my research of two hypotheses: 1) HLA typing on an automated DNA sequencing machine can provide typing that is as reliable as typing performed by traditional methods, and 2) genes other than HLA exist that contribute to predisposition to CD. To test hypothesis 1, I developed a computerized HLA typing strategy that is a modification of the sequence specific primer method. Primers were designed to preferentially amplify DNA fragments of the generic allelic groups of the DQAl and DQBl loci. Only three PCR reactions are required for low resolution typing of DQAl and DQBl. Use of different labeled primers enabled genotyping for both loci in a single gel lane or capillary tube. Automated allele assignments were determined using computer software, based on DNA migration distance through a polyacrylamide gel using a standard genotype allele-calling program. Accuracy of this method was greater than 96% for both loci. To test hypothesis 2,1 performed a comprehensive four-stage linkage study on 113 celiac famiUes. The analyses were of a candidate gene search, a genome wide search, a follow-up genotyping, and stratification of families by HLA using computer analysis software LINKAGE and GENEHUNTER. Multipoint evidence for linkage was minimal in this study. However several genetic markers show above nominal evidence for linkage to CD using 2-point analysis. When stratified by HLA, HLOD scores increased for several of these markers, two of which provide suggestive evidence for linkage. The work presented here represents the integration of molecular genetics and medical informatics to HLA type CD patients. Furthermore, we have preliminary evidence that non-HLA genes may be involved in the pathogenesis of celiac disease

    Quickly identifying identical and closely related subjects in large databases using genotype data

    No full text
    <div><p>Genome-wide association studies (GWAS) usually rely on the assumption that different samples are not from closely related individuals. Detection of duplicates and close relatives becomes more difficult both statistically and computationally when one wants to combine datasets that may have been genotyped on different platforms. The dbGaP repository at the National Center of Biotechnology Information (NCBI) contains datasets from hundreds of studies with over one million samples. There are many duplicates and closely related individuals both within and across studies from different submitters. Relationships between studies cannot always be identified by the submitters of individual datasets. To aid in curation of dbGaP, we developed a rapid statistical method called Genetic Relationship and Fingerprinting (GRAF) to detect duplicates and closely related samples, even when the sets of genotyped markers differ and the DNA strand orientations are unknown. GRAF extracts genotypes of 10,000 informative and independent SNPs from genotype datasets obtained using different methods, and implements quick algorithms that enable it to find all of the duplicate pairs from more than 880,000 samples within and across dbGaP studies in less than two hours. In addition, GRAF uses two statistical metrics called All Genotype Mismatch Rate (AGMR) and Homozygous Genotype Mismatch Rate (HGMR) to determine subject relationships directly from the observed genotypes, without estimating probabilities of identity by descent (IBD), or kinship coefficients, and compares the predicted relationships with those reported in the pedigree files. We implemented GRAF in a freely available C++ program of the same name. In this paper, we describe the methods in GRAF and validate the usage of GRAF on samples from the dbGaP repository. Other scientists can use GRAF on their own samples and in combination with samples downloaded from dbGaP.</p></div

    GRAF-pop: A Fast Distance-Based Method To Infer Subject Ancestry from Multiple Genotype Datasets Without Principal Components Analysis

    No full text
    Inferring subject ancestry using genetic data is an important step in genetic association studies, required for dealing with population stratification. It has become more challenging to infer subject ancestry quickly and accurately since large amounts of genotype data, collected from millions of subjects by thousands of studies using different methods, are accessible to researchers from repositories such as the database of Genotypes and Phenotypes (dbGaP) at the National Center for Biotechnology Information (NCBI). Study-reported populations submitted to dbGaP are often not harmonized across studies or may be missing. Widely-used methods for ancestry prediction assume that most markers are genotyped in all subjects, but this assumption is unrealistic if one wants to combine studies that used different genotyping platforms. To provide ancestry inference and visualization across studies, we developed a new method, GRAF-pop, of ancestry prediction that is robust to missing genotypes and allows researchers to visualize predicted population structure in color and in three dimensions. When genotypes are dense, GRAF-pop is comparable in quality and running time to existing ancestry inference methods EIGENSTRAT, FastPCA, and FlashPCA2, all of which rely on principal components analysis (PCA). When genotypes are not dense, GRAF-pop gives much better ancestry predictions than the PCA-based methods. GRAF-pop employs basic geometric and probabilistic methods; the visualized ancestry predictions have a natural geometric interpretation, which is lacking in PCA-based methods. Since February 2018, GRAF-pop has been successfully incorporated into the dbGaP quality control process to identify inconsistencies between study-reported and computationally predicted populations and to provide harmonized population values in all new dbGaP submissions amenable to population prediction, based on marker genotypes. Plots, produced by GRAF-pop, of summary population predictions are available on dbGaP study pages, and the software, is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/Software.cgi

    Probabilities of different IBD and IBS states for different relationships.

    No full text
    <p>Probabilities of different IBD and IBS states for different relationships.</p
    corecore