82 research outputs found

    Merging microsatellite data: enhanced methodology and software to combine genotype data for linkage and association analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Correctly merged data sets that have been independently genotyped can increase statistical power in linkage and association studies. However, alleles from microsatellite data sets genotyped with different experimental protocols or platforms cannot be accurately matched using base-pair size information alone. In a previous publication we introduced a statistical model for merging microsatellite data by matching allele frequencies between data sets. These methods are implemented in our software MicroMerge version 1 (v1). While MicroMerge v1 output can be analyzed by some genetic analysis programs, many programs can not analyze alignments that do not match alleles one-to-one between data sets. A consequence of such alignments is that codominant genotypes must often be analyzed as phenotypes. In this paper we describe several extensions that are implemented in MicroMerge version 2 (v2).</p> <p>Results</p> <p>Notably, MicroMerge v2 includes a new one-to-one alignment option that creates merged pedigree and locus files that can be handled by most genetic analysis software. Other features in MicroMerge v2 enhance the following aspects of control: 1) optimizing the algorithm for different merging scenarios, such as data sets with very different sample sizes or multiple data sets, 2) merging small data sets when a reliable set of allele frequencies are available, and 3) improving the quantity and 4) quality of merged data. We present results from simulated and real microsatellite genotype data sets, and conclude with an association analysis of three familial dyslipidemia (FD) study samples genotyped at different laboratories. Independent analysis of each FD data set did not yield consistent results, but analysis of the merged data sets identified strong association at locus D11S2002.</p> <p>Conclusion</p> <p>The MicroMerge v2 features will enable merging for a variety of genotype data sets, which in turn will facilitate meta-analyses for powering association analysis.</p

    Integrated Weighted Gene Co-expression Network Analysis with an Application to Chronic Fatigue Syndrome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systems biologic approaches such as Weighted Gene Co-expression Network Analysis (WGCNA) can effectively integrate gene expression and trait data to identify pathways and candidate biomarkers. Here we show that the additional inclusion of genetic marker data allows one to characterize network relationships as causal or reactive in a chronic fatigue syndrome (CFS) data set.</p> <p>Results</p> <p>We combine WGCNA with genetic marker data to identify a disease-related pathway and its causal drivers, an analysis which we refer to as "Integrated WGCNA" or IWGCNA. Specifically, we present the following IWGCNA approach: 1) construct a co-expression network, 2) identify trait-related modules within the network, 3) use a trait-related genetic marker to prioritize genes within the module, 4) apply an integrated gene screening strategy to identify candidate genes and 5) carry out causality testing to verify and/or prioritize results. By applying this strategy to a CFS data set consisting of microarray, SNP and clinical trait data, we identify a module of 299 highly correlated genes that is associated with CFS severity. Our integrated gene screening strategy results in 20 candidate genes. We show that our approach yields biologically interesting genes that function in the same pathway and are causal drivers for their parent module. We use a separate data set to replicate findings and use Ingenuity Pathways Analysis software to functionally annotate the candidate gene pathways.</p> <p>Conclusion</p> <p>We show how WGCNA can be combined with genetic marker data to identify disease-related pathways and the causal drivers within them. The systems genetics approach described here can easily be used to generate testable genetic hypotheses in other complex disease studies.</p

    Towards a genetic linkage map of the California condor, an endangered New World vulture species

    Get PDF
    Simple Summary The California condor is a critically endangered representative of New World vultures maintained under restoration and reintroduction programs. Within a California condor genome research project, we made a preliminary step toward a genetic linage map for this iconic bird species. The respective linkage data were generated using a panel of 121 condors. The condors were genotyped for 123 polymorphic microsatellite markers. The condor genotyping and mapping results are a useful addition to the previously obtained physical and cytogenetic maps and can be further utilized in condor genome sequence assembly. Abstract The development of a linkage map is an important component for promoting genetic and genomic studies in California condors, an endangered New World vulture species. Using a set of designed anonymous microsatellite markers, we genotyped a reference condor population involving 121 individuals. After marker validation and genotype filtering, the genetic linkage analysis was performed using 123 microsatellite loci. This resulted in the identification of 15 linkage groups/subgroups that formed a first-generation condor genetic map, while no markers linked to a lethal chondrodystrophy mutation were found. A panel of polymorphic markers that is instrumental in molecular parentage diagnostics and other genetic studies in the California condor was selected. Further condor conservation genomics research will be focused on updating the linkage map and integrating it with cytogenetic and BAC-based physical maps and ultimately with the genome sequence assembly. (This article belongs to the Special Issue Vulture Ecology and Conservation

    OPENMENDEL: A Cooperative Programming Project for Statistical Genetics

    Full text link
    Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDELproject (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.Comment: 16 pages, 2 figures, 2 table

    Facultative parthenogenesis in California condors

    Get PDF
    Parthenogenesis is a relatively rare event in birds, documented in unfertilized eggs from columbid, galliform, and passerine females with no access to males. In the critically endangered California condor, parentage analysis conducted utilizing polymorphic microsatellite loci has identified two instances of parthenogenetic development from the eggs of two females in the captive breeding program, each continuously housed with a reproductively capable male with whom they had produced offspring. Paternal genetic contribution to the two chicks was excluded. Both parthenotes possessed the expected male ZZ sex chromosomes and were homozygous for all evaluated markers inherited from their dams. These findings represent the first molecular marker-based identification of facultative parthenogenesis in an avian species, notably of females in regular contact with fertile males, and add to the phylogenetic breadth of vertebrate taxa documented to have reproduced via asexual reproduction

    Single Nucleotide Polymorphisms of 8 Inflammation-related Genes and their Associations with Smoking-related Cancers

    Get PDF
    Tobacco smoke and its metabolites are carcinogens that increase tissue oxidative stress and induce target tissue inflammation. We hypothesized that genetic variation of inflammatory pathway genes plays a role in tobacco-related carcinogenesis and is modified by tobacco smoking. We evaluated the association of 12 single nucleotide polymorphisms of 8 inflammation-related genes with tobacco-related cancers (lung, oropharynx, larynx, esophagus, stomach, liver, bladder, and kidney) using 3 case-control studies from: Los Angeles (population-based; 611 lung and 553 upper aero-digestive tract cancer cases and 1,040 controls), Taixing, China (population-based; 218 esophagus, 206 stomach, 204 liver cancer cases, and 415 controls), and Memorial Sloan-Kettering Cancer Center (hospital-based; 227 bladder cancer cases and 211 controls). After adjusting for age, education, ethnicity, gender, and tobacco smoking, IL10 rs1800871 was inversely associated with oropharyngeal cancer (CT+TT vs. CC adjusted odds ratio [aOR]: 0.69, 95% confidence interval [CI]: 0.50-0.95), and was positively associated with lung cancer among never smokers (TT vs. CT+CC aOR: 2.5, 95% CI: 1.3-5.1) and inversely with oropharyngeal cancer among ever smokers (CT+TT vs. CC aOR: 0.63, 95% CI: 0.41-0.95). Among all pooled never smokers (588 cases and 816 controls), TNF rs1799964 was inversely associated with smoking-related cancer (CC vs. CT+TT aOR: 0.36, 95% CI: 0.17-0.77). Bayesian correction for multiple comparisons suggests that chance is unlikely to explain our findings (although epigenetic mechanisms may be in effect), which support our hypotheses, suggesting that IL10 rs1800871 is a susceptibility marker for oropharyngeal and lung cancers, and that TNF rs1799964 is associated with smoking-related cancers among never smokers. © 2010 UICC

    Histone H3.3 beyond cancer: Germline mutations in Histone 3 Family 3A and 3B cause a previously unidentified neurodegenerative disorder in 46 patients

    Get PDF
    Although somatic mutations in Histone 3.3 (H3.3) are well-studied drivers of oncogenesis, the role of germline mutations remains unreported. We analyze 46 patients bearing de novo germline mutations in histone 3 family 3A (H3F3A) or H3F3B with progressive neurologic dysfunction and congenital anomalies without malignancies. Molecular modeling of all 37 variants demonstrated clear disruptions in interactions with DNA, other histones, and histone chaperone proteins. Patient histone posttranslational modifications (PTMs) analysis revealed notably aberrant local PTM patterns distinct from the somatic lysine mutations that cause global PTM dysregulation. RNA sequencing on patient cells demonstrated up-regulated gene expression related to mitosis and cell division, and cellular assays confirmed an increased proliferative capacity. A zebrafish model showed craniofacial anomalies and a defect in Foxd3-derived glia. These data suggest that the mechanism of germline mutations are distinct from cancer-associated somatic histone mutations but may converge on control of cell proliferation

    Detection and Integration of Genotyping Errors in Statistical Genetics

    Get PDF
    Detection of genotyping errors and integration of such errors in statistical analysis are relatively neglected topics, given their importance in gene mapping. A few inopportunely placed errors, if ignored, can tremendously affect evidence for linkage. The present study takes a fresh look at the calculation of pedigree likelihoods in the presence of genotyping error. To accommodate genotyping error, we present extensions to the Lander-Green-Kruglyak deterministic algorithm for small pedigrees and to the Markov-chain Monte Carlo stochastic algorithm for large pedigrees. These extensions can accommodate a variety of error models and refrain from simplifying assumptions, such as allowing, at most, one error per pedigree. In principle, almost any statistical genetic analysis can be performed taking errors into account, without actually correcting or deleting suspect genotypes. Three examples illustrate the possibilities. These examples make use of the full pedigree data, multiple linked markers, and a prior error model. The first example is the estimation of genotyping error rates from pedigree data. The second—and currently most useful—example is the computation of posterior mistyping probabilities. These probabilities cover both Mendelian-consistent and Mendelian-inconsistent errors. The third example is the selection of the true pedigree structure connecting a group of people from among several competing pedigree structures. Paternity testing and twin zygosity testing are typical applications

    Next Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data.

    No full text
    Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future
    • …
    corecore