210 research outputs found

    A Strategy analysis for genetic association studies with known inbreeding

    Get PDF
    Background: Association studies consist in identifying the genetic variants which are related to a specific disease through the use of statistical multiple hypothesis testing or segregation analysis in pedigrees. This type of studies has been very successful in the case of Mendelian monogenic disorders while it has been less successful in identifying genetic variants related to complex diseases where the insurgence depends on the interactions between different genes and the environment. The current technology allows to genotype more than a million of markers and this number has been rapidly increasing in the last years with the imputation based on templates sets and whole genome sequencing. This type of data introduces a great amount of noise in the statistical analysis and usually requires a great number of samples. Current methods seldom take into account gene-gene and gene-environment interactions which are fundamental especially in complex diseases. In this paper we propose to use a non-parametric additive model to detect the genetic variants related to diseases which accounts for interactions of unknown order. Although this is not new to the current literature, we show that in an isolated population, where the most related subjects share also most of their genetic code, the use of additive models may be improved if the available genealogical tree is taken into account. Specifically, we form a sample of cases and controls with the highest inbreeding by means of the Hungarian method, and estimate the set of genes/environmental variables, associated with the disease, by means of Random Forest. Results: We have evidence, from statistical theory, simulations and two applications, that we build a suitable procedure to eliminate stratification between cases and controls and that it also has enough precision in identifying genetic variants responsible for a disease. This procedure has been successfully used for the betathalassemia, which is a well known Mendelian disease, and also to the common asthma where we have identified candidate genes that underlie to the susceptibility of the asthma. Some of such candidate genes have been also found related to common asthma in the current literature. Conclusions: The data analysis approach, based on selecting the most related cases and controls along with the Random Forest model, is a powerful tool for detecting genetic variants associated to a disease in isolated populations. Moreover, this method provides also a prediction model that has accuracy in estimating the unknown disease status and that can be generally used to build kit tests for a wide class of Mendelian diseases

    MATCHCLIP: locate precise breakpoints for copy number variation using CIGAR string by matching soft clipped reads

    Get PDF
    Copy number variations (CNVs) are associated with many complex diseases. Next generation sequencing data enable one to identify precise CNV breakpoints to better under the underlying molecular mechanisms and to design more efficient assays. Using the CIGAR strings of the reads, we develop a method that can identify the exact CNV breakpoints, and in cases when the breakpoints are in a repeated region, the method reports a range where the breakpoints can slide. Our method identifies the breakpoints of a CNV using both the positions and CIGAR strings of the reads that cover breakpoints of a CNV. A read with a long soft clipped part (denoted as S in CIGAR) at its 3′(right) end can be used to identify the 5′(left)-side of the breakpoints, and a read with a long S part at the 5′ end can be used to identify the breakpoint at the 3′-side. To ensure both types of reads cover the same CNV, we require the overlapped common string to include both of the soft clipped parts. When a CNV starts and ends in the same repeated regions, its breakpoints are not unique, in which case our method reports the left most positions for the breakpoints and a range within which the breakpoints can be incremented without changing the variant sequence. We have implemented the methods in a C++ package intended for the current Illumina Miseq and Hiseq platforms for both whole genome and exon-sequencing. Our simulation studies have shown that our method compares favorably with other similar methods in terms of true discovery rate, false positive rate and breakpoint accuracy. Our results from a real application have shown that the detected CNVs are consistent with zygosity and read depth information. The software package is available at http://statgene.med.upenn.edu/softprog.html

    Haplotype affinities resolve a major component of goat (<i>Capra hircus</i>) MtDNA D-loop diversity and reveal specific features of the Sardinian stock

    Get PDF
    Goat mtDNA haplogroup A is a poorly resolved lineage absorbing most of the overall diversity and is found in locations as distant as Eastern Asia and Southern Africa. Its phylogenetic dissection would cast light on an important portion of the spread of goat breeding. The aims of this work were 1) to provide an operational definition of meaningful mtDNA units within haplogroup A, 2) to investigate the mechanisms underlying the maintenance of diversity by considering the modes of selection operated by breeders and 3) to identify the peculiarities of Sardinian mtDNA types. We sequenced the mtDNA D-loop in a large sample of animals (1,591) which represents a non-trivial quota of the entire goat population of Sardinia. We found that Sardinia mirrors a large quota of mtDNA diversity of Western Eurasia in the number of variable sites, their mutational pattern and allele frequency. By using Bayesian analysis, a distance-based tree and a network analysis, we recognized demographically coherent groups of sequences identified by particular subsets of the variable positions. The results showed that this assignment system could be reproduced in other studies, capturing the greatest part of haplotype diversity. We identified haplotype groups overrepresented in Sardinian goats as a result of founder effects. We found that breeders maintain diversity of matrilines most likely through equalization of the reproductive potential. Moreover, the relevant amount of inter-farm mtDNA diversity found does not increase proportionally with distance. Our results illustrate the effects of breeding practices on the composition of maternal gene pool and identify mtDNA types that may be considered in projects aimed at retrieving the maternal component of the oldest breeds of Sardinia.</br

    Microsatellites and SNPs linkage analysis in a Sardinian genetic isolate confirms several essential hypertension loci previously identified in different populations

    Get PDF
    Background. A multiplicity of study designs such as gene candidate analysis, genome wide search (GWS) and, recently, whole genome association studies have been employed for the identification of the genetic components of essential hypertension (EH). Several genome-wide linkage studies of EH and blood pressure-related phenotypes demonstrate that there is no single locus with a major effect while several genomic regions likely to contain EH-susceptibility loci were validated by multiple studies. Methods. We carried out the clinical assessment of the entire adult population in a Sardinian village (Talana) and we analyzed 16 selected families with 62 hypertensive subjects out of 267 individuals. We carried out a double GWS using a set of 902 uniformly spaced microsatellites and a high-density SNPs map on the same group of families. Results. Three loci were identified by both microsatellites and SNP scans and the obtained linkage results showed a remarkable degree of similarity. These loci were identified on chromosome 2q24, 11q23.1–25 and 13q14.11–21.33. Further support to these findings is their broad description present in literature associated to EH or related phenotypes. Bioinformatic investigation of these loci shows several potential EH candidate genes, several of whom already associated to blood pressure regulation pathways. Conclusion. Our search for major susceptibility EH genetic factors evidences that EH in the genetic isolate of Talana is due to the contribution of several genes contained in loci identified and replicated by earlier findings in different human populations

    Meta-analysis of gene–environment-wide association scans accounting for education level identifies additional loci for refractive error

    Get PDF
    This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/Myopia is the most common human eye disorder and it results from complex genetic and environmental causes. The rapidly increasing prevalence of myopia poses a major public health challenge. Here, the CREAM consortium performs a joint meta-analysis to test single-nucleotide polymorphism (SNP) main effects and SNP × education interaction effects on refractive error in 40,036 adults from 25 studies of European ancestry and 10,315 adults from 9 studies of Asian ancestry. In European ancestry individuals, we identify six novel loci (FAM150B-ACP1, LINC00340, FBN1, DIS3L-MAP2K1, ARID2-SNAT1 and SLC14A2) associated with refractive error. In Asian populations, three genome-wide significant loci AREG, GABRR1 and PDE10A also exhibit strong interactions with education (P<8.5 × 10(-5)), whereas the interactions are less evident in Europeans. The discovery of these loci represents an important advance in understanding how gene and environment interactions contribute to the heterogeneity of myopia

    Browsing Isolated Population Data

    Get PDF
    BACKGROUND: In our studies of genetically isolated populations in a remote mountain area in the center of Sardinia (Italy), we found that 80–85% of the inhabitants of each village belong to a single huge pedigree with families strictly connected to each other through hundreds of loops. Moreover, intermarriages between villages join pedigrees of different villages through links that make family trees even more complicated. Unfortunately, none of the commonly used pedigree drawing tools are able to draw the complete pedigree, whereas it is commonly accepted that the visual representation of families is very important as it helps researchers in identifying clusters of inherited traits and genotypes. We had a representation issue that compels researchers to work with subsets extracted from the overall genealogy, causing a serious loss of information on familiar relationships. To visually explore such complex pedigrees, we developed PedNavigator, a browser for genealogical databases properly suited for genetic studies. RESULTS: The PedNavigator is useful for genealogical research due to its capacity to represent family relations between persons and to make a visual verification of the links during family history reconstruction. As for genetic studies, it is helpful to follow propagation of a specific set of genetic markers (haplotype), or to select people for linkage analysis, showing relations between various branch of a family tree of affected subjects. AVAILABILITY: PedNavigator is an application integrated into a Framework designed to handle data for human genetic studies based on the Oracle platform. To allow the use of PedNavigator also to people not owning the same required informatics infrastructure or systems, we developed PedNavigator Lite with mainly the same features of the integrated one, based on MySQL database server. This version is free for academic users, and it is available for download from our sit

    Genome-wide meta-analysis associates HLA-DQA1/DRB1 and LPA and lifestyle factors with human longevity

    Get PDF
    Genomic analysis of longevity offers the potential to illuminate the biology of human aging. Here, using genome-wide association meta-analysis of 606,059 parents' survival, we discover two regions associated with longevity (HLA-DQA1/DRB1 and LPA). We also validate previous suggestions that APOE, CHRNA3/5, CDKN2A/B, SH2B3 and FOXO3A influence longevity. Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated. We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD. Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan

    Integration of genome-wide association studies with biological knowledge identifies six novel genes related to kidney function

    Get PDF
    In conducting genome-wide association studies (GWAS), analytical approaches leveraging biological information may further understanding of the pathophysiology of clinical traits. To discover novel associations with estimated glomerular filtration rate (eGFR), a measure of kidney function, we developed a strategy for integrating prior biological knowledge into the existing GWAS data for eGFR from the CKDGen Consortium. Our strategy focuses on single nucleotide polymorphism (SNPs) in genes that are connected by functional evidence, determined by literature mining and gene ontology (GO) hierarchies, to genes near previously validated eGFR associations. It then requires association thresholds consistent with multiple testing, and finally evaluates novel candidates by independent replication. Among the samples of European ancestry, we identified a genome-wide significant SNP in FBXL20 (P = 5.6 × 10−9) in meta-analysis of all available data, and additional SNPs at the INHBC, LRP2, PLEKHA1, SLC3A2 and SLC7A6 genes meeting multiple-testing corrected significance for replication and overall P-values of 4.5 × 10−4-2.2 × 10−7. Neither the novel PLEKHA1 nor FBXL20 associations, both further supported by association with eGFR among African Americans and with transcript abundance, would have been implicated by eGFR candidate gene approaches. LRP2, encoding the megalin receptor, was identified through connection with the previously known eGFR gene DAB2 and extends understanding of the megalin system in kidney function. These findings highlight integration of existing genome-wide association data with independent biological knowledge to uncover novel candidate eGFR associations, including candidates lacking known connections to kidney-specific pathways. The strategy may also be applicable to other clinical phenotypes, although more testing will be needed to assess its potential for discovery in genera

    Genome-wide meta-analysis of myopia and hyperopia provides evidence for replication of 11 loci

    Get PDF
    Refractive error (RE) is a complex, multifactorial disorder characterized by a mismatch between the optical power of the eye and its axial length that causes object images to be focused off the retina. The two major subtypes of RE are myopia (nearsightedness) and hyperopia (farsightedness), which represent opposite ends of the distribution of the quantitative measure of spherical refraction. We performed a fixed effects meta-analysis of genome-wide association results of myopia and hyperopia from 9 studies of European-derived populations: AREDS, KORA, FES, OGP-Talana, MESA, RSI, RSII, RSIII and ERF. One genome-wide significant region was observed for myopia, corresponding to a previously identified myopia locus on 8q12 (p = 1.25610-8), which has been reported by Kiefer et al. as significantly associated with myopia age at onset and Verhoeven et al. as significantly associated to mean spherical-equivalent (MSE) refractive error. We observed two genomewide significant association

    Genome-wide association and functional follow-up reveals new loci for kidney function

    Get PDF
    Chronic kidney disease (CKD) is an important public health problem with a genetic component. We performed genome-wide association studies in up to 130,600 European ancestry participants overall, and stratified for key CKD risk factors. We uncovered 6 new loci in association with estimated glomerular filtration rate (eGFR), the primary clinical measure of CKD, in or near MPPED2, DDX1, SLC47A1, CDK12, CASP9, and INO80. Morpholino knockdown of mpped2 and casp9 in zebrafish embryos revealed podocyte and tubular abnormalities with altered dextran clearance, suggesting a role for these genes in renal function. By providing new insights into genes that regulate renal function, these results could further our understanding of the pathogenesis of CKD
    corecore