111 research outputs found
VarySysDB: a human genetic polymorphism database based on all H-InvDB transcripts
Creation of a vast variety of proteins is accomplished by genetic variation and a variety of alternative splicing transcripts. Currently, however, the abundant available data on genetic variation and the transcriptome are stored independently and in a dispersed fashion. In order to provide a research resource regarding the effects of human genetic polymorphism on various transcripts, we developed VarySysDB, a genetic polymorphism database based on 187 156 extensively annotated matured mRNA transcripts from 36 073 loci provided by H-InvDB. VarySysDB offers information encompassing published human genetic polymorphisms for each of these transcripts separately. This allows comparisons of effects derived from a polymorphism on different transcripts. The published information we analyzed includes single nucleotide polymorphisms and deletionβinsertion polymorphisms from dbSNP, copy number variations from Database of Genomic Variants, short tandem repeats and single amino acid repeats from H-InvDB and linkage disequilibrium regions from D-HaploDB. The information can be searched and retrieved by features, functions and effects of polymorphisms, as well as by keywords. VarySysDB combines two kinds of viewers, GBrowse and Sequence View, to facilitate understanding of the positional relationship among polymorphisms, genome, transcripts, loci and functional domains. We expect that VarySysDB will yield useful information on polymorphisms affecting gene expression and phenotypes. VarySysDB is available at http://h-invitational.jp/varygene/
Integrative annotation of 21,037 human genes validated by full-length cDNA clones
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology
ParaHaplo: A program package for haplotype-based whole-genome association study using parallel computing
<p>Abstract</p> <p>Background</p> <p>Since more than a million single-nucleotide polymorphisms (SNPs) are analyzed in any given genome-wide association study (GWAS), performing multiple comparisons can be problematic. To cope with multiple-comparison problems in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium. A permutation test can also control problems inherent in multiple testing; however, both the calculation of exact probability and the execution of permutation tests are time-consuming. Faster methods for calculating exact probabilities and executing permutation tests are required.</p> <p>Methods</p> <p>We developed a set of computer programs for the parallel computation of accurate P-values in haplotype-based GWAS. Our program, ParaHaplo, is intended for workstation clusters using the Intel Message Passing Interface (MPI). We compared the performance of our algorithm to that of the regular permutation test on JPT and CHB of HapMap.</p> <p>Results</p> <p>ParaHaplo can detect smaller differences between 2 populations than SNP-based GWAS. We also found that parallel-computing techniques made ParaHaplo 100-fold faster than a non-parallel version of the program.</p> <p>Conclusion</p> <p>ParaHaplo is a useful tool in conducting haplotype-based GWAS. Since the data sizes of such projects continue to increase, the use of fast computations with parallel computing--such as that used in ParaHaplo--will become increasingly important. The executable binaries and program sources of ParaHaplo are available at the following address: <url>http://sourceforge.jp/projects/parallelgwas/?_sl=1</url></p
Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification
The Eigenstrat method, based on principal components analysis (PCA), is commonly used both to quantify population relationships in population genetics and to correct for population stratification in genome-wide association studies. However, it can be difficult to make appropriate inference about population relationships from the principal component (PC) scatter plot. Here, to better understand the working mechanism of the Eigenstrat method, we consider its theoretical or βpopulationβ formulation. The eigen-equation for samples from an arbitrary number () of populations is reduced to that of a matrix of dimension , the elements of which are determined by the variance-covariance matrix for the random vector of the allele frequencies. Solving the reduced eigen-equation is numerically trivial and yields eigenvectors that are the axes of variation required for differentiating the populations. Using the reduced eigen-equation, we investigate the within-population fluctuations around the axes of variation on the PC scatter plot for simulated datasets. Specifically, we show that there exists an asymptotically stable pattern of the PC plot for large sample size. Our results provide theoretical guidance for interpreting the pattern of PC plot in terms of population relationships. For applications in genetic association tests, we demonstrate that, as a method of correcting for population stratification, regressing out the theoretical PCs corresponding to the axes of variation is equivalent to simply removing the population mean of allele counts and works as well as or better than the Eigenstrat method
Change of Positive Selection Pressure on HIV-1 Envelope Gene Inferred by Early and Recent Samples
HIV-1 infection has been on the rise in Japan recently, and the main transmission route has changed from blood transmission in the 1980s to homo- and/or hetero-sexual transmission in the 2000s. The lack of early viral samples with clinical information made it difficult to investigate the possible virological changes over time. In this study, we sequenced 142 full-length env genes collected from 16 Japanese subjects infected with HIV-1 in the 1980s and in the 2000s. We examined the diversity change in sequences and potential adaptive evolution of the virus to the host population. We used a codon-based likelihood method under the branch-site and clade models to detect positive selection operating on the virus. The clade model was extended to account for different positive selection pressures in different viral populations. The result showed that the selection pressure was weaker in the 2000s than in the 1980s, indicating that it might have become easier for the HIV to infect a new host and to develop into AIDS now than 20 years ago and that the HIV may be becoming more virulent in the Japanese population. The study provides useful information on the surveillance of HIV infection and highlights the utility of the extended clade models in analysis of virus populations which may be under different selection pressures
A comparative study of craniofacial measurements between Ryukyuan and mainland Japanese females using lateral cephalometric images
Using lateral cephalometric images, we compared the skeletal and soft tissue configurations of Ryukyuan and mainland Japanese females. We collected lateral cephalometric images of 30 females each from Okinawa Island and mainland Japan. Sixty landmarks were plotted on each image. Then, based on the coordinates of the landmarks, 68 distances and 34 angles were calculated according to orthodontic and anthropometric methods. We confirmed that the Ryukyuans have a smaller height in the upper and midfacial region than the mainland Japanese. Moreover, our findings indicate that, compared with the mainland Japanese females, the Ryukyuan females clearly have the following features: (1) a shallower mandibular notch, (2) an anterior-inclined symphysis of mandible, and (3) a smaller depth from upper lip to incisors. We also found that an anterior-inclined mandibular corpus and incisors are associated with a smaller distance between the surfaces of the upper lip and teeth and with a more protruded lip shape
Identification of Nine Novel Loci Associated with White Blood Cell Subtypes in a Japanese Population
White blood cells (WBCs) mediate immune systems and consist of various subtypes with distinct roles. Elucidation of the mechanism that regulates the counts of the WBC subtypes would provide useful insights into both the etiology of the immune system and disease pathogenesis. In this study, we report results of genome-wide association studies (GWAS) and a replication study for the counts of the 5 main WBC subtypes (neutrophils, lymphocytes, monocytes, basophils, and eosinophils) using 14,792 Japanese subjects enrolled in the BioBank Japan Project. We identified 12 significantly associated loci that satisfied the genome-wide significance threshold of P<5.0Γ10β8, of which 9 loci were novel (the CDK6 locus for the neutrophil count; the ITGA4, MLZE, STXBP6 loci, and the MHC region for the monocyte count; the SLC45A3-NUCKS1, GATA2, NAALAD2, ERG loci for the basophil count). We further evaluated associations in the identified loci using 15,600 subjects from Caucasian populations. These WBC subtype-related loci demonstrated a variety of patterns of pleiotropic associations within the WBC subtypes, or with total WBC count, platelet count, or red blood cell-related traits (nβ=β30,454), which suggests unique and common functional roles of these loci in the processes of hematopoiesis. This study should contribute to the understanding of the genetic backgrounds of the WBC subtypes and hematological traits
Detailed Analysis of Japanese Population Substructure with a Focus on the Southwest Islands of Japan
Uncovering population structure is important for properly conducting association studies and for examining the demographic history of a population. Here, we examined the Japanese population substructure using data from the Japan Multi-Institutional Collaborative Cohort (J-MICC), which covers all but the northern region of Japan. Using 222 autosomal loci from 4502 subjects, we investigated population substructure by estimating FST among populations, testing population differentiation, and performing principal component analysis (PCA) and correspondence analysis (CA). All analyses revealed a low but significant differentiation between the Amami Islanders and the mainland Japanese population. Furthermore, we examined the genetic differentiation between the mainland population, Amami Islanders and Okinawa Islanders using six loci included in both the Pan-Asian SNP (PASNP) consortium data and the J-MICC data. This analysis revealed that the Amami and Okinawa Islanders were differentiated from the mainland population. In conclusion, we revealed a low but significant level of genetic differentiation between the mainland population and populations in or to the south of the Amami Islands, although genetic variation between both populations might be clinal. Therefore, the possibility of population stratification must be considered when enrolling the islander population of this area, such as in the J-MICC study
A Genome-Wide Association Study of Nephrolithiasis in the Japanese Population Identifies Novel Susceptible Loci at 5q35.3, 7p14.3, and 13q14.1
Nephrolithiasis is a common nephrologic disorder with complex etiology. To identify the genetic factor(s) for nephrolithiasis, we conducted a three-stage genome-wide association study (GWAS) using a total of 5,892 nephrolithiasis cases and 17,809 controls of Japanese origin. Here we found three novel loci for nephrolithiasis: RGS14-SLC34A1-PFN3-F12 on 5q35.3 (rs11746443; Pβ=β8.51Γ10β12, odds ratio (OR)β=β1.19), INMT-FAM188B-AQP1 on 7p14.3 (rs1000597; Pβ=β2.16Γ10β14, ORβ=β1.22), and DGKH on 13q14.1 (rs4142110; Pβ=β4.62Γ10β9, ORβ=β1.14). Subsequent analyses in 21,842 Japanese subjects revealed the association of SNP rs11746443 with the reduction of estimated glomerular filtration rate (eGFR) (Pβ=β6.54Γ10β8), suggesting a crucial role for this variation in renal function. Our findings elucidated the significance of genetic variations for the pathogenesis of nephrolithiasis
- β¦