128 research outputs found

    HEAT: a New Tool for Gene Set Enrichment Analysis Using Comprehensive Annotation of Human Genes in H-InvDB

    Get PDF
    H-InvDB Enrichment Analysis Tool (HEAT) is a new data-mining tool for gene set enrichment analysis based on comprehensive annotations of human genes in H-InvDB. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. The advantage of HEAT is the wide variety of annotation items used for its analysis: chromosomal bands, InterPro functional domains, Gene Ontology terms, KEGG pathways, H-InvDB gene families/groups, SCOP structural domains, subcellular localization predicted by using the Wolf-PSORT program, tissue-specific gene expression as defined in the H-ANGEL database, and transcription factor binding sites in promoter regions based on JASPAR. HEAT accepts lists of human gene identifiers (IDs) including HUGO gene symbols, accession numbers of INSD (DDBJ/EMBL/GenBank), UniProt accession numbers, Gene IDs, Ensembl Gene IDs, H-InvDB Transcript IDs (HIT) and Locus IDs (HIX), etc. Then, HEAT converts the accepted IDs into HIX using the ID Converter System ("http://biodb.jp/":http://biodb.jp/), collects various annotations of H-InvDB representative transcripts, and conducts statistical tests by using Fisher's exact probability. The output of HEAT is a simple report of annotations commonly found among the query genes, which is very useful to grasp the property of a particular gene set. HEAT is freely available at "http://hinv.jp/HEAT/":http://hinv.jp/HEAT/

    Abundance of ultramicro inversions within local alignments between human and chimpanzee genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chromosomal inversion is one of the most important mechanisms of evolution. Recent studies of comparative genomics have revealed that chromosomal inversions are abundant in the human genome. While such previously characterized inversions are large enough to be identified as a single alignment or a string of local alignments, the impact of ultramicro inversions, which are such short that the local alignments completely cover them, on evolution is still uncertain.</p> <p>Results</p> <p>In this study, we developed a method for identifying ultramicro inversions by scanning of local alignments. This technique achieved a high sensitivity and a very low rate of false positives. We identified 2,377 ultramicro inversions ranging from five to 125 bp within the orthologous alignments between the human and chimpanzee genomes. The false positive rate was estimated to be around 4%. Based on phylogenetic profiles using the primate outgroups, 479 ultramicro inversions were inferred to have specifically inverted in the human lineage. Ultramicro inversions exclusively involving adenine and thymine were the most frequent; 461 inversions (19.4%) of the total. Furthermore, the density of ultramicro inversions in chromosome Y and the neighborhoods of transposable elements was higher than average. Sixty-five ultramicro inversions were identified within the exons of human protein-coding genes.</p> <p>Conclusions</p> <p>We defined ultramicro inversions as the inverted regions equal to or smaller than 125 bp buried within local alignments. Our observations suggest that ultramicro inversions are abundant among the human and chimpanzee genomes, and that location of the inversions correlated with the genome structural instability. Some of the ultramicro inversions may contribute to gene evolution. Our inversion-identification method is also applicable in the fine-tuning of genome alignments by distinguishing ultramicro inversions from nucleotide substitutions and indels.</p

    A genome-wide survey of changes in protein evolutionary rates across four closely related species of Saccharomyces sensu stricto group

    Get PDF
    BACKGROUND: Changes in protein evolutionary rates among lineages have been frequently observed during periods of notable phenotypic evolution. It is also known that, following gene duplication and loss, the protein evolutionary rates of genes involved in such events changed because of changes in functional constraints acting on the genes. However, in the evolution of closely related species, excluding the aforementioned situations, the frequency of changes in protein evolutionary rates is still not clear at the genome-wide level. Here we examine the constancy of protein evolutionary rates in the evolution of four closely related species of the Saccharomyces sensu stricto group (S. cerevisiae, S. paradoxus, S. mikatae and S. bayanus). RESULTS: For 2,610 unambiguously defined orthologous genes among the four species, we carried out likelihood ratio tests between constant-rate and variable-rate models and found 344 (13.2%) genes showing significant changes in the protein evolutionary rates in at least one lineage. Of all those genes which experienced rate changes, 139 and 49 genes showed accelerated and decelerated evolution, respectively. Most of the evolutionary rate changes could be attributed to changes in selective constraints acting on nonsynonymous sites, independently of species-specific gene duplication and loss. We estimated that the changes in protein evolutionary rates have appeared with a probability of 2.0 × 10(-3 )per gene per million years in the evolution of the Saccharomyces species. Furthermore, we found that the genes which experienced rate acceleration have lower expression levels and weaker codon usage bias than those which experienced rate deceleration. CONCLUSION: Changes in protein evolutionary rates possibly occur frequently in the evolution of closely related Saccharomyces species. Selection for translational accuracy and efficiency may dominantly affect the variability of protein evolutionary rates

    Two different classes of co-occurring motif pairs found by a novel visualization method in human promoter regions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is essential in modern biology to understand how transcriptional regulatory regions are composed of <it>cis</it>-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.</p> <p>Results</p> <p>We predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more likely to be CpG-rich and to be expressed ubiquitously than those that harbor Class 2 pairs. Third, the 'hub' motifs, which are used in many different motif pairs, are different between the two classes. In addition, many of the transcription factors that correspond to the Class 2 hub motifs contain domains rich in specific amino acids; these domains may form disordered regions important for protein-protein interaction.</p> <p>Conclusion</p> <p>There exist at least two classes of motif pairs with respect to TSSs in human promoters, possibly reflecting compositional differences between promoters and enhancers. We anticipate that our visualization method may be useful for the further characterisation of promoters.</p

    H-DBAS: Alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-Invitational

    Get PDF
    The Human-transcriptome DataBase for Alternative Splicing (H-DBAS) is a specialized database of alternatively spliced human transcripts. In this database, each of the alternative splicing (AS) variants corresponds to a completely sequenced and carefully annotated human full-length cDNA, one of those collected for the H-Invitational human-transcriptome annotation meeting. H-DBAS contains 38 664 representative alternative splicing variants (RASVs) in 11 744 loci, in total. The data is retrievable by various features of AS, which were annotated according to manual annotations, such as by patterns of ASs, consequently invoked alternations in the encoded amino acids and affected protein motifs, GO terms, predicted subcellular localization signals and transmembrane domains. The database also records recently identified very complex patterns of AS, in which two distinct genes seemed to be bridged, nested or degenerated (multiple CDS): in all three cases, completely unrelated proteins are encoded by a single locus. By using AS Viewer, each AS event can be analyzed in the context of full-length cDNAs, enabling the user's empirical understanding of the relation between AS event and the consequent alternations in the encoded amino acid sequences together with various kinds of affected protein motifs. H-DBAS is accessible at

    TACT: Transcriptome Auto-annotation Conducting Tool of H-InvDB

    Get PDF
    Transcriptome Auto-annotation Conducting Tool (TACT) is a newly developed web-based automated tool for conducting functional annotation of transcripts by the integration of sequence similarity searches and functional motif predictions. We developed the TACT system by integrating two kinds of similarity searches, FASTY and BLASTX, against protein sequence databases, UniProtKB (Swiss-Prot/TrEMBL) and RefSeq, and a unified motif prediction program, InterProScan, into the ORF-prediction pipeline originally designed for the ‘H-Invitational’ human transcriptome annotation project. This system successively applies these constituent programs to an mRNA sequence in order to predict the most plausible ORF and the function of the protein encoded. In this study, we applied the TACT system to 19 574 non-redundant human transcripts registered in H-InvDB and evaluated its predictive power by the degree of agreement with human-curated functional annotation in H-InvDB. As a result, the TACT system could assign functional description to 12 559 transcripts (64.2%), the remainder being hypothetical proteins. Furthermore, the overall agreement of functional annotation with H-InvDB, including those transcripts annotated as hypothetical proteins, was 83.9% (16 432/19 574). These results show that the TACT system is useful for functional annotation and that the prediction of ORFs and protein functions is highly accurate and close to the results of human curation. TACT is freely available at

    G-compass: a web-based comparative genome browser between human and other vertebrate genomes

    Get PDF
    Summary: G-compass is designed for efficient comparative genome analysis between human and other vertebrate genomes. The current version of G-compass allows us to browse two corresponding genomic regions between human and another species in parallel. One-to-one evolutionarily conserved regions (i.e. orthologous regions) between species are highlighted along the genomes. Information such as locations of duplicated regions, copy number variations and mammalian ultra-conserved elements is also provided. These features of G-compass enable us to easily determine patterns of genomic rearrangements and changes in gene orders through evolutionary time. Since G-compass is a satellite database of H-InvDB, which is a comprehensive annotation resource for human genes and transcripts, users can easily refer to manually curated functional annotations and other abundant biological information for each human transcript. G-compass is expected to be a valuable tool for comparing human and model organisms and promoting the exchange of functional information

    H-DBAS: human-transcriptome database for alternative splicing: update 2010

    Get PDF
    H-DBAS (http://h-invitational.jp/h-dbas/) is a specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. In this update, for better annotations of AS events, we correlated RNA-Seq tag information to the AS exons and splice junctions. We generated a total of 148 376 598 RNA-Seq tags from RNAs extracted from cytoplasmic, nuclear and polysome fractions. Analysis of the RNA-Seq tags allowed us to identify 90 900 exons that are very likely to be used for protein synthesis. On the other hand, 254 AS junctions of human RefSeq transcripts are unique to nuclear RNA and may not have any translational consequences. We also present a new comparative genomics viewer so that users can empirically understand the evolutionary turnover of AS. With the unique experimental data closely connected with intensively curated cDNA information, H-DBAS provides a unique platform for the analysis of complex AS

    The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information

    Get PDF
    With the completion of the rice genome sequencing, a standardized annotation is necessary so that the information from the genome sequence can be fully utilized in understanding the biology of rice and other cereal crops. An annotation jamboree was held in Japan with the aim of annotating and manually curating all the genes in the rice genome. Here we present the Rice Annotation Project Database (RAP-DB), which has been developed to provide access to the annotation data. The RAP-DB has two different types of annotation viewers, BLAST and BLAT search, and other useful features. By connecting the annotations to other rice genomics data, such as full-length cDNAs and Tos17 mutant lines, the RAP-DB serves as a hub for rice genomics. All of the resources can be accessed through
    corecore