7 research outputs found

    GeneTide—Terra Incognita Discovery Endeavor: a new transcriptome focused member of the GeneCards/GeneNote suite of databases

    Get PDF
    GeneCards® is an automatically mined database of human genes that strives to create, along with its auxiliary databases—GeneLoc, GeneNote and GeneAnnot—the most inclusive resource of gene-centered information of the human genome. GeneTide, the Gene Terra Incognita Discovery Endeavor (http://genecards.weizmann.ac.il/genetide/), the newest addition to this family, is a transcriptome-focused database which aims to enhance GeneCards with additional expressed sequence tag (EST)-based genes. This is achieved by comprehensively mapping >85% of the ∼5.6 million human ESTs currently available at dbEST to known genes by means of data mining and integration of genomic resources including UniGene, DoTS, AceView and in-house resources. GeneTide thus creates comprehensive links between ESTs and GeneCards genes. Furthermore, groups of unassociated transcripts serve as a basis for defining novel EST-based GeneCards Candidates (EGCs). These EGCs, nearly 25 000 of which were defined in version 0.3 of GeneTide, are further annotated with various parameters, including splicing evidence and expression data extracted from the GeneNote database, to determine their validity as possible de novo genes

    Expoldb: expression linked polymorphism database with inbuilt tools for analysis of expression and simple repeats

    Get PDF
    BACKGROUND: Quantitative variation in gene expression has been proposed to underlie phenotypic variation among human individuals. A facilitating step towards understanding the basis for gene expression variability is associating genome wide transcription patterns with potential cis modifiers of gene expression. DESCRIPTION: EXPOLDB, a novel Database, is a new effort addressing this need by providing information on gene expression levels variability across individuals, as well as the presence and features of potentially polymorphic (TG/CA)(n )repeats. EXPOLDB thus enables associating transcription levels with the presence and length of (TG/CA)(n )repeats. One of the unique features of this database is the display of expression data for 5 pairs of monozygotic twins, which allows identification of genes whose variability in expression, are influenced by non-genetic factors including environment. In addition to queries by gene name, EXPOLDB allows for queries by a pathway name. Users can also upload their list of HGNC (HUGO (The Human Genome Organisation) Gene Nomenclature Committee) symbols for interrogating expression patterns. The online application 'SimRep' can be used to find simple repeats in a given nucleotide sequence. To help illustrate primary applications, case examples of Housekeeping genes and the RUNX gene family, as well as one example of glycolytic pathway genes are provided. CONCLUSION: The uniqueness of EXPOLDB is in facilitating the association of genome wide transcription variations with the presence and type of polymorphic repeats while offering the feature for identifying genes whose expression variability are influenced by non genetic factors including environment. In addition, the database allows comprehensive querying including functional information on biochemical pathways of the human genes. EXPOLDB can be accessed a

    GIFtS: annotation landscape analysis with GeneCards

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene annotation is a pivotal component in computational genomics, encompassing prediction of gene function, expression analysis, and sequence scrutiny. Hence, quantitative measures of the annotation landscape constitute a pertinent bioinformatics tool. GeneCards<sup>® </sup>is a gene-centric compendium of rich annotative information for over 50,000 human gene entries, building upon 68 data sources, including Gene Ontology (GO), pathways, interactions, phenotypes, publications and many more.</p> <p>Results</p> <p>We present the GeneCards Inferred Functionality Score (GIFtS) which allows a quantitative assessment of a gene's annotation status, by exploiting the unique wealth and diversity of GeneCards information. The GIFtS tool, linked from the GeneCards home page, facilitates browsing the human genome by searching for the annotation level of a specified gene, retrieving a list of genes within a specified range of GIFtS value, obtaining random genes with a specific GIFtS value, and experimenting with the GIFtS weighting algorithm for a variety of annotation categories. The bimodal shape of the GIFtS distribution suggests a division of the human gene repertoire into two main groups: the high-GIFtS peak consists almost entirely of protein-coding genes; the low-GIFtS peak consists of genes from all of the categories. Cluster analysis of GIFtS annotation vectors provides the classification of gene groups by detailed positioning in the annotation arena. GIFtS also provide measures which enable the evaluation of the databases that serve as GeneCards sources. An inverse correlation is found (for GIFtS>25) between the number of genes annotated by each source, and the average GIFtS value of genes associated with that source. Three typical source prototypes are revealed by their GIFtS distribution: genome-wide sources, sources comprising mainly highly annotated genes, and sources comprising mainly poorly annotated genes. The degree of accumulated knowledge for a given gene measured by GIFtS was correlated (for GIFtS>30) with the number of publications for a gene, and with the seniority of this entry in the HGNC database.</p> <p>Conclusion</p> <p>GIFtS can be a valuable tool for computational procedures which analyze lists of large set of genes resulting from wet-lab or computational research. GIFtS may also assist the scientific community with identification of groups of uncharacterized genes for diverse applications, such as delineation of novel functions and charting unexplored areas of the human genome.</p

    Retroposed Copies of the HMG Genes: A Window to Genome Dynamics

    No full text
    Retroposed copies (RPCs) of genes are functional (intronless paralogs) or nonfunctional (processed pseudogenes) copies derived from mRNA through a process of retrotransposition. Previous studies found that gene families involved in mRNA translation or nuclear function were more likely to have large numbers of RPCs. Here we characterize RPCs of the few families coding for the abundant high-mobility-group (HMG) proteins in humans. Using an algorithm we developed, we identified and studied 219 HMG RPCs. For slightly more than 10% of these RPCs, we found evidence indicating expression. Furthermore, eight of these are potentially new members of the HMG families of proteins. For three RPCs, the evidence indicated expression as part of other transcripts; in all of these, we found the presence of alternative splicing or multiple polyadenylation signals. RPC distribution among the HMGs was not even, with 33–65 each for HMGB1, HMGB3, HMGN1, and HMGN2, and 0–6 each for HMGA1, HMGA2, HMGB2, and HMGN3. Analysis of the sequences flanking the RPCs revealed that the junction between the target site duplications and the 5′-flanking sequences exhibited the same TT/AAAA consensus found for the L1 endonuclease, supporting an L1-mediated retrotransposition mechanism. Finally, because our algorithm included aligning RPC flanking sequences with the corresponding HMG genomic sequence, we were able to identify transcribed regions of HMG genes that were not part of the published mRNA sequences
    corecore