18 research outputs found

    Super Paramagnetic Clustering of DNA Sequences

    No full text
    An unsupervised clustering of 4541 DNA sequences containing active promoter regions from vertebrate and arthropod classes (including their viral genes) was performed. All necessary information was solely gathered a priori from the DNA sequences by measuring frequencies of tri-nucleotides and tetra-nucleotides. We employed Super Paramagnetic Clustering, a novel clustering algorithm based on physical properties of an inhomogeneous granular ferromagnet. This method utilizes Swendsen-Wang cluster Monte Carlo simulations to distinguish clusters by measuring pairs of correlation functions from different resolutions. We identified two strongly separated clusters of human viral genes corresponding to the Epstein-Barr virus and the Herpes Simplex virus type 1. In addition, vertebrate and arthropod sequences were successfully separated into two different classes with merely 9.25% of arthropod sequences being misclassified. From a functional perspective, these sequences have high gene function correlations with sequences from the vertebrate cluster. By tuning a clustering parameter, Super Paramagnetic Clustering was able to classify vertebrate class further into two major clusters, from where a large number of housekeeping genes and tissue-specific genes were found respectively. The indications came from observation of gene expression function and consensus transcription factors which were found grouped together in specific positions of the DNA sequences

    The structure of the mouse parvalbumin gene

    No full text
    Schleef M, Zühlke C, Jockusch H, Schöffl F. The structure of the mouse parvalbumin gene. Mammalian Genome. 1992;3(4):217-225.Parvalbumin (PV) is a calcium-binding protein of the EF-hand family, expressed mainly in fast contracting/relaxing muscles of vertebrates. We have isolated five overlapping genomic PV clones which overall span 28 kilobase pairs (kb) around the Pva locus on mouse Chromosome (Chr) 15. The positions of four introns were determined by DNA sequencing. They interrupt the coding sequences at positions corresponding to those in rat and human PV genes. The transcription start site, 25 bp downstream from the TATA-box, was mapped by oligonucleotide primer extension on poly(A)+-RNA. The analysis of 0.4 kb promoter sequence of the mouse PV gene revealed CCAAT- and TATA-box sequences and a 59 bp GC-rich stretch between positions -59 and -118. Similar motifs have been found in the parvalbumin genes of rat and human. A perfect 11-bp repeat upstream to positions -149 and -163 respectively is homologous only to the rat promoter. These results will be related to tissue and species differences in PV expression
    corecore