13 research outputs found

    Computational Characterization of 3′ Splice Variants in the GFAP Isoform Family

    Get PDF
    Glial fibrillary acidic protein (GFAP) is an intermediate filament (IF) protein specific to central nervous system (CNS) astrocytes. It has been the subject of intense interest due to its association with neurodegenerative diseases, and because of growing evidence that IF proteins not only modulate cellular structure, but also cellular function. Moreover, GFAP has a family of splicing isoforms apparently more complex than that of other CNS IF proteins, consistent with it possessing a range of functional and structural roles. The gene consists of 9 exons, and to date all isoforms associated with 3′ end splicing have been identified from modifications within intron 7, resulting in the generation of exon 7a (GFAPδ/ε) and 7b (GFAPκ). To better understand the nature and functional significance of variation in this region, we used a Bayesian multiple change-point approach to identify conserved regions. This is the first successful application of this method to a single gene – it has previously only been used in whole-genome analyses. We identified several highly or moderately conserved regions throughout the intron 7/7a/7b regions, including untranslated regions and regulatory features, consistent with the biology of GFAP. Several putative unconfirmed features were also identified, including a possible new isoform. We then integrated multiple computational analyses on both the DNA and protein sequences from the mouse, rat and human, showing that the major isoform, GFAPα, has highly conserved structure and features across the three species, whereas the minor isoforms GFAPδ/ε and GFAPκ have low conservation of structure and features at the distal 3′ end, both relative to each other and relative to GFAPα. The overall picture suggests distinct and tightly regulated functions for the 3′ end isoforms, consistent with complex astrocyte biology. The results illustrate a computational approach for characterising splicing isoform families, using both DNA and protein sequences

    Kyte-Doolittle hydropathy plots for the tail regions of GFAP isoforms across the human, rat and mouse.

    No full text
    <p>The human, rat and mouse sequences are indicated by red, blue and green trendlines respectively. The peaks below zero represent hydrophilicity, whereas those above zero represent hydrophobicity.</p

    Comparison of secondary structures for the GFAP tail regions for the GFAPα, GFAPδ/ε and GFAPκ isoforms across species.

    No full text
    <p>The secondary structures of GFAP isoforms across human, mouse and rat were predicted using the PSIPRED server. Only the tail sequences were used as the sequences are similar up to exon 6. The program predicts the possibility of a helix (pink box), strand (yellow arrow) or a coil for the target amino acid.</p

    Comparison of the head, rod and tail domain sequences of GFAP isoforms across species.

    No full text
    <p>For each species, the exon usage for isoforms with 3′end splice variation is shown, relative to the major isoform GFAPα. For each isoform, the total length of the polypeptide is shown in terms of number of aa, followed by the length of the combined head+rod domains in brackets (i.e encoded by exons 1–6, see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0033565#pone-0033565-g001" target="_blank">Figure 1B</a>). The tail domain consists of exons 7–9 in GFAPα, exons 7 and 7a in GFAPδ/ε and exon 7b (which includes exon 7, intron 7a and exon 7a, see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0033565#pone-0033565-g001" target="_blank">Figure 1B</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0033565#pone.0033565-Blechingberg1" target="_blank">[36]</a>) in GFAPκ. The length of the complete tail domain is shown for each isoform and the length of the variable regions for GFAPδ/ε and GFAPκ in brackets. The above data were generated from the UniProt Knowledgebase database.</p

    The GFAP family of isoforms.

    No full text
    <p><b>A.</b> The secondary structure of the GFAP major isoform has the typical organization of IF proteins. It consists of an α-helical central rod domain flanked by the amino terminal head and carboxy terminal tail domain. The central rod domain consists of four subdomains, namely 1A, 1B, 2A and 2B interrupted by linker regions L1, L1–2 and L2. The head and tail regions are also divided into the highly charged E1 and E2 domains, variable domains V1 and V2 and the hypervariable stretches H1 and H2. <b>B.</b> At least 4 additional isoforms have been described, resulting from alternative splicing at the 5′ or 3′ end. Exons are shown as black boxes and introns as red lines. The relationship between exons and subregions is shown by dotted lines. <i>GFAPα</i> is the major isoform and contains all 9 exons. <i>GFAPβ</i> has an upstream transcriptional start sequence relative to GFAPα, resulting in additional 5′ mRNA sequences (red box), but the same protein sequence. <i>GFAPγ</i> lacks exon 1 and part of intron 1, but has a transcriptional start site within exon 1 resulting in an altered 5′ mRNA sequence (purple box); however, its translational start site is still undetermined. <i>GFAPδ</i> (rat) and <i>GFAPε</i> (mouse) appear to be the same product and lack exons 8 and 9 but contain exon 7a (blue box) derived from alternative splicing of intron 7. <i>GFAPκ</i> also lacks exons 8 and 9, but contain a novel exon 7b, comprised of exon 7, a proximal region of intron 7 (green box) and exon 7a.</p

    Conserved features across exon 7/7a/7b of GFAP.

    No full text
    <p>The profile shows detail of the Group 2 profile from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0033565#pone-0033565-g002" target="_blank">Figure 2</a> in the region surrounding exons 7 (right of screen) and 7a (right of centre). Exons (wide bars), UTRs (narrow bars) and introns (arrowed lines) are shown for two genes in the UCSC collection and one in RefSeq. At the bottom is the UCSC conservation profile relative to mouse and rat. Conserved regions are labelled A–F (in white). Labelling is from right to left to match the order in which exons are displayed. Conserved motifs that were identified are labelled as follows: GGG, the 3G triplets (feature A); EBF1, motif recognized by the transcription factor EBF1 (feature B); HSF1 and HSF2, the actual and possible acceptor sites identified by Human Splice Finder (scores 93.19 and 76.63 respectively, feature C); PTB1 (feature D) and PTB2 (feature C), conserved PTB-binding motifs embedded in polypyrimidine-rich sequences; PolyA polyadenylation signal AAUAAA (feature E). All other symbols are as per <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0033565#pone-0033565-g002" target="_blank">Figure 2</a>.</p

    The four segment classes identified in the GFAP gene using changept.

    No full text
    <p>The top four profiles show, for each sequence position in the human GFAP DNA sequence (chr17: 42982993–42992914 in UCSC genomic coordinates), the probability that the base at that position belongs to conservation groups 1 to 4 respectively, as identified by the program changept applied to a 3-way alignment of rat, mouse and human sequences. At any position, the sum of the four profiles is 1. The two rows below the Group 4 profile display the exons (wide bars), the UTRs (narrow bars) and the introns (thin lines) of GFAP genes recorded in the UCSC and RefSeq collections respectively. Below these are the UCSC conservation tracks relative to mouse and rat, in which darker regions correspond to higher conservation, and parallel lines indicate deletions. At the bottom of the figure are the exon numbers. Note that the gene is transcribed from right to left. Exon boundaries are indicated with red vertical lines. <i>Group 1</i> identifies regions of insertions specific to the human version of the gene; <i>group 2</i> corresponds mainly to the mapped exons of the GFAP gene, appearing to cover regions of high conservation between the three species; <i>group 3</i> is comprised of segments in which deletions occur in either the rat or the mouse genes, but not both; <i>group 4</i> represents the least conserved parts of the gene.</p
    corecore