8 research outputs found

    Cloning of a gene (SR-A1), encoding for a new member of the human Ser/Arg-rich family of pre-mRNA splicing factors: overexpression in aggressive ovarian cancer

    Get PDF
    By using the positional cloning gene approach, we were able to identify a novel gene encoding for a serine/arginine-rich protein, which appears to be the human homologue of the rat A1 gene. We named this new gene SR-A1. Members of the SR family of proteins have been shown to interact with the C-terminal domain (CTD) of the large subunit of RNA polymerase II and participate in pre-mRNA splicing. We have localized the SR-A1 gene between the known genes IRF3 and RRAS on chromosome 19q13.3. The novel gene spans 16.7 kb of genomic sequence and it is formed of 11 exons and 10 intervening introns. The SR-A1 protein is composed of 1312 amino acids, with a molecular mass of 139.3 kDa and a theoretical isoelectric point of 9.31. The SR-A1 protein contains an SR-rich domain as well as a CTD-binding domain present only in a subset of SR-proteins. Through interactions with the pre-mRNA and the CTD domain of the Polymerase II, SR proteins have been shown to regulate alternative splicing. The SR-A1 gene is expressed in all tissues tested, with highest levels found in fetal brain and fetal liver. Our data suggest that this gene is overexpressed in a subset of ovarian cancers which are clinically more aggressive. Studies with the steroid hormone receptor-positive breast and prostate carcinoma cell lines ZR-75-1, BT-474 and LNCaP, respectively, suggest that SR-A1 is constitutively expressed. Furthermore, the mRNA of the SR-A1 gene in these cell lines appears to increase by estrogens, androgens and glucocorticoids, and to a lesser extend by progestins. © 2001 Cancer Research Campaign http://www.bjcancer.co

    EGPred: prediction of eukaryotic genes using Ab initio methods after combining with sequence similarity approaches

    Get PDF
    EGPred is a Web-based server that combines ab initio methods and similarity searches to predict genes, particularly exon regions, with high accuracy. The EGPred program proceeds in the following steps: (1) an initial BLASTX search of genomic sequence against the RefSeq database is used to identify protein hits with an E-value <1; (2) a second BLASTX search of genomic sequence against the hits from the previous run with relaxed parameters (E-values <10) helps to retrieve all probable coding exon regions; (3) a BLASTN search of genomic sequence against the intron database is then used to detect probable intron regions; (4) the probable intron and exon regions are compared to filter/remove wrong exons; (5) the NNSPLICE program is then used to reassign splicing signal site positions in the remaining probable coding exons; and (6) finally ab initio predictions are combined with exons derived from the fifth step based on the relative strength of start/stop and splice signal sites as obtained from ab initio and similarity search. The combination method increases the exon level performance of five different ab initio programs by 4%-10% when evaluated on the HMR195 data set. Similar improvement is observed when ab initio programs are evaluated on the Burset/Guigo data set. Finally, EGPred is demonstrated on an ~95-Mbp fragment of human chromosome 13. The list of predicted genes from this analysis are available in the supplementary material. The EGPred program is computationally intensive due to multiple BLAST runs during each analysis. The EGPred server is available at http://www.imtech.res.in/raghava/egpred/

    Spliced alignment and its application in Arabidopsis thaliana

    Get PDF
    This thesis describes the development and biological applications of GeneSeqer, which is a homology-based gene prediction program by means of spliced alignment. Additionally, a program named MyGV was written in JAVA as a browser to visualize the output of GeneSeqer. In order to test and demonstrate the performance, GeneSeqer was utilized to map 176,915 Arabidopsis EST sequences on the whole genome of Arabidopsis thaliana, which consists of five chromosomes, with about 117 million base pairs in total. All results were parsed and imported into a MySQL database. Information that was inferred from the Arabidopsis spliced alignment results may serve as valuable resource for a number of projects of special scientific interest, such as alternative splicing, non-canonical splice sites, mini-exons, etc. We also built AtGDB (Arabidopsis thaliana Genome DataBase, http://www.plantgdb.org/AtGDB/) to interactively browse EST spliced alignments and GenBank annotations for the Arabidopsis genome. Moreover, as one application of the Arabidopsis EST mapping data, U12-type introns were identified from the transcript-confirmed introns in the Arabidopsis genome, and the characteristics of these minor class introns were further explored

    Molecular genetics of Cohen syndorome

    Get PDF

    Analyse des Genoms von Dictyostelium discoideum

    Get PDF
    Dictyostelium discoideum ist eine soziale Amöbe, deren Eigenschaften sie zu einem Model für verschiedene zelluläre Prozesse gemacht haben. Die vorliegende Arbeit beschreibt die Analyse des Genoms dieses meist einzelligen Protisten und wertet die herausragendsten Erkenntnisse aus der Analyse aus. Einige Eigenschaften des Genoms sind ungewöhnlich, wie z.B. der hohe anteil an A und T Nukleotiden oder auch die Organisation der chromosomenenden, der Telomere. Andererseits sind in diesem Organismus Gene vorhanden, die bis jetzt nur mehrzelligen Tiere zugeordnet wurden. Die Genomanalyse dieses Organismus ist die Grundlage für die weitere Beschäftigung mit diesem Organismus als Modell

    Analysis of Genomic and Proteomic Sequences using DSP Techniques

    Get PDF
    Analysis of biological sequences by detecting the hidden periodicities and symbolic patterns has been an active area of research since couple of decades. The hidden periodic components and the patterns help locating the biologically relevant motifs such as protein coding regions (exons), CpG islands (CGI) and hot-spots that characterize various biological functions. The discrete nature of biological sequences has prompted many researchers to use digital signal processing (DSP) techniques for their analysis. After mapping the biological sequences to numerical sequences, various DSP techniques using digital filters, wavelets, neural networks, filter banks etc. have been developed to detect the hidden periodicities and recurring patterns in these sequences. This thesis attempts to develop effective DSP based techniques to solve some of the important problems in biological sequence analysis. Specifically, DSP techniques such as statistically optimal null filters (SONF), matched filters and neural networks based algorithms are developed for the analysis of deoxyribonucleic acid (DNA), ribonucleic acid (RNA) and protein sequences. In the first part of this study, DNA sequences are investigated in order to identify the locations of CGIs and protein coding regions, i.e., exons. SONFs, which are known for their ability to efficiently estimate short-duration signals embedded in noise by combining the maximum signal-to-noise ratio and the least squares optimization criteria, are utilized to solve these problems. Basis sequences characterizing CGIs and exons are formulated to be used in SONF technique for solving the problems. In the second part of this study, RNA sequences are analyzed to predict their secondary structures. For this purpose, matched filters based on 2-dimensional convolution are developed to identify the locations of stem and loop patterns in the RNA secondary structure. The knowledge of the stem and loop patterns thus obtained are then used to predict the presence of pseudoknot, leading to the determination of the entire RNA secondary structure. Finally, in the third part of this thesis, protein sequences are analyzed to solve the problems of predicting protein secondary structure and identifying the locations of hot-spots. For predicting the protein secondary structure a two-stage neural network scheme is developed, whereas for predicting the locations of hot-spots an SONF based approach is proposed. Hot-spots in proteins exhibit a characteristic frequency corresponding to their biological function. A basis function is formulated based on this characteristic frequency to be used in SONFs to detect the locations of hot-spots belonging to the corresponding functional group. Extensive experiments are performed throughout the thesis to demonstrate the effectiveness and validity of the various schemes and techniques developed in this investigation. The performance of the proposed techniques is compared with that of the previously reported techniques for the analysis of biological sequences. For this purpose, the results obtained are validated using databases containing with known annotations. It is shown that the proposed schemes result in performance superior to those of some of the existing techniques
    corecore