70 research outputs found

    Localizing triplet periodicity in DNA and cDNA sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism <it>C. elegans</it>.</p> <p>Results</p> <p>Using both simulated TP signals and the real <it>C. elegans </it>sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.</p> <p>Conclusions</p> <p>MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.</p

    Multiple Functions Of The Striated Rootlet Proteins Of The Paramecium Basal Body

    Get PDF
    Paramecium ciliary basal bodies align in straight rows from posterior to anterior. Each basal body is connected to three rootlets ((Post Ciliary Rootlet (PCR), Transverse Rootlet (TR) and Striated Rootlet (SR)). The SR, the longest, projects from the basal body toward the anterior past several more anterior basal bodies. The depletion of Meckelin (MKS3) misaligns SRs, disorganizes basal body rows and makes the SRs appear ragged and serpentine. In this study we clarify the composition of the Paramecium ciliary basal body’s SR and demonstrate that the SR plays a critical role in creating the orderly array of basal bodies in rows that run from pole to pole of the cell, likely through the interactions with centrins and other cytoskeletal elements underlying the cell surface. Here in this study we first report the reciprocal relationship between the SR and centrin related infraciliary lattice (ICL) protein that can dictate the cell surface morphology. The SR of Chlamydomonas is the best studied. Using the single SR Chlamydomonas gene SF-assemblin to search in Paramecium DB, we found thirty Paramecium genes in thirteen Paralog Groups. Proteins from 13 paralog groups were confirmed to be in the SR structure using immunofluorescence. LC-MS/MS analyses of density fractions from SRs isolation show all thirty SR members are within the same density fraction. We further categorized all 30 SR genes in five Structural Groups based on their ability to form coiled coil domain and evaluate the function of all five Structural Group using RNA interference (RNAi). Silencing the transcripts of the any of the Structural Group showed misaligned basal body rows and the disordered organization of the SRs with abnormal appearance of SRs all over the cell surface. Silencing of Paralog Group showed normal phenotype except for the two Paralog Group (Paralog Group 1 or Paralog Group 7) which themselves constitute Structural Group individually. Isolated SRs from the control or Paralog Group depleted cells show a characteristic striation pattern that includes characteristic major and minor striations. Isolated SRs from any of the Structural Group depleted cells demonstrate abnormal shapes and striation periodicity. There is a correlation between the SR Structural Group RNAi surface misalignment phenotype and the isolated SR Structural Group RNAi phenotype for shape and periodicity of the SR. Strikingly our study of SR clearly demonstrates the role of SRs in shaping the other cytoskeleton structures of the cell cortex e.g., ICL, epiplasm territory and cortical unit territory. In another follow up study of MKS3 (Picariello et al., 2014), we depleted the transcripts of MKS5 gene in Paramecium tetraurelia. Depletion of MKS5 transcripts in Paramecium causes cilia loss all over the cell surface. Unlike MKS3 depletion, MKS5 depletion does not affect the straight basal body rows and the ordered organization of SRs. Moreover, data presented in this study clearly demonstrates depletion of MKS5 transcripts somehow affect the localization of another transition zone protein, B9D2. It appears when lacking any of the SR Structural Group, the rest fail to interact properly with each other to maintain the SRs structure and directionality toward the anterior. As a result, abnormal SRs appear to lose the interaction with other cytoskeleton structures such as ICL network complex, which eventually results in misaligned basal body rows and altered swimming behavior. From the data presented in this study it is reasonable to postulate ICL1e subfamily and SRs are in a reciprocal relationship to maintain the straight basal body rows and the highly ordered organization of the SRs all over the cell surface

    Analysis of Genomic and Proteomic Signals Using Signal Processing and Soft Computing Techniques

    Get PDF
    Bioinformatics is a data rich field which provides unique opportunities to use computational techniques to understand and organize information associated with biomolecules such as DNA, RNA, and Proteins. It involves in-depth study in the areas of genomics and proteomics and requires techniques from computer science,statistics and engineering to identify, model, extract features and to process data for analysis and interpretation of results in a biologically meaningful manner.In engineering methods the signal processing techniques such as transformation,filtering, pattern analysis and soft-computing techniques like multi layer perceptron(MLP) and radial basis function neural network (RBFNN) play vital role to effectively resolve many challenging issues associated with genomics and proteomics. In this dissertation, a sincere attempt has been made to investigate on some challenging problems of bioinformatics by employing some efficient signal and soft computing methods. Some of the specific issues, which have been attempted are protein coding region identification in DNA sequence, hot spot identification in protein, prediction of protein structural class and classification of microarray gene expression data. The dissertation presents some novel methods to measure and to extract features from the genomic sequences using time-frequency analysis and machine intelligence techniques.The problems investigated and the contribution made in the thesis are presented here in a concise manner. The S-transform, a powerful time-frequency representation technique, possesses superior property over the wavelet transform and short time Fourier transform as the exponential function is fixed with respect to time axis while the localizing scalable Gaussian window dilates and translates. The S-transform uses an analysis window whose width is decreasing with frequency providing a frequency dependent resolution. The invertible property of S-transform makes it suitable for time-band filtering application. Gene prediction and protein coding region identification have been always a challenging task in computational biology,especially in eukaryote genomes due to its complex structure. This issue is resolved using a S-transform based time-band filtering approach by localizing the period-3 property present in the DNA sequence which forms the basis for the identification.Similarly, hot spot identification in protein is a burning issue in protein science due to its importance in binding and interaction between proteins. A novel S-transform based time-frequency filtering approach is proposed for efficient identification of the hot spots. Prediction of structural class of protein has been a challenging problem in bioinformatics.A novel feature representation scheme is proposed to efficiently represent the protein, thereby improves the prediction accuracy. The high dimension and low sample size of microarray data lead to curse of dimensionality problem which affects the classification performance.In this dissertation an efficient hybrid feature extraction method is proposed to overcome the dimensionality issue and a RBFNN is introduced to efficiently classify the microarray samples

    Identification of coding regions using DNA spectrogram analysis

    Get PDF
    Tato bakalářská práce se zabývá vyhledáváním kódujících úseků pomocí analýzy DNA spektrogramu. V teoretické části jsou popsány numerické reprezentace genomických dat, možnosti úprav sekvencí DNA a charakteristika metod pro vyhledávání kódujících úseků. Nejpoužívanější metodou pro zpracování DNA je diskrétní Fourierova transformace, díky které jsme schopni v sekvenci vyhledávat požadované úseky. Dále je uveden teoretický postup pro vytvoření spektrogramu a výčet vzorů z něj detekovatelných. Nabyté teoretické znalosti nám slouží k praktické realizaci konkrétních metod v programovém prostředí MATLAB. Vytvořili jsme program pro detekci kódujících úseků ze spektrogramu a nalezení jejich přesných pozic v sekvenci. Námi dosažené výsledky jsou v závěru porovnány s databází NCBI.The Bachelor’s Thesis deals with coding identification of coding regions using DNA spectrogram analysis. The theoretical part describes numerical representations of genomic data and methods for editing DNA sequences. The types of methods used for DNA spektrogram construction and characteristic patterns detected by spectrogram are described. The most used method for data processing is discrete Fourier transformation that enables us to scan sequences for required data. There is also a theoretical part about creating a spectrogram and a list of detected samples. Knowledge of this is used to program specific methods in Matlab. We created a program for detection of coding parts in Spectrogram and defining their accurate positions in the sequence. Acquired results are discussed and compared with the NCBI database at the end of this work.

    Promoter-bound METTL3 maintains myeloid leukaemia by m6A-dependent translation control.

    Get PDF
    N6-methyladenosine (m6A) is an abundant internal RNA modification in both coding and non-coding RNAs that is catalysed by the METTL3-METTL14 methyltransferase complex. However, the specific role of these enzymes in cancer is still largely unknown. Here we define a pathway that is specific for METTL3 and is implicated in the maintenance of a leukaemic state. We identify METTL3 as an essential gene for growth of acute myeloid leukaemia cells in two distinct genetic screens. Downregulation of METTL3 results in cell cycle arrest, differentiation of leukaemic cells and failure to establish leukaemia in immunodeficient mice. We show that METTL3, independently of METTL14, associates with chromatin and localizes to the transcriptional start sites of active genes. The vast majority of these genes have the CAATT-box binding protein CEBPZ present at the transcriptional start site, and this is required for recruitment of METTL3 to chromatin. Promoter-bound METTL3 induces m6A modification within the coding region of the associated mRNA transcript, and enhances its translation by relieving ribosome stalling. We show that genes regulated by METTL3 in this way are necessary for acute myeloid leukaemia. Together, these data define METTL3 as a regulator of a chromatin-based pathway that is necessary for maintenance of the leukaemic state and identify this enzyme as a potential therapeutic target for acute myeloid leukaemia

    A study of mRNA translation with computational analysis of ribosome profiling datasets

    Get PDF
    Ribosome profiling is based on capturing and sequencing of the mRNA fragments enclosed within the translating ribosome and it thereby provides a “snapshot” of ribosome positions at the transcriptome wide level. The approach was developed in 2009 and was a significant advancement towards the better understanding of the regulation of protein synthesis. I this thesis I describe my analysis of ribosome profiling data. In Chapter 1, I present a review of the recent developments of understanding obtained with ribosome profiling as well as discussing the implications of artifacts on the interpretation of its data. Chapters 2 and 3 details using ribosome profiling to examine the translational response to eIF2 repression and to the deprivation of oxygen and glucose respectively. Chapter 4 details how the interaction of the Shine Dalgarno with the ribosome rRNA alters the length of the mRNA protected fragments. In Chapter 5, I present analysis at identifying the relative impact of mRNA features on local ribosome profiling read density

    Two Splice Variants of Nopp140 in Drosophila Melanogaster.

    Get PDF
    The activities of non-ribosomal nucleolar proteins are now understood to be important for the normal functions of both nucleoli and C&barbelow;ajal B&barbelow;odies (CBs). Although these proteins have been studied extensively in other eukaryotes, knowledge of non-ribosomal nucleolar proteins in Drosophila melanogaster lags far behind. The n&barbelow;ucleo&barbelow;lar p&barbelow;hosphop&barbelow;rotein of 140 kDa (Nopp140) may function to shuttle box C/D and box H/ACA s&barbelow;mall n&barbelow;ucleo&barbelow;lar (sno)RNAs from the nucleus to the nucleolus, where they function in the 2 \u27-O-methylation and pseudouridylation of rRNA, respectively. Nopp140 homologues have been described in rat, human, Xenopus laevis , and yeast. This dissertation describes the cloning of cDNAs that encode two splice variants of Nopp140 in D. melanogaster. In addition, this dissertation addresses the localization patterns of the D. melanogaster Nopp140 splice variants in various cell types with respect to endogenous nucleolar proteins and CBs. The D. melanogaster Nopp140 gene maps within 79A5 of chromosome 3. Alternative mRNA splicing yields two variants. DmNopp140 (654 residues) is the true D. melanogaster homologue of vertebrate Nopp140 in that its carboxy terminus is 58% identical to the carboxy terminus of rat Nopp140. DmNopp140-RGG (688 residues) is identical to DmNopp140 throughout its first 551 residues, but its carboxy terminus contains an extensive ar&barbelow;rginine-g&barbelow;lycine-g&barbelow;lycine (RGG) domain that is found in many RNA-binding proteins such as vertebrate nucleolin. Both D. melanogaster Nopp140 variants localize to the dense fibrillar component (DFC) of D. melanogaster Schneider II cells and X. laevis oocytes. In HeLa cells, DmNopp140-RGG localizes to intact nucleoli, while DmNopp140 segregates nucleoli into phase-light and phase-dark regions. The phase light regions contain DmNopp140 and endogenous fibrillarin, while the phase-dark regions contain endogenous nucleolin. Both D. melanogaster variants co-localize to nucleoli when co-expressed in HeLa cells. Both proteins also co-localize with exogenously expressed X. laevis coilin to enlarged C&barbelow;ajal b&barbelow;odies (CBs) within HeLa cell nucleoli, but only DmNopp140 localizes to CBs in Schneider II cells. Both variants fail to localize to CBs in X. laevis oocyte nuclei. A carboxy terminal truncation, DmNopp140-DeltaRGG, fails to localize to nucleoli in HeLa cells, but like DmNopp140, it localizes with exogenously expressed coilin in HeLa cell CBs
    corecore