3,645 research outputs found

    Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.</p> <p>Results</p> <p>We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor <it>n </it>for <it>n</it>mer) and higher harmonics. In general, <it>n</it>mer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/<it>f</it><sup><it>β </it></sup>– noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations.</p> <p>Conclusion</p> <p>DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of <it>n</it>mer HOR, i.e., the number <it>n </it>of monomers contained in consensus HOR.</p

    Low complexity frequency monitoring filter for fast exon prediction sequence analysis

    Get PDF
    Over the last few years, the application of Digital Signal Processing (DSP) techniques for genomic sequence analysis has received great interest. Indeed, among its applications in genomic analysis, it has been demonstrated that DSP can be used to detect protein coding regions (exons) among non-coding regions in a DNA sequence. The period-3 behavior exhibited by exons is one of its features that has been exploited in several developed algorithms for exon prediction. Identification of this periodicity in genomic sequences can be done by using different methods such as the well-known Fast Fourier Transform (FFT) and the Goertzel algorithm for complexity reduction in which the reduction of computational time is a great challenge in genomic analysis. Therefore, this paper presents a novel one frequency analysis by using half of the arithmetic complexity of the Goertzel algorithm for gene prediction. Compared to the Intel®’s FFT (MKL) optimized function, the Goertzel’s (IPP) and the dedicated Goertzel compiled function with ICC on Xeon CPU (24 cores), the proposed method conserves the same accuracy provided by the referenced methods which will manifest a speedup of 3000, 10 and 2 compared to MKL FFT, IPP Goertzel and the dedicated Goertzel with ICC, respectively

    Analysis of Genomic and Proteomic Signals Using Signal Processing and Soft Computing Techniques

    Get PDF
    Bioinformatics is a data rich field which provides unique opportunities to use computational techniques to understand and organize information associated with biomolecules such as DNA, RNA, and Proteins. It involves in-depth study in the areas of genomics and proteomics and requires techniques from computer science,statistics and engineering to identify, model, extract features and to process data for analysis and interpretation of results in a biologically meaningful manner.In engineering methods the signal processing techniques such as transformation,filtering, pattern analysis and soft-computing techniques like multi layer perceptron(MLP) and radial basis function neural network (RBFNN) play vital role to effectively resolve many challenging issues associated with genomics and proteomics. In this dissertation, a sincere attempt has been made to investigate on some challenging problems of bioinformatics by employing some efficient signal and soft computing methods. Some of the specific issues, which have been attempted are protein coding region identification in DNA sequence, hot spot identification in protein, prediction of protein structural class and classification of microarray gene expression data. The dissertation presents some novel methods to measure and to extract features from the genomic sequences using time-frequency analysis and machine intelligence techniques.The problems investigated and the contribution made in the thesis are presented here in a concise manner. The S-transform, a powerful time-frequency representation technique, possesses superior property over the wavelet transform and short time Fourier transform as the exponential function is fixed with respect to time axis while the localizing scalable Gaussian window dilates and translates. The S-transform uses an analysis window whose width is decreasing with frequency providing a frequency dependent resolution. The invertible property of S-transform makes it suitable for time-band filtering application. Gene prediction and protein coding region identification have been always a challenging task in computational biology,especially in eukaryote genomes due to its complex structure. This issue is resolved using a S-transform based time-band filtering approach by localizing the period-3 property present in the DNA sequence which forms the basis for the identification.Similarly, hot spot identification in protein is a burning issue in protein science due to its importance in binding and interaction between proteins. A novel S-transform based time-frequency filtering approach is proposed for efficient identification of the hot spots. Prediction of structural class of protein has been a challenging problem in bioinformatics.A novel feature representation scheme is proposed to efficiently represent the protein, thereby improves the prediction accuracy. The high dimension and low sample size of microarray data lead to curse of dimensionality problem which affects the classification performance.In this dissertation an efficient hybrid feature extraction method is proposed to overcome the dimensionality issue and a RBFNN is introduced to efficiently classify the microarray samples

    The JM-Filter to detect specific frequency in monitored signal

    Get PDF
    The Discrete Fourier Transform (DFT) is a mathematical procedure that stands at the center of the processing inside a digital signal processor. It has been widely known and argued in relevant literature that the Fast Fourier Transform (FFT) is useless in detecting specific frequencies in a monitored signal of length N because most of the computed results are ignored. In this paper, we present an efficient FFT-based method to detect specific frequencies in a monitored signal, which will then be compared to the most frequently used method which is the recursive Goertzel algorithm that detects and analyses one selectable frequency component from a discrete signal. The proposed JM-Filter algorithm presents a reduction of iterations compared to the first and second order Goertzel algorithm by a factor of r, where r represents the radix of the JM-Filter. The obtained results are significant in terms of computational reduction and accuracy in fixed-point implementation. Gains of 15 dB and 19 dB in signal to quantization noise ratio (SQNR) were respectively observed for the proposed first and second order radix-8 JM-Filter in comparison to Goertzel algorithm

    Development and Application of Next-Generation Sequencing Methods to Profile Cellular Translational Dynamics

    Full text link
    The transmission of genetic information from the transcription of DNA to RNA and the subsequent translation of RNA into protein is often abstracted into a linear process. However, as methods and technologies to measure the genomic, transcriptomic, and proteomic content of cells have advanced, so too has our understanding that the transmission of genetic information does not always flow in a lossless manner. For instance, changes observed in messenger RNA (mRNA) abundance are not always retained at the proteomic level. Indeed, a diverse array of mechanisms have been identified that exert regulatory control over this transmission of information. Next-generation short read sequencing has driven many of these insights and provided increasingly nuanced understanding of these regulatory mechanisms. However, the continued development and application of sequencing methodologies and analytics are required to properly contextualize many of these insights on a more global scale. Ribosome profiling is one such recent advancement which enriches for ribosome-protected fragments of mRNA; sequencing and analysis of these ribosome-protected mRNA fragments enables profiling of the translational content of a sample. The aim of this dissertation is to address the need for the development and application of statistical and analytical algorithms to profile the regulatory factors that contribute to the translational dynamics in cells. In the first chapter, I survey the development and application of next-generation sequencing methods for the profiling and computational analysis of translation and translational dynamics. In the second chapter of this thesis, I present SPECtre, a software package that identifies regions of active translation through measurement of the translational engagement of ribosomes over a transcript. SPECtre achieves high sensitivity and specificity in its classification of regions undergoing translation by leveraging the codon-dependent elongation of peptides; this tri-nucleotide periodicity is evident in the alignment of ribosome profiling sequence reads to a reference transcriptome. SPECtre classifies actively translated transcripts according to their coherence in read coverage over a region to an optimal tri-nucleotide signal. In the third chapter, I describe the application of SPECtre to identify the translation of upstream-initiated open-reading frames that may regulate differentiation in a neuron-like cell model. uORFs are transcripts that result from the initiation of translation from AUG, and under certain biological constraints, from non-AUG sequences localized in the 5’ untranslated regions of annotated protein-coding genes. Subsets of these uORFs have been implicated in the regulation of their downstream protein-coding genes in yeast, mice and humans. In this chapter, I provide further evidence for this regulation as well as the spatial context for the functional consequences of uORF translation on downstream protein-coding genes in a neuron-like cell line model of differentiation. Finally, in the fourth chapter, I outline a strategy using our coherence-based translational scoring algorithm to profile ribosomal engagement over chimeric gene fusion breakpoints in prostate cancer. Here, known breakpoints from current annotation databases are integrated with novel junctions nominated by existing whole genome and transcriptomic gene fusion detection algorithms, and the translational profile over these chimeric junctions using SPECtre is measured. This provides an additional layer of translational evidence to known and novel gene fusion breakpoints in prostate cancer. Ongoing development of a database and visualization platform based on these results will enable integrative insights into the transcriptional and translational topology of these breakpoints.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144106/1/stonyc_1.pd

    Statistical and network-based methods for the analysis of chromatin accessibility maps in single cells

    Get PDF
    In questo lavoro, metodi provenienti dalla Fisica, dalla Statistica e dalla Teoria dei Grafi sono stati impiegati per caratterizzare ed analizzare profili di apertura e accessibilità della cromatina ottenuti con la tecnica ATAC-seq in singole cellule, nella fattispecie linfociti B provenienti da tre pazienti affetti da Leucemia Linfocitica Cronica. Una pipeline bioinformatica è stata sviluppata per processare i dati di sequencing ed ottenere le posizioni accessibili del genoma per ciascuna cellula. La quantità di regioni aperte e la loro distribuzione spaziale lungo il DNA sono state caratterizzate. Infine, l’apertura simultanea nelle stesse singole cellule di regioni regolatrici è stata impiegata come metrica per valutare relazioni funzionali, e in questo modo grafi tra enhancer e promoter sono stati costruiti e le loro proprietà sono state analizzate. La distribuzione spaziale lungo il genoma di regioni aperte consecutive ricapitola proprietà strutturali come gli array di nucleosomi e le strutture a loop della cromatina. Inoltre, i profili di accessibilità delle regioni regolatrici sono significativamente conservati nelle singole cellule. I network tra enhancer e promoter forniscono un modo per caratterizzare la rilevanza di ciascuna regione regolatrice in termini di centralità. Le statistiche sulla connettività tra enhancer e promoter confermano il modello di relazione uno-a-uno come il più frequente, in cui un promoter è regolato dall'enhancer ad esso più vicino. Infine, anche il funzionamento dei superenhancer è stato indagato. In conclusione, ATAC-seq si rivela un'efficace tecnica per indagare l'apertura della cromatina in singole cellule, i cui profili di accessibilità ricapitolano caratteristiche strutturali e funzionali della cromatina. Al fine di indagare i meccanismi della malattia, il panorama di accessibilità dei lifociti tumorali può essere confrontato con quello di cellule sane e cellule trattate con farmaci epigenetici

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Rakenteellisen tiedon johtaminen koodaamattomista ribonukleiinihapoista massaspektrometrialla

    Get PDF
    The purpose of this study is to describe the development of application of mass spectrometry for the structural analyses of non-coding ribonucleic acids during past decade. Mass spectrometric methods are compared of traditional gel electrophoretic methods, the characteristics of performance of mass spectrometric, analyses are studied and the future trends of mass spectrometry of ribonucleic acids are discussed. Non-coding ribonucleic acids are short polymeric biomolecules which are not translated to proteins, but which may affect the gene expression in all organisms. Regulatory ribonucleic acids act through transient interactions with key molecules in signal transduction pathways. Interactions are mediated through specific secondary and tertiary structures. Posttranscriptional modifications in the structures of molecules may introduce new properties to the organism, such as adaptation to environmental changes or development of resistance to antibiotics. In the scope of this study, the structural studies include i) determination of the sequence of nucleobases in the polymer chain, ii) characterisation and localisation of posttranscriptional modifications in nucleobases and in the backbone structure, iii) identification of ribonucleic acid-binding molecules and iv) probing of higher order structures in the ribonucleic acid molecule. Bacteria, archaea, viruses and HeLa cancer cells have been used as target organisms. Synthesised ribonucleic acids consisting of structural regions of interest have been frequently used. Electrospray ionisation (ESI) and matrix-assisted laser desorption ionisation (MALDI) have been used for ionisation of ribonucleic analytes. Ammonium acetate and 2-propanol are common solvents for ESI. Trihydroxyacetophenone is the optimal MALDI matrix for ionisation of ribonucleic acids and peptides. Ammonium salts are used in ESI buffers and MALDI matrices as additives to remove cation adducts. Reverse phase high performance liquid chromatography has been used for desalting and fractionation of analytes either off-line of on-line, coupled with ESI source. Triethylamine and triethylammonium bicarbonate are used as ion pair reagents almost exclusively. Fourier transform ion cyclotron resonance analyser using ESI coupled with liquid chromatography is the platform of choice for all forms of structural analyses. Time-of-flight (TOF) analyser using MALDI may offer sensitive, easy-to-use and economical solution for simple sequencing of longer oligonucleotides and analyses of analyte mixtures without prior fractionation. Special analysis software is used for computer-aided interpretation of mass spectra. With mass spectrometry, sequences of 20-30 nucleotides of length may be determined unambiguously. Sequencing may be applied to quality control of short synthetic oligomers for analytical purposes. Sequencing in conjunction with other structural studies enables accurate localisation and characterisation of posttranscriptional modifications and identification of nucleobases and amino acids at the sites of interaction. High throughput screening methods for RNA-binding ligands have been developed. Probing of the higher order structures has provided supportive data for computer-generated three dimensional models of viral pseudoknots. In conclusion. mass spectrometric methods are well suited for structural analyses of small species of ribonucleic acids, such as short non-coding ribonucleic acids in the molecular size region of 20-30 nucleotides. Structural information not attainable with other methods of analyses, such as nuclear magnetic resonance and X-ray crystallography, may be obtained with the use of mass spectrometry. Sequencing may be applied to quality control of short synthetic oligomers for analytical purposes. Ligand screening may be used in the search of possible new therapeutic agents. Demanding assay design and challenging interpretation of data requires multidisclipinary knowledge. The implement of mass spectrometry to structural studies of ribonucleic acids is probably most efficiently conducted in specialist groups consisting of researchers from various fields of science
    corecore