165,665 research outputs found

    Genomic Signal Processing

    Get PDF

    Genomics and proteomics: a signal processor's tour

    Get PDF
    The theory and methods of signal processing are becoming increasingly important in molecular biology. Digital filtering techniques, transform domain methods, and Markov models have played important roles in gene identification, biological sequence analysis, and alignment. This paper contains a brief review of molecular biology, followed by a review of the applications of signal processing theory. This includes the problem of gene finding using digital filtering, and the use of transform domain methods in the study of protein binding spots. The relatively new topic of noncoding genes, and the associated problem of identifying ncRNA buried in DNA sequences are also described. This includes a discussion of hidden Markov models and context free grammars. Several new directions in genomic signal processing are briefly outlined in the end

    Genomic Signal Processing Techniques for Taxonomy Prediction

    Get PDF
    To analyze complex biodiversity in microbial communities, 16S rRNA marker gene sequences are often assigned to operational taxonomic units (OTUs). The abundance of methods that have been used to assign 16S rRNA marker gene sequences into OTUs brings discussions in which one is better. Suggestions on having clustering methods should be stable in which generated OTU assignments do not change as additional sequences are added to the dataset is contradicting some other researches contend that the methods should properly present the distances of sequences is more important. We add one more de novo clustering algorithm, Rolling Snowball to existing ones including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. We use GreenGenes, RDP, and SILVA 16S rRNA gene databases to show the success of the method. The highest accuracy is obtained with SILVA library

    Evaluation of Organisms Relationship by Genomic Signal Processing

    Get PDF
    Tato dizertační práce se zabývá alternativními přístupy k analýze genetické informace organismů. V teoretické části práce jsou představeny dva odlišné přístupy vyhodnocení příbuznosti organismů na základě podobnosti jejich genetické informace obsažené v sekvenci DNA. Jedním z nich je dnes standardizovaný postup fylogenetické analýzy znakového zápisu sekvencí DNA. Přestože je tento postup poměrně výpočetně náročný kvůli potřebě mnohonásobného zarovnání DNA sekvencí, umožňuje stanovit podobnost jak globálně celých sekvencí DNA, tak lokalizovat jen konkrétní homologie v nich. Druhým přístupem jsou alternativní techniky klasifikace sekvencí DNA ve formě numerického vektoru reprezentujícího charakteristický rys obsažené genetické informace. Tyto metody označované jako „alignment-free“ umožňují velmi rychlé vyhodnocení globální podobnosti sekvencí DNA, numerickou konverzí však ztrácejí možnost vyhodnotit lokální změny v sekvencích. V praktické části je pak představena nová metoda klasifikace numerických reprezentací DNA kombinující výhody obou uvedených přístupů. Z numerických reprezentací DNA jsou zvoleny jen reprezentace mající 1D signálu podobný charakter, tzn. obsahující specifický trend vyvíjející se podél osy x. Hlavním předpokladem je taxonomická specifičnost těchto genomických signálů. Praktická část práce se zabývá vytvořením vhodných nástrojů pro číslicové zpracování genomických signálů umožňující vyhodnocení vzájemné podobnosti taxonomicky specifických trendů. Na základě vyhodnocené vzájemné podobnosti genomických signálů je provedena klasifikace formou dendrogramu, jež je obdobou fylogenetických stromů využívaných ve standardní fylogenetice.This dissertation deals with alternative techniques for analysis of genetic information of organisms. The theoretical part presents two different approaches for evaluation of relationship between organisms based on mutual similarity of genetic information contained in their DNA sequences. The first approach is currently standardized phylogenetics analysis of character based records of DNA sequences. Although this approach is computationally expensive due to the need of multiple sequence alignment, it allows evaluation of global and local similarity of DNA sequences. The second approach is represented by techniques for classification of DNA sequences in a form of numerical vectors representing characteristic features of their genetic information. These methods known as „alignment free“ allow fast evaluation of global similarity but cannot evaluate local changes. The new method presented in this dissertation combines the advantages of both approaches. It utilizes numerical representation similar to 1D digital signal, i.e. representation that contains specific trend along x-axis. The experimental part of dissertation deals with design of a set of appropriate tools for genomic signal processing to allow evaluation mutual similarity of taxonomically specific trends. On the basis of the mutual similarity of genomic signals, the classification in the form of dendrogram is applied. It corresponds to phylogenetic trees used in standard phylogenetics.

    Genomic applications of statistical signal processing

    Get PDF
    Biological phenomena in the cells can be explained in terms of the interactions among biological macro-molecules, e.g., DNAs, RNAs and proteins. These interactions can be modeled by genetic regulatory networks (GRNs). This dissertation proposes to reverse engineering the GRNs based on heterogeneous biological data sets, including time-series and time-independent gene expressions, Chromatin ImmunoPrecipatation (ChIP) data, gene sequence and motifs and other possible sources of knowledge. The objective of this research is to propose novel computational methods to catch pace with the fast evolving biological databases. Signal processing techniques are exploited to develop computationally efficient, accurate and robust algorithms, which deal individually or collectively with various data sets. Methods of power spectral density estimation are discussed to identify genes participating in various biological processes. Information theoretic methods are applied for non-parametric inference. Bayesian methods are adopted to incorporate several sources with prior knowledge. This work aims to construct an inference system which takes into account different sources of information such that the absence of some components will not interfere with the rest of the system. It has been verified that the proposed algorithms achieve better inference accuracy and higher computational efficiency compared with other state-of-the-art schemes, e.g. REVEAL, ARACNE, Bayesian Networks and Relevance Networks, at presence of artificial time series and steady state microarray measurements. The proposed algorithms are especially appealing when the the sample size is small. Besides, they are able to integrate multiple heterogeneous data sources, e.g. ChIP and sequence data, so that a unified GRN can be inferred. The analysis of biological literature and in silico experiments on real data sets for fruit fly, yeast and human have corroborated part of the inferred GRN. The research has also produced a set of potential control targets for designing gene therapy strategies
    corecore