23 research outputs found

    A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches

    Get PDF
    This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with k-mismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in computational biology. We consider a generalisation of this problem, the fixed-length approximate string-matching with k-mismatches problem: given a text t, a pattern x and an integer E, search for all the occurrences in t of all factors of x of length l with k or fewer mismatches with a factor of t. We present a practical parallel algorithm of comparable simplicity that requires only O(nm inverted right perpendicularl/winverted left perpendicular/p)) time, where w is the word size of the machine (e.g. 32 or 64 in practice) and p the number of processors. Thus the algorithm's performance is independent of k and the alphabet size vertical bar Sigma vertical bar. The proposed parallel algorithm makes use of message-passing parallelism model, and word-level parallelism for efficient approximate string-matching

    New Methods for Detecting Lineage-Specific Selection

    No full text
    So far, most methods for identifying sequences under selection based on comparative sequence data have either assumed selectional pressures are the same across all branches of a phylogeny, or have focused on changes in specific lineages of interest. Here, we introduce a more general method that detects sequences that have either come under selection, or begun to drift, on any lineage. The method is based on a phylogenetic hidden Markov model (phylo-HMM), and does not require element boundaries to be determined a priori, making it particularly useful for identifying noncoding sequences. Insertions and deletions (indels) are incorporated into the phylo-HMM by a simple strategy that uses a separately reconstructed “indel history.” To evaluate the statistical significance of predictions, we introduce a novel method for computing P-values based on prior and posterior distributions of the number of substitutions that have occurred in the evolution of predicted elements. We derive efficient dynamic-programming algorithms for obtaining these distributions, given a model of neutral evolution. Our methods have been implemented as computer programs called DLESS (Detection of LinEage-Specific Selection) and phyloP (phylogenetic P-values). We discuss results obtained with these programs on both real and simulated data sets

    Fruit Distribution in the Canadian West

    Get PDF
    Transcription factors, by binding to particular DNA sequences termed transcription factor-binding sites, play an important role in regulating gene expression in both prokaryotic and eukaryotic organisms. These binding sites lie within promoters (which are located just upstream of a gene and promote transcription of that gene) and enhancers (short DNA elements enhancing transcription levels of genes in a gene cluster, and which need not be particularly close to the genes they act on, or even located on the same chromosome). Binding of transcription factors in these genomic regulatory regions can influence gene transcription rates either positively or negatively. The binding may also be dependant on the interaction with co-activators and co-repressors, in addition to context (e.g. particular histone modifications in the vicinity of the regulatory element). Identifying all transcription factors and their respective binding sites would be an important step towards a more thorough understanding of gene regulation. Regular expression type patterns, as well as nucleotide distribution matrices, have both been used for describing transcription factor-binding sites, e.g. (Bucher 1990; Ghosh 1990; Chen et al. 1995; Wingender et al. 1996). Here we will discuss some of the computational approaches that are used in binding site identification.SCOPUS: ch.binfo:eu-repo/semantics/publishe
    corecore