32 research outputs found

    An Algorithm for Identifying Novel Targets of Transcription Factor Families: Application to Hypoxia-inducible Factor 1 Targets

    Get PDF
    Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets

    Pattern matching in compressed texts and images

    No full text
    TR-COSC 07/01This paper provides a survey of techniques for pattern matching in compressed text and images. Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the right way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy compression methods, and then in each of these cases the pattern matching can be either exact or inexact. Much work has been reported in the literature on techniques for all of these cases, including algorithms that are suitable for pattern matching for various compression methods, and compression methods designed specifically for pattern matching. This work is surveyed in this paper. The paper also exposes the important relationship between pattern matching and compression, and proposes some performance measures for compressed pattern matching algorithms. Ideas and directions for future work are also described

    Dna Sequence Compression Using The Burrows-Wheeler Transform

    No full text
    We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT and important pattern matching data structures, such as the suffix tree and suffix array. We discuss how the proposed approach can be incorporated in the BWT compression pipeline

    Pattern Matching In Bwt-Transformed Text

    No full text
    Summary form only given. The compressed pattern matching problem is to locate the occurrence(s) of a pattern P in a text string T using a compressed representation of T, with minimal (or no) decompression. The BWT performs a permutation of the characters in the text, such that characters in lexically similar contexts will be near to each other. The motivation for our approach is the observation that the BWT provides a lexicographic ordering of the input text as part of its inverse transformation process

    A Flexible Architecture for the Integration of Media Servers and Databases

    No full text
    Information systems in general manage formatted data, and most of them use databases to store them adequately. While these systems work well, there are new requirements now to improve them towards the inclusion of multimedia data, i.e. images, graphics, video, and audio. Usually, multimedia data are stored in specialized servers which can cope with the requirements of real-time storage and delivery. Applications being developed today, however, need the services of both databases and these media servers. This paper presents a flexible architecture for the integration of media servers and databases into a single system

    Techniques for Fast Partitioning of Compressed and Uncompressed Video

    No full text
    . Video partitioning is the segmentation of a video sequence into visually independent partitions, which represent various identifiable scenes in the video. It is an important first step in considering other issues in video databases management, such as indexing and retrieval. As video partitioning is a computationally intensive process, effective management of digital video requires highly efficient techniques for the process. In general, for compressed and uncompressed video, the basic mechanism used to reduce computation is by selective processing of a subpart of the video frames. However, so far the choice of this proportion has been made randomly, without any formal basis. An ad hoc selection of this subpart cannot always guarantee a reduction in computation while ensuring effective partitioning. This paper presents formal methods for determining the optimal window size and the minimum thresholds which ensure that decisions on scene similarity are made on a reliable, effective and..

    Burrows-Wheeler transform and Run-Length Enconding

    No full text
    In this paper we study the clustering effect of the Burrows-Wheeler Transform (BWT) from a combinatorial viewpoint. In particular, given a word w we define the BWT-clustering ratio of w as the ratio between the number of clusters produced by BWT and the number of the clusters of w. The number of clusters of a word is measured by its Run-Length Encoding. We show that the BWT-clustering ratio ranges in ]0,à2]. Moreover, given a rational number râ]0,2], it is possible to find infinitely many words having BWT-clustering ratio equal to r. Finally, we show how the words can be classified according to their BWT-clustering ratio. The behavior of such a parameter is studied for very well-known families of binary words

    Achieving lossless compression of audio by encoding its constituted components (LCAEC)

    No full text
    corecore