165 research outputs found

    Multi-Step Processing of Spatial Joins

    Get PDF
    Spatial joins are one of the most important operations for combining spatial objects of several relations. In this paper, spatial join processing is studied in detail for extended spatial objects in twodimensional data space. We present an approach for spatial join processing that is based on three steps. First, a spatial join is performed on the minimum bounding rectangles of the objects returning a set of candidates. Various approaches for accelerating this step of join processing have been examined at the last year’s conference [BKS 93a]. In this paper, we focus on the problem how to compute the answers from the set of candidates which is handled by the following two steps. First of all, sophisticated approximations are used to identify answers as well as to filter out false hits from the set of candidates. For this purpose, we investigate various types of conservative and progressive approximations. In the last step, the exact geometry of the remaining candidates has to be tested against the join predicate. The time required for computing spatial join predicates can essentially be reduced when objects are adequately organized in main memory. In our approach, objects are first decomposed into simple components which are exclusively organized by a main-memory resident spatial data structure. Overall, we present a complete approach of spatial join processing on complex spatial objects. The performance of the individual steps of our approach is evaluated with data sets from real cartographic applications. The results show that our approach reduces the total execution time of the spatial join by factors

    Detecting Redundancy in Data Warehouse Evolution

    Full text link

    Selected heterozygosity at cis-regulatory sequences increases the expression homogeneity of a cell population in humans

    Get PDF
    Background: Examples of heterozygote advantage in humans are scarce and limited to protein-coding sequences. Here, we attempt a genome-wide functional inference of advantageous heterozygosity at cis-regulatory regions. Results: The single-nucleotide polymorphisms bearing the signatures of balancing selection are enriched in active cis-regulatory regions of immune cells and epithelial cells, the latter of which provide barrier function and innate immunity. Examples associated with ancient trans-specific balancing selection are also discovered. Allelic imbalance in chromatin accessibility and divergence in transcription factor motif sequences indicate that these balanced polymorphisms cause distinct regulatory variation. However, a majority of these variants show no association with the expression level of the target gene. Instead, single-cell experimental data for gene expression and chromatin accessibility demonstrate that heterozygous sequences can lower cell-to-cell variability in proportion to selection strengths. This negative correlation is more pronounced for highly expressed genes and consistently observed when using different data and methods. Based on mathematical modeling, we hypothesize that extrinsic noise from fluctuations in transcription factor activity may be amplified in homozygotes, whereas it is buffered in heterozygotes. While high expression levels are coupled with intrinsic noise reduction, regulatory heterozygosity can contribute to the suppression of extrinsic noise. Conclusions: This mechanism may confer a selective advantage by increasing cell population homogeneity and thereby enhancing the collective action of the cells, especially of those involved in the defense systems in humansope

    Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm

    Get PDF
    BACKGROUND: The problem of finding a Shortest Common Supersequence (SCS) of a set of sequences is an important problem with applications in many areas. It is a key problem in biological sequences analysis. The SCS problem is well-known to be NP-complete. Many heuristic algorithms have been proposed. Some heuristics work well on a few long sequences (as in sequence comparison applications); others work well on many short sequences (as in oligo-array synthesis). Unfortunately, most do not work well on large SCS instances where there are many, long sequences. RESULTS: In this paper, we present a Deposition and Reduction (DR) algorithm for solving large SCS instances of biological sequences. There are two processes in our DR algorithm: deposition process, and reduction process. The deposition process is responsible for generating a small set of common supersequences; and the reduction process shortens these common supersequences by removing some characters while preserving the common supersequence property. Our evaluation on simulated data and real DNA and protein sequences show that our algorithm consistently produces the best results compared to many well-known heuristic algorithms, and especially on large instances. CONCLUSION: Our DR algorithm provides a partial answer to the open problem of designing efficient heuristic algorithm for SCS problem on many long sequences. Our algorithm has a bounded approximation ratio. The algorithm is efficient, both in running time and space complexity and our evaluation shows that it is practical even for SCS problems on many long sequences

    Accurate microRNA target prediction correlates with protein repression levels

    Get PDF
    MicroRNAs are small endogenously expressed non-coding RNA molecules that regulate target gene expression through translation repression or messenger RNA degradation. MicroRNA regulation is performed through pairing of the microRNA to sites in the messenger RNA of protein coding genes. Since experimental identification of miRNA target genes poses difficulties, computational microRNA target prediction is one of the key means in deciphering the role of microRNAs in development and diseas

    A Genome-Wide Analysis of FRT-Like Sequences in the Human Genome

    Get PDF
    Efficient and precise genome manipulations can be achieved by the Flp/FRT system of site-specific DNA recombination. Applications of this system are limited, however, to cases when target sites for Flp recombinase, FRT sites, are pre-introduced into a genome locale of interest. To expand use of the Flp/FRT system in genome engineering, variants of Flp recombinase can be evolved to recognize pre-existing genomic sequences that resemble FRT and thus can serve as recombination sites. To understand the distribution and sequence properties of genomic FRT-like sites, we performed a genome-wide analysis of FRT-like sites in the human genome using the experimentally-derived parameters. Out of 642,151 identified FRT-like sequences, 581,157 sequences were unique and 12,452 sequences had at least one exact duplicate. Duplicated FRT-like sequences are located mostly within LINE1, but also within LTRs of endogenous retroviruses, Alu repeats and other repetitive DNA sequences. The unique FRT-like sequences were classified based on the number of matches to FRT within the first four proximal bases pairs of the Flp binding elements of FRT and the nature of mismatched base pairs in the same region. The data obtained will be useful for the emerging field of genome engineering

    Alu distribution and mutation types of cancer genes

    Get PDF
    Background: Alu elements are the most abundant retrotransposable elements comprising ~11% of the human genome. Many studies have highlighted the role that Alu elements have in genetic instability and how their contribution to the assortment of mutagenic events can lead to cancer. As of yet, little has been done to quantitatively assess the association between Alu distribution and genes that are causally implicated in oncogenesis.Results: We have investigated the effect of various Alu densities on the mutation type based classifications of cancer genes. In order to establish the direct relationship between Alus and the cancer genes of interest, genome wide Alu-related densities were measured using genes rather than the sliding windows of fixed length as the units. Several novel genomic features, such as the density of the adjacent Alu pairs and the number of Alu-Exon-Alu triplets, were developed in order to extend the investigation via the multivariate statistical analysis toward more advanced biological insight. In addition, we characterized the genome-wide intron Alu distribution with a mixture model that distinguished genes containing Alu elements from those with no Alus, and evaluated the gene-level effect of the 5\u27-TTAAAA motif associated with Alu insertion sites using a two-step regression analysis method.Conclusions: The study resulted in several novel findings worthy of further investigation. They include: (1) Recessive cancer genes (tumor suppressor genes) are enriched with Alu elements (p \u3c 0.01) compared to dominant cancer genes (oncogenes) and the entire set of genes in the human genome; (2) Alu-related genomic features can be used to cluster cancer genes into biological meaningful groups; (3) The retention of exon Alus has been restricted in the human genome development, and an upper limit to the chromosome-level exon Alu densities is suggested by the distribution profile; (4) For the genes with at least one intron Alu repeat in individual chromosomes, the intron Alu densities can be well fitted by a Gamma distribution; (5) The effect of the 5\u27-TTAAAA motif on Alu densities varies across different chromosomes
    corecore