67,906 research outputs found

    Compressing DNA sequence databases with coil

    Get PDF
    Background: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results: We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion: coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work

    ADLib: An Arduino Communication Framework for Ambient Displays

    Get PDF
    As computers become more and more a part of our everyday lives, the need to change the way in which people interact with them is also evolving. Ambient displays provide an effective way to move computers away from our main focus and into the periphery. ADLib is a small communication framework that aims to simplify the construction of ambient displays built using the Arduino prototyping platform. The ADLib framework provides an easy-to-use library for communicating with an Arduino, allowing the user to focus on the construction and development of the display. The framework consists of three main components: A protocol for encoding information to be sent from a host computer to the Arduino An Arduino library for receiving and parsing incoming data A desktop application for sending data to the Arduin

    Compressing Binary Decision Diagrams

    Full text link
    The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and compression will in many cases reduce the size of the BDD to 1-2 bits per node. Empirical results for our compression technique are presented, including comparisons with previously introduced techniques, showing that the new technique dominate on all tested instances.Comment: Full (tech-report) version of ECAI 2008 short pape

    Processing Posting Lists Using OpenCL

    Get PDF
    One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements. Some of the critical functions in search engines are resource-intensive in terms of processing power, memory, and I/O usage. The processing times vary based on the complexity and magnitude of data involved. This project involves different phases such as identifying critical resource intensive functions, initially replacing such methods with PHP Extensions, and eventually experimenting with OpenCL code. We also ran performance tests to measure the reduction in processing times. From our results, we concluded that PHP Extensions and OpenCL processing resulted in performance improvements

    Genomic analysis of the role of transcription factor C/EBPδ in the regulation of cell behaviour on nanometric grooves

    Get PDF
    C/EBPδ is a tumour suppressor transcription factor that induces gene expression involved in suppressing cell migration. Here we investigate whether C/EBPδ-dependent gene expression also affects cell responses to nanometric topology. We found that ablation of the C/EBPδ gene in mouse embryonal fibroblasts (MEFs) decreased cell size, adhesion and cytoskeleton spreading on 240 nm and 540 nm nanometric grooves. ChIP-SEQ and cDNA microarray analyses demonstrated that many binding sites for C/EBPδ, and the closely related C/EBPβ, exist throughout the mouse genome and control the upregulation or downregulation of many adjacent genes. We also identified a group of C/EBPδ-dependent, trans-regulated genes, whose promoters contained no C/EBPδ binding sites and yet their activity was regulated in a C/EBPδ-dependent manner. These genes include signalling molecules (e.g. SOCS3), cytoskeletal components (Tubb2, Krt16 and Krt20) and cytoskeletal regulators (ArhGEF33 and Rnd3) and are possibly regulated by cis-regulated diffusible mediators, such as IL6. Of particular note, SOCS3 was shown to be absolutely required for efficient cell spreading and contact guidance on 240 nm and 540 nm nanometric grooves. C/EBPδ is therefore involved in the complex regulation of multiple genes, including cytoskeletal components and signalling mediators, which influence the nature of cell interactions with nanometric topology
    corecore