67,906 research outputs found
Compressing DNA sequence databases with coil
Background: Publicly available DNA sequence databases such as GenBank are large, and are
growing at an exponential rate. The sheer volume of data being dealt with presents serious storage
and data communications problems. Currently, sequence data is usually kept in large "flat files,"
which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which
rarely achieves good compression ratios. While much research has been done on compressing
individual DNA sequences, surprisingly little has focused on the compression of entire databases
of such sequences. In this study we introduce the sequence database compression software coil.
Results: We have designed and implemented a portable software package, coil, for compressing
and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared
towards achieving high compression ratios at the expense of execution time and memory usage
during compression – the compression time represents a "one-off investment" whose cost is
quickly amortised if the resulting compressed file is transmitted many times. Decompression
requires little memory and is extremely fast. We demonstrate a 5% improvement in compression
ratio over state-of-the-art general-purpose compression tools for a large GenBank database file
containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental
additions to a sequence database.
Conclusion: coil presents a compelling alternative to conventional compression of flat files for the
storage and distribution of DNA sequence databases having a narrow distribution of sequence
lengths, such as EST data. Increasing compression levels for databases having a wide distribution of
sequence lengths is a direction for future work
ADLib: An Arduino Communication Framework for Ambient Displays
As computers become more and more a part of our everyday lives, the need to change the way in which people interact with them is also evolving. Ambient displays provide an effective way to move computers away from our main focus and into the periphery.
ADLib is a small communication framework that aims to simplify the construction of ambient displays built using the Arduino prototyping platform. The ADLib framework provides an easy-to-use library for communicating with an Arduino, allowing the user to focus on the construction and development of the display.
The framework consists of three main components: A protocol for encoding information to be sent from a host computer to the Arduino An Arduino library for receiving and parsing incoming data A desktop application for sending data to the Arduin
Compressing Binary Decision Diagrams
The paper introduces a new technique for compressing Binary Decision Diagrams
in those cases where random access is not required. Using this technique,
compression and decompression can be done in linear time in the size of the BDD
and compression will in many cases reduce the size of the BDD to 1-2 bits per
node. Empirical results for our compression technique are presented, including
comparisons with previously introduced techniques, showing that the new
technique dominate on all tested instances.Comment: Full (tech-report) version of ECAI 2008 short pape
Processing Posting Lists Using OpenCL
One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements.
Some of the critical functions in search engines are resource-intensive in terms of processing power, memory, and I/O usage. The processing times vary based on the complexity and magnitude of data involved. This project involves different phases such as identifying critical resource intensive functions, initially replacing such methods with PHP Extensions, and eventually experimenting with OpenCL code. We also ran performance tests to measure the reduction in processing times. From our results, we concluded that PHP Extensions and OpenCL processing resulted in performance improvements
Genomic analysis of the role of transcription factor C/EBPδ in the regulation of cell behaviour on nanometric grooves
C/EBPδ is a tumour suppressor transcription factor that induces gene expression involved in suppressing cell migration. Here we investigate whether C/EBPδ-dependent gene expression also affects cell responses to nanometric topology. We found that ablation of the C/EBPδ gene in mouse embryonal fibroblasts (MEFs) decreased cell size, adhesion and cytoskeleton spreading on 240 nm and 540 nm nanometric grooves. ChIP-SEQ and cDNA microarray analyses demonstrated that many binding sites for C/EBPδ, and the closely related C/EBPβ, exist throughout the mouse genome and control the upregulation or downregulation of many adjacent genes. We also identified a group of C/EBPδ-dependent, trans-regulated genes, whose promoters contained no C/EBPδ binding sites and yet their activity was regulated in a C/EBPδ-dependent manner. These genes include signalling molecules (e.g. SOCS3), cytoskeletal components (Tubb2, Krt16 and Krt20) and cytoskeletal regulators (ArhGEF33 and Rnd3) and are possibly regulated by cis-regulated diffusible mediators, such as IL6. Of particular note, SOCS3 was shown to be absolutely required for efficient cell spreading and contact guidance on 240 nm and 540 nm nanometric grooves. C/EBPδ is therefore involved in the complex regulation of multiple genes, including cytoskeletal components and signalling mediators, which influence the nature of cell interactions with nanometric topology
- …