344 research outputs found

    When a Dollar Makes a BWT

    Get PDF
    TheBurrows-Wheeler-Transform(BWT)isareversiblestring transformation which plays a central role in text compression and is fun- damental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every string is a BWT image, and exact charac- terizations of BWT images are known. We investigate a related combi- natorial question. In many applications, a sentinel character isaddedtomarktheendofthestring,andthustheBWTofastringendingwith is added to mark the end of the string, and thus the BWT of a string ending with contains exactly one character.Weask,givenastringw,inwhichpositions,ifany,canthe character. We ask, given a string w, in which positions, if any, can the -character be inserted to turn w into the BWT image of a word ending with the sentinel character. We show that this depends only on the standard permutation of w and give a combinatorial characterization of such positions via this permutation. We then develop an O(n log n)-time algorithm for identifying all such positions, improving on the naive quadratic time algorithm

    Investigation of compression algorithms for the purposes of medical archiving and remote diagnostics

    Get PDF
    The objective of this research is to investigate compression algorithms for the purposes o f medical archiving and remote diagnostics. A single X-ray of size 1024 by 1024 pixels at 8 bit resolution occupies approximately 1 Mbyte of storage space and takes approximately 8 minutes to transmit over a standard telephone line. There is a need to reduce both required storage space and transmission time. This is achieved by compressing the image. For archiving, a lossless compression algorithm is used and for remote diagnostics a lossy compression algorithm is used. The development of an application for the transmission of images from the server application to a client application using the Internet is also investigated. The performances of three lossless compression algorithms are investigated: Huffman Coding, Arithmetic Coding and Huffman Coding using Splay Trees. These algorithms are written in C, transformed into (Dynamic Linked Library) DLLs using Visual C++ for use in a Visual Basic application. The algorithms are tested on five images, three X-rays and two standard images, and compression ratio, compression time and decompression time are recorded for each. The lossy algorithm investigated is Transform Image coding using the Discrete Cosine Transform (DCT). This algorithm is written in C, transformed into a DLL using Visual C++ for use in a Visual Basic application. The algorithm is tested on five images, three X-rays and two standard images, and compression ratio, compression time, decompression time and mean square error for different quality factors are recorded for each image. The application is developed with a user-friendly Graphic User Interface (GUI) using Visual Basic. The client could choose an image from the server and then zo om in on any section of it. This can be used for remote diagnostics or as a reference tool. The application could also determine which DCT method to use to optimise the bandwidth, depending on the speed of the medium used

    Approximation algorithms for the shortest common superstring problem

    Get PDF
    AbstractThe object of the shortest common superstring problem (SCS) is to find the shortest possible string that contains every string in a given set as substrings. As the problem is NP-complete, approximation algorithms are of interest. The value of an aproximate solution to SCS is normally taken to be its length, and we seek algorithms that make the length as small as possible. A different measure is given by the sum of the overlaps between consecutive strings in a candidate solution. When considering this measure, the object is to find solutions that make it as large as possible. These two measures offer different ways of viewing the problem. While the two viewpoints are equivalent with respect to optimal solutions, they differ with respect to approximate solutions. We describe several approximation algorithms that produce solutions that are always within a factor of two of optimum with respect to the overlap measure. We also describe an efficient implementation of one of these, using McCreight's compact suffix tree construction algorithm. The worstcase running time is O(m log n) for small alphabets, where m is the sum of the lengths of all the strings in the set and n is the number of strings. For large alphabets, the algorithm can be implemented in O(m log m) time by using Sleator and Tarjan's lexicographic splay tree data structure

    Cryptanalysis of splay tree based encryption

    Get PDF
    We present a chosen-plaintext attack on KIST, a recently proposed encryption scheme based on splay trees. Our attack recovers a 128-bit key with approximately 2^28 bit operations and fewer than 2^19 chosen-plaintext queries
    • …
    corecore