7 research outputs found

    Enhancing Text Compression Method Using Information Source Indexing

    Get PDF
    Text compression methods where the original texts are directly mapped into binary domain are attractive to compress English text files. This paper proposes an intermediate mapping scheme in which the original English text is transformed firstly to decimal domain and then to binary domain. Each two-decimal-digit value in the resulting intermediate decimal file represents the index to the location of each alphabet found in the original text. If the already indexed alphabet is seen again, it will be replaced by the previously given decimal-index number. The decimal file is converted into binary domain by assigning each decimal digit a 4-bit weighted code in according to its frequency of occurrence that is akin to BCD code. The assigned codes aim at generating an equivalent binary file with entropy as close as much to that of the original one. Thereafter, any conventional compression algorithm such as Lempel-Ziv algorithms can be applied to the generated binary file. The obtained compression ratios outperform those ones obtained when applying the same compression algorithm to the binary files generated either via direct mapping of the original text or via mapping the decimal file using Binary Coded Decimal (BCD) codes. Keywords: Lossless data compression; Source encoding, LZW coding, Hamming weights, Compression ratio

    Re-Use Dynamic Programming for Sequence Alignment: An Algorithmic Toolkit

    Get PDF
    International audienceThe problem of comparing two sequences S and T to determine their similarity is one of the fundamental problems in pattern matching. In this manuscript we will be primarily concerned with sequences as our objects and with various string comparison metrics. Our goal is to survey a methodology for utilizing repetitions in sequences in order to speed up the comparison process. Within this framework we consider various methods of parsing the sequences in order to frame their repetitions, and present a toolkit of various solutions whose time complexity depends both on the chosen parsing method as well as on the string-comparison metric used for the alignment

    Sublinear Computation Paradigm

    Get PDF
    This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms
    corecore