Search CORE

10 research outputs found

A new word-based compression model allowing compressed pattern matching

Author: Buluş Halil Nusret
Carus Aydın
Mesut Altan
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 01/01/2017
Field of study

In this study a new semistatic data compression model that has a fast coding process and that allows compressed pattern matching is introduced. The name of the proposed model is chosen as tagged word-based compression algorithm (TWBCA) since it has a word-based coding and word-based compressed matching algorithm. The model has two phases. In the first phase a dictionary is constructed by adding a phrase, paying attention to word boundaries, and in the second phase compression is done by using codewords of phrases in this dictionary. The first byte of the codeword determines whether the word is compressed or not. By paying attention to this rule, the CPM process can be conducted as word based. In addition, the proposed method makes it possible to also search for the group of consecutively compressed words. Any of the previous pattern matching algorithms can be chosen to use in compressed pattern matching as a black box. The duration of the CPM process is always less than the duration of the same process on the texts coded by Gzip tool. While matching longer patterns, compressed pattern matching takes more time on the texts coded by compress and end-tagged dense code (ETDC). However, searching shorter patterns takes less time on texts coded by our approach than the texts compressed with compress. Besides this, the compression ratio of our algorithm has a better performance against ETDC only on a file that has been written in Turkish. The compression performance of TWBCA is stable and does not vary over 6% on different text files

Crossref

Namik Kemal University Institutional Repository

Index structures for distributed text databases

Author: Marin Cahiuan Juan Mauricio
Publication venue
Publication date: 01/04/2004
Field of study

The Web has became an obiquitous resource for distributed computing making it relevant to investigate new ways of providing efficient access to services available at dedicated sites. Efficiency is an ever-increasing demand which can be only satisfied with the development of parallel algorithms which are efficient in practice. This tutorial paper focuses on the design, analysis and implementation of parallel algorithms and data structures for widely-used text database applications on the Web. In particular we describe parallel algorithms for inverted files and suffix arrays structures that are suitable for implementing search engines. Algorithmic design is effected on top of the BSP model of parallel computing. This model ensures portability across diverse parallel architectures ranging from clusters to super-computers.Facultad de Informátic

Index structures for distributed text databases

Author: Marin Cahiuan Juan Mauricio
Publication venue
Publication date: 01/04/2004
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Servicio de Difusión de la Creación Intelectual

Использование метода сжатия информации в семантическом информационном поиске

Author: Кудинов В. А.
Нэй Лин
Publication venue
Publication date: 01/01/2019
Field of study

Предлагается новая модель семантического информационного поиска, использующая метод сжатия на основе кода End Tagged Dense Code-ETD

DSpace at Belgorod State University

A general compression algorithm that supports fast searching

Author: Aho
Baeza-Yates
Baeza-Yates
Brisaboa
Brisaboa
Fredriksson
Fredriksson
Kida
Kimmo Fredriksson
Klein
Moura
Navarro
Navarro
Navarro
Rautio
Szymon Grabowski
Takeda
Takeda
Wu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Index structures for distributed text databases

Author: Marin Cahiuan Juan Mauricio
Publication venue
Publication date: 10/08/2004
Field of study

Servicio de Difusión de la Creación Intelectual

Sublinear Computation Paradigm

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/11/2021
Field of study

This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms

Directory of Open Access Books (DOAB)

An Efficient Compression Code for Text Databases

Author: A. Moffat
D. A. Huffman
D. Manstetten
E. Silva de Moura
G. Navarro
G. Navarro
J. Ziv
J. Ziv
N. Ziviani
R. Prisco De
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref