121 research outputs found

    35th Symposium on Theoretical Aspects of Computer Science: STACS 2018, February 28-March 3, 2018, Caen, France

    Get PDF

    Communication in membrana Systems with symbol Objects.

    Get PDF
    Esta tesis está dedicada a los sistemas de membranas con objetos-símbolo como marco teórico de los sistemas paralelos y distribuidos de procesamiento de multiconjuntos.Una computación de parada puede aceptar, generar o procesar un número, un vector o una palabra; por tanto el sistema define globalmente (a través de los resultados de todas sus computaciones) un conjunto de números, de vectores, de palabras (es decir, un lenguaje), o bien una función. En esta tesis estudiamos la capacidad de estos sistemas para resolver problemas particulares, así como su potencia computacional. Por ejemplo, las familias de lenguajes definidas por diversas clases de estos sistemas se comparan con las familias clásicas, esto es, lenguajes regulares, independientes del contexto, generados por sistemas 0L tabulados extendidos, generados por gramáticas matriciales sin chequeo de apariciones, recursivamente enumerables, etc. Se prestará especial atención a la comunicación de objetos entre regiones y a las distintas formas de cooperación entre ellos.Se pretende (Sección 3.4) realizar una formalización los sistemas de membranas y construir una herramienta tipo software para la variante que usa cooperación no distribuida, el navegador de configuraciones, es decir, un simulador, en el cual el usuario selecciona la siguiente configuración entre todas las posibles, estando permitido volver hacia atrás. Se considerarán diversos modelos distribuidos. En el modelo de evolución y comunicación (Capítulo 4) separamos las reglas tipo-reescritura y las reglas de transporte (llamadas symport y antiport). Los sistemas de bombeo de protones (proton pumping, Secciones 4.8, 4.9) constituyen una variante de los sistemas de evolución y comunicación con un modo restrictivo de cooperación. Un modelo especial de computación con membranas es el modelo puramente comunicativo, en el cual los objetos traspasan juntos una membrana. Estudiamos la potencia computacional de las sistemas de membranas con symport/antiport de 2 o 3 objetos (Capítulo 5) y la potencia computacional de las sistemas de membranas con alfabeto limitado (Capítulo 6).El determinismo (Secciones 4.7, 5.5, etc.) es una característica especial (restrictiva) de los sistemas computacionales. Se pondrá especial énfasis en analizar si esta restricción reduce o no la potencia computacional de los mismos. Los resultados obtenidos para sistemas de bombeo del protones están transferidos (Sección 7.3) a sistemas con catalizadores bistabiles. Unos ejemplos de aplicación concreta de los sistemas de membranas (Secciones 7.1, 7.2) son la resolución de problemas NP-completos en tiempo polinomial y la resolución de problemas de ordenación.This thesis deals with membrane systems with symbol objects as a theoretical framework of distributed parallel multiset processing systems.A halting computation can accept, generate or process a number, a vector or a word, so the system globally defines (by the results of all its computations) a set of numbers or a set of vectors or a set of words, (i.e., a language), or a function. The ability of these systems to solve particular problems is investigated, as well as their computational power, e.g., the language families defined by different classes of these systems are compared to the classical ones, i.e., regular, context-free, languages generated by extended tabled 0L systems, languages generated by matrix grammars without appearance checking, recursively enumerable languages, etc. Special attention is paid to communication of objects between the regions and to the ways of cooperation between the objects.An attempt to formalize the membrane systems is made (Section 3.4), and a software tool is constructed for the non-distributed cooperative variant, the configuration browser, i.e., a simulator, where the user chooses the next configuration among the possible ones and can go back. Different distributed models are considered. In the evolution-communication model (Chapter 4) rewriting-like rules are separated from transport rules. Proton pumping systems (Sections 4.8, 4.9) are a variant of the evolution-communication systems with a restricted way of cooperation. A special membrane computing model is a purely communicative one: the objects are moved together through a membrane. We study the computational power of membrane systems with symport/antiport of 2 or 3 objects (Chapter 5) and the computational power of membrane systems with a limited alphabet (Chapter 6).Determinism (Sections 4.7, 5.5, etc.) is a special property of computational systems; the question of whether this restriction reduces the computational power is addressed. The results on proton pumping systems can be carried over (Section 7.3) to the systems with bi-stable catalysts. Some particular examples of membrane systems applications are solving NP-complete problems in polynomial time, and solving the sorting problem

    String Searching with Ranking Constraints and Uncertainty

    Get PDF
    Strings play an important role in many areas of computer science. Searching pattern in a string or string collection is one of the most classic problems. Different variations of this problem such as document retrieval, ranked document retrieval, dictionary matching has been well studied. Enormous growth of internet, large genomic projects, sensor networks, digital libraries necessitates not just efficient algorithms and data structures for the general string indexing, but indexes for texts with fuzzy information and support for queries with different constraints. This dissertation addresses some of these problems and proposes indexing solutions. One such variation is document retrieval query for included and excluded/forbidden patterns, where the objective is to retrieve all the relevant documents that contains the included patterns and does not contain the excluded patterns. We continue the previous work done on this problem and propose more efficient solution. We conjecture that any significant improvement over these results is highly unlikely. We also consider the scenario when the query consists of more than two patterns. The forbidden pattern problem suffers from the drawback that linear space (in words) solutions are unlikely to yield a solution better than O(root(n/occ)) per document reporting time, where n is the total length of the documents and occ is the number of output documents. Continuing this path, we introduce a new variation, namely document retrieval with forbidden extension query, where the forbidden pattern is an extension of the included pattern.We also address the more general top-k version of the problem, which retrieves the top k documents, where the ranking is based on PageRank relevance metric. This problem finds motivation from search applications. It also holds theoretical interest as we show that the hardness of forbidden pattern problem is alleviated in this problem. We achieve linear space and optimal query time for this variation. We also propose succinct indexes for both these problems. Position restricted pattern matching considers the scenario where only part of the text is searched. We propose succinct index for this problem with efficient query time. An important application for this problem stems from searching in genomic sequences, where only part of the gene sequence is searched for interesting patterns. The problem of computing discriminating(resp. generic) words is to report all minimal(resp. maximal) extensions of a query pattern which are contained in at most(resp. at least) a given number of documents. These problems are motivated from applications in computational biology, text mining and automated text classification. We propose succinct indexes for these problems. Strings with uncertainty and fuzzy information play an important role in increasingly many applications. We propose a general framework for indexing uncertain strings such that a deterministic query string can be searched efficiently. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable characters with associated probability of occurrence for each character. Such uncertain strings are prevalent in various applications such as biological sequence data, event monitoring and automatic ECG annotations. We consider two basic problems of string searching, namely substring searching and string listing. We formulate these well known problems for uncertain strings paradigm and propose exact and approximate solution for them. We also discuss a constrained variation of orthogonal range searching. Given a set of points, the task of orthogonal range searching is to build a data structure such that all the points inside a orthogonal query region can be reported. We introduce a new variation, namely shared constraint range searching which naturally arises in constrained pattern matching applications. Shared constraint range searching is a special four sided range reporting query problem where two constraints has sharing among them, effectively reducing the number of independent constraints. For this problem, we propose a linear space index that can match the best known bound for three dimensional dominance reporting problem. We extend our data structure in the external memory model

    Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS'09)

    Get PDF
    The Symposium on Theoretical Aspects of Computer Science (STACS) is held alternately in France and in Germany. The conference of February 26-28, 2009, held in Freiburg, is the 26th in this series. Previous meetings took place in Paris (1984), Saarbr¨ucken (1985), Orsay (1986), Passau (1987), Bordeaux (1988), Paderborn (1989), Rouen (1990), Hamburg (1991), Cachan (1992), W¨urzburg (1993), Caen (1994), M¨unchen (1995), Grenoble (1996), L¨ubeck (1997), Paris (1998), Trier (1999), Lille (2000), Dresden (2001), Antibes (2002), Berlin (2003), Montpellier (2004), Stuttgart (2005), Marseille (2006), Aachen (2007), and Bordeaux (2008). ..

    Transform Based And Search Aware Text Compression Schemes And Compressed Domain Text Retrieval

    Get PDF
    In recent times, we have witnessed an unprecedented growth of textual information via the Internet, digital libraries and archival text in many applications. While a good fraction of this information is of transient interest, useful information of archival value will continue to accumulate. We need ways to manage, organize and transport this data from one point to the other on data communications links with limited bandwidth. We must also have means to speedily find the information we need from this huge mass of data. Sometimes, a single site may also contain large collections of data such as a library database, thereby requiring an efficient search mechanism even to search within the local data. To facilitate the information retrieval, an emerging ad hoc standard for uncompressed text is XML which preprocesses the text by putting additional user defined metadata such as DTD or hyperlinks to enable searching with better efficiency and effectiveness. This increases the file size considerably, underscoring the importance of applying text compression. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible. Text compression is concerned with techniques for representing the digital text data in alternate representations that takes less space. Not only does it help conserve the storage space for archival and online data, it also helps system performance by requiring less number of secondary storage (disk or CD Rom) accesses and improves the network transmission bandwidth utilization by reducing the transmission time. Unlike static images or video, there is no international standard for text compression, although compressed formats like .zip, .gz, .Z files are increasingly being used. In general, data compression methods are classified as lossless or lossy. Lossless compression allows the original data to be recovered exactly. Although used primarily for text data, lossless compression algorithms are useful in special classes of images such as medical imaging, finger print data, astronomical images and data bases containing mostly vital numerical data, tables and text information. Many lossy algorithms use lossless methods at the final stage of the encoding stage underscoring the importance of lossless methods for both lossy and lossless compression applications. In order to be able to effectively utilize the full potential of compression techniques for the future retrieval systems, we need efficient information retrieval in the compressed domain. This means that techniques must be developed to search the compressed text without decompression or only with partial decompression independent of whether the search is done on the text or on some inversion table corresponding to a set of key words for the text. In this dissertation, we make the following contributions: (1) Star family compression algorithms: We have proposed an approach to develop a reversible transformation that can be applied to a source text that improves existing algorithm\u27s ability to compress. We use a static dictionary to convert the English words into predefined symbol sequences. These transformed sequences create additional context information that is superior to the original text. Thus we achieve some compression at the preprocessing stage. We have a series of transforms which improve the performance. Star transform requires a static dictionary for a certain size. To avoid the considerable complexity of conversion, we employ the ternary tree data structure that efficiently converts the words in the text to the words in the star dictionary in linear time. (2) Exact and approximate pattern matching in Burrows-Wheeler transformed (BWT) files: We proposed a method to extract the useful context information in linear time from the BWT transformed text. The auxiliary arrays obtained from BWT inverse transform brings logarithm search time. Meanwhile, approximate pattern matching can be performed based on the results of exact pattern matching to extract the possible candidate for the approximate pattern matching. Then fast verifying algorithm can be applied to those candidates which could be just small parts of the original text. We present algorithms for both k-mismatch and k-approximate pattern matching in BWT compressed text. A typical compression system based on BWT has Move-to-Front and Huffman coding stages after the transformation. We propose a novel approach to replace the Move-to-Front stage in order to extend compressed domain search capability all the way to the entropy coding stage. A modification to the Move-to-Front makes it possible to randomly access any part of the compressed text without referring to the part before the access point. (3) Modified LZW algorithm that allows random access and partial decoding for the compressed text retrieval: Although many compression algorithms provide good compression ratio and/or time complexity, LZW is the first one studied for the compressed pattern matching because of its simplicity and efficiency. Modifications on LZW algorithm provide the extra advantage for fast random access and partial decoding ability that is especially useful for text retrieval systems. Based on this algorithm, we can provide a dynamic hierarchical semantic structure for the text, so that the text search can be performed on the expected level of granularity. For example, user can choose to retrieve a single line, a paragraph, or a file, etc. that contains the keywords. More importantly, we will show that parallel encoding and decoding algorithm is trivial with the modified LZW. Both encoding and decoding can be performed with multiple processors easily and encoding and decoding process are independent with respect to the number of processors
    corecore