22 research outputs found

    Algorithms and Data Structures for Coding, Indexing, and Mining of Sequential Data

    Get PDF
    In recent years, the production of sequential data has been rapidly increasing. This requires solving challenging problems about how to represent information, how to retrieve information, and how to extract knowledge, from sequential data. These questions belong to the areas of coding, indexing, and mining, respectively. In this thesis, we investigate problems from those three areas. Coding refers to the way in which information is represented. Coding aims at generating optimal codes, that are codes having a minimum expected length. Codes can be generated for different purposes, from data compression to error detection/correction. The Lempel-Ziv 77 parsing produces an asymptotically optimal code in terms of compression. We study algorithms to efficiently decompress strings from the Lempel-Ziv 77 parsing, using memory proportional to the size of the parsing itself. We provide the first implementation of an algorithm by Bille et al., the only work we are aware of on this problem. We present a practical evaluation of this approach and several optimizations which improve the performance on all datasets we tested. Through the Ulam-R{'e}nyi game, it is possible to provide optimal adaptive error-correcting codes. The game consists of discovering an unknown mm-bit number by asking membership questions the answers to which can be erroneous. Questions are formulated knowing the answers to all previous ones. We want to find an optimal strategy, i.e., a strategy that can identify any mm-bit number using the theoretical minimum number of questions. We studied the case where questions are a union of up to a fixed number of intervals, and up to three answers can be erroneous. We first show that for any sufficiently large mm, there exists a strategy to identify an initially unknown mm-bit number which uses at most four intervals per question. We further refine our main tool to turn the above asymptotic result into a complete characterization of those instances of the Ulam-R{'e}nyi game that admit optimal strategies. Indexing refers to the way in which information is retrieved. An index for texts permits finding all occurrences of any substring, without traversing the whole text. Many applications require to look for approximate substrings. One of these is the problem of jumbled pattern matching, where two strings match if one is a permutation of the other. We study combinatorial aspects of prefix normal words, a class of binary words introduced in this context. These words can be used as indices for the Indexed Binary Jumbled Pattern Matching problem. We present a new recursive generation algorithm for prefix normal words that is competitive with the previous one but allows to list all prefix normal words sharing the same prefix. This sheds lights on novel insights that may help solving the problem of counting the number of prefix normal words of a given length. We then introduce infinite prefix normal words, and we show that one of the operations used by the algorithm, when repeatedly applied to extend a word, produces an infinite prefix normal word. This motivates the seeking for other operations that produce infinite prefix normal words. We found that one of these operations establishes a connection between prefix normal words and Sturmian words. We also explored the relationship between prefix normal words and Abelian complexity, as well as between prefix normal words and lexicographic order. Mining refers to the way in which information is converted into knowledge. The process of knowledge discovery covers several processing steps, including knowledge extraction. We analyze the problem of mining assertions for an embedded system from its simulation traces. This problem can be modeled as a pattern discovery problem on colored strings. We present two problems of pattern discovery on colored strings: patterns for one color only, or for all colors at the same time. We present two suffix tree-based algorithms. The first algorithm solves both the one color problem and the all colors problem. We then, introduce modifications which improve performance of the algorithm both on synthetic and on real data. We implemented and evaluated the proposed approaches, highlighting time trade-offs that can be obtained. A different way of knowledge extraction is based on the information-theoretic perspective of Pearl's model of causality. It has been postulated that the true causality direction between two phenomena A and B is related to the problem of finding the minimum entropy joint distribution between A and B. This problem is known to be NP-hard, and greedy algorithms have recently been proposed. We provide a novel analysis of one of the proposed heuristic showing that this algorithm guarantees an additive approximation of 1 bit. We then, provide a general criterion for guaranteeing an additive approximation factor of 1. This criterion may be of independent interest in other contexts where couplings are used

    Microscopy as a statistical, RĂ©nyi-Ulam, half-lie game: a new heuristic search strategy to accelerate imaging

    Get PDF
    Finding a fluorescent target in a biological environment is a common and pressing microscopy problem. This task is formally analogous to the canonical search problem. In ideal (noise-free, truthful) search problems, the well-known binary search is optimal. The case of half-lies, where one of two responses to a search query may be deceptive, introduces a richer, Rényi-Ulam problem and is particularly relevant to practical microscopy. We analyse microscopy in the contexts of Rényi-Ulam games and half-lies, developing a new family of heuristics. We show the cost of insisting on verification by positive result in search algorithms; for the zero-half-lie case bisectioning with verification incurs a 50% penalty in the average number of queries required. The optimal partitioning of search spaces directly following verification in the presence of random half-lies is determined. Trisectioning with verification is shown to be the most efficient heuristic of the family in a majority of cases

    Error-Tolerant Exact Query Learning of Finite Set Partitions with Same-Cluster Oracle

    Full text link
    This paper initiates the study of active learning for exact recovery of partitions exclusively through access to a same-cluster oracle in the presence of bounded adversarial error. We first highlight a novel connection between learning partitions and correlation clustering. Then we use this connection to build a R\'enyi-Ulam style analytical framework for this problem, and prove upper and lower bounds on its worst-case query complexity. Further, we bound the expected performance of a relevant randomized algorithm. Finally, we study the relationship between adaptivity and query complexity for this problem and related variants.Comment: 28 pages, 2 figure

    Modelling the structure of complex networks

    Get PDF

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Analysis and Control of Socio-Cultural Opinion Evolution in Complex Social Systems

    Get PDF
    The overarching goal of this thesis is to further our understanding about opinion evolution in networked societies. Such insights can be used in a variety of fields such as economy, marketing, transportation, egress, etc. Three main subjects build up this interdisciplinary research: Sociology, Statistical Mechanics, and Network Sciences. In this thesis, for macrolevel (or society-level) analyses, techniques from statistical mechanics have been borrowed to mathematically model the opinion dynamic on different network topologies based on different interaction models. Also, for micro-level (individual-level) analyses, Individual Decision Making Algorithms (IDMA) have been designed. To account for both macro-level and micro-level dynamics, these two regimes are combined resulting in a more accurate model for opinion propagation. Assessing the controllability of such dynamics through experiments in presence of actual humans is the part of this thesis

    29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

    Get PDF

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF
    corecore