15 research outputs found

    Data Structure Lower Bounds for Document Indexing Problems

    Get PDF
    We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful in proving high and unconditional lower bounds that cannot be obtained in any other known model of computation with the current techniques. Often our lower bounds match the known space-query time trade-off curve and in fact for all the problems considered, there is a very good and reasonable match between the our lower bounds and the known upper bounds, at least for some choice of input parameters. The problems that we consider are set intersection queries (both the reporting variant and the semi-group counting variant), indexing a set of documents for two-pattern queries, or forbidden- pattern queries, or queries with wild-cards, and indexing an input set of gapped-patterns (or two-patterns) to find those matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016, 25 page

    Makespan Scheduling of Unit Jobs with Precedence Constraints in O(1.995n)O(1.995^n) time

    Get PDF
    In a classical scheduling problem, we are given a set of nn jobs of unitlength along with precedence constraints and the goal is to find a schedule ofthese jobs on mm identical machines that minimizes the makespan. This problemis well-known to be NP-hard for an unbounded number of machines. Using standard3-field notation, it is known as Pprec,pj=1CmaxP|\text{prec}, p_j=1|C_{\max}. We present an algorithm for this problem that runs in O(1.995n)O(1.995^n) time.Before our work, even for m=3m=3 machines the best known algorithms ran inO(2n)O^\ast(2^n) time. In contrast, our algorithm works when the number ofmachines mm is unbounded. A crucial ingredient of our approach is an algorithmwith a runtime that is only single-exponential in the vertex cover of thecomparability graph of the precedence constraint graph. This heavily relies oninsights from a classical result by Dolev and Warmuth (Journal of Algorithms1984) for precedence graphs without long chains.<br

    Makespan Scheduling of Unit Jobs with Precedence Constraints in O(1.995n)O(1.995^n) time

    Full text link
    In a classical scheduling problem, we are given a set of nn jobs of unit length along with precedence constraints and the goal is to find a schedule of these jobs on mm identical machines that minimizes the makespan. This problem is well-known to be NP-hard for an unbounded number of machines. Using standard 3-field notation, it is known as Pprec,pj=1CmaxP|\text{prec}, p_j=1|C_{\max}. We present an algorithm for this problem that runs in O(1.995n)O(1.995^n) time. Before our work, even for m=3m=3 machines the best known algorithms ran in O(2n)O^\ast(2^n) time. In contrast, our algorithm works when the number of machines mm is unbounded. A crucial ingredient of our approach is an algorithm with a runtime that is only single-exponential in the vertex cover of the comparability graph of the precedence constraint graph. This heavily relies on insights from a classical result by Dolev and Warmuth (Journal of Algorithms 1984) for precedence graphs without long chains.Comment: 26 pages, 7 figure

    Formalization of block pruning: reducing the number of cells computed in exact biological sequence comparison algorithms

    Get PDF
    This is a pre-copyedited, author-produced version of an article accepted for publication in The Computer Journal following peer review. The version of record Edans F O Sandes, George L M Teodoro, Maria Emilia M T Walter, Xavier Martorell, Eduard Ayguade, Alba C M A Melo; Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms, The Computer Journal, Volume 61, Issue 5, 1 May 2018, Pages 687–713 is available online at: The Computer Journal https://academic.oup.com/comjnl/article-abstract/61/5/687/4539903 and https://doi.org/10.1093/comjnl/bxx090.Biological sequence comparison algorithms that compute the optimal local and global alignments calculate a dynamic programming (DP) matrix with quadratic time complexity. The DP matrix H is calculated with a recurrence relation in which the value of each cell Hi,j is the result of a maximum operation on the cells’ values Hi-1,j-1, Hi-1,j and Hi,j-1 added or subtracted by a constant value. Therefore, it can be noticed that the difference between the value of cell Hi,j being calculated and the values of direct neighbor cells previously computed respect well-defined upper and lower bounds. Using these bounds, we can show that it is possible to determine the maximum and the minimum value of every cell in H, for a given reference cell. We use this result to define a generic pruning method which determines the cells that can pruned (i.e. no need to be computed since they will not contribute to the final solution), accelerating the computation but keeping the guarantee that the optimal result will be produced. The goal of this paper is thus to investigate and formalize properties of the DP matrix in order to estimate and increase the pruning method efficiency. We also show that the pruning efficiency depends mainly on three characteristics: (a) the order in which the cells of H are calculated, (b) the values of the parameters used in the recurrence relation and (c) the contents of the sequences compared.Peer ReviewedPostprint (author's final draft

    Almost Every Simply Typed Lambda-Term Has a Long Beta-Reduction Sequence

    Full text link
    It is well known that the length of a beta-reduction sequence of a simply typed lambda-term of order k can be huge; it is as large as k-fold exponential in the size of the lambda-term in the worst case. We consider the following relevant question about quantitative properties, instead of the worst case: how many simply typed lambda-terms have very long reduction sequences? We provide a partial answer to this question, by showing that asymptotically almost every simply typed lambda-term of order k has a reduction sequence as long as (k-1)-fold exponential in the term size, under the assumption that the arity of functions and the number of variables that may occur in every subterm are bounded above by a constant. To prove it, we have extended the infinite monkey theorem for strings to a parametrized one for regular tree languages, which may be of independent interest. The work has been motivated by quantitative analysis of the complexity of higher-order model checking

    Statistical properties of lambda terms

    Get PDF
    We present a quantitative, statistical analysis of random lambda terms in the de Bruijn notation. Following an analytic approach using multivariate generating functions, we investigate the distribution of various combinatorial parameters of random open and closed lambda terms, including the number of redexes, head abstractions, free variables or the de Bruijn index value profile. Moreover, we conduct an average-case complexity analysis of finding the leftmost-outermost redex in random lambda terms showing that it is on average constant. The main technical ingredient of our analysis is a novel method of dealing with combinatorial parameters inside certain infinite, algebraic systems of multivariate generating functions. Finally, we briefly discuss the random generation of lambda terms following a given skewed parameter distribution and provide empirical results regarding a series of more involved combinatorial parameters such as the number of open subterms and binding abstractions in closed lambda terms.Comment: Major revision of section 5. In particular, proofs of Lemma 5.7 and Theorem 5.

    Statistical properties of lambda terms

    Get PDF
    We present a quantitative, statistical analysis of random lambda terms in the De Bruijn notation. Following an analytic approach using multivariate generat-ing functions, we investigate the distribution of various combinatorial parameters of random open and closed lambda terms, including the number of redexes, head abstractions, free variables or the De Bruijn index value profile. Moreover, we con-duct an average-case complexity analysis of finding the leftmost-outermost redex in random lambda terms showing that it is on average constant. The main technical ingredient of our analysis is a novel method of dealing with combinatorial paramet-ers inside certain infinite, algebraic systems of multivariate generating functions. Finally, we briefly discuss the random generation of lambda terms following a given skewed parameter distribution and provide empirical results regarding a series of more involved combinatorial parameters such as the number of open subterms and binding abstractions in closed lambda terms

    Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

    Full text link
    Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications

    Distributed Systems and Mobile Computing

    Get PDF
    The book is about Distributed Systems and Mobile Computing. This is a branch of Computer Science devoted to the study of systems whose components are in different physical locations and have limited communication capabilities. Such components may be static, often organized in a network, or may be able to move in a discrete or continuous environment. The theoretical study of such systems has applications ranging from swarms of mobile robots (e.g., drones) to sensor networks, autonomous intelligent vehicles, the Internet of Things, and crawlers on the Web. The book includes five articles. Two of them are about networks: the first one studies the formation of networks by agents that interact randomly and have the ability to form connections; the second one is a study of clustering models and algorithms. The three remaining articles are concerned with autonomous mobile robots operating in continuous space. One article studies the classical gathering problem, where all robots have to reach a common location, and proposes a fast algorithm for robots that are endowed with a compass but have limited visibility. The last two articles deal with the evacuations problem, where two robots have to locate an exit point and evacuate a region in the shortest possible time
    corecore