15 research outputs found
Data Structure Lower Bounds for Document Indexing Problems
We study data structure problems related to document indexing and pattern
matching queries and our main contribution is to show that the pointer machine
model of computation can be extremely useful in proving high and unconditional
lower bounds that cannot be obtained in any other known model of computation
with the current techniques. Often our lower bounds match the known space-query
time trade-off curve and in fact for all the problems considered, there is a
very good and reasonable match between the our lower bounds and the known upper
bounds, at least for some choice of input parameters. The problems that we
consider are set intersection queries (both the reporting variant and the
semi-group counting variant), indexing a set of documents for two-pattern
queries, or forbidden- pattern queries, or queries with wild-cards, and
indexing an input set of gapped-patterns (or two-patterns) to find those
matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016,
25 page
Makespan Scheduling of Unit Jobs with Precedence Constraints in time
In a classical scheduling problem, we are given a set of jobs of unitlength along with precedence constraints and the goal is to find a schedule ofthese jobs on identical machines that minimizes the makespan. This problemis well-known to be NP-hard for an unbounded number of machines. Using standard3-field notation, it is known as . We present an algorithm for this problem that runs in time.Before our work, even for machines the best known algorithms ran in time. In contrast, our algorithm works when the number ofmachines is unbounded. A crucial ingredient of our approach is an algorithmwith a runtime that is only single-exponential in the vertex cover of thecomparability graph of the precedence constraint graph. This heavily relies oninsights from a classical result by Dolev and Warmuth (Journal of Algorithms1984) for precedence graphs without long chains.<br
Makespan Scheduling of Unit Jobs with Precedence Constraints in time
In a classical scheduling problem, we are given a set of jobs of unit
length along with precedence constraints and the goal is to find a schedule of
these jobs on identical machines that minimizes the makespan. This problem
is well-known to be NP-hard for an unbounded number of machines. Using standard
3-field notation, it is known as .
We present an algorithm for this problem that runs in time.
Before our work, even for machines the best known algorithms ran in
time. In contrast, our algorithm works when the number of
machines is unbounded. A crucial ingredient of our approach is an algorithm
with a runtime that is only single-exponential in the vertex cover of the
comparability graph of the precedence constraint graph. This heavily relies on
insights from a classical result by Dolev and Warmuth (Journal of Algorithms
1984) for precedence graphs without long chains.Comment: 26 pages, 7 figure
Formalization of block pruning: reducing the number of cells computed in exact biological sequence comparison algorithms
This is a pre-copyedited, author-produced version of an article accepted for publication in The Computer Journal following peer review. The version of record Edans F O Sandes, George L M Teodoro, Maria Emilia M T Walter, Xavier Martorell, Eduard Ayguade, Alba C M A Melo; Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms, The Computer Journal, Volume 61, Issue 5, 1 May 2018, Pages 687–713 is available online at: The Computer Journal https://academic.oup.com/comjnl/article-abstract/61/5/687/4539903 and https://doi.org/10.1093/comjnl/bxx090.Biological sequence comparison algorithms that compute the optimal local and global alignments calculate a dynamic programming (DP) matrix with quadratic time complexity. The DP matrix H is calculated with a recurrence relation in which the value of each cell Hi,j is the result of a maximum operation on the cells’ values Hi-1,j-1, Hi-1,j and Hi,j-1 added or subtracted by a constant value. Therefore, it can be noticed that the difference between the value of cell Hi,j being calculated and the values of direct neighbor cells previously computed respect well-defined upper and lower bounds. Using these bounds, we can show that it is possible to determine the maximum and the minimum value of every cell in H, for a given reference cell. We use this result to define a generic pruning method which determines the cells that can pruned (i.e. no need to be computed since they will not contribute to the final solution), accelerating the computation but keeping the guarantee that the optimal result will be produced. The goal of this paper is thus to investigate and formalize properties of the DP matrix in order to estimate and increase the pruning method efficiency. We also show that the pruning efficiency depends mainly on three characteristics: (a) the order in which the cells of H are calculated, (b) the values of the parameters used in the recurrence relation and (c) the contents of the sequences compared.Peer ReviewedPostprint (author's final draft
Almost Every Simply Typed Lambda-Term Has a Long Beta-Reduction Sequence
It is well known that the length of a beta-reduction sequence of a simply
typed lambda-term of order k can be huge; it is as large as k-fold exponential
in the size of the lambda-term in the worst case. We consider the following
relevant question about quantitative properties, instead of the worst case: how
many simply typed lambda-terms have very long reduction sequences? We provide a
partial answer to this question, by showing that asymptotically almost every
simply typed lambda-term of order k has a reduction sequence as long as
(k-1)-fold exponential in the term size, under the assumption that the arity of
functions and the number of variables that may occur in every subterm are
bounded above by a constant. To prove it, we have extended the infinite monkey
theorem for strings to a parametrized one for regular tree languages, which may
be of independent interest. The work has been motivated by quantitative
analysis of the complexity of higher-order model checking
Statistical properties of lambda terms
We present a quantitative, statistical analysis of random lambda terms in the
de Bruijn notation. Following an analytic approach using multivariate
generating functions, we investigate the distribution of various combinatorial
parameters of random open and closed lambda terms, including the number of
redexes, head abstractions, free variables or the de Bruijn index value
profile. Moreover, we conduct an average-case complexity analysis of finding
the leftmost-outermost redex in random lambda terms showing that it is on
average constant. The main technical ingredient of our analysis is a novel
method of dealing with combinatorial parameters inside certain infinite,
algebraic systems of multivariate generating functions. Finally, we briefly
discuss the random generation of lambda terms following a given skewed
parameter distribution and provide empirical results regarding a series of more
involved combinatorial parameters such as the number of open subterms and
binding abstractions in closed lambda terms.Comment: Major revision of section 5. In particular, proofs of Lemma 5.7 and
Theorem 5.
Statistical properties of lambda terms
We present a quantitative, statistical analysis of random lambda terms in the De Bruijn notation. Following an analytic approach using multivariate generat-ing functions, we investigate the distribution of various combinatorial parameters of random open and closed lambda terms, including the number of redexes, head abstractions, free variables or the De Bruijn index value profile. Moreover, we con-duct an average-case complexity analysis of finding the leftmost-outermost redex in random lambda terms showing that it is on average constant. The main technical
ingredient of our analysis is a novel method of dealing with combinatorial paramet-ers inside certain infinite, algebraic systems of multivariate generating functions. Finally, we briefly discuss the random generation of lambda terms following a given skewed parameter distribution and provide empirical results regarding a series of more involved combinatorial parameters such as the number of open subterms and binding abstractions in closed lambda terms
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Ensuring alignment, which refers to making models behave in accordance with
human intentions [1,2], has become a critical task before deploying large
language models (LLMs) in real-world applications. For instance, OpenAI devoted
six months to iteratively aligning GPT-4 before its release [3]. However, a
major challenge faced by practitioners is the lack of clear guidance on
evaluating whether LLM outputs align with social norms, values, and
regulations. This obstacle hinders systematic iteration and deployment of LLMs.
To address this issue, this paper presents a comprehensive survey of key
dimensions that are crucial to consider when assessing LLM trustworthiness. The
survey covers seven major categories of LLM trustworthiness: reliability,
safety, fairness, resistance to misuse, explainability and reasoning, adherence
to social norms, and robustness. Each major category is further divided into
several sub-categories, resulting in a total of 29 sub-categories.
Additionally, a subset of 8 sub-categories is selected for further
investigation, where corresponding measurement studies are designed and
conducted on several widely-used LLMs. The measurement results indicate that,
in general, more aligned models tend to perform better in terms of overall
trustworthiness. However, the effectiveness of alignment varies across the
different trustworthiness categories considered. This highlights the importance
of conducting more fine-grained analyses, testing, and making continuous
improvements on LLM alignment. By shedding light on these key dimensions of LLM
trustworthiness, this paper aims to provide valuable insights and guidance to
practitioners in the field. Understanding and addressing these concerns will be
crucial in achieving reliable and ethically sound deployment of LLMs in various
applications
Distributed Systems and Mobile Computing
The book is about Distributed Systems and Mobile Computing. This is a branch of Computer Science devoted to the study of systems whose components are in different physical locations and have limited communication capabilities. Such components may be static, often organized in a network, or may be able to move in a discrete or continuous environment. The theoretical study of such systems has applications ranging from swarms of mobile robots (e.g., drones) to sensor networks, autonomous intelligent vehicles, the Internet of Things, and crawlers on the Web. The book includes five articles. Two of them are about networks: the first one studies the formation of networks by agents that interact randomly and have the ability to form connections; the second one is a study of clustering models and algorithms. The three remaining articles are concerned with autonomous mobile robots operating in continuous space. One article studies the classical gathering problem, where all robots have to reach a common location, and proposes a fast algorithm for robots that are endowed with a compass but have limited visibility. The last two articles deal with the evacuations problem, where two robots have to locate an exit point and evacuate a region in the shortest possible time