199 research outputs found
Fair Rank Aggregation
Ranking algorithms find extensive usage in diverse areas such as web search,
employment, college admission, voting, etc. The related rank aggregation
problem deals with combining multiple rankings into a single aggregate ranking.
However, algorithms for both these problems might be biased against some
individuals or groups due to implicit prejudice or marginalization in the
historical data. We study ranking and rank aggregation problems from a fairness
or diversity perspective, where the candidates (to be ranked) may belong to
different groups and each group should have a fair representation in the final
ranking. We allow the designer to set the parameters that define fair
representation. These parameters specify the allowed range of the number of
candidates from a particular group in the top- positions of the ranking.
Given any ranking, we provide a fast and exact algorithm for finding the
closest fair ranking for the Kendall tau metric under block-fairness. We also
provide an exact algorithm for finding the closest fair ranking for the Ulam
metric under strict-fairness, when there are only number of groups. Our
algorithms are simple, fast, and might be extendable to other relevant metrics.
We also give a novel meta-algorithm for the general rank aggregation problem
under the fairness framework. Surprisingly, this meta-algorithm works for any
generalized mean objective (including center and median problems) and any
fairness criteria. As a byproduct, we obtain 3-approximation algorithms for
both center and median problems, under both Kendall tau and Ulam metrics.
Furthermore, using sophisticated techniques we obtain a
-approximation algorithm, for a constant , for
the Ulam metric under strong fairness.Comment: A preliminary version of this paper appeared in NeurIPS 202
On the role of metaheuristic optimization in bioinformatics
Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics
Multivariate Fine-Grained Complexity of Longest Common Subsequence
We revisit the classic combinatorial pattern matching problem of finding a
longest common subsequence (LCS). For strings and of length , a
textbook algorithm solves LCS in time , but although much effort has
been spent, no -time algorithm is known. Recent work
indeed shows that such an algorithm would refute the Strong Exponential Time
Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann,
K\"unnemann FOCS'15].
Despite the quadratic-time barrier, for over 40 years an enduring scientific
interest continued to produce fast algorithms for LCS and its variations.
Particular attention was put into identifying and exploiting input parameters
that yield strongly subquadratic time algorithms for special cases of interest,
e.g., differential file comparison. This line of research was successfully
pursued until 1990, at which time significant improvements came to a halt. In
this paper, using the lens of fine-grained complexity, our goal is to (1)
justify the lack of further improvements and (2) determine whether some special
cases of LCS admit faster algorithms than currently known.
To this end, we provide a systematic study of the multivariate complexity of
LCS, taking into account all parameters previously discussed in the literature:
the input size , the length of the shorter string
, the length of an LCS of and , the numbers of
deletions and , the alphabet size, as well as
the numbers of matching pairs and dominant pairs . For any class of
instances defined by fixing each parameter individually to a polynomial in
terms of the input size, we prove a SETH-based lower bound matching one of
three known algorithms. Specifically, we determine the optimal running time for
LCS under SETH as .
[...]Comment: Presented at SODA'18. Full Version. 66 page
Supernode Transformation On Parallel Systems With Distributed Memory – An Analytical Approach
Supernode transformation, or tiling, is a technique that partitions algorithms to improve data locality and parallelism by balancing computation and inter-processor communication costs to achieve shortest execution or running time. It groups multiple iterations of nested loops into supernodes to be assigned to processors for processing in parallel. A supernode transformation can be described by supernode size and shape. This research focuses on supernode transformation on multi-processor architectures with distributed memory, including computer cluster systems and General Purpose Graphic Processing Units (GPGPUs). The research involves supernode scheduling, supernode mapping to processors, and the finding of the optimal supernode size, for achieving the shortest total running time. The algorithms considered are two nested loops with regular data dependencies. The Longest Common Subsequence problem is used as an illustration. A novel mathematical model for the total running time is established as a function of the supernode size, algorithm parameters such as the problem size and the data dependence, the computation time of each loop iteration, architecture parameters such as the number of processors, and the communication cost. The optimal supernode size is derived from this closed form model. The model and the optimal supernode size provide better results than previous researches and are verified by simulations on multi-processor systems including computer cluster systems and GPGPUs
Multivariate Fine-Grained Complexity of Longest Common Subsequence
We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings and of length , a textbook algorithm solves LCS in time , but although much effort has been spent, no -time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size , the length of the shorter string , the length of an LCS of and , the numbers of deletions and , the alphabet size, as well as the numbers of matching pairs and dominant pairs . For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as . [...
Complementarity methods in the analysis of piecewise linear dynamical systems
The main object of this thesis is a class of piecewise linear dynamical systems that are related both to system theory and to mathematical programming. The dynamical systems in this class are known as complementarity systems. With regard to these nonlinear and nonsmooth dynamical systems, the research in the thesis concentrates on two themes: well-posedness and approximations. The well-posedness issue, in the sense of existence and uniqueness of solutions, is of considerable importance from a model validation point of view. In the thesis, sufficient conditions are established for the well-posedness of complementarity systems. Furthermore, an investigation is made of the convergence of approximations of these systems with an eye towards simulation
Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations
Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences\u27 structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes.
In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity.
In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm
- …