199 research outputs found

    Fair Rank Aggregation

    Full text link
    Ranking algorithms find extensive usage in diverse areas such as web search, employment, college admission, voting, etc. The related rank aggregation problem deals with combining multiple rankings into a single aggregate ranking. However, algorithms for both these problems might be biased against some individuals or groups due to implicit prejudice or marginalization in the historical data. We study ranking and rank aggregation problems from a fairness or diversity perspective, where the candidates (to be ranked) may belong to different groups and each group should have a fair representation in the final ranking. We allow the designer to set the parameters that define fair representation. These parameters specify the allowed range of the number of candidates from a particular group in the top-kk positions of the ranking. Given any ranking, we provide a fast and exact algorithm for finding the closest fair ranking for the Kendall tau metric under block-fairness. We also provide an exact algorithm for finding the closest fair ranking for the Ulam metric under strict-fairness, when there are only O(1)O(1) number of groups. Our algorithms are simple, fast, and might be extendable to other relevant metrics. We also give a novel meta-algorithm for the general rank aggregation problem under the fairness framework. Surprisingly, this meta-algorithm works for any generalized mean objective (including center and median problems) and any fairness criteria. As a byproduct, we obtain 3-approximation algorithms for both center and median problems, under both Kendall tau and Ulam metrics. Furthermore, using sophisticated techniques we obtain a (3ε)(3-\varepsilon)-approximation algorithm, for a constant ε>0\varepsilon>0, for the Ulam metric under strong fairness.Comment: A preliminary version of this paper appeared in NeurIPS 202

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Multivariate Fine-Grained Complexity of Longest Common Subsequence

    Full text link
    We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings xx and yy of length nn, a textbook algorithm solves LCS in time O(n2)O(n^2), but although much effort has been spent, no O(n2ε)O(n^{2-\varepsilon})-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size n:=max{x,y}n:=\max\{|x|,|y|\}, the length of the shorter string m:=min{x,y}m:=\min\{|x|,|y|\}, the length LL of an LCS of xx and yy, the numbers of deletions δ:=mL\delta := m-L and Δ:=nL\Delta := n-L, the alphabet size, as well as the numbers of matching pairs MM and dominant pairs dd. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as (n+min{d,δΔ,δm})1±o(1)(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}. [...]Comment: Presented at SODA'18. Full Version. 66 page

    A Problem-based Curriculum for Algorithmic Programming

    Get PDF

    Linear complementarity systems : a study in hybrid dynamics

    Get PDF

    Supernode Transformation On Parallel Systems With Distributed Memory – An Analytical Approach

    Get PDF
    Supernode transformation, or tiling, is a technique that partitions algorithms to improve data locality and parallelism by balancing computation and inter-processor communication costs to achieve shortest execution or running time. It groups multiple iterations of nested loops into supernodes to be assigned to processors for processing in parallel. A supernode transformation can be described by supernode size and shape. This research focuses on supernode transformation on multi-processor architectures with distributed memory, including computer cluster systems and General Purpose Graphic Processing Units (GPGPUs). The research involves supernode scheduling, supernode mapping to processors, and the finding of the optimal supernode size, for achieving the shortest total running time. The algorithms considered are two nested loops with regular data dependencies. The Longest Common Subsequence problem is used as an illustration. A novel mathematical model for the total running time is established as a function of the supernode size, algorithm parameters such as the problem size and the data dependence, the computation time of each loop iteration, architecture parameters such as the number of processors, and the communication cost. The optimal supernode size is derived from this closed form model. The model and the optimal supernode size provide better results than previous researches and are verified by simulations on multi-processor systems including computer cluster systems and GPGPUs

    Multivariate Fine-Grained Complexity of Longest Common Subsequence

    No full text
    We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings xx and yy of length nn, a textbook algorithm solves LCS in time O(n2)O(n^2), but although much effort has been spent, no O(n2ε)O(n^{2-\varepsilon})-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size n:=max{x,y}n:=\max\{|x|,|y|\}, the length of the shorter string m:=min{x,y}m:=\min\{|x|,|y|\}, the length LL of an LCS of xx and yy, the numbers of deletions δ:=mL\delta := m-L and Δ:=nL\Delta := n-L, the alphabet size, as well as the numbers of matching pairs MM and dominant pairs dd. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as (n+min{d,δΔ,δm})1±o(1)(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}. [...

    Complementarity methods in the analysis of piecewise linear dynamical systems

    Get PDF
    The main object of this thesis is a class of piecewise linear dynamical systems that are related both to system theory and to mathematical programming. The dynamical systems in this class are known as complementarity systems. With regard to these nonlinear and nonsmooth dynamical systems, the research in the thesis concentrates on two themes: well-posedness and approximations. The well-posedness issue, in the sense of existence and uniqueness of solutions, is of considerable importance from a model validation point of view. In the thesis, sufficient conditions are established for the well-posedness of complementarity systems. Furthermore, an investigation is made of the convergence of approximations of these systems with an eye towards simulation

    Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

    Get PDF
    Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences\u27 structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm
    corecore