Search CORE

30 research outputs found

An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

Author: Ailon Nir
Publication venue
Publication date: 17/05/2011
Field of study

We study the problem of learning to rank from pairwise preferences, and solve a long-standing open problem that has led to development of many heuristics but no provable results for our particular problem. Given a set

V

n

elements, we wish to linearly order them given pairwise preference labels. A pairwise preference label is obtained as a response, typically from a human, to the question "which if preferred, u or v?

for two elements

u,v\in V

. We assume possible non-transitivity paradoxes which may arise naturally due to human mistakes or irrationality. The goal is to linearly order the elements from the most preferred to the least preferred, while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The loss and the query complexity (number of pairwise preference labels we obtain). This is a typical learning problem, with the exception that the space from which the pairwise preferences is drawn is finite, consisting of

{n\choose 2}$ possibilities only. We present an active learning algorithm for this problem, with query bounds significantly beating general (non active) bounds for the same error guarantee, while almost achieving the information theoretical lower bound. Our main construct is a decomposition of the input s.t. (i) each block incurs high loss at optimum, and (ii) the optimal solution respecting the decomposition is not much worse than the true opt. The decomposition is done by adapting a recent result by Kenyon and Schudy for a related combinatorial optimization problem to the query efficient setting. We thus settle an open problem posed by learning-to-rank theoreticians and practitioners: What is a provably correct way to sample preference labels? To further show the power and practicality of our solution, we show how to use it in concert with an SVM relaxation.Comment: Fixed a tiny error in theorem 3.1 statemen

arXiv.org e-Print Archive

CiteSeerX

An Efficient Semi-Streaming PTAS for Tournament Feedback Arc Set with Few Passes

Author: Baweja Anubhav
Jia Justin
Woodruff David P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 15/07/2021
Field of study

We present the first semi-streaming polynomial-time approximation scheme (PTAS) for the minimum feedback arc set problem on directed tournaments in a small number of passes. Namely, we obtain a (1 + ?)-approximation in time O (poly(n) 2^{poly(1/?)}), with p passes, in n^{1+1/p} ? poly((log n)/?) space. The only previous algorithm with this pass/space trade-off gave a 3-approximation (SODA, 2020), and other polynomial-time algorithms which achieved a (1+?)-approximation did so with quadratic memory or with a linear number of passes. We also present a new time/space trade-off for 1-pass algorithms that solve the tournament feedback arc set problem. This problem has several applications in machine learning such as creating linear classifiers and doing Bayesian inference. We also provide several additional algorithms and lower bounds for related streaming problems on directed graphs, which is a largely unexplored territory

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

LIPIcs, Volume 274, ESA 2023, Complete Volume

Author: Farach-Colton Martin
Herman Grzegorz
Puglisi Simon J.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 274, ESA 2023, Complete Volum

Dagstuhl Research Online Publication Server

Space-Efficient Algorithms and Verification Schemes for Graph Streams

Author: Ghosh Prantar
Publication venue: Dartmouth Digital Commons
Publication date: 11/06/2022
Field of study

Structured data-sets are often easy to represent using graphs. The prevalence of massive data-sets in the modern world gives rise to big graphs such as web graphs, social networks, biological networks, and citation graphs. Most of these graphs keep growing continuously and pose two major challenges in their processing: (a) it is infeasible to store them entirely in the memory of a regular server, and (b) even if stored entirely, it is incredibly inefficient to reread the whole graph every time a new query appears. Thus, a natural approach for efficiently processing and analyzing such graphs is reading them as a stream of edge insertions and deletions and maintaining a summary that can be (a) stored in affordable memory (significantly smaller than the input size) and (b) used to detect properties of the original graph. In this thesis, we explore the strengths and limitations of such graph streaming algorithms under three main paradigms: classical or standard streaming, adversarially robust streaming, and streaming verification. In the classical streaming model, an algorithm needs to process an adversarially chosen input stream using space sublinear in the input size and return a desired output at the end of the stream. Here, we study a collection of fundamental directed graph problems like reachability, acyclicity testing, and topological sorting. Our investigation reveals that while most problems are provably hard for general digraphs, they admit efficient algorithms for the special and widely-studied subclass of tournament graphs. Further, we exhibit certain problems that become drastically easier when the stream elements arrive in random order rather than adversarial order, as well as problems that do not get much easier even under this relaxation. Furthermore, we study the graph coloring problem in this model and design color-efficient algorithms using novel parameterizations and establish complexity separations between different versions of the problem. The classical streaming setting assumes that the entire input stream is fixed by an adversary before the algorithm reads it. Many randomized algorithms in this setting, however, fail when the stream is extended by an adaptive adversary based on past outputs received. This is the so-called adversarially robust streaming model. We show that graph coloring is significantly harder in the robust setting than in the classical setting, thus establishing the first such separation for a ``natural\u27\u27 problem. We also design a class of efficient robust coloring algorithms using novel techniques. In classical streaming, many important problems turn out to be ``intractable\u27\u27, i.e., provably impossible to solve in sublinear space. It is then natural to consider an enhanced streaming setting where a space-bounded client outsources the computation to a space-unbounded but untrusted cloud service, who replies with the solution and a supporting ``proof\u27\u27 that the client needs to verify. This is called streaming verification or the annotated streaming model. It allows algorithms or verification schemes for the otherwise intractable problems using both space and proof length sublinear in the input size. We devise efficient schemes that improve upon the state of the art for a variety of fundamental graph problems including triangle counting, maximum matching, topological sorting, maximal independent set, graph connectivity, and shortest paths, as well as for computing frequency-based functions such as distinct items and maximum frequency, which have broad applications in graph streaming. Some of our schemes were conjectured to be impossible, while some others attain smooth and optimal tradeoffs between space and communication costs

Dartmouth Digital Commons (Dartmouth College)

Recommended from our members

Massive Graph Analysis in the Data Stream Model

Author: Vorotnikova Sofya
Publication venue: ScholarWorks@UMass Amherst
Publication date: 02/07/2019
Field of study

Graphs have become an abstraction of choice in modeling highly-structured data. The need to compute graph-theoretic properties of datasets arises in many applications that involve entities and pairwise relations between them. However, in practice the datasets in question can be too large to be stored in main memory, distributed across many machines, or changing over time. Moreover, in an increasing number of applications the algorithm has to make real time decisions as the data arrives, which puts further limitations on the time and space that can realistically be used. These characteristics render classical algorithmic approaches obsolete and necessitate the development of new techniques. The streaming model of computation takes these challenges into account, providing a trade-off between the resources used by the algorithm and its accuracy. A graph stream is defined by a sequence of edge insertions (and sometimes deletions) into an initially empty graph. The objective is to compute a certain property of the graph at the end of the stream while minimizing the amount of space the algorithm uses. In this model, we explore fundamental graph-theoretic problems that also serve as important primitives in massive graph analysis. Our results can be divided into three main categories: Finding large matchings and related problems. We describe two optimal algorithms for finding large matchings in dynamic (insert-delete) graph streams---an approximation of an arbitrary maximum matching and an exact algorithm under the assumption that the matching is of certain size. We also show how the techniques developed in these algorithms can be used to solve a variety of related problems such as vertex cover and hitting set in hypergraphs. We then concentrate on estimating just the size of the matching and present a series of sublinear results for the class of low arboricity graphs. Counting the number of cycles. We fully resolve in which settings there exist algorithms approximating the number of fixed length cycles that do not store the entire graph. For cycles of length five or greater, we show that no such algorithms exist. For triangles and four-cycles, we describe several counting results and a few lower bounds for the insert-only model, considering such parameters as the number of passes taken over the stream and its ordering. Vertex ordering problems in directed graphs. We consider such fundamental problems as topologically sorting a directed acyclic graph (DAG), checking whether the input is in fact a DAG, and finding a minimum feedback arc set. It can be shown that when the input graph is arbitrary, these problems have high space complexity in the streaming model. Thus, we concentrate on designing algorithms for tournaments and a certain family of random graphs. Together, these results complement the much more mature body of work on algorithms for undirected graph streams

ScholarWorks@UMass Amherst

Vergleichen und Aggregieren von partiellen Ordnungen

Author: Hofmeier Andreas
Publication venue
Publication date: 05/11/2012
Field of study

Das Vergleichen und Aggregieren von Informationen ist ein zentraler Bereich in der Analyse von Wahlsystemen. In diesen müssen die verschiedenen Meinungen von Wählern über eine Menge von Kandidaten zu einem möglichst gerechten Wahlergebnis aggregiert werden. In den meisten politischen Wahlen entscheidet sich jeder Wähler durch Ankreuzen für einen einzigen Kandidaten. Daneben werden aber auch Rangordnungsprobleme als eine Variante von Wahlsystemen untersucht. Bei diesen bringt jeder Wähler seine Meinung in Form einer totalen Ordnung über der Menge der Kandidaten zum Ausdruck, wodurch seine oftmals komplexe Meinung exakter repräsentiert werden kann als durch die Auswahl eines einzigen, favorisierten Kandidaten. Das Wahlergebnis eines Rangordnungsproblems ist dann eine ebenfalls totale Ordnung der Kandidaten, welche die geringste Distanz zu den Meinungen der Wähler aufweist. Als Distanzmaße zwischen zwei totalen Ordnungen haben sich neben anderen Kendalls Tau-Distanz und Spearmans Footrule-Distanz etabliert. Durch moderne Anwendungsmöglichkeiten von Rangordnungsproblemen im maschinellen Lernen, in der künstlichen Intelligenz, in der Bioinformatik und vor allem in verschiedenen Bereichen des World Wide Web rücken bereits bekannte, jedoch bislang eher wenig studierte Aspekte in den Fokus der Forschung. Zum einen gewinnt die algorithmische Komplexität von Rangordnungsproblemen an Bedeutung. Zum anderen existieren in vielen dieser Anwendungen unvollständige „Wählermeinungen“ mit unentschiedenen oder unvergleichbaren Kandidaten, so dass totale Ordnungen zu deren Repräsentation nicht länger geeignet sind. Die vorliegende Arbeit greift diese beiden Aspekte auf und betrachtet die algorithmische Komplexität von Rangordnungsproblemen, in denen Wählermeinungen anstatt durch totale Ordnungen durch schwache oder partielle Ordnungen repräsentiert werden. Dazu werden Kendalls Tau-Distanz und Spearmans Footrule-Distanz auf verschiedene nahe liegende Arten verallgemeinert. Es zeigt sich dabei, dass nun bereits die Distanzberechnung zwischen zwei Ordnungen ein algorithmisch komplexes Problem darstellt. So ist die Berechnung der verallgemeinerten Versionen von Kendalls Tau-Distanz oder Spearmans Footrule-Distanz für schwache Ordnungen noch effizient möglich. Sobald jedoch partielle Ordnungen betrachtet werden, sind die Probleme NP-vollständig, also vermutlich nicht mehr effizient lösbar. In diesem Fall werden Resultate zur Approximierbarkeit und zur parametrisierten Komplexität der Probleme vorgestellt. Auch die Komplexität der Rangordnungsprobleme selbst erhöht sich. Für totale Ordnungen effizient lösbare Varianten werden für schwache Ordnungen NP-vollständig, für totale Ordnungen NP-vollständige Varianten hingegen liegen für partielle Ordnungen teilweise außerhalb der Komplexitätsklasse NP. Die Arbeit schließt mit einem Ausblick auf offene Problemstellungen

LIPIcs, Volume 244, ESA 2022, Complete Volume

Author: Chechik Shiri
Herman Grzegorz
Navarro Gonzalo
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 244, ESA 2022, Complete Volum

Dagstuhl Research Online Publication Server

43rd International Symposium on Mathematical Foundations of Computer Science: MFCS 2018, August 27-31, 2018, Liverpool, United Kingdom

Author: International Symposium on Mathematical Foundations of Computer Science <43. 2018, Liverpool>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/08/2018
Field of study

Digitale Bibliothek Thüringen