20 research outputs found
Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries
We study ranked enumeration of join-query results according to very general
orders defined by selective dioids. Our main contribution is a framework for
ranked enumeration over a class of dynamic programming problems that
generalizes seemingly different problems that had been studied in isolation. To
this end, we extend classic algorithms that find the k-shortest paths in a
weighted graph. For full conjunctive queries, including cyclic ones, our
approach is optimal in terms of the time to return the top result and the delay
between results. These optimality properties are derived for the widely used
notion of data complexity, which treats query size as a constant. By performing
a careful cost analysis, we are able to uncover a previously unknown tradeoff
between two incomparable enumeration approaches: one has lower complexity when
the number of returned results is small, the other when the number is very
large. We theoretically and empirically demonstrate the superiority of our
techniques over batch algorithms, which produce the full result and then sort
it. Our technique is not only faster for returning the first few results, but
on some inputs beats the batch algorithm even when all results are produced.Comment: 50 pages, 19 figure
Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS'09)
The Symposium on Theoretical Aspects of Computer Science (STACS) is held alternately in France and in Germany. The conference of February 26-28, 2009, held in Freiburg, is the 26th in this series. Previous meetings took place in Paris (1984), Saarbr¨ucken (1985), Orsay (1986), Passau (1987), Bordeaux (1988), Paderborn (1989), Rouen (1990), Hamburg (1991), Cachan (1992), W¨urzburg (1993), Caen (1994), M¨unchen (1995), Grenoble (1996), L¨ubeck (1997), Paris (1998), Trier (1999), Lille (2000), Dresden (2001), Antibes (2002), Berlin (2003), Montpellier (2004), Stuttgart (2005), Marseille (2006), Aachen (2007), and Bordeaux (2008). ..
The Effect of Representations on Constraint Satisfaction Problems
Constraint Satisfaction is used in the solution of a wide variety of important problems such as frequency assignment, code analysis, and scheduling. It is apparent that the modelling process is key to the success of any constraint based technique, and much work has been done on the identification of good models [FJHM05]. One of the key choices made during the modelling process is the selection of a constraint representation with which to express the constraints [HS02]. Whilst practitioners will commonly use an implicit representation, most existing structural tractability results are defined for explicit representation. We address a well-known anomaly in structural tractability theory, that acyclic instances are tractable when expressed explicitly, but may not be when expressed implicitly, and show that there is a link between representation and tractability, We introduce the notion of interaction width in order to address this disconnect between theory and practice, and use this to define new tractable classes by applying existing structural tractability results to different constraint representations, We show that for a given succinct representation, a non-trivial class of instances with bounded interaction width can be transformed into an explicit representation in polynomial time 50 that existing structural tractability results may be applied, We compare our work to existing results Cor alternative succinct representutions and show that the tractable classes we have defined arc incomparable and novel, and can be used to deduce new tractable classes for SAT. 3EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Relational Machine Learning Algorithms
The majority of learning tasks faced by data scientists involve relational data, yet most standard algorithms for standard learning problems are not designed to accept relational data as input. The standard practice to address this issue is to join the relational data to create the type of geometric input that standard learning algorithms expect. Unfortunately, this standard practice has exponential worst-case time and space complexity. This leads us to consider what we call the Relational Learning Question: "Which standard learning algorithms can be efficiently implemented on relational data, and for those that cannot, is there an alternative algorithm that can be efficiently implemented on relational data and that has similar performance guarantees to the standard algorithm?"
In this dissertation, we address the relational learning question for the well-known problems of support vector machine (SVM), logistic regression, and -means clustering. First, we design an efficient relational algorithm for regularized linear SVM and logistic regression using sampling methods. We show how to implement a variation of gradient descent that provides a nearly optimal approximation guarantee for stable instances. For the -means problem, we show that the -means++ algorithm can be efficiently implemented on relational data, and that a slight variation of adaptive k-means algorithm can be efficiently implemented on relational data while maintaining a constant approximation guarantee. On the way to developing these algorithms, we give an efficient approximation algorithm for certain sum-product queries with additive inequalities that commonly arise