7 research outputs found

    Shortest Unique Substring Query Revisited

    Get PDF
    We revisit the problem of finding shortest unique substring (SUS) proposed recently by [6]. We propose an optimal O(n)O(n) time and space algorithm that can find an SUS for every location of a string of size nn. Our algorithm significantly improves the O(n2)O(n^2) time complexity needed by [6]. We also support finding all the SUSes covering every location, whereas the solution in [6] can find only one SUS for every location. Further, our solution is simpler and easier to implement and can also be more space efficient in practice, since we only use the inverse suffix array and longest common prefix array of the string, while the algorithm in [6] uses the suffix tree of the string and other auxiliary data structures. Our theoretical results are validated by an empirical study that shows our algorithm is much faster and more space-saving than the one in [6]

    Optimizing Query Predicates with Disjunctions for Column Stores

    Full text link
    Since its inception, database research has given limited attention to optimizing predicates with disjunctions. What little past work there is has focused on optimizations for traditional row-oriented databases. A key difference in predicate evaluation for row stores and column stores is that while row stores apply predicates to one record at a time, column stores apply predicates to sets of records. Not only must the execution engine decide the order in which to apply the predicates, but it must also decide how many times each predicate should be applied and on which sets of records it should be applied to. In our work, we tackle exactly this problem. We formulate, analyze, and solve the predicate evaluation problem for column stores. Our results include proofs about various properties of the problem, and in turn, these properties have allowed us to derive the first polynomial-time (i.e., O(n log n)) algorithm ShallowFish which evaluates predicates optimally for all predicate expressions with a depth of 2 or less. We capture the exact property which makes the problem more difficult for predicate expressions of depth 3 or greater and propose an approximate algorithm DeepFish which outperforms ShallowFish in these situations. Finally, we show that both ShallowFish and DeepFish outperform the corresponding state of the art by two orders of magnitude

    Mechanisms with costly knowledge

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.Cataloged from PDF version of thesis.Includes bibliographical references (pages 18-21).We propose investigating the design and analysis of game theoretic mechanisms when the players have very unstructured initial knowledge about themselves, but can refine their own knowledge at a cost. We consider several set-theoretic models of "costly knowledge". Specifically, we consider auctions of a single good in which a player i's only knowledge about his own valuation, [theta]i, is that it lies in a given interval [a, b]. However, the player can pay a cost, depending on a and b (in several ways), and learn a possibly arbitrary but shorter (in several metrics) sub-interval, which is guaranteed to contain [theta]i. In light of the set-theoretic uncertainty they face, it is natural for the players to act so as to minimize their regret. As a first step, we analyze the performance of the second-price mechanism in regret-minimizing strategies, and show that, in all our models, it always returns an outcome of very high social welfare.by Atalay M. Ileri.S.M

    A simple yet time-optimal and linear-space algorithm for shortest unique substring queries

    Get PDF
    WOS: 000347602000043We revisit the problem of finding shortest unique substring (SUS) proposed recently by Pei et al. (2013) [12]. We propose an optimal O(n) time and space algorithm that can find an SUS for every location of a string of size n and thus significantly improve their O(n(2)) time complexity. Our method also supports finding all the SUSes covering every location, whereas theirs can find only one SUS for every location. Further, our solution is simpler and easier to implement and is more space efficient in practice, since we only use the inverse suffix array and the longest common prefix array of the string, while their algorithm uses the suffix tree of the string and other auxiliary data structures. Our theoretical results are validated by an empirical study with real-world data that shows our method is at least 8 times faster and uses at least 20 times less memory. The speedup gained by our method against Pei et al's can become even more significant when the string size increases due to their quadratic time complexity. We also have compared our method with the recent Tsuruta et al.'s (2014) [14] proposal, another independent 0(n) time and space algorithm for SUS finding. The empirical study shows that both methods have nearly the same processing speed. However, ours uses at least 4 times less memory for finding one SUS and at least 2 times less memory for finding all SUSes, both covering every string location.EWU's Faculty Grants for Research and Creative WorksSupported in part by EWU's Faculty Grants for Research and Creative Works
    corecore