7 research outputs found
Shortest Unique Substring Query Revisited
We revisit the problem of finding shortest unique substring (SUS) proposed
recently by [6]. We propose an optimal time and space algorithm that can
find an SUS for every location of a string of size . Our algorithm
significantly improves the time complexity needed by [6]. We also
support finding all the SUSes covering every location, whereas the solution in
[6] can find only one SUS for every location. Further, our solution is simpler
and easier to implement and can also be more space efficient in practice, since
we only use the inverse suffix array and longest common prefix array of the
string, while the algorithm in [6] uses the suffix tree of the string and other
auxiliary data structures. Our theoretical results are validated by an
empirical study that shows our algorithm is much faster and more space-saving
than the one in [6]
Optimizing Query Predicates with Disjunctions for Column Stores
Since its inception, database research has given limited attention to
optimizing predicates with disjunctions. What little past work there is has
focused on optimizations for traditional row-oriented databases. A key
difference in predicate evaluation for row stores and column stores is that
while row stores apply predicates to one record at a time, column stores apply
predicates to sets of records. Not only must the execution engine decide the
order in which to apply the predicates, but it must also decide how many times
each predicate should be applied and on which sets of records it should be
applied to. In our work, we tackle exactly this problem. We formulate, analyze,
and solve the predicate evaluation problem for column stores. Our results
include proofs about various properties of the problem, and in turn, these
properties have allowed us to derive the first polynomial-time (i.e., O(n log
n)) algorithm ShallowFish which evaluates predicates optimally for all
predicate expressions with a depth of 2 or less. We capture the exact property
which makes the problem more difficult for predicate expressions of depth 3 or
greater and propose an approximate algorithm DeepFish which outperforms
ShallowFish in these situations. Finally, we show that both ShallowFish and
DeepFish outperform the corresponding state of the art by two orders of
magnitude
Mechanisms with costly knowledge
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.Cataloged from PDF version of thesis.Includes bibliographical references (pages 18-21).We propose investigating the design and analysis of game theoretic mechanisms when the players have very unstructured initial knowledge about themselves, but can refine their own knowledge at a cost. We consider several set-theoretic models of "costly knowledge". Specifically, we consider auctions of a single good in which a player i's only knowledge about his own valuation, [theta]i, is that it lies in a given interval [a, b]. However, the player can pay a cost, depending on a and b (in several ways), and learn a possibly arbitrary but shorter (in several metrics) sub-interval, which is guaranteed to contain [theta]i. In light of the set-theoretic uncertainty they face, it is natural for the players to act so as to minimize their regret. As a first step, we analyze the performance of the second-price mechanism in regret-minimizing strategies, and show that, in all our models, it always returns an outcome of very high social welfare.by Atalay M. Ileri.S.M
A simple yet time-optimal and linear-space algorithm for shortest unique substring queries
WOS: 000347602000043We revisit the problem of finding shortest unique substring (SUS) proposed recently by Pei et al. (2013) [12]. We propose an optimal O(n) time and space algorithm that can find an SUS for every location of a string of size n and thus significantly improve their O(n(2)) time complexity. Our method also supports finding all the SUSes covering every location, whereas theirs can find only one SUS for every location. Further, our solution is simpler and easier to implement and is more space efficient in practice, since we only use the inverse suffix array and the longest common prefix array of the string, while their algorithm uses the suffix tree of the string and other auxiliary data structures. Our theoretical results are validated by an empirical study with real-world data that shows our method is at least 8 times faster and uses at least 20 times less memory. The speedup gained by our method against Pei et al's can become even more significant when the string size increases due to their quadratic time complexity. We also have compared our method with the recent Tsuruta et al.'s (2014) [14] proposal, another independent 0(n) time and space algorithm for SUS finding. The empirical study shows that both methods have nearly the same processing speed. However, ours uses at least 4 times less memory for finding one SUS and at least 2 times less memory for finding all SUSes, both covering every string location.EWU's Faculty Grants for Research and Creative WorksSupported in part by EWU's Faculty Grants for Research and Creative Works