511,974 research outputs found
Which brands gain share from which brands? Inference from store-level scanner data
Market share models for weekly store-level data are useful to understand competitive structures
by delivering own and cross price elasticities. These models can however not be used to
examine which brands lose share to which brands during a specific period of time. It is for this
purpose that we propose a new model, which does allow for such an examination. We illustrate
the model for two product categories in two markets, and we show that our model has validity in
terms of both in-sample fit and out-of-sample forecasting. We also demonstrate how our model
can be used to decompose own and cross price elasticities to get additional insights into the
competitive structure
Recommended from our members
Indexing Proximity-based Dependencies for Information Retrieval
Research into term dependencies for information retrieval has demonstrated that dependency retrieval models are able to consistently improve retrieval effectiveness over bag-of-words models. However, the computation of term dependency statistics is a major efficiency bottleneck in the execution of these retrieval models. This thesis investigates the problem of improving the efficiency of dependency retrieval models without compromising the effectiveness benefits of the term dependency features.
Despite the large number of published comparisons between dependency models and bag-of-words approaches, there has been a lack of direct comparisons between alternate dependency models. We provide this comparison and investigate different types of proximity features. Several bi-term and many-term dependency models over a range of TREC collections, for both short (title) and long (description) queries, are compared to determine the strongest benchmark models. We observe that the weighted sequential dependence model is the most effective model studied. Additionally, we observe that there is some potential in many-term dependencies, but more selective methods are required to exploit these features.
We then investigate two novel index structures to directly index the proximitybased dependencies used in the sequential dependence model and weighted sequential dependence model. The frequent index and the sketch index data structures can both provide efficient access to collection and document level statistics for all indexed term dependencies, while minimizing space costs, relative to a full inverted index of term dependencies. We test whether these structures can improve retrieval efficiency without incurring large space requirements, or degrading retrieval effectiveness significantly. A secondary requirement is that each data structure must be able to be constructed for an input text collection in a scalable and distributed manner.
Based on the observation that the vast majority of term dependencies extracted from queries are relatively frequent in the collection, the “frequent” index of term dependencies omits data for infrequent term dependencies. The sketch index of term dependencies uses techniques from sketch data structures to store probabilisticallybounded estimates of the required statistics. We present analyses of these data structures that include construction and space costs, retrieval efficiency and investigation of any degradation of retrieval effectiveness.
Finally, we investigate the application of these data structures to the execution of the strongest performing dependency models identified. We compare the retrieval efficiency of each of these structures across two query processing algorithms, and across both short and long queries, using two large web collections. We observe that these newly proposed data structures allow the execution of queries considerably faster than when using positional indexes, and as fast as a full index of term dependencies, but with lowered storage overhead
AiiDA: Automated Interactive Infrastructure and Database for Computational Science
Computational science has seen in the last decades a spectacular rise in the
scope, breadth, and depth of its efforts. Notwithstanding this prevalence and
impact, it is often still performed using the renaissance model of individual
artisans gathered in a workshop, under the guidance of an established
practitioner. Great benefits could follow instead from adopting concepts and
tools coming from computer science to manage, preserve, and share these
computational efforts. We illustrate here our paradigm sustaining such vision,
based around the four pillars of Automation, Data, Environment, and Sharing. We
then discuss its implementation in the open-source AiiDA platform
(http://www.aiida.net), that has been tuned first to the demands of
computational materials science. AiiDA's design is based on directed acyclic
graphs to track the provenance of data and calculations, and ensure
preservation and searchability. Remote computational resources are managed
transparently, and automation is coupled with data storage to ensure
reproducibility. Last, complex sequences of calculations can be encoded into
scientific workflows. We believe that AiiDA's design and its sharing
capabilities will encourage the creation of social ecosystems to disseminate
codes, data, and scientific workflows.Comment: 30 pages, 7 figure
Succinct Representations of Permutations and Functions
We investigate the problem of succinctly representing an arbitrary
permutation, \pi, on {0,...,n-1} so that \pi^k(i) can be computed quickly for
any i and any (positive or negative) integer power k. A representation taking
(1+\epsilon) n lg n + O(1) bits suffices to compute arbitrary powers in
constant time, for any positive constant \epsilon <= 1. A representation taking
the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers
in O(lg n / lg lg n) time.
We then consider the more general problem of succinctly representing an
arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed
quickly for any i and any integer power k. We give a representation that takes
(1+\epsilon) n lg n + O(1) bits, for any positive constant \epsilon <= 1, and
computes arbitrary positive powers in constant time. It can also be used to
compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time.
We place emphasis on the redundancy, or the space beyond the
information-theoretic lower bound that the data structure uses in order to
support operations efficiently. A number of lower bounds have recently been
shown on the redundancy of data structures. These lower bounds confirm the
space-time optimality of some of our solutions. Furthermore, the redundancy of
one of our structures "surpasses" a recent lower bound by Golynski [Golynski,
SODA 2009], thus demonstrating the limitations of this lower bound.Comment: Preliminary versions of these results have appeared in the
Proceedings of ICALP 2003 and 2004. However, all results in this version are
improved over the earlier conference versio
Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space
Indexing highly repetitive texts - such as genomic databases, software
repositories and versioned text collections - has become an important problem
since the turn of the millennium. A relevant compressibility measure for
repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms
(BWTs). One of the earliest indexes for repetitive collections, the Run-Length
FM-index, used O(r) space and was able to efficiently count the number of
occurrences of a pattern of length m in the text (in loglogarithmic time per
pattern symbol, with current techniques). However, it was unable to locate the
positions of those occurrences efficiently within a space bounded in terms of
r. In this paper we close this long-standing problem, showing how to extend the
Run-Length FM-index so that it can locate the occ occurrences efficiently
within O(r) space (in loglogarithmic time each), and reaching optimal time, O(m
+ occ), within O(r log log w ({\sigma} + n/r)) space, for a text of length n
over an alphabet of size {\sigma} on a RAM machine with words of w =
{\Omega}(log n) bits. Within that space, our index can also count in optimal
time, O(m). Multiplying the space by O(w/ log {\sigma}), we support count and
locate in O(dm log({\sigma})/we) and O(dm log({\sigma})/we + occ) time, which
is optimal in the packed setting and had not been obtained before in compressed
space. We also describe a structure using O(r log(n/r)) space that replaces the
text and extracts any text substring of length ` in almost-optimal time
O(log(n/r) + ` log({\sigma})/w). Within that space, we similarly provide direct
access to suffix array, inverse suffix array, and longest common prefix array
cells, and extend these capabilities to full suffix tree functionality,
typically in O(log(n/r)) time per operation.Comment: submitted version; optimal count and locate in smaller space: O(r log
log_w(n/r + sigma)
- …