1,067,317 research outputs found
Existence of an infinite particle limit of stochastic ranking process
We study a stochastic particle system which models the time evolution of the
ranking of books by online bookstores (e.g., Amazon). In this system, particles
are lined in a queue. Each particle jumps at random jump times to the top of
the queue, and otherwise stays in the queue, being pushed toward the tail every
time another particle jumps to the top. In an infinite particle limit, the
random motion of each particle between its jumps converges to a deterministic
trajectory. (This trajectory is actually observed in the ranking data on web
sites.) We prove that the (random) empirical distribution of this particle
system converges to a deterministic space-time dependent distribution. A core
of the proof is the law of large numbers for {\it dependent} random variables
Seeing the sky through Hubble's eye: The COSMOS SkyWalker
Large, high-resolution space-based imaging surveys produce a volume of data
that is difficult to present to the public in a comprehensible way. While
megapixel-sized images can still be printed out or downloaded via the World
Wide Web, this is no longer feasible for images with 10^9 pixels (e.g., the
Hubble Space Telescope Advanced Camera for Surveys [ACS] images of the Galaxy
Evolution from Morphology and SEDs [GEMS] project) or even 10^10 pixels (for
the ACS Cosmic Evolution Survey [COSMOS]). We present a Web-based utility
called the COSMOS SkyWalker that allows viewing of the huge ACS image data set,
even through slow Internet connections. Using standard HTML and JavaScript, the
application successively loads only those portions of the image at a time that
are currently being viewed on the screen. The user can move within the image by
using the mouse or interacting with an overview image. Using an astrometrically
registered image for the COSMOS SkyWalker allows the display of calibrated
world coordinates for use in science. The SkyWalker "technique" can be applied
to other data sets. This requires some customization, notably the slicing up of
a data set into small (e.g., 256^2 pixel) subimages. An advantage of the
SkyWalker is the use of standard Web browser components; thus, it requires no
installation of any software and can therefore be viewed by anyone across many
operating systems.Comment: 4 pages, 2 figures, accepted for publication in PAS
Multi-Paradigm Reasoning for Access to Heterogeneous GIS
Accessing and querying geographical data in a uniform way has become easier in recent years. Emerging standards like WFS turn
the web into a geospatial web services enabled place. Mediation
architectures like VirGIS overcome syntactical and semantical heterogeneity
between several distributed sources. On mobile devices,
however, this kind of solution is not suitable, due to limitations,
mostly regarding bandwidth, computation power, and available storage
space. The aim of this paper is to present a solution for providing
powerful reasoning mechanisms accessible from mobile applications
and involving data from several heterogeneous sources.
By adapting contents to time and location, mobile web information
systems can not only increase the value and suitability of the
service itself, but can substantially reduce the amount of data delivered
to users. Because many problems pertain to infrastructures
and transportation in general and to way finding in particular, one
cornerstone of the architecture is higher level reasoning on graph
networks with the Multi-Paradigm Location Language MPLL. A
mediation architecture is used as a âgraph providerâ in order to
transfer the load of computation to the best suited component â
graph construction and transformation for example being heavy on
resources. Reasoning in general can be conducted either near the
âsourceâ or near the end user, depending on the specific use case.
The concepts underlying the proposal described in this paper are
illustrated by a typical and concrete scenario for web applications
An Efficient Bandit Algorithm for Realtime Multivariate Optimization
Optimization is commonly employed to determine the content of web pages, such
as to maximize conversions on landing pages or click-through rates on search
engine result pages. Often the layout of these pages can be decoupled into
several separate decisions. For example, the composition of a landing page may
involve deciding which image to show, which wording to use, what color
background to display, etc. Such optimization is a combinatorial problem over
an exponentially large decision space. Randomized experiments do not scale well
to this setting, and therefore, in practice, one is typically limited to
optimizing a single aspect of a web page at a time. This represents a missed
opportunity in both the speed of experimentation and the exploitation of
possible interactions between layout decisions.
Here we focus on multivariate optimization of interactive web pages. We
formulate an approach where the possible interactions between different
components of the page are modeled explicitly. We apply bandit methodology to
explore the layout space efficiently and use hill-climbing to select optimal
content in realtime. Our algorithm also extends to contextualization and
personalization of layout selection. Simulation results show the suitability of
our approach to large decision spaces with strong interactions between content.
We further apply our algorithm to optimize a message that promotes adoption of
an Amazon service. After only a single week of online optimization, we saw a
21% conversion increase compared to the median layout. Our technique is
currently being deployed to optimize content across several locations at
Amazon.com.Comment: KDD'17 Audience Appreciation Awar
Search in the Universe of Big Networks and Data
Searching in the Internet for some object characterised by its attributes in
the form of data, such as a hotel in a certain city whose price is less than
something, is one of our most common activities when we access the Web. We
discuss this problem in a general setting, and compute the average amount of
time and the energy it takes to find an object in an infinitely large search
space. We consider the use of N search agents which act concurrently. Both the
case where the search agent knows which way it needs to go to find the object,
and the case where the search agent is perfectly ignorant and may even head
away from the object being sought. We show that under mild conditions regarding
the randomness of the search and the use of a time-out, the search agent will
always find the object despite the fact that the search space is infinite. We
obtain a formula for the average search time and the average energy expended by
N search agents acting concurrently and independently of each other. We see
that the time-out itself can be used to minimise the search time and the amount
of energy that is consumed to find an object. An approximate formula is derived
for the number of search agents that can help us guarantee that an object is
found in a given time, and we discuss how the competition between search agents
and other agents that try to hide the data object, can be used by opposing
parties to guarantee their own success.Comment: IEEE Network Magazine - Special Issue on Networking for Big Data,
July-August 201
A practical index for approximate dictionary matching with few mismatches
Approximate dictionary matching is a classic string matching problem
(checking if a query string occurs in a collection of strings) with
applications in, e.g., spellchecking, online catalogs, geolocation, and web
searchers. We present a surprisingly simple solution called a split index,
which is based on the Dirichlet principle, for matching a keyword with few
mismatches, and experimentally show that it offers competitive space-time
tradeoffs. Our implementation in the C++ language is focused mostly on data
compaction, which is beneficial for the search speed (e.g., by being cache
friendly). We compare our solution with other algorithms and we show that it
performs better for the Hamming distance. Query times in the order of 1
microsecond were reported for one mismatch for the dictionary size of a few
megabytes on a medium-end PC. We also demonstrate that a basic compression
technique consisting in -gram substitution can significantly reduce the
index size (up to 50% of the input text size for the DNA), while still keeping
the query time relatively low
Efficient and Effective Query Auto-Completion
Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search
systems, suggesting possible ways of completing the query being typed by the
user. Efficiency is crucial to make the system have a real-time responsiveness
when operating in the million-scale search space. Prior work has extensively
advocated the use of a trie data structure for fast prefix-search operations in
compact space. However, searching by prefix has little discovery power in that
only completions that are prefixed by the query are returned. This may impact
negatively the effectiveness of the QAC system, with a consequent monetary loss
for real applications like Web Search Engines and eCommerce. In this work we
describe the implementation that empowers a new QAC system at eBay, and discuss
its efficiency/effectiveness in relation to other approaches at the
state-of-the-art. The solution is based on the combination of an inverted index
with succinct data structures, a much less explored direction in the
literature. This system is replacing the previous implementation based on
Apache SOLR that was not always able to meet the required
service-level-agreement.Comment: Published in SIGIR 202
- âŚ