1,067,317 research outputs found

    Existence of an infinite particle limit of stochastic ranking process

    Get PDF
    We study a stochastic particle system which models the time evolution of the ranking of books by online bookstores (e.g., Amazon). In this system, particles are lined in a queue. Each particle jumps at random jump times to the top of the queue, and otherwise stays in the queue, being pushed toward the tail every time another particle jumps to the top. In an infinite particle limit, the random motion of each particle between its jumps converges to a deterministic trajectory. (This trajectory is actually observed in the ranking data on web sites.) We prove that the (random) empirical distribution of this particle system converges to a deterministic space-time dependent distribution. A core of the proof is the law of large numbers for {\it dependent} random variables

    Seeing the sky through Hubble's eye: The COSMOS SkyWalker

    Get PDF
    Large, high-resolution space-based imaging surveys produce a volume of data that is difficult to present to the public in a comprehensible way. While megapixel-sized images can still be printed out or downloaded via the World Wide Web, this is no longer feasible for images with 10^9 pixels (e.g., the Hubble Space Telescope Advanced Camera for Surveys [ACS] images of the Galaxy Evolution from Morphology and SEDs [GEMS] project) or even 10^10 pixels (for the ACS Cosmic Evolution Survey [COSMOS]). We present a Web-based utility called the COSMOS SkyWalker that allows viewing of the huge ACS image data set, even through slow Internet connections. Using standard HTML and JavaScript, the application successively loads only those portions of the image at a time that are currently being viewed on the screen. The user can move within the image by using the mouse or interacting with an overview image. Using an astrometrically registered image for the COSMOS SkyWalker allows the display of calibrated world coordinates for use in science. The SkyWalker "technique" can be applied to other data sets. This requires some customization, notably the slicing up of a data set into small (e.g., 256^2 pixel) subimages. An advantage of the SkyWalker is the use of standard Web browser components; thus, it requires no installation of any software and can therefore be viewed by anyone across many operating systems.Comment: 4 pages, 2 figures, accepted for publication in PAS

    Multi-Paradigm Reasoning for Access to Heterogeneous GIS

    Get PDF
    Accessing and querying geographical data in a uniform way has become easier in recent years. Emerging standards like WFS turn the web into a geospatial web services enabled place. Mediation architectures like VirGIS overcome syntactical and semantical heterogeneity between several distributed sources. On mobile devices, however, this kind of solution is not suitable, due to limitations, mostly regarding bandwidth, computation power, and available storage space. The aim of this paper is to present a solution for providing powerful reasoning mechanisms accessible from mobile applications and involving data from several heterogeneous sources. By adapting contents to time and location, mobile web information systems can not only increase the value and suitability of the service itself, but can substantially reduce the amount of data delivered to users. Because many problems pertain to infrastructures and transportation in general and to way finding in particular, one cornerstone of the architecture is higher level reasoning on graph networks with the Multi-Paradigm Location Language MPLL. A mediation architecture is used as a “graph provider” in order to transfer the load of computation to the best suited component – graph construction and transformation for example being heavy on resources. Reasoning in general can be conducted either near the “source” or near the end user, depending on the specific use case. The concepts underlying the proposal described in this paper are illustrated by a typical and concrete scenario for web applications

    An Efficient Bandit Algorithm for Realtime Multivariate Optimization

    Full text link
    Optimization is commonly employed to determine the content of web pages, such as to maximize conversions on landing pages or click-through rates on search engine result pages. Often the layout of these pages can be decoupled into several separate decisions. For example, the composition of a landing page may involve deciding which image to show, which wording to use, what color background to display, etc. Such optimization is a combinatorial problem over an exponentially large decision space. Randomized experiments do not scale well to this setting, and therefore, in practice, one is typically limited to optimizing a single aspect of a web page at a time. This represents a missed opportunity in both the speed of experimentation and the exploitation of possible interactions between layout decisions. Here we focus on multivariate optimization of interactive web pages. We formulate an approach where the possible interactions between different components of the page are modeled explicitly. We apply bandit methodology to explore the layout space efficiently and use hill-climbing to select optimal content in realtime. Our algorithm also extends to contextualization and personalization of layout selection. Simulation results show the suitability of our approach to large decision spaces with strong interactions between content. We further apply our algorithm to optimize a message that promotes adoption of an Amazon service. After only a single week of online optimization, we saw a 21% conversion increase compared to the median layout. Our technique is currently being deployed to optimize content across several locations at Amazon.com.Comment: KDD'17 Audience Appreciation Awar

    Search in the Universe of Big Networks and Data

    Full text link
    Searching in the Internet for some object characterised by its attributes in the form of data, such as a hotel in a certain city whose price is less than something, is one of our most common activities when we access the Web. We discuss this problem in a general setting, and compute the average amount of time and the energy it takes to find an object in an infinitely large search space. We consider the use of N search agents which act concurrently. Both the case where the search agent knows which way it needs to go to find the object, and the case where the search agent is perfectly ignorant and may even head away from the object being sought. We show that under mild conditions regarding the randomness of the search and the use of a time-out, the search agent will always find the object despite the fact that the search space is infinite. We obtain a formula for the average search time and the average energy expended by N search agents acting concurrently and independently of each other. We see that the time-out itself can be used to minimise the search time and the amount of energy that is consumed to find an object. An approximate formula is derived for the number of search agents that can help us guarantee that an object is found in a given time, and we discuss how the competition between search agents and other agents that try to hide the data object, can be used by opposing parties to guarantee their own success.Comment: IEEE Network Magazine - Special Issue on Networking for Big Data, July-August 201

    A practical index for approximate dictionary matching with few mismatches

    Get PDF
    Approximate dictionary matching is a classic string matching problem (checking if a query string occurs in a collection of strings) with applications in, e.g., spellchecking, online catalogs, geolocation, and web searchers. We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive space-time tradeoffs. Our implementation in the C++ language is focused mostly on data compaction, which is beneficial for the search speed (e.g., by being cache friendly). We compare our solution with other algorithms and we show that it performs better for the Hamming distance. Query times in the order of 1 microsecond were reported for one mismatch for the dictionary size of a few megabytes on a medium-end PC. We also demonstrate that a basic compression technique consisting in qq-gram substitution can significantly reduce the index size (up to 50% of the input text size for the DNA), while still keeping the query time relatively low

    Efficient and Effective Query Auto-Completion

    Full text link
    Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact space. However, searching by prefix has little discovery power in that only completions that are prefixed by the query are returned. This may impact negatively the effectiveness of the QAC system, with a consequent monetary loss for real applications like Web Search Engines and eCommerce. In this work we describe the implementation that empowers a new QAC system at eBay, and discuss its efficiency/effectiveness in relation to other approaches at the state-of-the-art. The solution is based on the combination of an inverted index with succinct data structures, a much less explored direction in the literature. This system is replacing the previous implementation based on Apache SOLR that was not always able to meet the required service-level-agreement.Comment: Published in SIGIR 202
    • …
    corecore