35,725 research outputs found

    gMark: Schema-Driven Generation of Graphs and Queries

    Full text link
    Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this paper, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints. We illustrate the flexibility and practical usability of gMark by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.Comment: Accepted in November 2016. URL: http://ieeexplore.ieee.org/document/7762945/. in IEEE Transactions on Knowledge and Data Engineering 201

    Basis Token Consistency: A Practical Mechanism for Strong Web Cache Consistency

    Full text link
    With web caching and cache-related services like CDNs and edge services playing an increasingly significant role in the modern internet, the problem of the weak consistency and coherence provisions in current web protocols is becoming increasingly significant and drawing the attention of the standards community [LCD01]. Toward this end, we present definitions of consistency and coherence for web-like environments, that is, distributed client-server information systems where the semantics of interactions with resource are more general than the read/write operations found in memory hierarchies and distributed file systems. We then present a brief review of proposed mechanisms which strengthen the consistency of caches in the web, focusing upon their conceptual contributions and their weaknesses in real-world practice. These insights motivate a new mechanism, which we call "Basis Token Consistency" or BTC; when implemented at the server, this mechanism allows any client (independent of the presence and conformity of any intermediaries) to maintain a self-consistent view of the server's state. This is accomplished by annotating responses with additional per-resource application information which allows client caches to recognize the obsolescence of currently cached entities and identify responses from other caches which are already stale in light of what has already been seen. The mechanism requires no deviation from the existing client-server communication model, and does not require servers to maintain any additional per-client state. We discuss how our mechanism could be integrated into a fragment-assembling Content Management System (CMS), and present a simulation-driven performance comparison between the BTC algorithm and the use of the Time-To-Live (TTL) heuristic.National Science Foundation (ANI-9986397, ANI-0095988

    The Pan-STARRS Moving Object Processing System

    Full text link
    We describe the Pan-STARRS Moving Object Processing System (MOPS), a modern software package that produces automatic asteroid discoveries and identifications from catalogs of transient detections from next-generation astronomical survey telescopes. MOPS achieves > 99.5% efficiency in producing orbits from a synthetic but realistic population of asteroids whose measurements were simulated for a Pan-STARRS4-class telescope. Additionally, using a non-physical grid population, we demonstrate that MOPS can detect populations of currently unknown objects such as interstellar asteroids. MOPS has been adapted successfully to the prototype Pan-STARRS1 telescope despite differences in expected false detection rates, fill-factor loss and relatively sparse observing cadence compared to a hypothetical Pan-STARRS4 telescope and survey. MOPS remains >99.5% efficient at detecting objects on a single night but drops to 80% efficiency at producing orbits for objects detected on multiple nights. This loss is primarily due to configurable MOPS processing limits that are not yet tuned for the Pan-STARRS1 mission. The core MOPS software package is the product of more than 15 person-years of software development and incorporates countless additional years of effort in third-party software to perform lower-level functions such as spatial searching or orbit determination. We describe the high-level design of MOPS and essential subcomponents, the suitability of MOPS for other survey programs, and suggest a road map for future MOPS development.Comment: 57 Pages, 26 Figures, 13 Table

    Performance Testing of Distributed Component Architectures

    Get PDF
    Performance characteristics, such as response time, throughput andscalability, are key quality attributes of distributed applications. Current practice,however, rarely applies systematic techniques to evaluate performance characteristics.We argue that evaluation of performance is particularly crucial in early developmentstages, when important architectural choices are made. At first glance, thiscontradicts the use of testing techniques, which are usually applied towards the endof a project. In this chapter, we assume that many distributed systems are builtwith middleware technologies, such as the Java 2 Enterprise Edition (J2EE) or theCommon Object Request Broker Architecture (CORBA). These provide servicesand facilities whose implementations are available when architectures are defined.We also note that it is the middleware functionality, such as transaction and persistenceservices, remote communication primitives and threading policy primitives,that dominates distributed system performance. Drawing on these observations, thischapter presents a novel approach to performance testing of distributed applications.We propose to derive application-specific test cases from architecture designs so thatthe performance of a distributed application can be tested based on the middlewaresoftware at early stages of a development process. We report empirical results thatsupport the viability of the approach

    QueRIE: Collaborative Database Exploration

    Get PDF
    Interactive database exploration is a key task in information mining. However, users who lack SQL expertise or familiarity with the database schema face great difficulties in performing this task. To aid these users, we developed the QueRIE system for personalized query recommendations. QueRIE continuously monitors the user’s querying behavior and finds matching patterns in the system’s query log, in an attempt to identify previous users with similar information needs. Subsequently, QueRIE uses these “similar” users and their queries to recommend queries that the current user may find interesting. In this work we describe an instantiation of the QueRIE framework, where the active user’s session is represented by a set of query fragments. The recorded fragments are used to identify similar query fragments in the previously recorded sessions, which are in turn assembled in potentially interesting queries for the active user. We show through experimentation that the proposed method generates meaningful recommendations on real-life traces from the SkyServer database and propose a scalable design that enables the incremental update of similarities, making real-time computations on large amounts of data feasible. Finally, we compare this fragment-based instantiation with our previously proposed tuple-based instantiation discussing the advantages and disadvantages of each approach

    LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning

    Full text link
    We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV). The resulting crowd video datasets are used to design accurate algorithms or training models for crowded scene understanding. Our overall approach is composed of two components: a procedural simulation framework for generating crowd movements and behaviors, and a procedural rendering framework to generate different videos or images. Each video or image is automatically labeled based on the environment, number of pedestrians, density, behavior, flow, lighting conditions, viewpoint, noise, etc. Furthermore, we can increase the realism by combining synthetically-generated behaviors with real-world background videos. We demonstrate the benefits of LCrowdV over prior lableled crowd datasets by improving the accuracy of pedestrian detection and crowd behavior classification algorithms. LCrowdV would be released on the WWW
    corecore