8 research outputs found

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    Shared Arrangements: practical inter-query sharing for streaming dataflows

    Full text link
    Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally maintained queries: each query must independently maintain the same indexed state over the same input streams, and new queries must build this state from scratch before they can begin to emit their first results. This paper introduces shared arrangements: indexed views of maintained state that allow concurrent queries to reuse the same in-memory state without compromising data-parallel performance and scaling. We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams, while also significantly improving performance in other domains including business analytics, graph processing, and program analysis

    Exploring Superpage Promotion Policies for Efficient Address Translation

    Get PDF
    Address translation performance for modern applications depends heavily upon the number of translation entries cached in the hardware TLB (translation look-aside buffer). Therefore, the efficiency of address translation relies directly on the TLB hit rate. The number of TLB entries continues to fall further behind the growth of memory consumption for modern applications. Superpages, which are pages with larger sizes, can increase the efficiency of the TLB by enabling each translation entry to cover a larger memory region. Without requiring more TLB entries, using superpages can increase the TLB hit rate and benefit address translation. However, using superpages can bring overhead. The TLB uses a single dirty bit to mark a page as dirty during address translation before modifying the page, so the granularity of the dirty bit corresponds to the coverage of the translation entry. As a result, the OS (operating system) will pay extra I/O effort when it allocates or writes an underutilized superpage back to disk. Such extra overhead can easily surpass the address translation benefits of superpages. This thesis discusses the performance trade-offs of superpages by exploring the design space of superpage promotion policies in the OS. A data collection infrastructure is built based on QEMU with kernel instrumentation on FreeBSD to collaboratively collect both memory accesses and kernel events. Then, the TLB behavior of Intel Skylake x86 family processors is simulated. The simulation has been validated to be faithful and consistent with the real-world performance. Last, this thesis evaluates and compares both TLB performance benefits and I/O overheads among the superpage promotion policies to discuss the trade-offs in the design space

    Model-Based Design, Analysis, and Implementations for Power and Energy-Efficient Computing Systems

    Get PDF
    Modern computing systems are becoming increasingly complex. On one end of the spectrum, personal computers now commonly support multiple processing cores, and, on the other end, Internet services routinely employ thousands of servers in distributed locations to provide the desired service to its users. In such complex systems, concerns about energy usage and power consumption are increasingly important. Moreover, growing awareness of environmental issues has added to the overall complexity by introducing new variables to the problem. In this regard, the ability to abstractly focus on the relevant details allows model-based design to help significantly in the analysis and solution of such problems. In this dissertation, we explore and analyze model-based design for energy and power considerations in computing systems. Although the presented techniques are more generally applicable, we focus their application on large-scale Internet services operating in U.S. electricity markets. Internet services are becoming increasingly popular in the ICT ecosystem of today. The physical infrastructure to support such services is commonly based on a group of cooperative data centers (DCs) operating in tandem. These DCs are geographically distributed to provide security and timing guarantees for their customers. To provide services to millions of customers, DCs employ hundreds of thousands of servers. These servers consume a large amount of energy that is traditionally produced by burning coal and employing other environmentally hazardous methods, such as nuclear and gas power generation plants. This large energy consumption results in significant and fast-growing financial and environmental costs. Consequently, for protection of local and global environments, governing bodies around the globe have begun to introduce legislation to encourage energy consumers, especially corporate entities, to increase the share of renewable energy (green energy) in their total energy consumption. However, in U.S. electricity markets, green energy is usually more expensive than energy generated from traditional sources like coal or petroleum. We model the overall problem in three sub-areas and explore different approaches aimed at reducing the environmental foot print and operating costs of multi-site Internet services, while honoring the Quality of Service (QoS) constraints as contracted in service level agreements (SLAs). Firstly, we model the load distribution among member DCs of a multi-site Internet service. The use of green energy is optimized considering different factors such as (a) geographically and temporally variable electricity prices, (b) the multitude of available energy sources to choose from at each DC, (c) the necessity to support more than one SLA, and, (d) the requirements to offer more than one service at each DC. Various approaches are presented for solving this problem and extensive simulations using Google’s setup in North America are used to evaluate the presented approaches. Secondly, we explore the area of shaving the peaks in the energy demand of large electricity consumers, such as DCs by using a battery-based energy storage system. Electrical demand of DCs is typically peaky based on the usage cycle of their customers. Resultant peaks in the electrical demand require development and maintenance of a costlier energy delivery mechanism, and are often met using expensive gas or diesel generators which often have a higher environmental impact. To shave the peak power demand, a battery can be used which is charged during low load and is discharged during the peak loads. Since the batteries are costly, we present a scheme to estimate the size of battery required for any variable electrical load. The electrical load is modeled using the concept of arrival curves from Network Calculus. Our analysis mechanism can help determine the appropriate battery size for a given load arrival curve to reduce the peak. Thirdly, we present techniques to employ intra-DC scheduling to regulate the peak power usage of each DC. The model we develop is equally applicable to an individual server with multi-/many-core chips as well as a complete DC with an intermix of homogeneous and heterogeneous servers. We evaluate these approaches on single-core and multi-core chip processors and present the results. Overall, our work demonstrates the value of model-based design for intelligent load distribution across DCs, storage integration, and per DC optimizations for efficient energy management to reduce operating costs and environmental footprint for multi-site Internet services

    The Free Press : January 3, 2019

    Get PDF

    Soundtrack recommendation for images

    Get PDF
    The drastic increase in production of multimedia content has emphasized the research concerning its organization and retrieval. In this thesis, we address the problem of music retrieval when a set of images is given as input query, i.e., the problem of soundtrack recommendation for images. The task at hand is to recommend appropriate music to be played during the presentation of a given set of query images. To tackle this problem, we formulate a hypothesis that the knowledge appropriate for the task is contained in publicly available contemporary movies. Our approach, Picasso, employs similarity search techniques inside the image and music domains, harvesting movies to form a link between the domains. To achieve a fair and unbiased comparison between different soundtrack recommendation approaches, we proposed an evaluation benchmark. The evaluation results are reported for Picasso and the baseline approach, using the proposed benchmark. We further address two efficiency aspects that arise from the Picasso approach. First, we investigate the problem of processing top-K queries with set-defined selections and propose an index structure that aims at minimizing the query answering latency. Second, we address the problem of similarity search in high-dimensional spaces and propose two enhancements to the Locality Sensitive Hashing (LSH) scheme. We also investigate the prospects of a distributed similarity search algorithm based on LSH using the MapReduce framework. Finally, we give an overview of the PicasSound|a smartphone application based on the Picasso approach.Der drastische Anstieg von verfügbaren Multimedia-Inhalten hat die Bedeutung der Forschung über deren Organisation sowie Suche innerhalb der Daten hervorgehoben. In dieser Doktorarbeit betrachten wir das Problem der Suche nach geeigneten Musikstücken als Hintergrundmusik für Diashows. Wir formulieren die Hypothese, dass die für das Problem erforderlichen Kenntnisse in öffentlich zugänglichen, zeitgenössischen Filmen enthalten sind. Unser Ansatz, Picasso, verwendet Techniken aus dem Bereich der Ähnlichkeitssuche innerhalb von Bild- und Musik-Domains, um basierend auf Filmszenen eine Verbindung zwischen beliebigen Bildern und Musikstücken zu lernen. Um einen fairen und unvoreingenommenen Vergleich zwischen verschiedenen Ansätzen zur Musikempfehlung zu erreichen, schlagen wir einen Bewertungs-Benchmark vor. Die Ergebnisse der Auswertung werden, anhand des vorgeschlagenen Benchmarks, für Picasso und einen weiteren, auf Emotionen basierenden Ansatz, vorgestellt. Zusätzlich behandeln wir zwei Effizienzaspekte, die sich aus dem Picasso Ansatz ergeben. (i) Wir untersuchen das Problem der Ausführung von top-K Anfragen, bei denen die Ergebnismenge ad-hoc auf eine kleine Teilmenge des gesamten Indexes eingeschränkt wird. (ii) Wir behandeln das Problem der Ähnlichkeitssuche in hochdimensionalen Räumen und schlagen zwei Erweiterungen des Lokalitätssensitiven Hashing (LSH) Schemas vor. Zusätzlich untersuchen wir die Erfolgsaussichten eines verteilten Algorithmus für die Ähnlichkeitssuche, der auf LSH unter Verwendung des MapReduce Frameworks basiert. Neben den vorgenannten wissenschaftlichen Ergebnissen beschreiben wir ferner das Design und die Implementierung von PicassSound, einer auf Picasso basierenden Smartphone-Anwendung
    corecore