293 research outputs found

    Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff

    Get PDF
    The relative ease of collaborative data science and analysis has led to a proliferation of many thousands or millions of versionsversions of the same datasets in many scientific and commercial domains, acquired or constructed at various stages of data analysis across many users, and often over long periods of time. Managing, storing, and recreating these dataset versions is a non-trivial task. The fundamental challenge here is the storage−recreation  trade−offstorage-recreation\;trade-off: the more storage we use, the faster it is to recreate or retrieve versions, while the less storage we use, the slower it is to recreate or retrieve versions. Despite the fundamental nature of this problem, there has been a surprisingly little amount of work on it. In this paper, we study this trade-off in a principled manner: we formulate six problems under various settings, trading off these quantities in various ways, demonstrate that most of the problems are intractable, and propose a suite of inexpensive heuristics drawing from techniques in delay-constrained scheduling, and spanning tree literature, to solve these problems. We have built a prototype version management system, that aims to serve as a foundation to our DATAHUB system for facilitating collaborative data science. We demonstrate, via extensive experiments, that our proposed heuristics provide efficient solutions in practical dataset versioning scenarios

    The Traveling Salesman Problem Under Squared Euclidean Distances

    Get PDF
    Let PP be a set of points in Rd\mathbb{R}^d, and let α≥1\alpha \ge 1 be a real number. We define the distance between two points p,q∈Pp,q\in P as ∣pq∣α|pq|^{\alpha}, where ∣pq∣|pq| denotes the standard Euclidean distance between pp and qq. We denote the traveling salesman problem under this distance function by TSP(d,αd,\alpha). We design a 5-approximation algorithm for TSP(2,2) and generalize this result to obtain an approximation factor of 3α−1+6α/33^{\alpha-1}+\sqrt{6}^{\alpha}/3 for d=2d=2 and all α≥2\alpha\ge2. We also study the variant Rev-TSP of the problem where the traveling salesman is allowed to revisit points. We present a polynomial-time approximation scheme for Rev-TSP(2,α)(2,\alpha) with α≥2\alpha\ge2, and we show that Rev-TSP(d,α)(d, \alpha) is APX-hard if d≥3d\ge3 and α>1\alpha>1. The APX-hardness proof carries over to TSP(d,α)(d, \alpha) for the same parameter ranges.Comment: 12 pages, 4 figures. (v2) Minor linguistic change

    Clustering and Hybrid Routing in Mobile Ad Hoc Networks

    Get PDF
    This dissertation focuses on clustering and hybrid routing in Mobile Ad Hoc Networks (MANET). Specifically, we study two different network-layer virtual infrastructures proposed for MANET: the explicit cluster infrastructure and the implicit zone infrastructure. In the first part of the dissertation, we propose a novel clustering scheme based on a number of properties of diameter-2 graphs to provide a general-purpose virtual infrastructure for MANET. Compared to virtual infrastructures with central nodes, our virtual infrastructure is more symmetric and stable, but still light-weight. In our clustering scheme, cluster initialization naturally blends into cluster maintenance, showing the unity between these two operations. We call our algorithm tree-based since cluster merge and split operations are performed based on a spanning tree maintained at some specific nodes. Extensive simulation results have shown the effectiveness of our clustering scheme when compared to other schemes proposed in the literature. In the second part of the dissertation, we propose TZRP (Two-Zone Routing Protocol) as a hybrid routing framework that can balance the tradeoffs between pure proactive, fuzzy proactive, and reactive routing approaches more effectively in a wide range of network conditions. In TZRP, each node maintains two zones: a Crisp Zone for proactive routing and efficient bordercasting, and a Fuzzy Zone for heuristic routing using imprecise locality information. The perimeter of the Crisp Zone is the boundary between pure proactive routing and fuzzy proactive routing, and the perimeter of the Fuzzy Zone is the boundary between proactive routing and reactive routing. By adjusting the sizes of these two zones, a reduced total routing control overhead can be achieved

    A distributed topology control technique for low interference and energy efficiency in wireless sensor networks

    Get PDF
    Wireless sensor networks are used in several multi-disciplinary areas covering a wide variety of applications. They provide distributed computing, sensing and communication in a powerful integration of capabilities. They have great long-term economic potential and have the ability to transform our lives. At the same time however, they pose several challenges – mostly as a result of their random deployment and non-renewable energy sources.Among the most important issues in wireless sensor networks are energy efficiency and radio interference. Topology control plays an important role in the design of wireless ad hoc and sensor networks; it is capable of constructing networks that have desirable characteristics such as sparser connectivity, lower transmission power and a smaller node degree.In this research a distributed topology control technique is presented that enhances energy efficiency and reduces radio interference in wireless sensor networks. Each node in the network makes local decisions about its transmission power and the culmination of these local decisions produces a network topology that preserves global connectivity. The topology that is produced consists of a planar graph that is a power spanner, it has lower node degrees and can be constructed using local information. The network lifetime is increased by reducing transmission power and the use of low node degrees reduces traffic interference. The approach to topology control that is presented in this document has an advantage over previously developed approaches in that it focuses not only on reducing either energy consumption or radio interference, but on reducing both of these obstacles. Results are presented of simulations that demonstrate improvements in performance. AFRIKAANS : Draadlose sensor netwerke word gebruik in verskeie multi-dissiplinêre areas wat 'n wye verskeidenheid toepassings dek. Hulle voorsien verspreide berekening, bespeuring en kommunikasie in 'n kragtige integrate van vermoëns. Hulle het goeie langtermyn ekonomiese potentiaal en die vermoë om ons lewens te herskep. Terselfdertyd lewer dit egter verskeie uitdagings op as gevolg van hul lukrake ontplooiing en nie-hernubare energie bronne. Van die belangrikste kwessies in draadlose sensor netwerke is energie-doeltreffendheid en radiosteuring. Topologie-beheer speel 'n belangrike rol in die ontwerp van draadlose informele netwerke en sensor netwerke en dit is geskik om netwerke aan te bring wat gewenste eienskappe het soos verspreide koppeling, laer transmissiekrag en kleiner nodus graad.In hierdie ondersoek word 'n verspreide topologie beheertegniek voorgelê wat energie-doeltreffendheid verhoog en radiosteuring verminder in draadlose sensor netwerke. Elke nodus in die netwerk maak lokale besluite oor sy transmissiekrag en die hoogtepunt van hierdie lokale besluite lewer 'n netwerk-topologie op wat globale verbintenis behou.Die topologie wat gelewer word is 'n tweedimensionele grafiek en 'n kragsleutel; dit het laer nodus grade en kan gebou word met lokale inligting. Die netwerk-leeftyd word vermeerder deur transmissiekrag te verminder en verkeer-steuring word verminder deur lae nodus grade. Die benadering tot topologie-beheer wat voorgelê word in hierdie skrif het 'n voordeel oor benaderings wat vroeër ontwikkel is omdat dit nie net op die vermindering van net energie verbruik of net radiosteuring fokus nie, maar op albei. Resultate van simulasies word voorgelê wat die verbetering in werkverrigting demonstreer.Dissertation (MEng)--University of Pretoria, 2010.Electrical, Electronic and Computer Engineeringunrestricte

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot
    • …
    corecore