63 research outputs found

    A note on the data-driven capacity of P2P networks

    Get PDF
    We consider two capacity problems in P2P networks. In the first one, the nodes have an infinite amount of data to send and the goal is to optimally allocate their uplink bandwidths such that the demands of every peer in terms of receiving data rate are met. We solve this problem through a mapping from a node-weighted graph featuring two labels per node to a max flow problem on an edge-weighted bipartite graph. In the second problem under consideration, the resource allocation is driven by the availability of the data resource that the peers are interested in sharing. That is a node cannot allocate its uplink resources unless it has data to transmit first. The problem of uplink bandwidth allocation is then equivalent to constructing a set of directed trees in the overlay such that the number of nodes receiving the data is maximized while the uplink capacities of the peers are not exceeded. We show that the problem is NP-complete, and provide a linear programming decomposition decoupling it into a master problem and multiple slave subproblems that can be resolved in polynomial time. We also design a heuristic algorithm in order to compute a suboptimal solution in a reasonable time. This algorithm requires only a local knowledge from nodes, so it should support distributed implementations. We analyze both problems through a series of simulation experiments featuring different network sizes and network densities. On large networks, we compare our heuristic and its variants with a genetic algorithm and show that our heuristic computes the better resource allocation. On smaller networks, we contrast these performances to that of the exact algorithm and show that resource allocation fulfilling a large part of the peer can be found, even for hard configuration where no resources are in excess.Comment: 10 pages, technical report assisting a submissio

    Compressing DNA sequence databases with coil

    Get PDF
    Background: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results: We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion: coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work

    Algorithms for cartographic visualization

    Get PDF
    Maps are effective tools for communicating information to the general public and help people to make decisions in, for example, navigation, spatial planning and politics. The mapmaker chooses the details to put on a map and the symbols to represent them. Not all details need to be geographic: thematic maps, which depict a single theme or attribute, such as population, income, crime rate, or migration, can very effectively communicate the spatial distribution of the visualized attribute. The vast amount of data currently available makes it infeasible to design all maps manually, and calls for automated cartography. In this thesis we presented efficient algorithms for the automated construction of various types of thematic maps. In Chapter 2 we studied the problem of drawing schematic maps. Schematic maps are a well-known cartographic tool; they visualize a set of nodes and edges (for example, highway or metro networks) in simplified form to communicate connectivity information as effectively as possible. Many schematic maps deviate substantially from the underlying geography since edges and vertices of the original network are moved in the simplification process. This can be a problem if we want to integrate the schematized network with a geographic map. In this scenario the schematized network has to be drawn with few orientations and links, while critical features (cities, lakes, etc.) of the base map are not obscured and retain their correct topological position with respect to the network. We developed an efficient algorithm to compute a collection of non-crossing paths with fixed orientations using as few links as possible. This algorithm approximates the optimal solution to within a factor that depends only on the number of allowed orientations. We can also draw the roads with different thicknesses, allowing us to visualize additional data related to the roads such as trafic volume. In Chapter 3 we studied methods to visualize quantitative data related to geographic regions. We first considered rectangular cartograms. Rectangular cartograms represent regions by rectangles; the positioning and adjacencies of these rectangles are chosen to suggest their geographic locations to the viewer, while their areas are chosen to represent the numeric values being communicated by the cartogram. One drawback of rectangular cartograms is that not every rectangular layout can be used to visualize all possible area assignments. Rectangular layouts that do have this property are called area-universal. We show that area-universal layouts are always one-sided, and we present algorithms to find one-sided layouts given a set of adjacencies. Rectangular cartograms often provide a nice visualization of quantitative data, but cartograms deform the underlying regions according to the data, which can make the map virtually unrecognizable if the data value differs greatly from the original area of a region or if data is not available at all for a particular region. A more direct method to visualize the data is to place circular symbols on the corresponding region, where the areas of the symbols correspond to the data. However, these maps, so-called symbol maps, can appear very cluttered with many overlapping symbols if large data values are associated with small regions. In Chapter 4 we proposed a novel type of quantitative thematic map, called necklace map, which overcomes these limitations. Instead of placing the symbols directly on a region, we place the symbols on a closed curve, the necklace, which surrounds the map. The location of a symbol on the necklace should be chosen in such a way that the relation between symbol and region is as clear as possible. Necklace maps appear clear and uncluttered and allow for comparatively large symbol sizes. We developed algorithms to compute necklace maps and demonstrated our method with experiments using various data sets and maps. In Chapter 5 and 6 we studied the automated creation of ow maps. Flow maps are thematic maps that visualize the movement of objects, such as people or goods, between geographic regions. One or more sources are connected to several targets by lines whose thickness corresponds to the amount of ow between a source and a target. Good ow maps reduce visual clutter by merging (bundling) lines smoothly and by avoiding self-intersections. We developed a new algorithm for drawing ow trees, ow maps with a single source. Unlike existing methods, our method merges lines smoothly and avoids self-intersections. Our method is based on spiral trees, a new type of Steiner trees that we introduced. Spiral trees have an angle restriction which makes them appear smooth and hence suitable for drawing ow maps. We study the properties of spiral trees and give an approximation algorithm to compute them. We also show how to compute ow trees from spiral trees and we demonstrate our approach with extensive experiments

    MATHEMATICAL PROGRAMMING ALGORITHMS FOR NETWORK OPTIMIZATION PROBLEMS

    Get PDF
    In the thesis we consider combinatorial optimization problems that are defined by means of networks. These problems arise when we need to take effective decisions to build or manage network structures, both satisfying the design constraints and minimizing the costs. In the thesis we focus our attention on the four following problems: - The Multicast Routing and Wavelength Assignment with Delay Constraint in WDM networks with heterogeneous capabilities (MRWADC) problem: this problem arises in the telecommunications industry and it requires to define an efficient way to make multicast transmissions on a WDM optical network. In more formal terms, to solve the MRWADC problem we need to identify, in a given directed graph that models the WDM optical network, a set of arborescences that connect the source of the transmission to all its destinations. These arborescences need to satisfy several quality-of-service constraints and need to take into account the heterogeneity of the electronic devices belonging to the WDM network. - The Homogeneous Area Problem (HAP): this problem arises from a particular requirement of an intermediate level of the Italian government called province. Each province needs to coordinate the common activities of the towns that belong to its territory. To practically perform its coordination role, the province of Milan created a customer care layer composed by a certain number of employees that have the task to support the towns of the province in their administrative works. For the sake of efficiency, the employees of this customer care layer have been partitioned in small groups and each group is assigned to a particular subset of towns that have in common a large number of activities. The HAP requires to identify the set of towns assigned to each group in order to minimize the redundancies generated by the towns that, despite having some activities in common, have been assigned to different groups. Since, for both historical and practical reasons, the towns in a particular subset need to be adjacent, the HAP can be effectively modeled as a particular graph partitioning problem that requires the connectivity of the obtained subgraphs and the satisfaction of nonlinear knapsack constraints. - Knapsack Prize Collecting Steiner Tree Problem (KPCSTP): to implement a Column Generation algorithm for the MRWADC problem and for the HAP, we need also to solve the two corresponding pricing problems. These two problems are very similar, both of them require to find an arborescence, contained in a given directed weighted graph, that minimizes the difference between its cost and the prizes associated with the spanned nodes. The two problems differ in the side constraints that their feasible solutions need to satisfy and in the way in which the cost of an arborescence is defined. The ILP formulations and the resolution methods that we developed to tackle these two problems have many characteristics in common with the ones used to solve other similar problems. To exemplify these similarities and to summarize and extend the techniques that we developed for the MRWADC problem and for the HAP, we also considered the KPCSTP. This problem requires to find a tree that minimizes the difference between the cost of the used arcs and the profits of the spanned nodes. However, not all trees are feasible: the sum of the weights of the nodes spanned by a feasible tree cannot exceed a given weight threshold. In the thesis we propose a computational comparison among several optimization methods for the KPCSTP that have been either already proposed in the literature or obtained modifying our ILP formulations for the two previous pricing problems. - The Train Design Optimization (TDO) problem: this problem was the topic of the second problem solving competition, sponsored in 2011 by the Railway Application Section (RAS) of the Institute for Operations Research and the Management Sciences (INFORMS). We participated to the contest and we won the second prize. After the competition, we continued to work on the TDO problem and in the thesis we describe the improved method that we have obtained at the end of this work. The TDO problem arises in the freight railroad industry. Typically, a freight railroad company receives requests from customers to transport a set of railcars from an origin rail yard to a destination rail yard. To satisfy these requests, the company first aggregates the railcars having the same origin and the same destination in larger blocks, and then it defines a trip plan to transport the obtained blocks to their correct destinations. The TDO problem requires to identify a trip plan that efficiently uses the limited resources of the considered rail company. More formally, given a railway network, a set of blocks and the segments of the network in which a crew can legally drive a train, the TDO problem requires to define a set of trains and the way in which the given blocks can be transported to their destinations by these trains, both satisfying operational constraints and minimizing the transportation costs

    When Stuck, Flip a Coin:New Algorithms for Large-Scale Tasks

    Get PDF
    Many modern services need to routinely perform tasks on a large scale. This prompts us to consider the following question: How can we design efficient algorithms for large-scale computation? In this thesis, we focus on devising a general strategy to address the above question. Our approaches use tools from graph theory and convex optimization, and prove to be very effective on a number of problems that exhibit locality. A recurring theme in our work is to use randomization to obtain simple and practical algorithms. The techniques we developed enabled us to make progress on the following questions: - Parallel Computation of Approximately Maximum Matchings. We put forth a new approach to computing O(1)O(1)-approximate maximum matchings in the Massively Parallel Computation (MPC) model. In the regime in which the memory per machine is Θ(n)\Theta(n), i.e., linear in the size of the vertex-set, our algorithm requires only O((loglogn)2)O((\log \log{n})^2) rounds of computations. This is an almost exponential improvement over the barrier of Ω(logn)\Omega(\log {n}) rounds that all the previous results required in this regime. - Parallel Computation of Maximal Independent Sets. We propose a simple randomized algorithm that constructs maximal independent sets in the MPC model. If the memory per machine is Θ(n)\Theta(n) our algorithm runs in O(loglogn)O(\log \log{n}) MPC-rounds. In the same regime, all the previously known algorithms required O(logn)O(\log{n}) rounds of computation. - Network Routing under Link Failures. We design a new protocol for stateless message-routing in kk-connected graphs. Our routing scheme has two important features: (1) each router performs the routing decisions based only on the local information available to it; and, (2) a message is delivered successfully even if arbitrary k1k-1 links have failed. This significantly improves upon the previous work of which the routing schemes tolerate only up to k/21k/2 - 1 failed links in kk-connected graphs. - Streaming Submodular Maximization under Element Removals. We study the problem of maximizing submodular functions subject to cardinality constraint kk, in the context of streaming algorithms. In a regime in which up to mm elements can be removed from the stream, we design an algorithm that provides a constant-factor approximation for this problem. At the same time, the algorithm stores only O(klog2k+mlog3k)O(k \log^2{k} + m \log^3{k}) elements. Our algorithm improves quadratically upon the prior work, that requires storing O(km)O(k \cdot m) many elements to solve the same problem. - Fast Recovery for the Separated Sparsity Model. In the context of compressed sensing, we put forth two recovery algorithms of nearly-linear time for the separated sparsity signals (that naturally model neural spikes). This improves upon the previous algorithm that had a quadratic running time. We also derive a refined version of the natural dynamic programming (DP) approach to the recovery of the separated sparsity signals. This DP approach leads to a recovery algorithm that runs in linear time for an important class of separated sparsity signals. Finally, we consider a generalization of these signals into two dimensions, and we show that computing an exact projection for the two-dimensional model is NP-hard

    Cooperative task assignment for multiple vehicles

    Get PDF

    Stochastic Combinatorial Optimization via Poisson Approximation

    Full text link
    We study several stochastic combinatorial problems, including the expected utility maximization problem, the stochastic knapsack problem and the stochastic bin packing problem. A common technical challenge in these problems is to optimize some function of the sum of a set of random variables. The difficulty is mainly due to the fact that the probability distribution of the sum is the convolution of a set of distributions, which is not an easy objective function to work with. To tackle this difficulty, we introduce the Poisson approximation technique. The technique is based on the Poisson approximation theorem discovered by Le Cam, which enables us to approximate the distribution of the sum of a set of random variables using a compound Poisson distribution. We first study the expected utility maximization problem introduced recently [Li and Despande, FOCS11]. For monotone and Lipschitz utility functions, we obtain an additive PTAS if there is a multidimensional PTAS for the multi-objective version of the problem, strictly generalizing the previous result. For the stochastic bin packing problem (introduced in [Kleinberg, Rabani and Tardos, STOC97]), we show there is a polynomial time algorithm which uses at most the optimal number of bins, if we relax the size of each bin and the overflow probability by eps. For stochastic knapsack, we show a 1+eps-approximation using eps extra capacity, even when the size and reward of each item may be correlated and cancelations of items are allowed. This generalizes the previous work [Balghat, Goel and Khanna, SODA11] for the case without correlation and cancelation. Our algorithm is also simpler. We also present a factor 2+eps approximation algorithm for stochastic knapsack with cancelations. the current known approximation factor of 8 [Gupta, Krishnaswamy, Molinaro and Ravi, FOCS11].Comment: 42 pages, 1 figure, Preliminary version appears in the Proceeding of the 45th ACM Symposium on the Theory of Computing (STOC13
    corecore