13 research outputs found

    GPU accelerated maximum cardinality matching algorithms for bipartite graphs

    Get PDF
    We design, implement, and evaluate GPU-based algorithms for the maximum cardinality matching problem in bipartite graphs. Such algorithms have a variety of applications in computer science, scientific computing, bioinformatics, and other areas. To the best of our knowledge, ours is the first study which focuses on GPU implementation of the maximum cardinality matching algorithms. We compare the proposed algorithms with serial and multicore implementations from the literature on a large set of real-life problems where in majority of the cases one of our GPU-accelerated algorithms is demonstrated to be faster than both the sequential and multicore implementations.Comment: 14 pages, 5 figure

    GLive: The Gradient overlay as a market maker for mesh-based P2P live streaming

    Get PDF
    Peer-to-Peer (P2P) live video streaming over the Internet is becoming increasingly popular, but it is still plagued by problems of high playback latency and intermittent playback streams. This paper presents GLive, a distributed market-based solution that builds a mesh overlay for P2P live streaming. The mesh overlay is constructed such that (i) nodes with increasing upload bandwidth are located closer to the media source, and (ii) nodes with similar upload bandwidth become neighbours. We introduce a market-based approach that matches nodes willing and able to share the stream with one another. However, market-based approaches converge slowly on random overlay networks, and we improve the rate of convergence by adapting our market-based algorithm to exploit the clustering of nodes with similar upload bandwidths in our mesh overlay. We address the problem of free-riding through nodes preferentially uploading more of the stream to the best uploaders. We compare GLive with our previous tree-based streaming protocol, Sepidar, and NewCoolstreaming in simulation, and our results show significantly improved playback continuity and playback latency

    Activity recognition from videos with parallel hypergraph matching on GPUs

    Full text link
    In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph matching algorithm for GPUs. Traditionally, graphs and hypergraphs are frequently used to recognize complex and often non-rigid patterns in computer vision, either through graph matching or point-set matching with graphs. Most formulations resort to the minimization of a difficult discrete energy function mixing geometric or structural terms with data attached terms involving appearance features. Traditional methods solve this minimization problem approximately, for instance with spectral techniques. In this work, instead of solving the problem approximatively, the exact solution for the optimal assignment is calculated in parallel on GPUs. The graphical structure is simplified and regularized, which allows to derive an efficient recursive minimization algorithm. The algorithm distributes subproblems over the calculation units of a GPU, which solves them in parallel, allowing the system to run faster than real-time on medium-end GPUs

    TrackSort: Predictive Tracking for Sorting Uncooperative Bulk Materials

    Get PDF
    Optical belt sorters are a versatile, state-of-the-art technology to sort bulk materials that are hard to sort based on only nonvisual properties. In this paper, we propose an extension to current optical belt sorters that involves replacing the line camera with an area camera to observe a wider field of view, allowing us to observe each particle over multiple time steps. By performing multitarget tracking, we are able to improve the prediction of each particle‘s movement and thus enhance the performance of the utilized separation mechanism. We show that our approach will allow belt sorters to handle new classes of bulk materials while improving cost efficiency. Furthermore, we lay out additional extensions that are made possible by our new paradig

    Constraints Propagation on GPU: A Case Study for AllDifferent

    Get PDF
    The AllDifferent constraint is a fundamental tool in Constraint Programming. It naturally arises in many problems, from puzzles to scheduling and routing applications. Such popularity has prompted an extensive literature on filtering and propagation for this constraint. Motivated by the benefits that GPUs offer to other branches of AI, this paper investigates the use of GPUs to accelerate filtering and propagation. In particular, we present an efficient parallelization of the AllDifferent constraint on GPU; we analyze different design and implementation choices and evaluates the performance of the resulting system on medium to large instances of the Travelling Salesman Problem with encouraging results

    Real-time multitarget tracking for sensor-based sorting – A new implementation of the auction algorithm for graphics processing units

    Get PDF
    Utilizing parallel algorithms is an established way of increasing performance in systems that are bound to real-time restrictions. Sensor-based sorting is a machine vision application for which firm real-time requirements need to be respected in order to reliably remove potentially harmful entities from a material feed. Recently, employing a predictive tracking approach using multitarget tracking in order to decrease the error in the physical separation in optical sorting has been proposed. For implementations that use hard associations between measurements and tracks, a linear assignment problem has to be solved for each frame recorded by a camera. The auction algorithm can be utilized for this purpose, which also has the advantage of being well suited for parallel architectures. In this paper, an improved implementation of this algorithm for a graphics processing unit (GPU) is presented. The resulting algorithm is implemented in both an OpenCL and a CUDA based environment. By using an optimized data structure, the presented algorithm outperforms recently proposed implementations in terms of speed while retaining the quality of output of the algorithm. Furthermore, memory requirements are significantly decreased, which is important for embedded systems. Experimental results are provided for two different GPUs and six datasets. It is shown that the proposed approach is of particular interest for applications dealing with comparatively large problem sizes

    A Sparse Multi-Scale Algorithm for Dense Optimal Transport

    Full text link
    Discrete optimal transport solvers do not scale well on dense large problems since they do not explicitly exploit the geometric structure of the cost function. In analogy to continuous optimal transport we provide a framework to verify global optimality of a discrete transport plan locally. This allows construction of an algorithm to solve large dense problems by considering a sequence of sparse problems instead. The algorithm lends itself to being combined with a hierarchical multi-scale scheme. Any existing discrete solver can be used as internal black-box.Several cost functions, including the noisy squared Euclidean distance, are explicitly detailed. We observe a significant reduction of run-time and memory requirements.Comment: Published "online first" in Journal of Mathematical Imaging and Vision, see DO

    Distributed Optimization of P2P Media Delivery Overlays

    Get PDF
    Media streaming over the Internet is becoming increasingly popular. Currently, most media is delivered using global content-delivery networks, providing a scalable and robust client-server model. However, content delivery infrastructures are expensive. One approach to reduce the cost of media delivery is to use peer-to-peer (P2P) overlay networks, where nodes share responsibility for delivering the media to one another. The main challenges in P2P media streaming using overlay networks include: (i) nodes should receive the stream with respect to certain timing constraints, (ii) the overlay should adapt to the changes in the network, e.g., varying bandwidth capacity and join/failure of nodes, (iii) nodes should be intentivized to contribute and share their resources, and (iv) nodes should be able to establish connectivity to the other nodes behind NATs. In this work, we meet these requirements by presenting P2P solutions for live media streaming, as well as proposing a distributed NAT traversal solution. First of all, we introduce a distributed market model to construct an approximately minimal height multiple-tree streaming overlay for content delivery, in gradienTv. In this system, we assume all the nodes are cooperative and execute the protocol. However, in reality, there may exist some opportunistic nodes, free-riders, that take advantage of the system, without contributing to content distribution. To overcome this problem, we extend our market model in Sepidar to be effective in deterring free-riders. However, gradienTv and Sepidar are tree-based solutions, which are fragile in high churn and failure scenarios. We present a solution to this problem in GLive that provides a more robust overlay by replacing the tree structure with a mesh. We show in simulation, that the mesh-based overlay outperforms the multiple-tree overlay. Moreover, we compare the performance of all our systems with the state-of-the-art NewCoolstreaming, and observe that they provide better playback continuity and lower playback latency than that of NewCoolstreaming under a variety of experimental scenarios. Although our distributed market model can be run against a random sample of nodes, we improve its convergence time by executing it against a sample of nodes taken from the Gradient overlay. The Gradient overlay organizes nodes in a topology using a local utility value at each node, such that nodes are ordered in descending utility values away from a core of the highest utility nodes. The evaluations show that the streaming overlays converge faster when our market model works on top of the Gradient overlay. We use a gossip-based peer sampling service in our streaming systems to provide each node with a small list of live nodes. However, in the Internet, where a high percentage of nodes are behind NATs, existing gossiping protocols break down. To solve this problem, we present Gozar, a NAT-friendly gossip-based peer sampling service that: (i) provides uniform random samples in the presence of NATs, and (ii) enables direct connectivity to sampled nodes using a fully distributed NAT traversal service. We compare Gozar with the state-of-the-art NAT-friendly gossip-based peer sampling service, Nylon, and show that only Gozar supports one-hop NAT traversal, and its overhead is roughly half of Nylon’s

    Live Streaming in P2P and Hybrid P2P-Cloud Environments for the Open Internet

    Get PDF
    Peer-to-Peer (P2P) live media streaming is an emerging technology that reduces the barrier to stream live events over the Internet. However, providing a high quality media stream using P2P overlay networks is challenging and gives raise to a number of issues: (i) how to guarantee quality of the service (QoS) in the presence of dynamism, (ii) how to incentivize nodes to participate in media distribution, (iii) how to avoid bottlenecks in the overlay, and (iv) how to deal with nodes that reside behind Network Address Translators gateways (NATs). In this thesis, we answer the above research questions in form of new algorithms and systems. First of all, we address problems (i) and (ii) by presenting our P2P live media streaming solutions: Sepidar, which is a multiple-tree overlay, and GLive, which is a mesh overlay. In both models, nodes with higher upload bandwidth are positioned closer to the media source. This structure reduces the playback latency and increases the playback continuity at nodes, and also incentivizes the nodes to provide more upload bandwidth. We use a reputation model to improve participating nodes in media distribution in Sepidar and GLive. In both systems, nodes audit the behaviour of their directly connected nodes by getting feedback from other nodes. Nodes who upload more of the stream get a relatively higher reputation, and proportionally higher quality streams. To construct our streaming overlay, we present a distributed market model inspired by Bertsekas auction algorithm, although our model does not rely on a central server with global knowledge. In our model, each node has only partial information about the system. Nodes acquire knowledge of the system by sampling nodes using the Gradient overlay, where it facilitates the discovery of nodes with similar upload bandwidth. We address the bottlenecks problem, problem (iii), by presenting CLive that satisfies real-time constraints on delay between the generation of the stream and its actual delivery to users. We resolve this problem by borrowing some resources (helpers) from the cloud, upon need. In our approach, helpers are added on demand to the overlay, to increase the amount of total available bandwidth, thus increasing the probability of receiving the video on time. As the use of cloud resources costs money, we model the problem as the minimization of the economical cost, provided that a set of constraints on QoS is satisfied. Finally, we solve the NAT problem, problem (iv), by presenting two NAT-aware peer sampling services (PSS): Gozar and Croupier. Traditional gossip-based PSS breaks down, where a high percentage of nodes are behind NATs. We overcome this problem in Gozar using one-hop relaying to communicate with the nodes behind NATs. Croupier similarly implements a gossip-based PSS, but without the use of relaying
    corecore