5,746 research outputs found

    When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

    Full text link
    Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold τ\tau. In contrast to previous work where τ\tau is assumed to be quite close to 1, we focus on recommendation applications where τ\tau is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small τ\tau. To the best of our knowledge, there is no practical solution for computing all user pairs with, say τ=0.2\tau = 0.2 on large social networks, even using the power of distributed algorithms. Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges

    Spatially and Temporally Directed Noise Cancellation Using Federated Learning

    Get PDF
    Machine learning models can be trained to cancel noise of diverse types or spectral characteristics, e.g. traffic noise, background chatter, etc. Such models are trained by feeding training data that includes labeled noise waveforms, which is an expensive and time-consuming procedure. Further, the effectiveness of such machine learning models is limited in canceling types of noise absent from training data. Trained models occupy significant amounts of memory which limits their use in consumer devices. This disclosure describes the use of federated learning techniques to train noise canceling models locally at diverse device locations and times. With user permission, the trained models are tagged with timestamp and location, such that when a user device has time or location matching a particular noise cancellation model, the particular model is provided to the user device. Noise cancellation on the user device is then performed with a compact machine learning model that is suited to the time and location of the user device

    Decremental All-Pairs ALL Shortest Paths and Betweenness Centrality

    Full text link
    We consider the all pairs all shortest paths (APASP) problem, which maintains the shortest path dag rooted at every vertex in a directed graph G=(V,E) with positive edge weights. For this problem we present a decremental algorithm (that supports the deletion of a vertex, or weight increases on edges incident to a vertex). Our algorithm runs in amortized O(\vstar^2 \cdot \log n) time per update, where n=|V|, and \vstar bounds the number of edges that lie on shortest paths through any given vertex. Our APASP algorithm can be used for the decremental computation of betweenness centrality (BC), a graph parameter that is widely used in the analysis of large complex networks. No nontrivial decremental algorithm for either problem was known prior to our work. Our method is a generalization of the decremental algorithm of Demetrescu and Italiano [DI04] for unique shortest paths, and for graphs with \vstar =O(n), we match the bound in [DI04]. Thus for graphs with a constant number of shortest paths between any pair of vertices, our algorithm maintains APASP and BC scores in amortized time O(n^2 \log n) under decremental updates, regardless of the number of edges in the graph.Comment: An extended abstract of this paper will appear in Proc. ISAAC 201

    Comparative Study of Popular Data Mining Algorithms

    Get PDF
    Data Science is an appealing field , in the present world due to advancement of science as there is huge assortment of data which exist in numerous forms . Such data must be handled with care and store safely so that it can be retrieved as per needs. Some of the popular or commonly used algorithms are Apriori algorithm, K Means Clustering, Support Vector machines(SVM) and Association Rule Mining algorithms. This paper focus on the above mentioned algorithms and a comparison is made in terms of Technique, Time Utilization Software taking real time data examples

    Coupled discrete/continuum simulations of the impact of granular slugs with clamped beams: stand-off effects

    Get PDF
    Coupled discrete particle/continuum simulations of the normal (zero obliquity) impact of granular slugs against the centre of deformable, end-clamped beams are reported. The simulations analyse the experiments of Uth et al. (2015) enabling a detailed interpretation of their observations of temporal evolution of granular slug and a strong stand-off distance dependence of the structural response. The high velocity granular slugs were generated by the pushing action of a piston and develop a spatial velocity gradient due to elastic energy stored during the loading phase by the piston. The velocity gradient within the “stretching” slug is a strong function of the inter-particle contact stiffness and the time the piston takes to ramp up to its final velocity. Other inter-particle contact properties such as damping and friction are shown to have negligible effect on the evolution of the granular slug. The velocity gradients result in a slug density that decreases with increasing stand-off distance, and therefore the pressure imposed by the slug on the beams is reduced with increasing stand-off. This results in the stand-off dependence of the beam's deflection observed by Uth et al. (2015). The coupled simulations capture both the permanent deflections of the beams and their dynamic deformation modes with a high degree of fidelity. These simulations shed new light on the stand-off effect observed during the loading of structures by shallow-buried explosions

    Error Measures for Noise-Free Surrogate Approximations

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/76064/1/AIAA-2008-901-367.pd

    Budget feasible mechanisms on matroids

    Get PDF
    Motivated by many practical applications, in this paper we study budget feasible mechanisms where the goal is to procure independent sets from matroids. More specifically, we are given a matroid =(,) where each ground (indivisible) element is a selfish agent. The cost of each element (i.e., for selling the item or performing a service) is only known to the element itself. There is a buyer with a budget having additive valuations over the set of elements E. The goal is to design an incentive compatible (truthful) budget feasible mechanism which procures an independent set of the matroid under the given budget that yields the largest value possible to the buyer. Our result is a deterministic, polynomial-time, individually rational, truthful and budget feasible mechanism with 4-approximation to the optimal independent set. Then, we extend our mechanism to the setting of matroid intersections in which the goal is to procure common independent sets from multiple matroids. We show that, given a polynomial time deterministic blackbox that returns -approximation solutions to the matroid intersection problem, there exists a deterministic, polynomial time, individually rational, truthful and budget feasible mechanism with (3+1) -approximation to the optimal common independent set

    Building a GUI Application for Viewing and Searching Apache Kafka Messages

    Get PDF
    Apache Kafka is a scalable messaging system that follows Publish-Subscribe Model as its core. Several traditional messaging system like MSMQ, RabbitMQ exist but they have limitations in terms of performance and throughput. Kafka, developed at LinkedIn is the latest messaging technology being adopted by most of the top internet companies. The purpose of this paper is to provide a GUI and search tool, to view and monitor messages insideKafka
    corecore