5,778 research outputs found
When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors
Finding similar user pairs is a fundamental task in social networks, with
numerous applications in ranking and personalization tasks such as link
prediction and tie strength detection. A common manifestation of user
similarity is based upon network structure: each user is represented by a
vector that represents the user's network connections, where pairwise cosine
similarity among these vectors defines user similarity. The predominant task
for user similarity applications is to discover all similar pairs that have a
pairwise cosine similarity value larger than a given threshold . In
contrast to previous work where is assumed to be quite close to 1, we
focus on recommendation applications where is small, but still
meaningful. The all pairs cosine similarity problem is computationally
challenging on networks with billions of edges, and especially so for settings
with small . To the best of our knowledge, there is no practical solution
for computing all user pairs with, say on large social networks,
even using the power of distributed algorithms.
Our work directly addresses this challenge by introducing a new algorithm ---
WHIMP --- that solves this problem efficiently in the MapReduce model. The key
insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for
approximate matrix multiplication with the SimHash random projection techniques
of Charikar. We provide a theoretical analysis of WHIMP, proving that it has
near optimal communication costs while maintaining computation cost comparable
with the state of the art. We also empirically demonstrate WHIMP's scalability
by computing all highly similar pairs on four massive data sets, and show that
it accurately finds high similarity pairs. In particular, we note that WHIMP
successfully processes the entire Twitter network, which has tens of billions
of edges
Spatially and Temporally Directed Noise Cancellation Using Federated Learning
Machine learning models can be trained to cancel noise of diverse types or spectral characteristics, e.g. traffic noise, background chatter, etc. Such models are trained by feeding training data that includes labeled noise waveforms, which is an expensive and time-consuming procedure. Further, the effectiveness of such machine learning models is limited in canceling types of noise absent from training data. Trained models occupy significant amounts of memory which limits their use in consumer devices. This disclosure describes the use of federated learning techniques to train noise canceling models locally at diverse device locations and times. With user permission, the trained models are tagged with timestamp and location, such that when a user device has time or location matching a particular noise cancellation model, the particular model is provided to the user device. Noise cancellation on the user device is then performed with a compact machine learning model that is suited to the time and location of the user device
Decremental All-Pairs ALL Shortest Paths and Betweenness Centrality
We consider the all pairs all shortest paths (APASP) problem, which maintains
the shortest path dag rooted at every vertex in a directed graph G=(V,E) with
positive edge weights. For this problem we present a decremental algorithm
(that supports the deletion of a vertex, or weight increases on edges incident
to a vertex). Our algorithm runs in amortized O(\vstar^2 \cdot \log n) time per
update, where n=|V|, and \vstar bounds the number of edges that lie on shortest
paths through any given vertex. Our APASP algorithm can be used for the
decremental computation of betweenness centrality (BC), a graph parameter that
is widely used in the analysis of large complex networks. No nontrivial
decremental algorithm for either problem was known prior to our work. Our
method is a generalization of the decremental algorithm of Demetrescu and
Italiano [DI04] for unique shortest paths, and for graphs with \vstar =O(n), we
match the bound in [DI04]. Thus for graphs with a constant number of shortest
paths between any pair of vertices, our algorithm maintains APASP and BC scores
in amortized time O(n^2 \log n) under decremental updates, regardless of the
number of edges in the graph.Comment: An extended abstract of this paper will appear in Proc. ISAAC 201
Comparative Study of Popular Data Mining Algorithms
Data Science is an appealing field , in the present world due to advancement of science as there is huge assortment of data which exist in numerous forms . Such data must be handled with care and store safely so that it can be retrieved as per needs. Some of the popular or commonly used algorithms are Apriori algorithm, K Means Clustering, Support Vector machines(SVM) and Association Rule Mining algorithms. This paper focus on the above mentioned algorithms and a comparison is made in terms of Technique, Time Utilization Software taking real time data examples
Coupled discrete/continuum simulations of the impact of granular slugs with clamped beams: stand-off effects
Coupled discrete particle/continuum simulations of the normal (zero obliquity) impact of granular slugs against the centre of deformable, end-clamped beams are reported. The simulations analyse the experiments of Uth et al. (2015) enabling a detailed interpretation of their observations of temporal evolution of granular slug and a strong stand-off distance dependence of the structural response. The high velocity granular slugs were generated by the pushing action of a piston and develop a spatial velocity gradient due to elastic energy stored during the loading phase by the piston. The velocity gradient within the “stretching” slug is a strong function of the inter-particle contact stiffness and the time the piston takes to ramp up to its final velocity. Other inter-particle contact properties such as damping and friction are shown to have negligible effect on the evolution of the granular slug. The velocity gradients result in a slug density that decreases with increasing stand-off distance, and therefore the pressure imposed by the slug on the beams is reduced with increasing stand-off. This results in the stand-off dependence of the beam's deflection observed by Uth et al. (2015). The coupled simulations capture both the permanent deflections of the beams and their dynamic deformation modes with a high degree of fidelity. These simulations shed new light on the stand-off effect observed during the loading of structures by shallow-buried explosions
Error Measures for Noise-Free Surrogate Approximations
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/76064/1/AIAA-2008-901-367.pd
Budget feasible mechanisms on matroids
Motivated by many practical applications, in this paper we study budget feasible mechanisms where the goal is to procure independent sets from matroids. More specifically, we are given a matroid =(,) where each ground (indivisible) element is a selfish agent. The cost of each element (i.e., for selling the item or performing a service) is only known to the element itself. There is a buyer with a budget having additive valuations over the set of elements E. The goal is to design an incentive compatible (truthful) budget feasible mechanism which procures an independent set of the matroid under the given budget that yields the largest value possible to the buyer. Our result is a deterministic, polynomial-time, individually rational, truthful and budget feasible mechanism with 4-approximation to the optimal independent set. Then, we extend our mechanism to the setting of matroid intersections in which the goal is to procure common independent sets from multiple matroids. We show that, given a polynomial time deterministic blackbox that returns -approximation solutions to the matroid intersection problem, there exists a deterministic, polynomial time, individually rational, truthful and budget feasible mechanism with (3+1) -approximation to the optimal common independent set
Building a GUI Application for Viewing and Searching Apache Kafka Messages
Apache Kafka is a scalable messaging system that follows Publish-Subscribe Model as its core. Several traditional messaging system like MSMQ, RabbitMQ exist but they have limitations in terms of performance and throughput. Kafka, developed at LinkedIn is the latest messaging technology being adopted by most of the top internet companies. The purpose of this paper is to provide a GUI and search tool, to view and monitor messages insideKafka
Improving the Hydrodynamic Performance of Diffuser Vanes via Shape Optimization
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/77366/1/AIAA-2007-5551-433.pd
- …