63 research outputs found
The Zipf-Polylog distribution: Modeling human interactions through social networks
The Zipf distribution attracts considerable attention because it helps describe data from natural as well as man-made systems. Nevertheless, in most of the cases the Zipf is only appropriate to fit data in the upper tail. This is why it is important to dispose of Zipf extensions that allow to fit the data in its entire range. In this paper, we introduce the Zipf-Polylog family of distributions as a two-parameter generalization of the Zipf. The extended family contains the Zipf, the geometric, the logarithmic series and the shifted negative binomial with two successes, as particular distributions. We deduce important properties of the new family and demonstrate its suitability by analyzing the degree sequence of two real networks in all its range.Peer ReviewedPostprint (author's final draft
Pareto and Zipf laws for city size distribution
Pareto and Zipf distributions have been used in the modeling of distinct phenomena, namely in biology, demography, computer science, economics, amongst others. In this paper, it is presented a short review of applications of these distributions in city sizes.N/
A review of power laws in real life phenomena
Power law distributions, also known as heavy tail distributions, model distinct real life
phenomena in the areas of biology, demography, computer science, economics, information
theory, language, and astronomy, amongst others. In this paper, it is presented a
review of the literature having in mind applications and possible explanations for the
use of power laws in real phenomena. We also unravel some controversies around power
laws
MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms
The increasing size of input graphs for graph neural networks (GNNs)
highlights the demand for using multi-GPU platforms. However, existing
multi-GPU GNN systems optimize the computation and communication individually
based on the conventional practice of scaling dense DNNs. For irregularly
sparse and fine-grained GNN workloads, such solutions miss the opportunity to
jointly schedule/optimize the computation and communication operations for
high-performance delivery. To this end, we propose MGG, a novel system design
to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its
novel dynamic software pipeline to facilitate fine-grained
computation-communication overlapping within a GPU kernel. Specifically, MGG
introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to
facilitate workload balancing and operation overlapping. MGG also incorporates
an intelligent runtime design with analytical modeling and optimization
heuristics to dynamically improve the execution performance. Extensive
evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems
across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL,
MGG-UVM, and ROC, respectively
A Comparative Study Between Two Discrete Lindley Distributions
The methods of generate a probability function from a probability density function has long been used in recent years. In general, the discretization process produces probability functions that can be rivals to traditional distributions used in the analysis of count data as the geometric, the Poisson and negative binomial distributions. In this paper, by the method based on an infinite series, we studied an alternative discrete Lindley distribution to those study in Gomez (2011) and Bakouch (2014). For both distributions, a simulation study is carried out to examine the bias and mean squared error of the maximum likelihood estimators of the parameters as well as the coverage probability and the width of the confidence intervals. For the discrete Lindley distribution obtained by infinite series method we present the analytical expression for bias reduction of the maximum likelihood estimator. Some examples using real data from the literature show the potential of these distributions.
A Behavior-Based Approach To Securing Email Systems
The Malicious Email Tracking (MET) system, reported in a prior publication, is a behavior-based security system for email services. The Email Mining Toolkit (EMT) presented in this paper is an offline email archive data mining analysis system that is designed to assist computing models of malicious email behavior for deployment in an online MET system. EMT includes a variety of behavior models for email attachments, user accounts and groups of accounts. Each model computed is used to detect anomalous and errant email behaviors. We report on the set of features implemented in the current version of EMT, and describe tests of the system and our plans for extensions to the set of models
- …