3 research outputs found

    The Zipf鈥揚oisson-stopped-sum distribution with an application for modeling the degree sequence of social networks

    No full text
    Under the Zipf Distribution, the frequency of a value is a power function of its size. Thus, when plotting frequencies versus size in log鈥搇og scale of data following that distribution, one obtains a straight line. The Zipf has been assumed to be appropriate for modeling highly skewed data from many different areas. Nevertheless, for many real data sets, the linearity is observed only in the tail; thus, the Zipf is fitted only for values larger than a given threshold and, consequently, there is a loss of information. The Zipf鈥揚oisson-stopped-sum distribution is introduced as a more flexible alternative. It is proven that in log鈥搇og scale allows for top-concavity, maintaining the linearity in the tail. Consequently, the distribution fits properly many data sets in their entire range. To prove the suitability of our model 16 network degree sequences describing the interaction between members of a given platform have been fitted. The results have been compared with the fits obtained through other bi-parametric distributions.Peer ReviewedPostprint (author's final draft

    Zipf extensions and their applications for modeling the degree sequences of real networks

    Get PDF
    The Zipf distribution, also known as discrete Pareto distribution, attracts considerable attention because it helps describe skewed data from many natural as well as man-made systems. Under the Zipf distribution, the frequency of a given value is a power function of its size. Consequently, when plotting the frequencies versus the size in log-log scale for data following this distribution, one obtains a straight line. Nevertheless, for many data sets the linearity is only observed in the tail and when this happens, the Zipf is only adjusted for values larger than a given threshold. This procedure implies a loss of information, and unless one is only interested in the tail of the distribution, the need to have access to more flexible alternatives distributions is evidenced. The work conducted in this thesis revolves around four bi-parametric extensions of the Zipf distribution. The first two belong to the class of Random Stopped Extreme distributions. The third extension is the result of applying the concept of Poisson-Stopped-Sum to the Zipf distribution and, the last one, is obtained by including an additional parameter to the probability generating function of the Zipf. An interesting characteristic of three of the models presented is that they allow for a parameter interpretation that gives some insights about the mechanism that generates the data. In order to analyze the performance of these models, we have fitted the degree sequences of real networks from different areas as: social networks, protein interaction networks or collaboration networks. The fits obtained have been compared with those obtained with other bi-parametric models such as: the Zipf-Mandelbrot, the discrete Weibull or the negative binomial. To facilitate the use of the models presented, they have been implemented in the zipfextR package available in the Comprehensive R Archive Network.La distribuci贸n Zipf, tambi茅n conocida como distribuci贸n discreta de Pareto, atrae una atenci贸n considerable debido a su versatilidad para describir datos sesgados provenientes de diferentes entornos tanto naturales como artificiales. Bajo la distribuci贸n Zipf, la probabilidad de un valor dado es proporcional a una potencia negativa del mismo. En consecuencia, al dibujar en escala doble logar铆tmica las frecuencias, de datos provenientes de esta distribuci贸n, en funci贸n de su tama帽o, se obtiene una l铆nea recta. Sin embargo, en muchos conjuntos de datos, esta linealidad solo se observa en la cola, y cuando esto sucede, la distribuci贸n Zipf solo se ajusta para valores mayores que un umbral dado. Este procedimiento implica una p茅rdida de informaci贸n, y a menos que a uno solo le interese la cola de la distribuci贸n, se pone de manifiesto la necesidad de disponer de distribuciones alternativas con una mayor flexibilidad. El trabajo realizado en esta tesis gira en torno a cuatro extensiones bi-param茅tricas de la distribuci贸n Zipf. Las dos primeras pertenecen a la familia de distribuciones Random Stopped Extreme. La tercera extensi贸n es el resultado de aplicar el concepto Poisson-Stopped-Sum a la distribuci贸n Zipf y, la 煤ltima familia de distribuciones se obtiene al incluir un par谩metro adicional a la funci贸n generadora de probabilidad de la Zipf. Una caracter铆stica de tres de los modelos presentados es que proporcionan una interpretaci贸n directa de sus par谩metros, lo que permite extraer algunas ideas sobre el mecanismo subyacente que ha generado los datos. Con el objetivo de analizar la aplicabilidad de estos modelos, hemos ajustado secuencias de grados de redes reales de diferentes 谩reas tales como: redes sociales, redes de interacci贸n de prote铆nas y redes de colaboraci贸n. Los ajustes obtenidos se han comparado con los obtenidos con otros modelos bi-param茅tricos como: el Zipf-Mandelbrot, la distribuci贸n discreta de Weibull o la binomial negativa. Para facilitar el uso de los modelos presentados, estos se han implementado en el paquete de R zipfextR, disponible en el Comprehensive R Archive Network.Estad铆stica i Investigaci贸 Operativ
    corecore