We collect and analyse messages exchanged in Twitter using two of the
platform's publicly available APIs (the search and stream specifications). We
assess the differences between the two samples, and compare the networks of
communication reconstructed from them. The empirical context is given by
political protests taking place in May 2012: we track online communication
around these protests for the period of one month, and reconstruct the network
of mentions and re-tweets according to the two samples. We find that the search
API over-represents the more central users and does not offer an accurate
picture of peripheral activity; we also find that the bias is greater for the
network of mentions. We discuss the implications of this bias for the study of
diffusion dynamics and collective action in the digital era, and advocate the
need for more uniform sampling procedures in the study of online communication.Comment: 35 pages, 5 figures, 3 table