2 research outputs found
Automated Discovery of Internet Censorship by Web Crawling
Censorship of the Internet is widespread around the world. As access to the
web becomes increasingly ubiquitous, filtering of this resource becomes more
pervasive. Transparency about specific content that citizens are denied access
to is atypical. To counter this, numerous techniques for maintaining URL filter
lists have been proposed by various individuals and organisations that aim to
empirical data on censorship for benefit of the public and wider censorship
research community.
We present a new approach for discovering filtered domains in different
countries. This method is fully automated and requires no human interaction.
The system uses web crawling techniques to traverse between filtered sites and
implements a robust method for determining if a domain is filtered. We
demonstrate the effectiveness of the approach by running experiments to search
for filtered content in four different censorship regimes. Our results show
that we perform better than the current state of the art and have built domain
filter lists an order of magnitude larger than the most widely available public
lists as of Jan 2018. Further, we build a dataset mapping the interlinking
nature of blocked content between domains and exhibit the tightly networked
nature of censored web resources
A Churn for the Better: Localizing Censorship using Network-level Path Churn and Network Tomography
Recent years have seen the Internet become a key vehicle for citizens around
the globe to express political opinions and organize protests. This fact has
not gone unnoticed, with countries around the world repurposing network
management tools (e.g., URL filtering products) and protocols (e.g., BGP, DNS)
for censorship. However, repurposing these products can have unintended
international impact, which we refer to as "censorship leakage". While there
have been anecdotal reports of censorship leakage, there has yet to be a
systematic study of censorship leakage at a global scale. In this paper, we
combine a global censorship measurement platform (ICLab) with a general-purpose
technique -- boolean network tomography -- to identify which AS on a network
path is performing censorship. At a high-level, our approach exploits BGP churn
to narrow down the set of potential censoring ASes by over 95%. We exactly
identify 65 censoring ASes and find that the anomalies introduced by 24 of the
65 censoring ASes have an impact on users located in regions outside the
jurisdiction of the censoring AS, resulting in the leaking of regional
censorship policies