Search CORE

111,653 research outputs found

Automated Discovery of Internet Censorship by Web Crawling

Author: Aceto Giuseppe
Aceto Giuseppe
Filasto Arturo
Pearce Paul
Pearce Paul
Wright Joss
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals and organisations that aim to empirical data on censorship for benefit of the public and wider censorship research community. We present a new approach for discovering filtered domains in different countries. This method is fully automated and requires no human interaction. The system uses web crawling techniques to traverse between filtered sites and implements a robust method for determining if a domain is filtered. We demonstrate the effectiveness of the approach by running experiments to search for filtered content in four different censorship regimes. Our results show that we perform better than the current state of the art and have built domain filter lists an order of magnitude larger than the most widely available public lists as of Jan 2018. Further, we build a dataset mapping the interlinking nature of blocked content between domains and exhibit the tightly networked nature of censored web resources

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs

Author: anderson
aryan
blei
crandall
deibert
dornseif
duan
filastò
knockel
levis
lowe
nabi
rajaraman
salton
sfakianakis
verkamp
winter
wright
wright
Publication venue
Publication date: 01/01/2017
Field of study

Various methods have been proposed for creating and maintaining lists of potentially filtered URLs to allow for measurement of ongoing internet censorship around the world. Whilst testing a known resource for evidence of filtering can be relatively simple, given appropriate vantage points, discovering previously unknown filtered web resources remains an open challenge. We present a new framework for automating the process of discovering filtered resources through the use of adaptive queries to well-known search engines. Our system applies information retrieval algorithms to isolate characteristic linguistic patterns in known filtered web pages; these are then used as the basis for web search queries. The results of these queries are then checked for evidence of filtering, and newly discovered filtered resources are fed back into the system to detect further filtered content. Our implementation of this framework, applied to China as a case study, shows that this approach is demonstrably effective at detecting significant numbers of previously unknown filtered web pages, making a significant contribution to the ongoing detection of internet filtering as it develops. Our tool is currently deployed and has been used to discover 1355 domains that are poisoned within China as of Feb 2017 - 30 times more than are contained in the most widely-used public filter list. Of these, 759 are outside of the Alexa Top 1000 domains list, demonstrating the capability of this framework to find more obscure filtered content. Further, our initial analysis of filtered URLs, and the search terms that were used to discover them, gives further insight into the nature of the content currently being blocked in China.Comment: To appear in "Network Traffic Measurement and Analysis Conference 2017" (TMA2017

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Understanding ethical concerns in social media privacy studies

Author: Henderson Tristan
Hutton Luke
McNeilly Sam
Publication venue
Publication date: 17/02/2015
Field of study

There are myriad ethical considerations with conducting social media studies, in particular those investigating privacy concerns in such sites. We are interested in understanding how to address these concerns, and in particular wish to discuss our empirical work at this workshop and how to progress further in this space.Postprin

St Andrews Research Repository

Do Social Bots Dream of Electric Sheep? A Categorisation of Social Media Bot Accounts

Author: Brachten Florian
Jung Anna-Katharina
Ross Björn
Stieglitz Stefan
Publication venue
Publication date: 01/01/2017
Field of study

So-called 'social bots' have garnered a lot of attention lately. Previous research showed that they attempted to influence political events such as the Brexit referendum and the US presidential elections. It remains, however, somewhat unclear what exactly can be understood by the term 'social bot'. This paper addresses the need to better understand the intentions of bots on social media and to develop a shared understanding of how 'social' bots differ from other types of bots. We thus describe a systematic review of publications that researched bot accounts on social media. Based on the results of this literature review, we propose a scheme for categorising bot accounts on social media sites. Our scheme groups bot accounts by two dimensions - Imitation of human behaviour and Intent.Comment: Accepted for publication in the Proceedings of the Australasian Conference on Information Systems, 201

arXiv.org e-Print Archive

AIS Electronic Library (AISeL)

Recommended from our members

mHealth Research Applied to Regulated and Unregulated Behavioral Health Sciences

Author: Nebeker Camille
Publication venue: eScholarship, University of California
Publication date: 28/04/2020
Field of study

Behavioral scientists are developing new methods and frameworks that leverage mobile health technologies to optimize individual level behavior change. Pervasive sensors and mobile apps allow researchers to passively observe human behaviors “in the wild” 24/7 which supports delivery of personalized interventions in the real-world environment. This is all possible because these technologies contain an incredible array of sensors that allow applications to constantly record user location and can contextualize current environmental conditions through barometers, thermometers, and ambient light sensors and can also capture audio and video of the user and their surroundings through multiple integrated high-definition cameras and microphones. These tools are a game changer in behavioral health research and, not surprisingly, introduce new ethical, regulatory/legal and social implications described in this article

eScholarship - University of California

Worse Than Spam: Issues In Sampling Software Developers

Author: Fisher D.
Markham A.
United States Department of Health Education and Welfare.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/07/2017
Field of study

Background: Reaching out to professional software developers is a crucial part of empirical software engineering research. One important method to investigate the state of practice is survey research. As drawing a random sample of professional software developers for a survey is rarely possible, researchers rely on various sampling strategies. Objective: In this paper, we report on our experience with different sampling strategies we employed, highlight ethical issues, and motivate the need to maintain a collection of key demographics about software developers to ease the assessment of the external validity of studies. Method: Our report is based on data from two studies we conducted in the past. Results: Contacting developers over public media proved to be the most effective and efficient sampling strategy. However, we not only describe the perspective of researchers who are interested in reaching goals like a large number of participants or a high response rate, but we also shed light onto ethical implications of different sampling strategies. We present one specific ethical guideline and point to debates in other research communities to start a discussion in the software engineering research community about which sampling strategies should be considered ethical.Comment: 6 pages, 2 figures, Proceedings of the 2016 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2016), ACM, 201

arXiv.org e-Print Archive

Crossref