74,732 research outputs found
FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs
Various methods have been proposed for creating and maintaining lists of
potentially filtered URLs to allow for measurement of ongoing internet
censorship around the world. Whilst testing a known resource for evidence of
filtering can be relatively simple, given appropriate vantage points,
discovering previously unknown filtered web resources remains an open
challenge.
We present a new framework for automating the process of discovering filtered
resources through the use of adaptive queries to well-known search engines. Our
system applies information retrieval algorithms to isolate characteristic
linguistic patterns in known filtered web pages; these are then used as the
basis for web search queries. The results of these queries are then checked for
evidence of filtering, and newly discovered filtered resources are fed back
into the system to detect further filtered content.
Our implementation of this framework, applied to China as a case study, shows
that this approach is demonstrably effective at detecting significant numbers
of previously unknown filtered web pages, making a significant contribution to
the ongoing detection of internet filtering as it develops.
Our tool is currently deployed and has been used to discover 1355 domains
that are poisoned within China as of Feb 2017 - 30 times more than are
contained in the most widely-used public filter list. Of these, 759 are outside
of the Alexa Top 1000 domains list, demonstrating the capability of this
framework to find more obscure filtered content. Further, our initial analysis
of filtered URLs, and the search terms that were used to discover them, gives
further insight into the nature of the content currently being blocked in
China.Comment: To appear in "Network Traffic Measurement and Analysis Conference
2017" (TMA2017
Latent sentiment model for weakly-supervised cross-lingual sentiment classification
In this paper, we present a novel weakly-supervised method for crosslingual sentiment analysis. In specific, we propose a latent sentiment model (LSM) based on latent Dirichlet allocation where sentiment labels are considered as topics. Prior information extracted from English sentiment lexicons through machine translation are incorporated into LSM model learning, where preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. An efficient parameter estimation procedure using variational Bayes is presented. Experimental results on the Chinese product reviews show that the weakly-supervised LSM model performs comparably to supervised classifiers such as Support vector Machines with an average of 81% accuracy achieved over a total of 5484 review documents. Moreover, starting with a generic sentiment lexicon, the LSM model is able to extract highly domainspecific polarity words from text
Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse
Domain squatting is a common adversarial practice where attackers register
domain names that are purposefully similar to popular domains. In this work, we
study a specific type of domain squatting called "combosquatting," in which
attackers register domains that combine a popular trademark with one or more
phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first
large-scale, empirical study of combosquatting by analyzing more than 468
billion DNS records---collected from passive and active DNS data sources over
almost six years. We find that almost 60% of abusive combosquatting domains
live for more than 1,000 days, and even worse, we observe increased activity
associated with combosquatting year over year. Moreover, we show that
combosquatting is used to perform a spectrum of different types of abuse
including phishing, social engineering, affiliate abuse, trademark abuse, and
even advanced persistent threats. Our results suggest that combosquatting is a
real problem that requires increased scrutiny by the security community.Comment: ACM CCS 1
- …