3 research outputs found
Pythia: a Framework for the Automated Analysis of Web Hosting Environments
A common approach when setting up a website is to utilize third party Web
hosting and content delivery networks. Without taking this trend into account,
any measurement study inspecting the deployment and operation of websites can
be heavily skewed. Unfortunately, the research community lacks generalizable
tools that can be used to identify how and where a given website is hosted.
Instead, a number of ad hoc techniques have emerged, e.g., using Autonomous
System databases, domain prefixes for CNAME records. In this work we propose
Pythia, a novel lightweight approach for identifying Web content hosted on
third-party infrastructures, including both traditional Web hosts and content
delivery networks. Our framework identifies the organization to which a given
Web page belongs, and it detects which Web servers are self-hosted and which
ones leverage third-party services to provide contents. To test our framework
we run it on 40,000 URLs and evaluate its accuracy, both by comparing the
results with similar services and with a manually validated groundtruth. Our
tool achieves an accuracy of 90% and detects that under 11% of popular domains
are self-hosted. We publicly release our tool to allow other researchers to
reproduce our findings, and to apply it to their own studies
Herding Vulnerable Cats: A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting
Hosting providers play a key role in fighting web compromise, but their
ability to prevent abuse is constrained by the security practices of their own
customers. {\em Shared} hosting, offers a unique perspective since customers
operate under restricted privileges and providers retain more control over
configurations. We present the first empirical analysis of the distribution of
web security features and software patching practices in shared hosting
providers, the influence of providers on these security practices, and their
impact on web compromise rates. We construct provider-level features on the
global market for shared hosting -- containing 1,259 providers -- by gathering
indicators from 442,684 domains. Exploratory factor analysis of 15 indicators
identifies four main latent factors that capture security efforts: content
security, webmaster security, web infrastructure security and web application
security. We confirm, via a fixed-effect regression model, that providers exert
significant influence over the latter two factors, which are both related to
the software stack in their hosting environment. Finally, by means of GLM
regression analysis of these factors on phishing and malware abuse, we show
that the four security and software patching factors explain between 10\% and
19\% of the variance in abuse at providers, after controlling for size. For
web-application security for instance, we found that when a provider moves from
the bottom 10\% to the best-performing 10\%, it would experience 4 times fewer
phishing incidents. We show that providers have influence over patch
levels--even higher in the stack, where CMSes can run as client-side
software--and that this influence is tied to a substantial reduction in abuse
levels
The role of hosting providers in fighting command and control infrastructure of financial malware
A variety of botnets are used in attacks on financial services. Banks and security firms invest a lot of effort in detecting and combating malware-assisted takeover of customer accounts. A critical resource of these botnets is their command-and-control (C&C) infrastructure. Attackers rent or compromise servers to operate their C&C infrastructure. Hosting providers routinely take down C&C servers, but the effectiveness of this mitigation strategy depends on understanding how attackers select the hosting providers to host their servers. Do they prefer, for example, providers who are slow or unwilling in taking down C&Cs? In this paper, we analyze 7 years of data on the C&C servers of botnets that have engaged in attacks on financial services. Our aim is to understand whether attackers prefer certain types of providers or whether their C&Cs are randomly distributed across the whole attack surface of the hosting industry. We extract a set of structural properties of providers to capture the attack surface. We model the distribution of C&Cs across providers and show that the mere size of the provider can explain around 71% of the variance in the number of C&Cs per provider, whereas the rule of law in the country only explains around 1%. We further observe that price, time in business, popularity and ratio of vulnerable websites of providers relate signi ficantly with C&C counts. Finally, we find that the speed with which providers take down C&C domains has only a weak relation with C&C occurrence rates, adding only 1% explained variance. This suggests attackers have little to no preference for providers who allow long-lived C&C domains.Organisation and Governanc