3 research outputs found

    Pythia: a Framework for the Automated Analysis of Web Hosting Environments

    Get PDF
    A common approach when setting up a website is to utilize third party Web hosting and content delivery networks. Without taking this trend into account, any measurement study inspecting the deployment and operation of websites can be heavily skewed. Unfortunately, the research community lacks generalizable tools that can be used to identify how and where a given website is hosted. Instead, a number of ad hoc techniques have emerged, e.g., using Autonomous System databases, domain prefixes for CNAME records. In this work we propose Pythia, a novel lightweight approach for identifying Web content hosted on third-party infrastructures, including both traditional Web hosts and content delivery networks. Our framework identifies the organization to which a given Web page belongs, and it detects which Web servers are self-hosted and which ones leverage third-party services to provide contents. To test our framework we run it on 40,000 URLs and evaluate its accuracy, both by comparing the results with similar services and with a manually validated groundtruth. Our tool achieves an accuracy of 90% and detects that under 11% of popular domains are self-hosted. We publicly release our tool to allow other researchers to reproduce our findings, and to apply it to their own studies

    Herding Vulnerable Cats: A Statistical Approach to Disentangle Joint Responsibility for Web Security in Shared Hosting

    Full text link
    Hosting providers play a key role in fighting web compromise, but their ability to prevent abuse is constrained by the security practices of their own customers. {\em Shared} hosting, offers a unique perspective since customers operate under restricted privileges and providers retain more control over configurations. We present the first empirical analysis of the distribution of web security features and software patching practices in shared hosting providers, the influence of providers on these security practices, and their impact on web compromise rates. We construct provider-level features on the global market for shared hosting -- containing 1,259 providers -- by gathering indicators from 442,684 domains. Exploratory factor analysis of 15 indicators identifies four main latent factors that capture security efforts: content security, webmaster security, web infrastructure security and web application security. We confirm, via a fixed-effect regression model, that providers exert significant influence over the latter two factors, which are both related to the software stack in their hosting environment. Finally, by means of GLM regression analysis of these factors on phishing and malware abuse, we show that the four security and software patching factors explain between 10\% and 19\% of the variance in abuse at providers, after controlling for size. For web-application security for instance, we found that when a provider moves from the bottom 10\% to the best-performing 10\%, it would experience 4 times fewer phishing incidents. We show that providers have influence over patch levels--even higher in the stack, where CMSes can run as client-side software--and that this influence is tied to a substantial reduction in abuse levels

    The role of hosting providers in fighting command and control infrastructure of financial malware

    No full text
    A variety of botnets are used in attacks on financial services. Banks and security firms invest a lot of effort in detecting and combating malware-assisted takeover of customer accounts. A critical resource of these botnets is their command-and-control (C&C) infrastructure. Attackers rent or compromise servers to operate their C&C infrastructure. Hosting providers routinely take down C&C servers, but the effectiveness of this mitigation strategy depends on understanding how attackers select the hosting providers to host their servers. Do they prefer, for example, providers who are slow or unwilling in taking down C&Cs? In this paper, we analyze 7 years of data on the C&C servers of botnets that have engaged in attacks on financial services. Our aim is to understand whether attackers prefer certain types of providers or whether their C&Cs are randomly distributed across the whole attack surface of the hosting industry. We extract a set of structural properties of providers to capture the attack surface. We model the distribution of C&Cs across providers and show that the mere size of the provider can explain around 71% of the variance in the number of C&Cs per provider, whereas the rule of law in the country only explains around 1%. We further observe that price, time in business, popularity and ratio of vulnerable websites of providers relate signi ficantly with C&C counts. Finally, we find that the speed with which providers take down C&C domains has only a weak relation with C&C occurrence rates, adding only 1% explained variance. This suggests attackers have little to no preference for providers who allow long-lived C&C domains.Organisation and Governanc
    corecore