122 research outputs found

    Early detection of spam-related activity

    Get PDF
    Spam, the distribution of unsolicited bulk email, is a big security threat on the Internet. Recent studies show approximately 70-90% of the worldwide email traffic—about 70 billion messages a day—is spam. Spam consumes resources on the network and at mail servers, and it is also used to launch other attacks on users, such as distributing malware or phishing. Spammers have increased their virulence and resilience by sending spam from large collections of compromised machines (“botnets”). Spammers also make heavy use of URLs and domains to direct victims to point-of-sale Web sites, and miscreants register large number of domains to evade blacklisting efforts. To mitigate the threat of spam, users and network administrators need proactive techniques to distinguish spammers from legitimate senders and to take down online spam-advertised sites. In this dissertation, we focus on characterizing spam-related activities and developing systems to detect them early. Our work builds on the observation that spammers need to acquire attack agility to be profitable, which presents differences in how spammers and legitimate users interact with Internet services and exposes detectable during early period of attack. We examine several important components across the spam life cycle, including spam dissemination that aims to reach users' inboxes, the hosting process during which spammers set DNS servers and Web servers, and the naming process to acquire domain names via registration services. We first develop a new spam-detection system based on network-level features of spamming bots. These lightweight features allow the system to scale better and to be more robust. Next, we analyze DNS resource records and lookups from top-level domain servers during the initial stage after domain registrations, which provides a global view across the Internet to characterize spam hosting infrastructure. We further examine the domain registration process and present the unique registration behavior of spammers. Finally, we build an early-warning system to identify spammer domains at time-of-registration rather than later at time-of-use. We have demonstrated that our detection systems are effective by using real-world datasets. Our work has also had practical impact. Some of the network-level features that we identified have since been incorporated into spam filtering products at Yahoo! and McAfee, and our work on detecting spammer domains at time-of-registration has directly influenced new projects at Verisign to investigate domain registrations.Ph.D

    Monitoring the initial DNS behavior of malicious domains

    Full text link
    Attackers often use URLs to advertise scams or propagate mal-ware. Because the reputation of a domain can be used to identify malicious behavior, miscreants often register these domains “just in time ” before an attack. This paper explores the DNS behav-ior of attack domains, as identified by appearance in a spam trap, shortly after the domains were registered. We explore the behav-ioral properties of these domains from two perspectives: (1) the DNS infrastructure associated with the domain, as is observable from the resource records; and (2) the DNS lookup patterns from networks who are looking up the domains initially. Our analysis yields many findings that may ultimately be useful for early de-tection of malicious domains. By monitoring the infrastructure for these malicious domains, we find that about 55 % of scam domains occur in attacks at least one day after registration, suggesting the potential for early discovery of malicious domains, solely based on properties of the DNS infrastructure that resolves those domains. We also find that there are a few regions of IP address space that host name servers and other types of servers for only malicious domains. Malicious domains have resource records that are dis-tributed more widely across IP address space, and they are more quickly looked up by a variety of different networks. We also iden-tify a set of “tainted ” ASes that are used heavily by bad domains to host resource records. The features we observe are often evident before any attack even takes place; ultimately, they might serve as the basis for a DNS-based early warning system for attacks

    Operational Domain Name Classification:From Automatic Ground Truth Generation to Adaptation to Missing Values

    Get PDF
    With more than 350 million active domain names and at least 200,000 newly registered domains per day, it is technically and economically challenging for Internet intermediaries involved in domain registration and hosting to monitor them and accurately assess whether they are benign, likely registered with malicious intent, or have been compromised. This observation motivates the design and deployment of automated approaches to support investigators in preventing or effectively mitigating security threats. However, building a domain name classification system suitable for deployment in an operational environment requires meticulous design: from feature engineering and acquiring the underlying data to handling missing values resulting from, for example, data collection errors. The design flaws in some of the existing systems make them unsuitable for such usage despite their high theoretical accuracy. Even worse, they may lead to erroneous decisions, for example, by registrars, such as suspending a benign domain name that has been compromised at the website level, causing collateral damage to the legitimate registrant and website visitors. In this paper, we propose novel approaches to designing domain name classifiers that overcome the shortcomings of some existing systems. We validate these approaches with a prototype based on the COMAR (COmpromised versus MAliciously Registered domains) system focusing on its careful design, automated and reliable ground truth generation, feature selection, and the analysis of the extent of missing values. First, our classifier takes advantage of automatically generated ground truth based on publicly available domain name registration data. We then generate a large number of machine-learning models, each dedicated to handling a set of missing features: if we need to classify a domain name with a given set of missing values, we use the model without the missing feature set, thus allowing classification based on all other features. We estimate the importance of features using scatter plots and analyze the extent of missing values due to measurement errors. Finally, we apply the COMAR classifier to unlabeled phishing URLs and find, among other things, that 73% of corresponding domain names are maliciously registered. In comparison, only 27% are benign domains hosting malicious websites. The proposed system has been deployed at two ccTLD registry operators to support their anti-fraud practices.</p

    Proactive Discovery of Phishing Related Domain Names

    Full text link

    The Cooperative Defense Overlay Network: A Collaborative Automated Threat Information Sharing Framework for a Safer Internet

    Get PDF
    With the ever-growing proliferation of hardware and software-based computer security exploits and the increasing power and prominence of distributed attacks, network and system administrators are often forced to make a difficult decision: expend tremendous resources on defense from sophisticated and continually evolving attacks from an increasingly dangerous Internet with varying levels of success; or expend fewer resources on defending against common attacks on "low hanging fruit," hoping to avoid the less common but incredibly devastating zero-day worm or botnet attack. Home networks and small organizations are usually forced to choose the latter option and in so doing are left vulnerable to all but the simplest of attacks. While automated tools exist for sharing information about network-based attacks, this sharing is typically limited to administrators of large networks and dedicated security-conscious users, to the exclusion of smaller organizations and novice home users. In this thesis we propose a framework for a cooperative defense overlay network (CODON) in which participants with varying technical abilities and resources can contribute to the security and health of the internet via automated crowdsourcing, rapid information sharing, and the principle of collateral defense
    • …
    corecore