346 research outputs found
Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse
Domain squatting is a common adversarial practice where attackers register
domain names that are purposefully similar to popular domains. In this work, we
study a specific type of domain squatting called "combosquatting," in which
attackers register domains that combine a popular trademark with one or more
phrases (e.g., betterfacebook[.]com, youtube-live[.]com). We perform the first
large-scale, empirical study of combosquatting by analyzing more than 468
billion DNS records---collected from passive and active DNS data sources over
almost six years. We find that almost 60% of abusive combosquatting domains
live for more than 1,000 days, and even worse, we observe increased activity
associated with combosquatting year over year. Moreover, we show that
combosquatting is used to perform a spectrum of different types of abuse
including phishing, social engineering, affiliate abuse, trademark abuse, and
even advanced persistent threats. Our results suggest that combosquatting is a
real problem that requires increased scrutiny by the security community.Comment: ACM CCS 1
Recommended from our members
Analysing Java Identifier Names
Identifier names are the principal means of recording and communicating ideas in source code and are a significant source of information for software developers and maintainers, and the tools that support their work. This research aims to increase understanding of identifier name content types - words, abbreviations, etc. - and phrasal structures - noun phrases, verb phrases, etc. - by improving techniques for the analysis of identifier names. The techniques and knowledge acquired can be applied to improve program comprehension tools that support internal code quality, concept location, traceability and model extraction. Previous detailed investigations of identifier names have focused on method names, and the content and structure of Java class and reference (field, parameter, and variable) names are less well understood.
I developed improved algorithms to tokenise names, and trained part-of-speech tagger models on identifier names to support the analysis of class and reference names in a corpus of 60 open source Java projects. I confirm that developers structure the majority of names according to identifier naming conventions, and use phrasal structures reported in the literature. I also show that developers use a wider variety of content types and phrasal structures than previously understood. Unusually structured class names are largely project-specific naming conventions, but could indicate design issues. Analysis of phrasal reference names showed that developers most often use the phrasal structures described in the literature and used to support the extraction of information from names, but also choose unexpected phrasal structures, and complex, multi-phrasal, names.
Using Nominal - software I created to evaluate adherence to naming conventions - I found developers tend to follow naming conventions, but that adherence to published conventions varies between projects because developers also establish new conventions for the use of typography, content types and phrasal structure to support their work: particularly to distinguish the roles of Java field names
Detecting a stochastic background of gravitational radiation: Signal processing strategies and sensitivities
We analyze the signal processing required for the optimal detection of a
stochastic background of gravitational radiation using laser interferometric
detectors. Starting with basic assumptions about the statistical properties of
a stochastic gravity-wave background, we derive expressions for the optimal
filter function and signal-to-noise ratio for the cross-correlation of the
outputs of two gravity-wave detectors. Sensitivity levels required for
detection are then calculated. Issues related to: (i) calculating the
signal-to-noise ratio for arbitrarily large stochastic backgrounds, (ii)
performing the data analysis in the presence of nonstationary detector noise,
(iii) combining data from multiple detector pairs to increase the sensitivity
of a stochastic background search, (iv) correlating the outputs of 4 or more
detectors, and (v) allowing for the possibility of correlated noise in the
outputs of two detectors are discussed. We briefly describe a computer
simulation which mimics the generation and detection of a simulated stochastic
gravity-wave signal in the presence of simulated detector noise. Numerous
graphs and tables of numerical data for the five major interferometers
(LIGO-WA, LIGO-LA, VIRGO, GEO-600, and TAMA-300) are also given. The treatment
given in this paper should be accessible to both theorists involved in data
analysis and experimentalists involved in detector design and data acquisition.Comment: 81 pages, 30 postscript figures, REVTE
The NASA Astrophysics Data System: Data Holdings
Since its inception in 1993, the ADS Abstract Service has become an
indispensable research tool for astronomers and astrophysicists worldwide. In
those seven years, much effort has been directed toward improving both the
quantity and the quality of references in the database. From the original
database of approximately 160,000 astronomy abstracts, our dataset has grown
almost tenfold to approximately 1.5 million references covering astronomy,
astrophysics, planetary sciences, physics, optics, and engineering. We collect
and standardize data from approximately 200 journals and present the resulting
information in a uniform, coherent manner. With the cooperation of journal
publishers worldwide, we have been able to place scans of full journal articles
on-line back to the first volumes of many astronomical journals, and we are
able to link to current version of articles, abstracts, and datasets for
essentially all of the current astronomy literature. The trend toward
electronic publishing in the field, the use of electronic submission of
abstracts for journal articles and conference proceedings, and the increasingly
prominent use of the World Wide Web to disseminate information have enabled the
ADS to build a database unparalleled in other disciplines.
The ADS can be accessed at http://adswww.harvard.eduComment: 24 pages, 1 figure, 6 tables, 3 appendice
A Taxonomy of Privacy-Preserving Record Linkage Techniques
The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research
Reliable file transfer across a 10 megabit ethernet
The Ethernet communications network is a broadcast, multi-access system for local computing networks. Such a network was used to connect six 68000 based Charles River Data Systems for the purpose of file transfer. Each system required hardware installation and connection to the Ethernet cable. The software is an implementation which conforms to Xerox PUP File Transfer Protocol Specifications . This required the writing of two programs, the FTP user and the FTP server. Each program was built upon common communication packages which also had to be written. These communication routines transferred data over the Ethernet using the PARC Universal Packets (PUP) format
D-FENS: DNS Filtering & Extraction Network System for Malicious Domain Names
While the DNS (Domain Name System) has become a cornerstone for the operation of the Internet, it has also fostered creative cases of maliciousness, including phishing, typosquatting, and botnet communication among others. To address this problem, this dissertation focuses on identifying and mitigating such malicious domain names through prior knowledge and machine learning. In the first part of this dissertation, we explore a method of registering domain names with deliberate typographical mistakes (i.e., typosquatting) to masquerade as popular and well-established domain names. To understand the effectiveness of typosquatting, we conducted a user study which helped shed light on which techniques were more successful than others in deceiving users. While certain techniques fared better than others, they failed to take the context of the user into account. Therefore, in the second part of this dissertation we look at the possibility of an advanced attack which takes context into account when generating domain names. The main idea is determining the possibility for an adversary to improve their success rate of deceiving users with specifically-targeted malicious domain names. While these malicious domains typically target users, other types of domain names are generated by botnets for command & control (C2) communication. Therefore, in the third part of this dissertation we investigate domain generation algorithms (DGA) used by botnets and propose a method to identify DGA-based domain names. By analyzing DNS traffic for certain patterns of NXDomain (non-existent domain) query responses, we can accurately predict DGA-based domain names before they are registered. Given all of these approaches to malicious domain names, we ultimately propose a system called D-FENS (DNS Filtering & Extraction Network System). D-FENS uses machine learning and prior knowledge to accurately predict unreported malicious domain names in real-time, thereby preventing Internet devices from unknowingly connecting to a potentially malicious domain name
- …