19,640 research outputs found
A study of the personalization of spam content using Facebook public information
Millions of users per day are affected by unsolicited email campaigns. Spam filters are capable of detecting and avoiding an increasing number of messages, but researchers have quantified a response rate of a 0.006% [1], still significant to turn a considerable profit sending millions of emails, as the spammers do. While research directions are addressing topics such as better spam filters, or spam detection inside online social networks, in this paper we demonstrate that a classic spam model using online social network information can harvest a 7.62% of click-through rate. We collect email addresses from the Internet, complete email owner information using their public social network profile data, and analyze response of personalized spam sent to users according to their profile using a fake website. Finally we demonstrate the effectiveness of these profile-based emails to circumvent spam detection and we compare results between typical spam and personalized spam
A Broad Evaluation of the Tor English Content Ecosystem
Tor is among most well-known dark net in the world. It has noble uses,
including as a platform for free speech and information dissemination under the
guise of true anonymity, but may be culturally better known as a conduit for
criminal activity and as a platform to market illicit goods and data. Past
studies on the content of Tor support this notion, but were carried out by
targeting popular domains likely to contain illicit content. A survey of past
studies may thus not yield a complete evaluation of the content and use of Tor.
This work addresses this gap by presenting a broad evaluation of the content of
the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web
and, through topic and network analysis, characterize the types of information
and services hosted across a broad swath of Tor domains and their hyperlink
relational structure. We recover nine domain types defined by the information
or service they host and, among other findings, unveil how some types of
domains intentionally silo themselves from the rest of Tor. We also present
measurements that (regrettably) suggest how marketplaces of illegal drugs and
services do emerge as the dominant type of Tor domain. Our study is the product
of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a
collection of over 150,000 Tor pages. We make a dataset of the intend to make
the domain structure publicly available as a dataset at
https://github.com/wsu-wacs/TorEnglishContent.Comment: 11 page
Let Your CyberAlter Ego Share Information and Manage Spam
Almost all of us have multiple cyberspace identities, and these {\em
cyber}alter egos are networked together to form a vast cyberspace social
network. This network is distinct from the world-wide-web (WWW), which is being
queried and mined to the tune of billions of dollars everyday, and until
recently, has gone largely unexplored. Empirically, the cyberspace social
networks have been found to possess many of the same complex features that
characterize its real counterparts, including scale-free degree distributions,
low diameter, and extensive connectivity. We show that these topological
features make the latent networks particularly suitable for explorations and
management via local-only messaging protocols. {\em Cyber}alter egos can
communicate via their direct links (i.e., using only their own address books)
and set up a highly decentralized and scalable message passing network that can
allow large-scale sharing of information and data. As one particular example of
such collaborative systems, we provide a design of a spam filtering system, and
our large-scale simulations show that the system achieves a spam detection rate
close to 100%, while the false positive rate is kept around zero. This system
has several advantages over other recent proposals (i) It uses an already
existing network, created by the same social dynamics that govern our daily
lives, and no dedicated peer-to-peer (P2P) systems or centralized server-based
systems need be constructed; (ii) It utilizes a percolation search algorithm
that makes the query-generated traffic scalable; (iii) The network has a built
in trust system (just as in social networks) that can be used to thwart
malicious attacks; iv) It can be implemented right now as a plugin to popular
email programs, such as MS Outlook, Eudora, and Sendmail.Comment: 13 pages, 10 figure
A Guide to Distributed Digital Preservation
This volume is devoted to the broad topic of distributed digital preservation, a still-emerging field of practice for the cultural memory arena. Replication and distribution hold out the promise of indefinite preservation of materials without degradation, but establishing effective organizational and technical processes to enable this form of digital preservation is daunting. Institutions need practical examples of how this task can be accomplished in manageable, low-cost ways."--P. [4] of cove
The Digital Architectures of Social Media: Comparing Political Campaigning on Facebook, Twitter, Instagram, and Snapchat in the 2016 U.S. Election
The present study argues that political communication on social media is
mediated by a platform's digital architecture, defined as the technical
protocols that enable, constrain, and shape user behavior in a virtual space. A
framework for understanding digital architectures is introduced, and four
platforms (Facebook, Twitter, Instagram, and Snapchat) are compared along the
typology. Using the 2016 US election as a case, interviews with three
Republican digital strategists are combined with social media data to qualify
the studyies theoretical claim that a platform's network structure,
functionality, algorithmic filtering, and datafication model affect political
campaign strategy on social media
Harvesting Entities from the Web Using Unique Identifiers -- IBEX
In this paper we study the prevalence of unique entity identifiers on the
Web. These are, e.g., ISBNs (for books), GTINs (for commercial products), DOIs
(for documents), email addresses, and others. We show how these identifiers can
be harvested systematically from Web pages, and how they can be associated with
human-readable names for the entities at large scale.
Starting with a simple extraction of identifiers and names from Web pages, we
show how we can use the properties of unique identifiers to filter out noise
and clean up the extraction result on the entire corpus. The end result is a
database of millions of uniquely identified entities of different types, with
an accuracy of 73--96% and a very high coverage compared to existing knowledge
bases. We use this database to compute novel statistics on the presence of
products, people, and other entities on the Web.Comment: 30 pages, 5 figures, 9 tables. Complete technical report for A.
Talaika, J. A. Biega, A. Amarilli, and F. M. Suchanek. IBEX: Harvesting
Entities from the Web Using Unique Identifiers. WebDB workshop, 201
- …