1,745 research outputs found
Clustering Memes in Social Media
The increasing pervasiveness of social media creates new opportunities to
study human social behavior, while challenging our capability to analyze their
massive data streams. One of the emerging tasks is to distinguish between
different kinds of activities, for example engineered misinformation campaigns
versus spontaneous communication. Such detection problems require a formal
definition of meme, or unit of information that can spread from person to
person through the social network. Once a meme is identified, supervised
learning methods can be applied to classify different types of communication.
The appropriate granularity of a meme, however, is hardly captured from
existing entities such as tags and keywords. Here we present a framework for
the novel task of detecting memes by clustering messages from large streams of
social data. We evaluate various similarity measures that leverage content,
metadata, network features, and their combinations. We also explore the idea of
pre-clustering on the basis of existing entities. A systematic evaluation is
carried out using a manually curated dataset as ground truth. Our analysis
shows that pre-clustering and a combination of heterogeneous features yield the
best trade-off between number of clusters and their quality, demonstrating that
a simple combination based on pairwise maximization of similarity is as
effective as a non-trivial optimization of parameters. Our approach is fully
automatic, unsupervised, and scalable for real-time detection of memes in
streaming data.Comment: Proceedings of the 2013 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining (ASONAM'13), 201
Application of a Layered Hidden Markov Model in the Detection of Network Attacks
Network-based attacks against computer systems are a common and increasing problem. Attackers continue to increase the sophistication and complexity of their attacks with the goal of removing sensitive data or disrupting operations. Attack detection technology works very well for the detection of known attacks using a signature-based intrusion detection system. However, attackers can utilize attacks that are undetectable to those signature-based systems whether they are truly new attacks or modified versions of known attacks. Anomaly-based intrusion detection systems approach the problem of attack detection by detecting when traffic differs from a learned baseline. In the case of this research, the focus was on a relatively new area known as payload anomaly detection. In payload anomaly detection, the system focuses exclusively on the payload of packets and learns the normal contents of those payloads. When a payload\u27s contents differ from the norm, an anomaly is detected and may be a potential attack. A risk with anomaly-based detection mechanisms is they suffer from high false positive rates which reduce their effectiveness. This research built upon previous research in payload anomaly detection by combining multiple techniques of detection in a layered approach. The layers of the system included a high-level navigation layer, a request payload analysis layer, and a request-response analysis layer. The system was tested using the test data provided by some earlier payload anomaly detection systems as well as new data sets. The results of the experiments showed that by combining these layers of detection into a single system, there were higher detection rates and lower false positive rates
Semantic-free referencing in linked systems
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 43-45).The Web relies on the Domain Name System (DNS) to resolve the hostname portion of URLs into IP addresses. This marriage-of-convenience enabled the Web's meteoric rise, but the resulting entanglement is now hindering both infrastructures--the Web is overly constrained by the limitations of DNS, and DNS is unduly burdened by the demands of the Web. There has been much commentary on this sad state-of-affairs, but dissolving the ill-fated union between DNS and the Web requires a new way to resolve Web references. To this end, this thesis describes the design and implementation of Semantic Free Referencing (SFR), a reference resolution infrastructure based on distributed hash tables (DHTs).by Michael Walfish.S.M
Spear Phishing Attack Detection
This thesis addresses the problem of identifying email spear phishing attacks, which are indicative of cyber espionage. Spear phishing consists of targeted emails sent to entice a victim to open a malicious file attachment or click on a malicious link that leads to a compromise of their computer. Current detection methods fail to detect emails of this kind consistently. The SPEar phishing Attack Detection system (SPEAD) is developed to analyze all incoming emails on a network for the presence of spear phishing attacks. SPEAD analyzes the following file types: Windows Portable Executable and Common Object File Format (PE/COFF), Adobe Reader, and Microsoft Excel, Word, and PowerPoint. SPEAD\u27s malware detection accuracy is compared against five commercially-available email anti-virus solutions. Finally, this research quantifies the time required to perform this detection with email traffic loads emulating an Air Force base network. Results show that SPEAD outperforms the anti-virus products in PE/COFF malware detection with an overall accuracy of 99.68% and an accuracy of 98.2% where new malware is involved. Additionally, SPEAD is comparable to the anti-virus products when it comes to the detection of new Adobe Reader malware with a rate of 88.79%. Ultimately, SPEAD demonstrates a strong tendency to focus its detection on new malware, which is a rare and desirable trait. Finally, after less than 4 minutes of sustained maximum email throughput, SPEAD\u27s non-optimized configuration exhibits one-hour delays in processing files and links
Recommended from our members
SPIN-ning Software Architectures: A Method for Exploring Complex Systems
When designing complex software systems that provide multiple non-functional properties, it is usual to try to reuse (and finally compose) simpler existing designs, which deal with each of these properties in solitude. The paper describes a method for automatically and quickly identifying all the different ways one can compose such designs, with the aid of a model checke
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
Segurança e privacidade em terminologia de rede
Security and Privacy are now at the forefront of modern concerns, and drive
a significant part of the debate on digital society. One particular aspect that
holds significant bearing in these two topics is the naming of resources in the
network, because it directly impacts how networks work, but also affects how
security mechanisms are implemented and what are the privacy implications
of metadata disclosure. This issue is further exacerbated by interoperability
mechanisms that imply this information is increasingly available regardless of
the intended scope.
This work focuses on the implications of naming with regards to security and
privacy in namespaces used in network protocols. In particular on the imple-
mentation of solutions that provide additional security through naming policies
or increase privacy. To achieve this, different techniques are used to either
embed security information in existing namespaces or to minimise privacy ex-
posure. The former allows bootstraping secure transport protocols on top of
insecure discovery protocols, while the later introduces privacy policies as part
of name assignment and resolution.
The main vehicle for implementation of these solutions are general purpose
protocols and services, however there is a strong parallel with ongoing re-
search topics that leverage name resolution systems for interoperability such
as the Internet of Things (IoT) and Information Centric Networks (ICN), where
these approaches are also applicable.Segurança e Privacidade são dois topicos que marcam a agenda na discus-
são sobre a sociedade digital. Um aspecto particularmente subtil nesta dis-
cussão é a forma como atribuímos nomes a recursos na rede, uma escolha
com consequências práticas no funcionamento dos diferentes protocols de
rede, na forma como se implementam diferentes mecanismos de segurança
e na privacidade das várias partes envolvidas. Este problema torna-se ainda
mais significativo quando se considera que, para promover a interoperabili-
dade entre diferentes redes, mecanismos autónomos tornam esta informação
acessível em contextos que vão para lá do que era pretendido.
Esta tese foca-se nas consequências de diferentes políticas de atribuição de
nomes no contexto de diferentes protocols de rede, para efeitos de segurança
e privacidade. Com base no estudo deste problema, são propostas soluções
que, através de diferentes políticas de atribuição de nomes, permitem introdu-
zir mecanismos de segurança adicionais ou mitigar problemas de privacidade
em diferentes protocolos. Isto resulta na implementação de mecanismos de
segurança sobre protocolos de descoberta inseguros, assim como na intro-
dução de mecanismos de atribuiçao e resolução de nomes que se focam na
protecçao da privacidade.
O principal veículo para a implementação destas soluções é através de ser-
viços e protocolos de rede de uso geral. No entanto, a aplicabilidade destas
soluções extende-se também a outros tópicos de investigação que recorrem
a mecanismos de resolução de nomes para implementar soluções de intero-
perabilidade, nomedamente a Internet das Coisas (IoT) e redes centradas na
informação (ICN).Programa Doutoral em Informátic
- …