Search CORE

15,455 research outputs found

POISED: Spotting Twitter Spam Off the Beaten Paths

Author: Fernandez Jose
Kruegel Christopher
Labreche Francois
Nilizadeh Shirin
Sedighian Alireza
Stringhini Gianluca
Vigna Giovanni
Zand Ali
Publication venue
Publication date: 01/01/2017
Field of study

Cybercriminals have found in online social networks a propitious medium to spread spam and malicious content. Existing techniques for detecting spam include predicting the trustworthiness of accounts and analyzing the content of these messages. However, advanced attackers can still successfully evade these defenses. Online social networks bring people who have personal connections or share common interests to form communities. In this paper, we first show that users within a networked community share some topics of interest. Moreover, content shared on these social network tend to propagate according to the interests of people. Dissemination paths may emerge where some communities post similar messages, based on the interests of those communities. Spam and other malicious content, on the other hand, follow different spreading patterns. In this paper, we follow this insight and present POISED, a system that leverages the differences in propagation between benign and malicious messages on social networks to identify spam and other unwanted content. We test our system on a dataset of 1.3M tweets collected from 64K users, and we show that our approach is effective in detecting malicious messages, reaching 91% precision and 93% recall. We also show that POISED's detection is more comprehensive than previous systems, by comparing it to three state-of-the-art spam detection systems that have been proposed by the research community in the past. POISED significantly outperforms each of these systems. Moreover, through simulations, we show how POISED is effective in the early detection of spam messages and how it is resilient against two well-known adversarial machine learning attacks

arXiv.org e-Print Archive

ZENODO

UCL Discovery

PolyPublie

Why (and How) Networks Should Run Themselves

Author: Feamster Nick
Rexford Jennifer
Publication venue
Publication date: 31/10/2017
Field of study

The proliferation of networked devices, systems, and applications that we depend on every day makes managing networks more important than ever. The increasing security, availability, and performance demands of these applications suggest that these increasingly difficult network management problems be solved in real time, across a complex web of interacting protocols and systems. Alas, just as the importance of network management has increased, the network has grown so complex that it is seemingly unmanageable. In this new era, network management requires a fundamentally new approach. Instead of optimizations based on closed-form analysis of individual protocols, network operators need data-driven, machine-learning-based models of end-to-end and application performance based on high-level policy goals and a holistic view of the underlying components. Instead of anomaly detection algorithms that operate on offline analysis of network traces, operators need classification and detection algorithms that can make real-time, closed-loop decisions. Networks should learn to drive themselves. This paper explores this concept, discussing how we might attain this ambitious goal by more closely coupling measurement with real-time control and by relying on learning for inference and prediction about a networked application or system, as opposed to closed-form analysis of individual protocols

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Learning from networked examples

Author: Guo Zheng-Chu
Ramon Jan
Wang Yuyi
Publication venue
Publication date: 03/06/2017
Field of study

Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may share some common objects, and hence share the features of these shared objects. We show that the classic approach of ignoring this problem potentially can have a harmful effect on the accuracy of statistics, and then consider alternatives. One of these is to only use independent examples, discarding other information. However, this is clearly suboptimal. We analyze sample error bounds in this networked setting, providing significantly improved results. An important component of our approach is formed by efficient sample weighting schemes, which leads to novel concentration inequalities

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Probing the topological properties of complex networks modeling short written texts

Author: Amancio Diego R.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 29/12/2014
Field of study

In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well -- many informative discoveries have been made this way -- but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyzes performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks

arXiv.org e-Print Archive

Public Library of Science (PLOS)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Universidade de São Paulo

FigShare

Crisis Analytics: Big Data Driven Crisis Response

Author: Ali Anwaar
Crowcroft Jon
Qadir Junaid
Rasool Raihan ur
Sathiaseelan Arjuna
Zwitter Andrej
Publication venue
Publication date: 25/02/2016
Field of study

Disasters have long been a scourge for humanity. With the advances in technology (in terms of computing, communications, and the ability to process and analyze big data), our ability to respond to disasters is at an inflection point. There is great optimism that big data tools can be leveraged to process the large amounts of crisis-related data (in the form of user generated data in addition to the traditional humanitarian data) to provide an insight into the fast-changing situation and help drive an effective disaster response. This article introduces the history and the future of big crisis data analytics, along with a discussion on its promise, challenges, and pitfalls

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Integration of Legacy Appliances into Home Energy Management Systems

Author: Egarter Dominik
Elmenreich Wilfried
Khatib Tamer
Monacchi Andrea
Publication venue
Publication date: 12/06/2014
Field of study

The progressive installation of renewable energy sources requires the coordination of energy consuming devices. At consumer level, this coordination can be done by a home energy management system (HEMS). Interoperability issues need to be solved among smart appliances as well as between smart and non-smart, i.e., legacy devices. We expect current standardization efforts to soon provide technologies to design smart appliances in order to cope with the current interoperability issues. Nevertheless, common electrical devices affect energy consumption significantly and therefore deserve consideration within energy management applications. This paper discusses the integration of smart and legacy devices into a generic system architecture and, subsequently, elaborates the requirements and components which are necessary to realize such an architecture including an application of load detection for the identification of running loads and their integration into existing HEM systems. We assess the feasibility of such an approach with a case study based on a measurement campaign on real households. We show how the information of detected appliances can be extracted in order to create device profiles allowing for their integration and management within a HEMS

arXiv.org e-Print Archive

CiteSeerX

Distributionally Robust Semi-Supervised Learning for People-Centric Sensing

Author: Chang Xiaojun
Chen Kaixuan
Long Guodong
Wang Sen
Yao Lina
Zhang Dalin
Publication venue
Publication date: 12/11/2018
Field of study

Semi-supervised learning is crucial for alleviating labelling burdens in people-centric sensing. However, human-generated data inherently suffer from distribution shift in semi-supervised learning due to the diverse biological conditions and behavior patterns of humans. To address this problem, we propose a generic distributionally robust model for semi-supervised learning on distributionally shifted data. Considering both the discrepancy and the consistency between the labeled data and the unlabeled data, we learn the latent features that reduce person-specific discrepancy and preserve task-specific consistency. We evaluate our model in a variety of people-centric recognition tasks on real-world datasets, including intention recognition, activity recognition, muscular movement recognition and gesture recognition. The experiment results demonstrate that the proposed model outperforms the state-of-the-art methods.Comment: 8 pages, accepted by AAAI201

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

VBN

Monash University Research Portal

Association for the Advancement of Artificial Intelligence: AAAI Publications