Search CORE

1,885 research outputs found

Recommended from our members

REST: A thread embedding approach for identifying and classifying user-specified information in security forums

Author: Faloutsos Michalis
Publication venue: eScholarship, University of California
Publication date: 01/07/2020
Field of study

eScholarship - University of California

Recommended from our members

IDAPro for IoT Malware analysis?

Author: Faloutsos Michalis
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

eScholarship - University of California

Sustainable growth in complex networks

Author: C. J. Tessone
F. Schweitzer
Faloutsos M. Faloutsos P. Faloutsos C.
Gamma E.
M. M. Geipel
Saichev A.
Simon H.
Valverde S.
Valverde S.
Publication venue: 'IOP Publishing'
Publication date: 08/07/2010
Field of study

Based on the empirical analysis of the dependency network in 18 Java projects, we develop a novel model of network growth which considers both: an attachment mechanism and the addition of new nodes with a heterogeneous distribution of their initial degree,

k_0

. Empirically we find that the cumulative degree distributions of initial degrees and of the final network, follow power-law behaviors:

P(k_{0}) \propto k_{0}^{1-\alpha}

, and

P(k)\propto k^{1-\gamma}

, respectively. For the total number of links as a function of the network size, we find empirically

K(N)\propto N^{\beta}

, where

\beta

is (at the beginning of the network evolution) between 1.25 and 2, while converging to

\sim 1

for large

N

. This indicates a transition from a growth regime with increasing network density towards a sustainable regime, which revents a collapse because of ever increasing dependencies. Our theoretical framework is able to predict relations between the exponents

\alpha

\beta

\gamma

, which also link issues of software engineering and developer activity. These relations are verified by means of computer simulations and empirical investigations. They indicate that the growth of real Open Source Software networks occurs on the edge between two regimes, which are either dominated by the initial degree distribution of added nodes, or by the preferential attachment mechanism. Hence, the heterogeneous degree distribution of newly added nodes, found empirically, is essential to describe the laws of sustainable growth in networks.Comment: 5 pages, 2 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

PhishDef: URL Names Say It All

Author: Faloutsos Michalis
Le Anh
Markopoulou Athina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/09/2010
Field of study

Phishing is an increasingly sophisticated method to steal personal user information using sites that pretend to be legitimate. In this paper, we take the following steps to identify phishing URLs. First, we carefully select lexical features of the URLs that are resistant to obfuscation techniques used by attackers. Second, we evaluate the classification accuracy when using only lexical features, both automatically and hand-selected, vs. when using additional features. We show that lexical features are sufficient for all practical purposes. Third, we thoroughly compare several classification algorithms, and we propose to use an online method (AROW) that is able to overcome noisy training data. Based on the insights gained from our analysis, we propose PhishDef, a phishing detection system that uses only URL names and combines the above three elements. PhishDef is a highly accurate method (when compared to state-of-the-art approaches over real datasets), lightweight (thus appropriate for online and client-side deployment), proactive (based on online classification rather than blacklists), and resilient to training data inaccuracies (thus enabling the use of large noisy training data).Comment: 9 pages, submitted to IEEE INFOCOM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums

Author: Faloutsos Michalis
Gharibshah Joobin
Papalexakis Evangelos E
Publication venue: eScholarship, University of California
Publication date: 08/01/2020
Field of study

How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the "topics" of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9%. We see our approach as a first step for harnessing the wealth of information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest

arXiv.org e-Print Archive

eScholarship - University of California

Association for the Advancement of Artificial Intelligence: AAAI Publications