11,806 research outputs found
Probabilistic Clustering of Time-Evolving Distance Data
We present a novel probabilistic clustering model for objects that are
represented via pairwise distances and observed at different time points. The
proposed method utilizes the information given by adjacent time points to find
the underlying cluster structure and obtain a smooth cluster evolution. This
approach allows the number of objects and clusters to differ at every time
point, and no identification on the identities of the objects is needed.
Further, the model does not require the number of clusters being specified in
advance -- they are instead determined automatically using a Dirichlet process
prior. We validate our model on synthetic data showing that the proposed method
is more accurate than state-of-the-art clustering methods. Finally, we use our
dynamic clustering model to analyze and illustrate the evolution of brain
cancer patients over time
HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis
Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based datadriven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language
emgr - The Empirical Gramian Framework
System Gramian matrices are a well-known encoding for properties of
input-output systems such as controllability, observability or minimality.
These so-called system Gramians were developed in linear system theory for
applications such as model order reduction of control systems. Empirical
Gramian are an extension to the system Gramians for parametric and nonlinear
systems as well as a data-driven method of computation. The empirical Gramian
framework - emgr - implements the empirical Gramians in a uniform and
configurable manner, with applications such as Gramian-based (nonlinear) model
reduction, decentralized control, sensitivity analysis, parameter
identification and combined state and parameter reduction
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
- …