630 research outputs found
Parallel architectures for fuzzy triadic similarity learning
In a context of document co-clustering, we define a new similarity measure
which iteratively computes similarity while combining fuzzy sets in a
three-partite graph. The fuzzy triadic similarity (FT-Sim) model can deal with
uncertainty offers by the fuzzy sets. Moreover, with the development of the Web
and the high availability of storage spaces, more and more documents become
accessible. Documents can be provided from multiple sites and make similarity
computation an expensive processing. This problem motivated us to use parallel
computing. In this paper, we introduce parallel architectures which are able to
treat large and multi-source data sets by a sequential, a merging or a
splitting-based process. Then, we proceed to a local and a central (or global)
computing using the basic FT-Sim measure. The idea behind these architectures
is to reduce both time and space complexities thanks to parallel computation
Higher-order Relation Schema Induction using Tensor Factorization with Back-off and Aggregation
Relation Schema Induction (RSI) is the problem of identifying type signatures
of arguments of relations from unlabeled text. Most of the previous work in
this area have focused only on binary RSI, i.e., inducing only the subject and
object type signatures per relation. However, in practice, many relations are
high-order, i.e., they have more than two arguments and inducing type
signatures of all arguments is necessary. For example, in the sports domain,
inducing a schema win(WinningPlayer, OpponentPlayer, Tournament, Location) is
more informative than inducing just win(WinningPlayer, OpponentPlayer). We
refer to this problem as Higher-order Relation Schema Induction (HRSI). In this
paper, we propose Tensor Factorization with Back-off and Aggregation (TFBA), a
novel framework for the HRSI problem. To the best of our knowledge, this is the
first attempt at inducing higher-order relation schemata from unlabeled text.
Using the experimental analysis on three real world datasets, we show how TFBA
helps in dealing with sparsity and induce higher order schemata
Fastest Mixing Markov Chain on Symmetric K-Partite Network
Solving fastest mixing Markov chain problem (i.e. finding transition
probabilities on the edges to minimize the second largest eigenvalue modulus of
the transition probability matrix) over networks with different topologies is
one of the primary areas of research in the context of computer science and one
of the well known networks in this issue is K-partite network. Here in this
work we present analytical solution for the problem of fastest mixing Markov
chain by means of stratification and semidefinite programming, for four
particular types of K-partite networks, namely Symmetric K-PPDR, Semi Symmetric
K-PPDR, Cycle K-PPDR and Semi Cycle K-PPDR networks. Our method in this paper
is based on convexity of fastest mixing Markov chain problem, and inductive
comparing of the characteristic polynomials initiated by slackness conditions
in order to find the optimal transition probabilities. The presented results
shows that a Symmetric K-PPDR network and its equivalent Semi Symmetric K-PPDR
network have the same SLEM despite the fact that Semi symmetric K-PPDR network
has less edges than its equivalent symmetric K-PPDR network and at the same
time symmetric K-PPDR network has better mixing rate per step than its
equivalent semi symmetric K-PPDR network at first few iterations. The same
results are true for Cycle K-PPDR and Semi Cycle K-PPDR networks. Also the
obtained optimal transition probabilities have been compared with the
transition probabilities obtained from Metropolis-Hasting method by comparing
mixing time improvements numerically.Comment: 19 pages, 6 figure
A Survey on Social Media Anomaly Detection
Social media anomaly detection is of critical importance to prevent malicious
activities such as bullying, terrorist attack planning, and fraud information
dissemination. With the recent popularity of social media, new types of
anomalous behaviors arise, causing concerns from various parties. While a large
amount of work have been dedicated to traditional anomaly detection problems,
we observe a surge of research interests in the new realm of social media
anomaly detection. In this paper, we present a survey on existing approaches to
address this problem. We focus on the new type of anomalous phenomena in the
social media and review the recent developed techniques to detect those special
types of anomalies. We provide a general overview of the problem domain, common
formulations, existing methodologies and potential directions. With this work,
we hope to call out the attention from the research community on this
challenging problem and open up new directions that we can contribute in the
future.Comment: 23 page
E-commerce Anomaly Detection: A Bayesian Semi-Supervised Tensor Decomposition Approach using Natural Gradients
Anomaly Detection has several important applications. In this paper, our
focus is on detecting anomalies in seller-reviewer data using tensor
decomposition. While tensor-decomposition is mostly unsupervised, we formulate
Bayesian semi-supervised tensor decomposition to take advantage of sparse
labeled data. In addition, we use Polya-Gamma data augmentation for the
semi-supervised Bayesian tensor decomposition. Finally, we show that the
P\'olya-Gamma formulation simplifies calculation of the Fisher information
matrix for partial natural gradient learning. Our experimental results show
that our semi-supervised approach outperforms state of the art unsupervised
baselines. And that the partial natural gradient learning outperforms
stochastic gradient learning and Online-EM with sufficient statistics.Comment: Citations renderin
Latent Network Summarization: Bridging Network Embedding and Summarization
Motivated by the computational and storage challenges that dense embeddings
pose, we introduce the problem of latent network summarization that aims to
learn a compact, latent representation of the graph structure with
dimensionality that is independent of the input graph size (i.e., #nodes and
#edges), while retaining the ability to derive node representations on the fly.
We propose Multi-LENS, an inductive multi-level latent network summarization
approach that leverages a set of relational operators and relational functions
(compositions of operators) to capture the structure of egonets and
higher-order subgraphs, respectively. The structure is stored in low-rank,
size-independent structural feature matrices, which along with the relational
functions comprise our latent network summary. Multi-LENS is general and
naturally supports both homogeneous and heterogeneous graphs with or without
directionality, weights, attributes or labels. Extensive experiments on real
graphs show 3.5 - 34.3% improvement in AUC for link prediction, while requiring
80 - 2152x less output storage space than baseline embedding methods on large
datasets. As application areas, we show the effectiveness of Multi-LENS in
detecting anomalies and events in the Enron email communication graph and
Twitter co-mention graph
Machine Learning Spatial Geometry from Entanglement Features
Motivated by the close relations of the renormalization group with both the
holography duality and the deep learning, we propose that the holographic
geometry can emerge from deep learning the entanglement feature of a quantum
many-body state. We develop a concrete algorithm, call the entanglement feature
learning (EFL), based on the random tensor network (RTN) model for the tensor
network holography. We show that each RTN can be mapped to a Boltzmann machine,
trained by the entanglement entropies over all subregions of a given quantum
many-body state. The goal is to construct the optimal RTN that best reproduce
the entanglement feature. The RTN geometry can then be interpreted as the
emergent holographic geometry. We demonstrate the EFL algorithm on 1D free
fermion system and observe the emergence of the hyperbolic geometry (AdS
spatial geometry) as we tune the fermion system towards the gapless critical
point (CFT point).Comment: 14 pages, 14 figure
Graph Embedding with Rich Information through Heterogeneous Network
Graph embedding has attracted increasing attention due to its critical
application in social network analysis. Most existing algorithms for graph
embedding only rely on the typology information and fail to use the copious
information in nodes as well as edges. As a result, their performance for many
tasks may not be satisfactory. In this paper, we proposed a novel and general
framework of representation learning for graph with rich text information
through constructing a bipartite heterogeneous network. Specially, we designed
a biased random walk to explore the constructed heterogeneous network with the
notion of flexible neighborhood. The efficacy of our method is demonstrated by
extensive comparison experiments with several baselines on various datasets. It
improves the Micro-F1 and Macro-F1 of node classification by 10% and 7% on Cora
dataset.Comment: 9 pages, 7 figures, 4 table
node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching
Identity stitching, the task of identifying and matching various online
references (e.g., sessions over different devices and timespans) to the same
user in real-world web services, is crucial for personalization and
recommendations. However, traditional user stitching approaches, such as
grouping or blocking, require quadratic pairwise comparisons between a massive
number of user activities, thus posing both computational and storage
challenges. Recent works, which are often application-specific, heuristically
seek to reduce the amount of comparisons, but they suffer from low precision
and recall. To solve the problem in an application-independent way, we take a
heterogeneous network-based approach in which users (nodes) interact with
content (e.g., sessions, websites), and may have attributes (e.g., location).
We propose node2bits, an efficient framework that represents multi-dimensional
features of node contexts with binary hashcodes. node2bits leverages
feature-based temporal walks to encapsulate short- and long-term interactions
between nodes in heterogeneous web networks, and adopts SimHash to obtain
compact, binary representations and avoid the quadratic complexity for
similarity search. Extensive experiments on large-scale real networks show that
node2bits outperforms traditional techniques and existing works that generate
real-valued embeddings by up to 5.16% in F1 score on user stitching, while
taking only up to 1.56% as much storage
Exploiting the Structure of Bipartite Graphs for Algebraic and Spectral Graph Theory Applications
In this article, we extend several algebraic graph analysis methods to
bipartite networks. In various areas of science, engineering and commerce, many
types of information can be represented as networks, and thus the discipline of
network analysis plays an important role in these domains. A powerful and
widespread class of network analysis methods is based on algebraic graph
theory, i.e., representing graphs as square adjacency matrices. However, many
networks are of a very specific form that clashes with that representation:
They are bipartite. That is, they consist of two node types, with each edge
connecting a node of one type with a node of the other type. Examples of
bipartite networks (also called \emph{two-mode networks}) are persons and the
social groups they belong to, musical artists and the musical genres they play,
and text documents and the words they contain. In fact, any type of feature
that can be represented by a categorical variable can be interpreted as a
bipartite network. Although bipartite networks are widespread, most literature
in the area of network analysis focuses on unipartite networks, i.e., those
networks with only a single type of node. The purpose of this article is to
extend a selection of important algebraic network analysis methods to bipartite
networks, showing that many methods from algebraic graph theory can be applied
to bipartite networks with only minor modifications. We show methods for
clustering, visualization and link prediction. Additionally, we introduce new
algebraic methods for measuring the bipartivity in near-bipartite graphs.Comment: 37 pages; fixed reference
- …