630 research outputs found

    Parallel architectures for fuzzy triadic similarity learning

    Full text link
    In a context of document co-clustering, we define a new similarity measure which iteratively computes similarity while combining fuzzy sets in a three-partite graph. The fuzzy triadic similarity (FT-Sim) model can deal with uncertainty offers by the fuzzy sets. Moreover, with the development of the Web and the high availability of storage spaces, more and more documents become accessible. Documents can be provided from multiple sites and make similarity computation an expensive processing. This problem motivated us to use parallel computing. In this paper, we introduce parallel architectures which are able to treat large and multi-source data sets by a sequential, a merging or a splitting-based process. Then, we proceed to a local and a central (or global) computing using the basic FT-Sim measure. The idea behind these architectures is to reduce both time and space complexities thanks to parallel computation

    Higher-order Relation Schema Induction using Tensor Factorization with Back-off and Aggregation

    Full text link
    Relation Schema Induction (RSI) is the problem of identifying type signatures of arguments of relations from unlabeled text. Most of the previous work in this area have focused only on binary RSI, i.e., inducing only the subject and object type signatures per relation. However, in practice, many relations are high-order, i.e., they have more than two arguments and inducing type signatures of all arguments is necessary. For example, in the sports domain, inducing a schema win(WinningPlayer, OpponentPlayer, Tournament, Location) is more informative than inducing just win(WinningPlayer, OpponentPlayer). We refer to this problem as Higher-order Relation Schema Induction (HRSI). In this paper, we propose Tensor Factorization with Back-off and Aggregation (TFBA), a novel framework for the HRSI problem. To the best of our knowledge, this is the first attempt at inducing higher-order relation schemata from unlabeled text. Using the experimental analysis on three real world datasets, we show how TFBA helps in dealing with sparsity and induce higher order schemata

    Fastest Mixing Markov Chain on Symmetric K-Partite Network

    Full text link
    Solving fastest mixing Markov chain problem (i.e. finding transition probabilities on the edges to minimize the second largest eigenvalue modulus of the transition probability matrix) over networks with different topologies is one of the primary areas of research in the context of computer science and one of the well known networks in this issue is K-partite network. Here in this work we present analytical solution for the problem of fastest mixing Markov chain by means of stratification and semidefinite programming, for four particular types of K-partite networks, namely Symmetric K-PPDR, Semi Symmetric K-PPDR, Cycle K-PPDR and Semi Cycle K-PPDR networks. Our method in this paper is based on convexity of fastest mixing Markov chain problem, and inductive comparing of the characteristic polynomials initiated by slackness conditions in order to find the optimal transition probabilities. The presented results shows that a Symmetric K-PPDR network and its equivalent Semi Symmetric K-PPDR network have the same SLEM despite the fact that Semi symmetric K-PPDR network has less edges than its equivalent symmetric K-PPDR network and at the same time symmetric K-PPDR network has better mixing rate per step than its equivalent semi symmetric K-PPDR network at first few iterations. The same results are true for Cycle K-PPDR and Semi Cycle K-PPDR networks. Also the obtained optimal transition probabilities have been compared with the transition probabilities obtained from Metropolis-Hasting method by comparing mixing time improvements numerically.Comment: 19 pages, 6 figure

    A Survey on Social Media Anomaly Detection

    Full text link
    Social media anomaly detection is of critical importance to prevent malicious activities such as bullying, terrorist attack planning, and fraud information dissemination. With the recent popularity of social media, new types of anomalous behaviors arise, causing concerns from various parties. While a large amount of work have been dedicated to traditional anomaly detection problems, we observe a surge of research interests in the new realm of social media anomaly detection. In this paper, we present a survey on existing approaches to address this problem. We focus on the new type of anomalous phenomena in the social media and review the recent developed techniques to detect those special types of anomalies. We provide a general overview of the problem domain, common formulations, existing methodologies and potential directions. With this work, we hope to call out the attention from the research community on this challenging problem and open up new directions that we can contribute in the future.Comment: 23 page

    E-commerce Anomaly Detection: A Bayesian Semi-Supervised Tensor Decomposition Approach using Natural Gradients

    Full text link
    Anomaly Detection has several important applications. In this paper, our focus is on detecting anomalies in seller-reviewer data using tensor decomposition. While tensor-decomposition is mostly unsupervised, we formulate Bayesian semi-supervised tensor decomposition to take advantage of sparse labeled data. In addition, we use Polya-Gamma data augmentation for the semi-supervised Bayesian tensor decomposition. Finally, we show that the P\'olya-Gamma formulation simplifies calculation of the Fisher information matrix for partial natural gradient learning. Our experimental results show that our semi-supervised approach outperforms state of the art unsupervised baselines. And that the partial natural gradient learning outperforms stochastic gradient learning and Online-EM with sufficient statistics.Comment: Citations renderin

    Latent Network Summarization: Bridging Network Embedding and Summarization

    Full text link
    Motivated by the computational and storage challenges that dense embeddings pose, we introduce the problem of latent network summarization that aims to learn a compact, latent representation of the graph structure with dimensionality that is independent of the input graph size (i.e., #nodes and #edges), while retaining the ability to derive node representations on the fly. We propose Multi-LENS, an inductive multi-level latent network summarization approach that leverages a set of relational operators and relational functions (compositions of operators) to capture the structure of egonets and higher-order subgraphs, respectively. The structure is stored in low-rank, size-independent structural feature matrices, which along with the relational functions comprise our latent network summary. Multi-LENS is general and naturally supports both homogeneous and heterogeneous graphs with or without directionality, weights, attributes or labels. Extensive experiments on real graphs show 3.5 - 34.3% improvement in AUC for link prediction, while requiring 80 - 2152x less output storage space than baseline embedding methods on large datasets. As application areas, we show the effectiveness of Multi-LENS in detecting anomalies and events in the Enron email communication graph and Twitter co-mention graph

    Machine Learning Spatial Geometry from Entanglement Features

    Full text link
    Motivated by the close relations of the renormalization group with both the holography duality and the deep learning, we propose that the holographic geometry can emerge from deep learning the entanglement feature of a quantum many-body state. We develop a concrete algorithm, call the entanglement feature learning (EFL), based on the random tensor network (RTN) model for the tensor network holography. We show that each RTN can be mapped to a Boltzmann machine, trained by the entanglement entropies over all subregions of a given quantum many-body state. The goal is to construct the optimal RTN that best reproduce the entanglement feature. The RTN geometry can then be interpreted as the emergent holographic geometry. We demonstrate the EFL algorithm on 1D free fermion system and observe the emergence of the hyperbolic geometry (AdS3_3 spatial geometry) as we tune the fermion system towards the gapless critical point (CFT2_2 point).Comment: 14 pages, 14 figure

    Graph Embedding with Rich Information through Heterogeneous Network

    Full text link
    Graph embedding has attracted increasing attention due to its critical application in social network analysis. Most existing algorithms for graph embedding only rely on the typology information and fail to use the copious information in nodes as well as edges. As a result, their performance for many tasks may not be satisfactory. In this paper, we proposed a novel and general framework of representation learning for graph with rich text information through constructing a bipartite heterogeneous network. Specially, we designed a biased random walk to explore the constructed heterogeneous network with the notion of flexible neighborhood. The efficacy of our method is demonstrated by extensive comparison experiments with several baselines on various datasets. It improves the Micro-F1 and Macro-F1 of node classification by 10% and 7% on Cora dataset.Comment: 9 pages, 7 figures, 4 table

    node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching

    Full text link
    Identity stitching, the task of identifying and matching various online references (e.g., sessions over different devices and timespans) to the same user in real-world web services, is crucial for personalization and recommendations. However, traditional user stitching approaches, such as grouping or blocking, require quadratic pairwise comparisons between a massive number of user activities, thus posing both computational and storage challenges. Recent works, which are often application-specific, heuristically seek to reduce the amount of comparisons, but they suffer from low precision and recall. To solve the problem in an application-independent way, we take a heterogeneous network-based approach in which users (nodes) interact with content (e.g., sessions, websites), and may have attributes (e.g., location). We propose node2bits, an efficient framework that represents multi-dimensional features of node contexts with binary hashcodes. node2bits leverages feature-based temporal walks to encapsulate short- and long-term interactions between nodes in heterogeneous web networks, and adopts SimHash to obtain compact, binary representations and avoid the quadratic complexity for similarity search. Extensive experiments on large-scale real networks show that node2bits outperforms traditional techniques and existing works that generate real-valued embeddings by up to 5.16% in F1 score on user stitching, while taking only up to 1.56% as much storage

    Exploiting the Structure of Bipartite Graphs for Algebraic and Spectral Graph Theory Applications

    Full text link
    In this article, we extend several algebraic graph analysis methods to bipartite networks. In various areas of science, engineering and commerce, many types of information can be represented as networks, and thus the discipline of network analysis plays an important role in these domains. A powerful and widespread class of network analysis methods is based on algebraic graph theory, i.e., representing graphs as square adjacency matrices. However, many networks are of a very specific form that clashes with that representation: They are bipartite. That is, they consist of two node types, with each edge connecting a node of one type with a node of the other type. Examples of bipartite networks (also called \emph{two-mode networks}) are persons and the social groups they belong to, musical artists and the musical genres they play, and text documents and the words they contain. In fact, any type of feature that can be represented by a categorical variable can be interpreted as a bipartite network. Although bipartite networks are widespread, most literature in the area of network analysis focuses on unipartite networks, i.e., those networks with only a single type of node. The purpose of this article is to extend a selection of important algebraic network analysis methods to bipartite networks, showing that many methods from algebraic graph theory can be applied to bipartite networks with only minor modifications. We show methods for clustering, visualization and link prediction. Additionally, we introduce new algebraic methods for measuring the bipartivity in near-bipartite graphs.Comment: 37 pages; fixed reference
    • …
    corecore