11 research outputs found

    Modeling Relational Data via Latent Factor Blockmodel

    Full text link
    In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks.Comment: 10 pages, 12 figure

    Community Detection over Social Media: A Compressive Survey

    Get PDF
    Social media mining is an emerging field with a lot of research areas such as, sentiment analysis, link prediction, spammer detection, and community detection. In today’s scenario, researchers are working in the area of community detection and sentiment analysis because the main component of social media is user. Users create different types of community in social world. The ideas and discussions in the community may be negative or positive. To detect the communities and their behavior researcher have done a lot of work, but still two major issues are presents per survey, Scalability and Quality of the community. These issues of community detection motivate to work in this area of social media mining. This paper gives a bird eye view over social media and community detection

    Co-Clustering with Generative Models

    Get PDF
    In this paper, we present a generative model for co-clustering and develop algorithms based on the mean field approximation for the corresponding modeling problem. These algorithms can be viewed as generalizations of the traditional model-based clustering; they extend hard co-clustering algorithms such as Bregman co-clustering to include soft assignments. We show empirically that these model-based algorithms offer better performance than their hard-assignment counterparts, especially with increasing problem complexity

    Categories and functional units: An infinite hierarchical model for brain activations

    Get PDF
    We present a model that describes the structure in the responses of different brain areas to a set of stimuli in terms of stimulus categories (clusters of stimuli) and functional units (clusters of voxels). We assume that voxels within a unit respond similarly to all stimuli from the same category, and design a nonparametric hierarchical model to capture inter-subject variability among the units. The model explicitly encodes the relationship between brain activations and fMRI time courses. A variational inference algorithm derived based on the model learns categories, units, and a set of unit-category activation probabilities from data. When applied to data from an fMRI study of object recognition, the method finds meaningful and consistent clusterings of stimuli into categories and voxels into units.National Science Foundation (U.S.) (Grant IIS/CRCNS 0904625)National Science Foundation (U.S.) (CAREER Grant 0642971)McGovern Institute for Brain Research at MIT (Neurotechnology Program Grant)National Institutes of Health (U.S.) (Grant NIBIB NAMIC U54-EB005149)National Institutes of Health (U.S.) (Grant NCRR NAC P41-RR13218

    Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes

    Full text link
    With the rapid development of online social media, online shopping sites and cyber-physical systems, heterogeneous information networks have become increasingly popular and content-rich over time. In many cases, such networks contain multiple types of objects and links, as well as different kinds of attributes. The clustering of these objects can provide useful insights in many applications. However, the clustering of such networks can be challenging since (a) the attribute values of objects are often incomplete, which implies that an object may carry only partial attributes or even no attributes to correctly label itself; and (b) the links of different types may carry different kinds of semantic meanings, and it is a difficult task to determine the nature of their relative importance in helping the clustering for a given purpose. In this paper, we address these challenges by proposing a model-based clustering algorithm. We design a probabilistic model which clusters the objects of different types into a common hidden space, by using a user-specified set of attributes, as well as the links from different relations. The strengths of different types of links are automatically learned, and are determined by the given purpose of clustering. An iterative algorithm is designed for solving the clustering problem, in which the strengths of different types of links and the quality of clustering results mutually enhance each other. Our experimental results on real and synthetic data sets demonstrate the effectiveness and efficiency of the algorithm.Comment: VLDB201

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

    Relational network-service clustering analysis with set evidences

    Get PDF
    Network administrators are faced with a large amount of network data that they need to sift through to analyze user behaviors and detect anomalies. Through a network monitoring tool, we obtained TCP and UDP connection records together with additional information of the associated users and software in an enterprise network. Instead of using traditional payload inspection techniques, we propose a method that clusters such network traffic data by using relations between entities so that it can be analyzed for frequent behaviors and anomalies. Relational methods like Markov Logic Networks is able to avoid the feature extraction stage and directly handle multi-relation situations. We extend the common pairwise representation in relational models by adopting set evidence to build a better objective for the network service clustering problem. The automatic clustering process helps the administrator filter out normal traffic in shorter time and get an abstract overview of opening transport layer ports in the whole network, which is beneficial for assessing network security risks. Experimental results on synthetic and real datasets suggest that our method is able to discover underlying services and anomalies (malware or abused ports) with good interpretations. © 2010 ACM

    Community evolution in dynamic multi-mode networks

    Full text link
    corecore