11 research outputs found
Modeling Relational Data via Latent Factor Blockmodel
In this paper we address the problem of modeling relational data, which
appear in many applications such as social network analysis, recommender
systems and bioinformatics. Previous studies either consider latent feature
based models but disregarding local structure in the network, or focus
exclusively on capturing local structure of objects based on latent blockmodels
without coupling with latent characteristics of objects. To combine the
benefits of the previous work, we propose a novel model that can simultaneously
incorporate the effect of latent features and covariates if any, as well as the
effect of latent structure that may exist in the data. To achieve this, we
model the relation graph as a function of both latent feature factors and
latent cluster memberships of objects to collectively discover globally
predictive intrinsic properties of objects and capture latent block structure
in the network to improve prediction performance. We also develop an
optimization transfer algorithm based on the generalized EM-style strategy to
learn the latent factors. We prove the efficacy of our proposed model through
the link prediction task and cluster analysis task, and extensive experiments
on the synthetic data and several real world datasets suggest that our proposed
LFBM model outperforms the other state of the art approaches in the evaluated
tasks.Comment: 10 pages, 12 figure
Community Detection over Social Media: A Compressive Survey
Social media mining is an emerging field with a lot of research areas such as, sentiment analysis, link prediction, spammer detection, and community detection. In today’s scenario, researchers are working in the area of community detection and sentiment analysis because the main component of social media is user. Users create different types of community in social world. The ideas and discussions in the community may be negative or positive. To detect the communities and their behavior researcher have done a lot of work, but still two major issues are presents per survey, Scalability and Quality of the community. These issues of community detection motivate to work in this area of social media mining. This paper gives a bird eye view over social media and community detection
Co-Clustering with Generative Models
In this paper, we present a generative model for co-clustering and develop algorithms based on the mean field approximation for the corresponding modeling problem. These algorithms can be viewed as generalizations of the traditional model-based clustering; they extend hard co-clustering algorithms such as Bregman co-clustering to include soft assignments. We show empirically that these model-based algorithms offer better performance than their hard-assignment counterparts, especially with increasing problem complexity
Categories and functional units: An infinite hierarchical model for brain activations
We present a model that describes the structure in the responses of different brain areas to a set of stimuli in terms of stimulus categories (clusters of stimuli) and functional units (clusters of voxels). We assume that voxels within a unit respond similarly to all stimuli from the same category, and design a nonparametric hierarchical model to capture inter-subject variability among the units. The model explicitly encodes the relationship between brain activations and fMRI time courses. A variational inference algorithm derived based on the model learns categories, units, and a set of unit-category activation probabilities from data. When applied to data from an fMRI study of object recognition, the method finds meaningful and consistent clusterings of stimuli into categories and voxels into units.National Science Foundation (U.S.) (Grant IIS/CRCNS 0904625)National Science Foundation (U.S.) (CAREER Grant 0642971)McGovern Institute for Brain Research at MIT (Neurotechnology Program Grant)National Institutes of Health (U.S.) (Grant NIBIB NAMIC U54-EB005149)National Institutes of Health (U.S.) (Grant NCRR NAC P41-RR13218
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes
With the rapid development of online social media, online shopping sites and
cyber-physical systems, heterogeneous information networks have become
increasingly popular and content-rich over time. In many cases, such networks
contain multiple types of objects and links, as well as different kinds of
attributes. The clustering of these objects can provide useful insights in many
applications. However, the clustering of such networks can be challenging since
(a) the attribute values of objects are often incomplete, which implies that an
object may carry only partial attributes or even no attributes to correctly
label itself; and (b) the links of different types may carry different kinds of
semantic meanings, and it is a difficult task to determine the nature of their
relative importance in helping the clustering for a given purpose. In this
paper, we address these challenges by proposing a model-based clustering
algorithm. We design a probabilistic model which clusters the objects of
different types into a common hidden space, by using a user-specified set of
attributes, as well as the links from different relations. The strengths of
different types of links are automatically learned, and are determined by the
given purpose of clustering. An iterative algorithm is designed for solving the
clustering problem, in which the strengths of different types of links and the
quality of clustering results mutually enhance each other. Our experimental
results on real and synthetic data sets demonstrate the effectiveness and
efficiency of the algorithm.Comment: VLDB201
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Relational network-service clustering analysis with set evidences
Network administrators are faced with a large amount of network data that they need to sift through to analyze user behaviors and detect anomalies. Through a network monitoring tool, we obtained TCP and UDP connection records together with additional information of the associated users and software in an enterprise network. Instead of using traditional payload inspection techniques, we propose a method that clusters such network traffic data by using relations between entities so that it can be analyzed for frequent behaviors and anomalies. Relational methods like Markov Logic Networks is able to avoid the feature extraction stage and directly handle multi-relation situations. We extend the common pairwise representation in relational models by adopting set evidence to build a better objective for the network service clustering problem. The automatic clustering process helps the administrator filter out normal traffic in shorter time and get an abstract overview of opening transport layer ports in the whole network, which is beneficial for assessing network security risks. Experimental results on synthetic and real datasets suggest that our method is able to discover underlying services and anomalies (malware or abused ports) with good interpretations. © 2010 ACM