48,304 research outputs found
Learning and Inference in Massive Social Networks
Researchers and practitioners increasingly are gaining access
to data on explicit social networks. For example, telecommunications
and technology firms record data on consumer
networks (via phone calls, emails, voice-over-IP, instant messaging),
and social-network portal sites such as MySpace,
Friendster and Facebook record consumer-generated data
on social networks. Inference for fraud detection [5, 3, 8],
marketing [9], and other tasks can be improved with learned
models that take social networks into account and with collective
inference [12], which allows inferences about nodes
in the network to affect each other. However, these socialnetwork
graphs can be huge, comprising millions to billions
of nodes and one or two orders of magnitude more links.
This paper studies the application of collective inference
to improve prediction over a massive graph. Faced initially
with a social network comprising hundreds of millions of
nodes and a few billion edges, our goal is: to produce an
approximate consumer network that is orders of magnitude
smaller, but still facilitates improved performance via collective
inference. We introduce a sampling technique designed
to reduce the size of the network by many orders of magnitude,
but to keep linkages that facilitate improved prediction
via collective inference.
In short, the sampling scheme operates as follows: (1)
choose a set of nodes of interest; (2) then, in analogy to
snowball sampling [14], grow local graphs around these nodes,
adding their social networks, their neighbors’ social networks,
and so on; (3) next, prune these local graphs of edges
which are expected to contribute little to the collective inference;
(4) finally, connect the local graphs together to form
a graph with (hopefully) useful inference connectivity.
We apply this sampling method to assess whether collective
inference can improve learned targeted-marketing models
for a social network of consumers of telecommunication
services. Prior work [9] has shown improvement to the learning
of targeting models by including social-neighborhood
information—in particular, information on existing customers
in the immediate social network of a potential target. However,
the improvement was restricted to the “network neighbors”,
those targets linked to a prior customer thought to
be good candidates for the new service. Collective inference
techniques may extend the predictive influence of existing
customers beyond their immediate neighborhoods. For the
present work, our motivating conjecture has been that this
influence can improve prediction for consumers who are not
strongly connected to existing customers. Our results show
that this is indeed the case: collective inference on the approximate
network enables significantly improved predictive
performance for non-network-neighbor consumers, and for
consumers who have few links to existing customers.
In the rest of this extended abstract we motivate our approach,
describe our sampling method, present results on
applying our approach to a large real-world target marketing
campaign in the telecommunications industry, and finally
discuss our findings.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Learning and Inference in Massive Social Networks
Researchers and practitioners increasingly are gaining access
to data on explicit social networks. For example, telecommunications
and technology firms record data on consumer
networks (via phone calls, emails, voice-over-IP, instant messaging),
and social-network portal sites such as MySpace,
Friendster and Facebook record consumer-generated data
on social networks. Inference for fraud detection [5, 3, 8],
marketing [9], and other tasks can be improved with learned
models that take social networks into account and with collective
inference [12], which allows inferences about nodes
in the network to affect each other. However, these socialnetwork
graphs can be huge, comprising millions to billions
of nodes and one or two orders of magnitude more links.
This paper studies the application of collective inference
to improve prediction over a massive graph. Faced initially
with a social network comprising hundreds of millions of
nodes and a few billion edges, our goal is: to produce an
approximate consumer network that is orders of magnitude
smaller, but still facilitates improved performance via collective
inference. We introduce a sampling technique designed
to reduce the size of the network by many orders of magnitude,
but to keep linkages that facilitate improved prediction
via collective inference.
In short, the sampling scheme operates as follows: (1)
choose a set of nodes of interest; (2) then, in analogy to
snowball sampling [14], grow local graphs around these nodes,
adding their social networks, their neighbors’ social networks,
and so on; (3) next, prune these local graphs of edges
which are expected to contribute little to the collective inference;
(4) finally, connect the local graphs together to form
a graph with (hopefully) useful inference connectivity.
We apply this sampling method to assess whether collective
inference can improve learned targeted-marketing models
for a social network of consumers of telecommunication
services. Prior work [9] has shown improvement to the learning
of targeting models by including social-neighborhood
information—in particular, information on existing customers
in the immediate social network of a potential target. However,
the improvement was restricted to the “network neighbors”,
those targets linked to a prior customer thought to
be good candidates for the new service. Collective inference
techniques may extend the predictive influence of existing
customers beyond their immediate neighborhoods. For the
present work, our motivating conjecture has been that this
influence can improve prediction for consumers who are not
strongly connected to existing customers. Our results show
that this is indeed the case: collective inference on the approximate
network enables significantly improved predictive
performance for non-network-neighbor consumers, and for
consumers who have few links to existing customers.
In the rest of this extended abstract we motivate our approach,
describe our sampling method, present results on
applying our approach to a large real-world target marketing
campaign in the telecommunications industry, and finally
discuss our findings.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Latent Space Model for Multi-Modal Social Data
With the emergence of social networking services, researchers enjoy the
increasing availability of large-scale heterogenous datasets capturing online
user interactions and behaviors. Traditional analysis of techno-social systems
data has focused mainly on describing either the dynamics of social
interactions, or the attributes and behaviors of the users. However,
overwhelming empirical evidence suggests that the two dimensions affect one
another, and therefore they should be jointly modeled and analyzed in a
multi-modal framework. The benefits of such an approach include the ability to
build better predictive models, leveraging social network information as well
as user behavioral signals. To this purpose, here we propose the Constrained
Latent Space Model (CLSM), a generalized framework that combines Mixed
Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA)
incorporating a constraint that forces the latent space to concurrently
describe the multiple data modalities. We derive an efficient inference
algorithm based on Variational Expectation Maximization that has a
computational cost linear in the size of the network, thus making it feasible
to analyze massive social datasets. We validate the proposed framework on two
problems: prediction of social interactions from user attributes and behaviors,
and behavior prediction exploiting network information. We perform experiments
with a variety of multi-modal social systems, spanning location-based social
networks (Gowalla), social media services (Instagram, Orkut), e-commerce and
review sites (Amazon, Ciao), and finally citation networks (Cora). The results
indicate significant improvement in prediction accuracy over state of the art
methods, and demonstrate the flexibility of the proposed approach for
addressing a variety of different learning problems commonly occurring with
multi-modal social data.Comment: 12 pages, 7 figures, 2 table
Zero-Truncated Poisson Tensor Factorization for Massive Binary Tensors
We present a scalable Bayesian model for low-rank factorization of massive
tensors with binary observations. The proposed model has the following key
properties: (1) in contrast to the models based on the logistic or probit
likelihood, using a zero-truncated Poisson likelihood for binary data allows
our model to scale up in the number of \emph{ones} in the tensor, which is
especially appealing for massive but sparse binary tensors; (2)
side-information in form of binary pairwise relationships (e.g., an adjacency
network) between objects in any tensor mode can also be leveraged, which can be
especially useful in "cold-start" settings; and (3) the model admits simple
Bayesian inference via batch, as well as \emph{online} MCMC; the latter allows
scaling up even for \emph{dense} binary data (i.e., when the number of ones in
the tensor/network is also massive). In addition, non-negative factor matrices
in our model provide easy interpretability, and the tensor rank can be inferred
from the data. We evaluate our model on several large-scale real-world binary
tensors, achieving excellent computational scalability, and also demonstrate
its usefulness in leveraging side-information provided in form of
mode-network(s).Comment: UAI (Uncertainty in Artificial Intelligence) 201
- …