56,003 research outputs found
Latent Space Model for Multi-Modal Social Data
With the emergence of social networking services, researchers enjoy the
increasing availability of large-scale heterogenous datasets capturing online
user interactions and behaviors. Traditional analysis of techno-social systems
data has focused mainly on describing either the dynamics of social
interactions, or the attributes and behaviors of the users. However,
overwhelming empirical evidence suggests that the two dimensions affect one
another, and therefore they should be jointly modeled and analyzed in a
multi-modal framework. The benefits of such an approach include the ability to
build better predictive models, leveraging social network information as well
as user behavioral signals. To this purpose, here we propose the Constrained
Latent Space Model (CLSM), a generalized framework that combines Mixed
Membership Stochastic Blockmodels (MMSB) and Latent Dirichlet Allocation (LDA)
incorporating a constraint that forces the latent space to concurrently
describe the multiple data modalities. We derive an efficient inference
algorithm based on Variational Expectation Maximization that has a
computational cost linear in the size of the network, thus making it feasible
to analyze massive social datasets. We validate the proposed framework on two
problems: prediction of social interactions from user attributes and behaviors,
and behavior prediction exploiting network information. We perform experiments
with a variety of multi-modal social systems, spanning location-based social
networks (Gowalla), social media services (Instagram, Orkut), e-commerce and
review sites (Amazon, Ciao), and finally citation networks (Cora). The results
indicate significant improvement in prediction accuracy over state of the art
methods, and demonstrate the flexibility of the proposed approach for
addressing a variety of different learning problems commonly occurring with
multi-modal social data.Comment: 12 pages, 7 figures, 2 table
Heterogeneous Sensor Signal Processing for Inference with Nonlinear Dependence
Inferring events of interest by fusing data from multiple heterogeneous sources has been an interesting and important topic in recent years. Several issues related to inference using heterogeneous data with complex and nonlinear dependence are investigated in this dissertation. We apply copula theory to characterize the dependence among heterogeneous data.
In centralized detection, where sensor observations are available at the fusion center (FC), we study copula-based fusion. We design detection algorithms based on sample-wise copula selection and mixture of copulas model in different scenarios of the true dependence. The proposed approaches are theoretically justified and perform well when applied to fuse acoustic and seismic sensor data for personnel detection. Besides traditional sensors, the access to the massive amount of social media data provides a unique opportunity for extracting information about unfolding events. We further study how sensor networks and social media complement each other in facilitating the data-to-decision making process. We propose a copula-based joint characterization of multiple dependent time series from sensors and social media. As a proof-of-concept, this model is applied to the fusion of Google Trends (GT) data and stock/flu data for prediction, where the stock/flu data serves as a surrogate for sensor data.
In energy constrained networks, local observations are compressed before they are transmitted to the FC. In these cases, conditional dependence and heterogeneity complicate the system design particularly. We consider the classification of discrete random signals in Wireless Sensor Networks (WSNs), where, for communication efficiency, only local decisions are transmitted. We derive the necessary conditions for the optimal decision rules at the sensors and the FC by introducing a hidden random variable. An iterative algorithm is designed to search for the optimal decision rules. Its convergence and asymptotical optimality are also proved. The performance of the proposed scheme is illustrated for the distributed Automatic Modulation Classification (AMC) problem. Censoring is another communication efficient strategy, in which sensors transmit only informative observations to the FC, and censor those deemed uninformative . We design the detectors that take into account the spatial dependence among observations. Fusion rules for censored data are proposed with continuous and discrete local messages, respectively. Their computationally efficient counterparts based on the key idea of injecting controlled noise at the FC before fusion are also investigated.
In this thesis, with heterogeneous and dependent sensor observations, we consider not only inference in parallel frameworks but also the problem of collaborative inference where collaboration exists among local sensors. Each sensor forms coalition with other sensors and shares information within the coalition, to maximize its inference performance. The collaboration strategy is investigated under a communication constraint. To characterize the influence of inter-sensor dependence on inference performance and thus collaboration strategy, we quantify the gain and loss in forming a coalition by introducing the copula-based definitions of diversity gain and redundancy loss for both estimation and detection problems. A coalition formation game is proposed for the distributed inference problem, through which the information contained in the inter-sensor dependence is fully explored and utilized for improved inference performance
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Connection Discovery using Shared Images by Gaussian Relational Topic Model
Social graphs, representing online friendships among users, are one of the
fundamental types of data for many applications, such as recommendation,
virality prediction and marketing in social media. However, this data may be
unavailable due to the privacy concerns of users, or kept private by social
network operators, which makes such applications difficult. Inferring user
interests and discovering user connections through their shared multimedia
content has attracted more and more attention in recent years. This paper
proposes a Gaussian relational topic model for connection discovery using user
shared images in social media. The proposed model not only models user
interests as latent variables through their shared images, but also considers
the connections between users as a result of their shared images. It explicitly
relates user shared images to user connections in a hierarchical, systematic
and supervisory way and provides an end-to-end solution for the problem. This
paper also derives efficient variational inference and learning algorithms for
the posterior of the latent variables and model parameters. It is demonstrated
through experiments with over 200k images from Flickr that the proposed method
significantly outperforms the methods in previous works.Comment: IEEE International Conference on Big Data 201
Network Model Selection for Task-Focused Attributed Network Inference
Networks are models representing relationships between entities. Often these
relationships are explicitly given, or we must learn a representation which
generalizes and predicts observed behavior in underlying individual data (e.g.
attributes or labels). Whether given or inferred, choosing the best
representation affects subsequent tasks and questions on the network. This work
focuses on model selection to evaluate network representations from data,
focusing on fundamental predictive tasks on networks. We present a modular
methodology using general, interpretable network models, task neighborhood
functions found across domains, and several criteria for robust model
selection. We demonstrate our methodology on three online user activity
datasets and show that network model selection for the appropriate network task
vs. an alternate task increases performance by an order of magnitude in our
experiments
- …