5,598 research outputs found
Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis
Notwithstanding recent work which has demonstrated the potential of using
Twitter messages for content-specific data mining and analysis, the depth of
such analysis is inherently limited by the scarcity of data imposed by the 140
character tweet limit. In this paper we describe a novel approach for targeted
knowledge exploration which uses tweet content analysis as a preliminary step.
This step is used to bootstrap more sophisticated data collection from directly
related but much richer content sources. In particular we demonstrate that
valuable information can be collected by following URLs included in tweets. We
automatically extract content from the corresponding web pages and treating
each web page as a document linked to the original tweet show how a temporal
topic model based on a hierarchical Dirichlet process can be used to track the
evolution of a complex topic structure of a Twitter community. Using
autism-related tweets we demonstrate that our method is capable of capturing a
much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, 201
Locality statistics for anomaly detection in time series of graphs
The ability to detect change-points in a dynamic network or a time series of
graphs is an increasingly important task in many applications of the emerging
discipline of graph signal processing. This paper formulates change-point
detection as a hypothesis testing problem in terms of a generative latent
position model, focusing on the special case of the Stochastic Block Model time
series. We analyze two classes of scan statistics, based on distinct underlying
locality statistics presented in the literature. Our main contribution is the
derivation of the limiting distributions and power characteristics of the
competing scan statistics. Performance is compared theoretically, on synthetic
data, and on the Enron email corpus. We demonstrate that both statistics are
admissible in one simple setting, while one of the statistics is inadmissible a
second setting.Comment: 15 pages, 6 figure
A framework for the construction of generative models for mesoscale structure in multilayer networks
Multilayer networks allow one to represent diverse and coupled connectivity patterns—such as time-dependence, multiple subsystems, or both—that arise in many applications and which are difficult or awkward to incorporate into standard network representations. In the study of multilayer networks, it is important to investigate mesoscale (i.e., intermediate-scale) structures, such as dense sets of nodes known as communities, to discover network features that are not apparent at the microscale or the macroscale. The ill-defined nature of mesoscale structure and its ubiquity in empirical networks make it crucial to develop generative models that can produce the features that one encounters in empirical networks. Key purposes of such models include generating synthetic networks with empirical properties of interest, benchmarking mesoscale-detection methods and algorithms, and inferring structure in empirical multilayer networks. In this paper, we introduce a framework for the construction of generative models for mesoscale structures in multilayer networks. Our framework provides a standardized set of generative models, together with an associated set of principles from which they are derived, for studies of mesoscale structures in multilayer networks. It unifies and generalizes many existing models for mesoscale structures in fully ordered (e.g., temporal) and unordered (e.g., multiplex) multilayer networks. One can also use it to construct generative models for mesoscale structures in partially ordered multilayer networks (e.g., networks that are both temporal and multiplex). Our framework has the ability to produce many features of empirical multilayer networks, and it explicitly incorporates a user-specified dependency structure between layers. We discuss the parameters and properties of our framework, and we illustrate examples of its use with benchmark models for community-detection methods and algorithms in multilayer networks
Phytoplankton Hotspot Prediction With an Unsupervised Spatial Community Model
Many interesting natural phenomena are sparsely distributed and discrete.
Locating the hotspots of such sparsely distributed phenomena is often difficult
because their density gradient is likely to be very noisy. We present a novel
approach to this search problem, where we model the co-occurrence relations
between a robot's observations with a Bayesian nonparametric topic model. This
approach makes it possible to produce a robust estimate of the spatial
distribution of the target, even in the absence of direct target observations.
We apply the proposed approach to the problem of finding the spatial locations
of the hotspots of a specific phytoplankton taxon in the ocean. We use
classified image data from Imaging FlowCytobot (IFCB), which automatically
measures individual microscopic cells and colonies of cells. Given these
individual taxon-specific observations, we learn a phytoplankton community
model that characterizes the co-occurrence relations between taxa. We present
experiments with simulated robot missions drawn from real observation data
collected during a research cruise traversing the US Atlantic coast. Our
results show that the proposed approach outperforms nearest neighbor and
k-means based methods for predicting the spatial distribution of hotspots from
in-situ observations.Comment: To appear in ICRA 2017, Singapor
- …