5,598 research outputs found

    Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

    Full text link
    Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 201

    Locality statistics for anomaly detection in time series of graphs

    Full text link
    The ability to detect change-points in a dynamic network or a time series of graphs is an increasingly important task in many applications of the emerging discipline of graph signal processing. This paper formulates change-point detection as a hypothesis testing problem in terms of a generative latent position model, focusing on the special case of the Stochastic Block Model time series. We analyze two classes of scan statistics, based on distinct underlying locality statistics presented in the literature. Our main contribution is the derivation of the limiting distributions and power characteristics of the competing scan statistics. Performance is compared theoretically, on synthetic data, and on the Enron email corpus. We demonstrate that both statistics are admissible in one simple setting, while one of the statistics is inadmissible a second setting.Comment: 15 pages, 6 figure

    A framework for the construction of generative models for mesoscale structure in multilayer networks

    Get PDF
    Multilayer networks allow one to represent diverse and coupled connectivity patterns—such as time-dependence, multiple subsystems, or both—that arise in many applications and which are difficult or awkward to incorporate into standard network representations. In the study of multilayer networks, it is important to investigate mesoscale (i.e., intermediate-scale) structures, such as dense sets of nodes known as communities, to discover network features that are not apparent at the microscale or the macroscale. The ill-defined nature of mesoscale structure and its ubiquity in empirical networks make it crucial to develop generative models that can produce the features that one encounters in empirical networks. Key purposes of such models include generating synthetic networks with empirical properties of interest, benchmarking mesoscale-detection methods and algorithms, and inferring structure in empirical multilayer networks. In this paper, we introduce a framework for the construction of generative models for mesoscale structures in multilayer networks. Our framework provides a standardized set of generative models, together with an associated set of principles from which they are derived, for studies of mesoscale structures in multilayer networks. It unifies and generalizes many existing models for mesoscale structures in fully ordered (e.g., temporal) and unordered (e.g., multiplex) multilayer networks. One can also use it to construct generative models for mesoscale structures in partially ordered multilayer networks (e.g., networks that are both temporal and multiplex). Our framework has the ability to produce many features of empirical multilayer networks, and it explicitly incorporates a user-specified dependency structure between layers. We discuss the parameters and properties of our framework, and we illustrate examples of its use with benchmark models for community-detection methods and algorithms in multilayer networks

    Phytoplankton Hotspot Prediction With an Unsupervised Spatial Community Model

    Full text link
    Many interesting natural phenomena are sparsely distributed and discrete. Locating the hotspots of such sparsely distributed phenomena is often difficult because their density gradient is likely to be very noisy. We present a novel approach to this search problem, where we model the co-occurrence relations between a robot's observations with a Bayesian nonparametric topic model. This approach makes it possible to produce a robust estimate of the spatial distribution of the target, even in the absence of direct target observations. We apply the proposed approach to the problem of finding the spatial locations of the hotspots of a specific phytoplankton taxon in the ocean. We use classified image data from Imaging FlowCytobot (IFCB), which automatically measures individual microscopic cells and colonies of cells. Given these individual taxon-specific observations, we learn a phytoplankton community model that characterizes the co-occurrence relations between taxa. We present experiments with simulated robot missions drawn from real observation data collected during a research cruise traversing the US Atlantic coast. Our results show that the proposed approach outperforms nearest neighbor and k-means based methods for predicting the spatial distribution of hotspots from in-situ observations.Comment: To appear in ICRA 2017, Singapor
    • …
    corecore