11 research outputs found

    Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

    Full text link
    We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.Comment: 11 page

    Network Sampling: From Static to Streaming Graphs

    Full text link
    Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling, by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topology-based sampling. Our experimental results indicate that our proposed family of sampling methods more accurately preserves the underlying properties of the graph for both static and streaming graphs. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

    Statistical analysis of network data motivated by problems in online social media

    Full text link
    Networks have been widely used to represent and analyze a system of connected elements. Online social media networks, as a result of the expansion of the Internet and increased need of communication, have become an increasingly important part of people's lives. This thesis focuses on the statistical analysis of network data motivated by problems in online social media. It discusses problems arising from both explicit network data and implicit network data. Explicit network data are data where network structures are observable, implicit network data are those that do not have a network structure but occur under the influence of an underlying network. For the explicit network data analysis, we develop a novel method of recovering a fundamental characteristic -- network degree distributions -- under sampling. We formulate the problem of estimating degree distribution as an inverse problem. We show that this problem is ill-conditioned for many sampling methods in practice, and accordingly propose a constrained, penalized weighted least-squares approach to solve this problem. We demonstrate the ability of our method to accurately reconstruct the degree distributions from simulated network data and real world social network data. We also propose practical usage of the estimates relevant to marketing and advertising. For the implicit network data analysis, we look at review data from the popular review websites. Motivated by articles from the popular press and the research community which publicized that the average rating for top review sites is above 4 out of 5 stars, we study the phenomena of review rating trends and convergence using restaurant review data from TripAdvisor. We analyze the trend on different levels -- a rough analysis of the characteristics of the ratings, and a subtler statistical modeling with ordinal logistic regressions. Taking into account the implicit network underlying the review data, we suggest the upward trend observed in restaurant review ratings may be explained by social influence on an individual's perception of qualities. We use the intensity of review postings as an indicator of how popular a restaurant is and to test to what extent the increase in review intensity explains increases in average rating. After that, we consider a more nuanced approach to the joint modeling of ratings and review intensity which would allow for interaction between the two, rather than intensity serving only as an explanatory variable to ratings. Specifically, a state-space model is used to test the interaction between review intensity and review ratings

    Identification of Influentials in virtual social network: an agent-based simulation model of social influence processes

    Get PDF
    Die zunehmende Virtualisierung von gesellschaftlichen Sozialstrukturen durch den Social Media Bereich und insbesondere durch die virtuellen sozialen Netzwerke stellt die Marketingforschung vor neue Herausforderungen. Aufgrund der technologischen Entwicklung des Web 2.0 entstehen für Konsumenten schnelle und einfache Kommunikations- und Interaktionsmöglichkeiten zum Erfahrungsaustausch über die Produkte und Dienstleistungen eines Unternehmens. Innerhalb eines virtuellen sozialen Netzwerkes existieren Influentials, die aufgrund ihrer kommunikativen Verhaltensweisen und der netzwerkstrukturellen Einbettung eine einzigartige soziale Beeinflussungsfähigkeit aufweisen. Für das Marketing der Unternehmen stellen das Verständnis über die sozialen Beeinflussungsprozesse und die Identifikation der Influentials die zentralen Erfolgsfaktoren dar, um die Konsumenteninteraktion im Sinne der Unternehmenszielsetzung zu beeinflussen. Bisherige Analyse- bzw. Identifikationsmethoden für diese Influentials vernachlässigen jedoch die bedeutsame interpersonelle Perspektive. Die netzwerkstrukturelle Einbettung der Konsumenten bzw. Individuen sowie deren Kommunikations- und Interaktionsprozesse untereinander führen zu einem dynamischen, nichtlinearen und komplexen Sozialsystem. Bei der Untersuchung dieser Dynamiken stoßen traditionelle Analysemethoden der Marketingforschung an ihre Grenzen. Deshalb entwickelt der Verfasser ein agentenbasiertes Simulationsmodell, um das individuelle Konsumentenhalten als komplexes und dynamisches System abzubilden. Die Simulationsergebnisse deuten darauf hin, dass die Influentials weder über eine strukturell besonders bedeutsame Position innerhalb des Netzwerkes verfügen, noch eine erhöhte soziale Aktivität aufweisen. Die bisher verwendeten Verfahren der strukturellen sozialen Netzwerkanalyse und der sozialen Aktivitätsanalyse sind deshalb nur eingeschränkt zur Identifikation von Influentials geeignet. Aus einer interpersonellen Analyseperspektive zeigt sich, dass die Influentials eine besonders hohe wahrgenommene Glaubwürdigkeit aufweisen und das soziale Umfeld dieser Individuen durch eine hohe Empfänglichkeit für soziale Beeinflussungen gekennzeichnet ist. Die agentenbasierte Simulation erweitert somit das Verständnis über das sozial beeinflusste Konsumentenverhalten und liefert damit wertvolle Hinweise für die praxisnahe Identifikation von Influentials in einem virtuellen sozialen Netzwerk.Virtual social networking sites have become more and more popular over the last few years, attract millions of users worldwide and are growing exponentially. The increasing amount of virtually connected consumers leads to a social-driven information exchange about products, brands or services. Within virtual social networks, influentials can be considered as key users with high influence capabilities, unique communication patterns and important structural network positions. For marketers an understanding of social influence is key to benefit from consumer-to-consumer interaction and to address potential new customers by utilizing these influentials. So far, virtual social network analysis neglects interpersonal factors of influence as well as an individual consumer decision making perspective. The analysis of individual interaction and the lack of empirical data from virtual social networks require a research method, which models individual consumer behaviors as a complex and adaptive system. Therefore, the author develops an agent-based simulation model to explore and to investigate social influence processes by integrating perceived social activity, perceived structural positions and interpersonal relationship characteristics with an individual decision making perspective. Simulation results indicate that important members in virtual social networks are inadequately identified either through structural network or activity analysis respectively. Hence, these methods are less appropriate to identify influentials within a virtual social network. The interpersonal analysis of the social influence processes shows that influentials are characterized by a high perceived credibility. Moreover, the social contacts of the influentials are highly susceptible for social influences. The agent-based simulation model provides a deeper understanding of social influence processes in virtual social networks and serves marketers as a superior opportunity for identifying socially influential network members
    corecore