1,144 research outputs found
Semantic Networks of Interests in Online NSSI Communities
Persons who engage in non-suicidal self-injury (NSSI), often conceal their
practices which limits the examination and understanding of those who engage in
NSSI. The goal of this research is to utilize public online social networks
(namely, in LiveJournal, a major blogging network) to observe the NSSI
population's communication in a naturally occurring setting. Specifically,
LiveJournal users can publicly declare their interests. We collected the
self-declared interests of 22,000 users who are members of or participate in 43
NSSI-related communities. We extracted a bimodal socio-semantic network of
users and interests based on their similarity. The semantic subnetwork of
interests contains NSSI terms (such as "self-injury" and "razors"), references
to music performers (such as "Nine Inch Nails"), and general daily life and
creativity related terms (such as "poetry" and "boys"). Assuming users are
genuine in their declarations, the words reveal distinct patterns of interest
and may signal keys to NSSI.Comment: 5 pages, 3 figures. Presented at Words and Networks: Language Use in
Socio-Technical Networks (workshop at 2012 ACM Web Science Conference
Ringo: Interactive Graph Analytics on Big-Memory Machines
We present Ringo, a system for analysis of large graphs. Graphs provide a way
to represent and analyze systems of interacting objects (people, proteins,
webpages) with edges between the objects denoting interactions (friendships,
physical interactions, links). Mining graphs provides valuable insights about
individual objects as well as the relationships among them.
In building Ringo, we take advantage of the fact that machines with large
memory and many cores are widely available and also relatively affordable. This
allows us to build an easy-to-use interactive high-performance graph analytics
system. Graphs also need to be built from input data, which often resides in
the form of relational tables. Thus, Ringo provides rich functionality for
manipulating raw input data tables into various kinds of graphs. Furthermore,
Ringo also provides over 200 graph analytics functions that can then be applied
to constructed graphs.
We show that a single big-memory machine provides a very attractive platform
for performing analytics on all but the largest graphs as it offers excellent
performance and ease of use as compared to alternative approaches. With Ringo,
we also demonstrate how to integrate graph analytics with an iterative process
of trial-and-error data exploration and rapid experimentation, common in data
mining workloads.Comment: 6 pages, 2 figure
Line graphs as social networks
The line graphs are clustered and assortative. They share these topological
features with some social networks. We argue that this similarity reveals the
cliquey character of the social networks. In the model proposed here, a social
network is the line graph of an initial network of families, communities,
interest groups, school classes and small companies. These groups play the role
of nodes, and individuals are represented by links between these nodes. The
picture is supported by the data on the LiveJournal network of about 8 x 10^6
people. In particular, sharp maxima of the observed data of the degree
dependence of the clustering coefficient C(k) are associated with cliques in
the social network.Comment: 11 pages, 4 figure
The role of reciprocation in social network formation, with an application to blogging
This paper deals with the role of reciprocation in the formation of individuals' social networks, that is to what extent initiating a relation brings about its reciprocation. Following the activity of a panel of bloggers over more than a year, we seek to establish whether bloggers are mainly involved in social networking or are part of the media industry. We adapt a standard capital investment model to study the effect of reciprocation on the building of social capital. Results of our analysis confirm that activity and reciprocation both play a role in the dynamics of social media.Bloggers, Friendship, LiveJournal, Media, Panel Data, Reciprocation, Reci procity, Social Capital, Social Networks
NScale: Neighborhood-centric Large-Scale Graph Analytics in the Cloud
There is an increasing interest in executing complex analyses over large
graphs, many of which require processing a large number of multi-hop
neighborhoods or subgraphs. Examples include ego network analysis, motif
counting, personalized recommendations, and others. These tasks are not well
served by existing vertex-centric graph processing frameworks, where user
programs are only able to directly access the state of a single vertex. This
paper introduces NSCALE, a novel end-to-end graph processing framework that
enables the distributed execution of complex subgraph-centric analytics over
large-scale graphs in the cloud. NSCALE enables users to write programs at the
level of subgraphs rather than at the level of vertices. Unlike most previous
graph processing frameworks, which apply the user program to the entire graph,
NSCALE allows users to declaratively specify subgraphs of interest. Our
framework includes a novel graph extraction and packing (GEP) module that
utilizes a cost-based optimizer to partition and pack the subgraphs of interest
into memory on as few machines as possible. The distributed execution engine
then takes over and runs the user program in parallel, while respecting the
scope of the various subgraphs. Our experimental results show
orders-of-magnitude improvements in performance and drastic reductions in the
cost of analytics compared to vertex-centric approaches.Comment: 26 pages, 15 figures, 5 table
EAGr: Supporting Continuous Ego-centric Aggregate Queries over Large Dynamic Graphs
In this work, we present EAGr, a system for supporting large numbers of
continuous neighborhood-based ("ego-centric") aggregate queries over large,
highly dynamic, and rapidly evolving graphs. Examples of such queries include
computation of personalized, tailored trends in social networks, anomaly/event
detection in financial transaction networks, local search and alerts in
spatio-temporal networks, to name a few. Key challenges in supporting such
continuous queries include high update rates typically seen in these
situations, large numbers of queries that need to be executed simultaneously,
and stringent low latency requirements. We propose a flexible, general, and
extensible in-memory framework for executing different types of ego-centric
aggregate queries over large dynamic graphs with low latencies. Our framework
is built around the notion of an aggregation overlay graph, a pre-compiled data
structure that encodes the computations to be performed when an update/query is
received. The overlay graph enables sharing of partial aggregates across
multiple ego-centric queries (corresponding to the nodes in the graph), and
also allows partial pre-computation of the aggregates to minimize the query
latencies. We present several highly scalable techniques for constructing an
overlay graph given an aggregation function, and also design incremental
algorithms for handling structural changes to the underlying graph. We also
present an optimal, polynomial-time algorithm for making the pre-computation
decisions given an overlay graph, and evaluate an approach to incrementally
adapt those decisions as the workload changes. Although our approach is
naturally parallelizable, we focus on a single-machine deployment and show that
our techniques can easily handle graphs of size up to 320 million nodes and
edges, and achieve update/query throughputs of over 500K/s using a single,
powerful machine.Comment: 18 pages, 1 table, 14 figure
EmptyHeaded: A Relational Engine for Graph Processing
There are two types of high-performance graph processing engines: low- and
high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide
optimized data structures and computation models but require users to write
low-level imperative code, hence ensuring that efficiency is the burden of the
user. In high-level engines, users write in query languages like datalog
(SociaLite) or SQL (Grail). High-level engines are easier to use but are orders
of magnitude slower than the low-level graph engines. We present EmptyHeaded, a
high-level engine that supports a rich datalog-like query language and achieves
performance comparable to that of low-level engines. At the core of
EmptyHeaded's design is a new class of join algorithms that satisfy strong
theoretical guarantees but have thus far not achieved performance comparable to
that of specialized graph processing engines. To achieve high performance,
EmptyHeaded introduces a new join engine architecture, including a novel query
optimizer and data layouts that leverage single-instruction multiple data
(SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level
approaches by up to three orders of magnitude on graph pattern queries,
PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude
faster than many low-level baselines. We validate that EmptyHeaded competes
with the best-of-breed low-level engine (Galois), achieving comparable
performance on PageRank and at most 3x worse performance on SSSP
PyTorch-BigGraph: A Large-scale Graph Embedding System
Graph embedding methods produce unsupervised node features from graphs that
can then be used for a variety of machine learning tasks. Modern graphs,
particularly in industrial applications, contain billions of nodes and
trillions of edges, which exceeds the capability of existing embedding systems.
We present PyTorch-BigGraph (PBG), an embedding system that incorporates
several modifications to traditional multi-relation embedding systems that
allow it to scale to graphs with billions of nodes and trillions of edges. PBG
uses graph partitioning to train arbitrarily large embeddings on either a
single machine or in a distributed environment. We demonstrate comparable
performance with existing embedding systems on common benchmarks, while
allowing for scaling to arbitrarily large graphs and parallelization on
multiple machines. We train and evaluate embeddings on several large social
network graphs as well as the full Freebase dataset, which contains over 100
million nodes and 2 billion edges
Being Rational or Aggressive? A Revisit to Dunbar's Number in Online Social Networks
Recent years have witnessed the explosion of online social networks (OSNs).
They provide powerful IT-innovations for online social activities such as
organizing contacts, publishing contents, and sharing interests between friends
who may never meet before. As more and more people become the active users of
online social networks, one may ponder questions such as: (1) Do OSNs indeed
improve our sociability? (2) To what extent can we expand our offline social
spectrum in OSNs? (3) Can we identify some interesting user behaviors in OSNs?
Our work in this paper just aims to answer these interesting questions. To this
end, we pay a revisit to the well-known Dunbar's number in online social
networks. Our main research contributions are as follows. First, to our best
knowledge, our work is the first one that systematically validates the
existence of the online Dunbar's number in the range of [200,300]. To reach
this, we combine using local-structure analysis and user-interaction analysis
for extensive real-world OSNs. Second, we divide OSNs users into two
categories: rational and aggressive, and find that rational users intend to
develop close and reciprocated relationships, whereas aggressive users have no
consistent behaviors. Third, we build a simple model to capture the constraints
of time and cognition that affect the evolution of online social networks.
Finally, we show the potential use of our findings in viral marketing and
privacy management in online social networks
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
- …