21,737 research outputs found
The Role of Gender in Social Network Organization
The digital traces we leave behind when engaging with the modern world offer
an interesting lens through which we study behavioral patterns as expression of
gender. Although gender differentiation has been observed in a number of
settings, the majority of studies focus on a single data stream in isolation.
Here we use a dataset of high resolution data collected using mobile phones, as
well as detailed questionnaires, to study gender differences in a large cohort.
We consider mobility behavior and individual personality traits among a group
of more than university students. We also investigate interactions among
them expressed via person-to-person contacts, interactions on online social
networks, and telecommunication. Thus, we are able to study the differences
between male and female behavior captured through a multitude of channels for a
single cohort. We find that while the two genders are similar in a number of
aspects, there are robust deviations that include multiple facets of social
interactions, suggesting the existence of inherent behavioral differences.
Finally, we quantify how aspects of an individual's characteristics and social
behavior reveals their gender by posing it as a classification problem. We ask:
How well can we distinguish between male and female study participants based on
behavior alone? Which behavioral features are most predictive
Supersampling and network reconstruction of urban mobility
Understanding human mobility is of vital importance for urban planning,
epidemiology, and many other fields that aim to draw policies from the
activities of humans in space. Despite recent availability of large scale data
sets related to human mobility such as GPS traces, mobile phone data, etc., it
is still true that such data sets represent a subsample of the population of
interest, and then might give an incomplete picture of the entire population in
question. Notwithstanding the abundant usage of such inherently limited data
sets, the impact of sampling biases on mobility patterns is unclear -- we do
not have methods available to reliably infer mobility information from a
limited data set. Here, we investigate the effects of sampling using a data set
of millions of taxi movements in New York City. On the one hand, we show that
mobility patterns are highly stable once an appropriate simple rescaling is
applied to the data, implying negligible loss of information due to subsampling
over long time scales. On the other hand, contrasting an appropriate null model
on the weighted network of vehicle flows reveals distinctive features which
need to be accounted for. Accordingly, we formulate a "supersampling"
methodology which allows us to reliably extrapolate mobility data from a
reduced sample and propose a number of network-based metrics to reliably assess
its quality (and that of other human mobility models). Our approach provides a
well founded way to exploit temporal patterns to save effort in recording
mobility data, and opens the possibility to scale up data from limited records
when information on the full system is needed.Comment: 14 pages, 4 figure
Earthquake Arrival Association with Backprojection and Graph Theory
The association of seismic wave arrivals with causative earthquakes becomes
progressively more challenging as arrival detection methods become more
sensitive, and particularly when earthquake rates are high. For instance,
seismic waves arriving across a monitoring network from several sources may
overlap in time, false arrivals may be detected, and some arrivals may be of
unknown phase (e.g., P- or S-waves). We propose an automated method to
associate arrivals with earthquake sources and obtain source locations
applicable to such situations. To do so we use a pattern detection metric based
on the principle of backprojection to reveal candidate sources, followed by
graph-theory-based clustering and an integer linear optimization routine to
associate arrivals with the minimum number of sources necessary to explain the
data. This method solves for all sources and phase assignments simultaneously,
rather than in a sequential greedy procedure as is common in other association
routines. We demonstrate our method on both synthetic and real data from the
Integrated Plate Boundary Observatory Chile (IPOC) seismic network of northern
Chile. For the synthetic tests we report results for cases with varying
complexity, including rates of 500 earthquakes/day and 500 false
arrivals/station/day, for which we measure true positive detection accuracy of
> 95%. For the real data we develop a new catalog between January 1, 2010 -
December 31, 2017 containing 817,548 earthquakes, with detection rates on
average 279 earthquakes/day, and a magnitude-of-completion of ~M1.8. A subset
of detections are identified as sources related to quarry and industrial site
activity, and we also detect thousands of foreshocks and aftershocks of the
April 1, 2014 Mw 8.2 Iquique earthquake. During the highest rates of aftershock
activity, > 600 earthquakes/day are detected in the vicinity of the Iquique
earthquake rupture zone
Arriving on time: estimating travel time distributions on large-scale road networks
Most optimal routing problems focus on minimizing travel time or distance
traveled. Oftentimes, a more useful objective is to maximize the probability of
on-time arrival, which requires statistical distributions of travel times,
rather than just mean values. We propose a method to estimate travel time
distributions on large-scale road networks, using probe vehicle data collected
from GPS. We present a framework that works with large input of data, and
scales linearly with the size of the network. Leveraging the planar topology of
the graph, the method computes efficiently the time correlations between
neighboring streets. First, raw probe vehicle traces are compressed into pairs
of travel times and number of stops for each traversed road segment using a
`stop-and-go' algorithm developed for this work. The compressed data is then
used as input for training a path travel time model, which couples a Markov
model along with a Gaussian Markov random field. Finally, scalable inference
algorithms are developed for obtaining path travel time distributions from the
composite MM-GMRF model. We illustrate the accuracy and scalability of our
model on a 505,000 road link network spanning the San Francisco Bay Area
- …