21,737 research outputs found

    The Role of Gender in Social Network Organization

    Get PDF
    The digital traces we leave behind when engaging with the modern world offer an interesting lens through which we study behavioral patterns as expression of gender. Although gender differentiation has been observed in a number of settings, the majority of studies focus on a single data stream in isolation. Here we use a dataset of high resolution data collected using mobile phones, as well as detailed questionnaires, to study gender differences in a large cohort. We consider mobility behavior and individual personality traits among a group of more than 800800 university students. We also investigate interactions among them expressed via person-to-person contacts, interactions on online social networks, and telecommunication. Thus, we are able to study the differences between male and female behavior captured through a multitude of channels for a single cohort. We find that while the two genders are similar in a number of aspects, there are robust deviations that include multiple facets of social interactions, suggesting the existence of inherent behavioral differences. Finally, we quantify how aspects of an individual's characteristics and social behavior reveals their gender by posing it as a classification problem. We ask: How well can we distinguish between male and female study participants based on behavior alone? Which behavioral features are most predictive

    Supersampling and network reconstruction of urban mobility

    Get PDF
    Understanding human mobility is of vital importance for urban planning, epidemiology, and many other fields that aim to draw policies from the activities of humans in space. Despite recent availability of large scale data sets related to human mobility such as GPS traces, mobile phone data, etc., it is still true that such data sets represent a subsample of the population of interest, and then might give an incomplete picture of the entire population in question. Notwithstanding the abundant usage of such inherently limited data sets, the impact of sampling biases on mobility patterns is unclear -- we do not have methods available to reliably infer mobility information from a limited data set. Here, we investigate the effects of sampling using a data set of millions of taxi movements in New York City. On the one hand, we show that mobility patterns are highly stable once an appropriate simple rescaling is applied to the data, implying negligible loss of information due to subsampling over long time scales. On the other hand, contrasting an appropriate null model on the weighted network of vehicle flows reveals distinctive features which need to be accounted for. Accordingly, we formulate a "supersampling" methodology which allows us to reliably extrapolate mobility data from a reduced sample and propose a number of network-based metrics to reliably assess its quality (and that of other human mobility models). Our approach provides a well founded way to exploit temporal patterns to save effort in recording mobility data, and opens the possibility to scale up data from limited records when information on the full system is needed.Comment: 14 pages, 4 figure

    Earthquake Arrival Association with Backprojection and Graph Theory

    Full text link
    The association of seismic wave arrivals with causative earthquakes becomes progressively more challenging as arrival detection methods become more sensitive, and particularly when earthquake rates are high. For instance, seismic waves arriving across a monitoring network from several sources may overlap in time, false arrivals may be detected, and some arrivals may be of unknown phase (e.g., P- or S-waves). We propose an automated method to associate arrivals with earthquake sources and obtain source locations applicable to such situations. To do so we use a pattern detection metric based on the principle of backprojection to reveal candidate sources, followed by graph-theory-based clustering and an integer linear optimization routine to associate arrivals with the minimum number of sources necessary to explain the data. This method solves for all sources and phase assignments simultaneously, rather than in a sequential greedy procedure as is common in other association routines. We demonstrate our method on both synthetic and real data from the Integrated Plate Boundary Observatory Chile (IPOC) seismic network of northern Chile. For the synthetic tests we report results for cases with varying complexity, including rates of 500 earthquakes/day and 500 false arrivals/station/day, for which we measure true positive detection accuracy of > 95%. For the real data we develop a new catalog between January 1, 2010 - December 31, 2017 containing 817,548 earthquakes, with detection rates on average 279 earthquakes/day, and a magnitude-of-completion of ~M1.8. A subset of detections are identified as sources related to quarry and industrial site activity, and we also detect thousands of foreshocks and aftershocks of the April 1, 2014 Mw 8.2 Iquique earthquake. During the highest rates of aftershock activity, > 600 earthquakes/day are detected in the vicinity of the Iquique earthquake rupture zone

    Arriving on time: estimating travel time distributions on large-scale road networks

    Full text link
    Most optimal routing problems focus on minimizing travel time or distance traveled. Oftentimes, a more useful objective is to maximize the probability of on-time arrival, which requires statistical distributions of travel times, rather than just mean values. We propose a method to estimate travel time distributions on large-scale road networks, using probe vehicle data collected from GPS. We present a framework that works with large input of data, and scales linearly with the size of the network. Leveraging the planar topology of the graph, the method computes efficiently the time correlations between neighboring streets. First, raw probe vehicle traces are compressed into pairs of travel times and number of stops for each traversed road segment using a `stop-and-go' algorithm developed for this work. The compressed data is then used as input for training a path travel time model, which couples a Markov model along with a Gaussian Markov random field. Finally, scalable inference algorithms are developed for obtaining path travel time distributions from the composite MM-GMRF model. We illustrate the accuracy and scalability of our model on a 505,000 road link network spanning the San Francisco Bay Area
    • …
    corecore