19,547 research outputs found

    Supersampling and network reconstruction of urban mobility

    Get PDF
    Understanding human mobility is of vital importance for urban planning, epidemiology, and many other fields that aim to draw policies from the activities of humans in space. Despite recent availability of large scale data sets related to human mobility such as GPS traces, mobile phone data, etc., it is still true that such data sets represent a subsample of the population of interest, and then might give an incomplete picture of the entire population in question. Notwithstanding the abundant usage of such inherently limited data sets, the impact of sampling biases on mobility patterns is unclear -- we do not have methods available to reliably infer mobility information from a limited data set. Here, we investigate the effects of sampling using a data set of millions of taxi movements in New York City. On the one hand, we show that mobility patterns are highly stable once an appropriate simple rescaling is applied to the data, implying negligible loss of information due to subsampling over long time scales. On the other hand, contrasting an appropriate null model on the weighted network of vehicle flows reveals distinctive features which need to be accounted for. Accordingly, we formulate a "supersampling" methodology which allows us to reliably extrapolate mobility data from a reduced sample and propose a number of network-based metrics to reliably assess its quality (and that of other human mobility models). Our approach provides a well founded way to exploit temporal patterns to save effort in recording mobility data, and opens the possibility to scale up data from limited records when information on the full system is needed.Comment: 14 pages, 4 figure

    Finding Rumor Sources on Random Trees

    Get PDF
    We consider the problem of detecting the source of a rumor which has spread in a network using only observations about which set of nodes are infected with the rumor and with no information as to \emph{when} these nodes became infected. In a recent work \citep{ref:rc} this rumor source detection problem was introduced and studied. The authors proposed the graph score function {\em rumor centrality} as an estimator for detecting the source. They establish it to be the maximum likelihood estimator with respect to the popular Susceptible Infected (SI) model with exponential spreading times for regular trees. They showed that as the size of the infected graph increases, for a path graph (2-regular tree), the probability of source detection goes to 00 while for dd-regular trees with d≥3d \geq 3 the probability of detection, say αd\alpha_d, remains bounded away from 00 and is less than 1/21/2. However, their results stop short of providing insights for the performance of the rumor centrality estimator in more general settings such as irregular trees or the SI model with non-exponential spreading times. This paper overcomes this limitation and establishes the effectiveness of rumor centrality for source detection for generic random trees and the SI model with a generic spreading time distribution. The key result is an interesting connection between a continuous time branching process and the effectiveness of rumor centrality. Through this, it is possible to quantify the detection probability precisely. As a consequence, we recover all previous results as a special case and obtain a variety of novel results including the {\em universality} of rumor centrality in the context of tree-like graphs and the SI model with a generic spreading time distribution.Comment: 38 pages, 6 figure

    Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization

    Full text link
    Principal component analysis (PCA) is widely used for dimensionality reduction, with well-documented merits in various applications involving high-dimensional data, including computer vision, preference measurement, and bioinformatics. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify PCA against outliers. A least-trimmed squares estimator of a low-rank bilinear factor analysis model is shown closely related to that obtained from an â„“0\ell_0-(pseudo)norm-regularized criterion encouraging sparsity in a matrix explicitly modeling the outliers. This connection suggests robust PCA schemes based on convex relaxation, which lead naturally to a family of robust estimators encompassing Huber's optimal M-class as a special case. Outliers are identified by tuning a regularization parameter, which amounts to controlling sparsity of the outlier matrix along the whole robustification path of (group) least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its neat ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace robustly, as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.Comment: 30 pages, submitted to IEEE Transactions on Signal Processin

    How much data is enough to track tourists? The tradeoff between data granularity and storage costs

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceIn the increasingly technology-dependent world, data is one of the key strategic resources for organizations. Often, the challenge that many decision-makers face is to determine which data and how much to collect, and what needs to be kept in their data storage. The challenge is to preserve enough information to inform decisions but doing so without overly high costs of storage and data processing cost. In this thesis, this challenge is studied in the context of a collection of mobile signaling data for studying tourists’ behavioral patterns. Given the number of mobile phones in use, and frequency of their interaction with network infrastructure and location reporting, mobile data sets represent a rich source of information for mobility studies. The objective of this research is to analyze to what extent can individual trajectories be reconstructed if only a fraction of the original location data is preserved, providing insights about the tradeoff between the volume of data available and the accuracy of reconstructed paths. To achieve this, a signaling data of 277,093 anonymized foreign travelers is sampled with different sampling rates, and the full trajectories are reconstructed, using the last seen, linear, and cubic interpolations completion methods. The results of the comparison are discussed from the perspective of data management and implications on the research, especially the results of research with lower time-density mobile phone data

    Using mobile phone data to map evacuation and displacement: a case study of the central Italy earthquake

    Get PDF
    Population displacement is one of the most common consequences of disasters, and it can profoundly affect communities and territories. However, gaining an accurate measure of the size of displacement in the days and weeks following a major disaster can be extremely difficult. This study uses aggregated Call Detail Records as an inexpensive and efficient technique to measure post-disaster displacement in four Italian regions affected by repeated earthquakes in 2016-2017. By comparing post-disaster mobile phone count with a forecast computed before the earthquake hit, we can compute an index of change in the presence of mobile phones (MPE). This measure, obtained thanks to advanced analytical techniques, provides a reliable indication of the effect of the earthquake in terms of immediate and medium-term displacement. We test this measure against census data and in combination with other datasets. Looking into available data on economic activities and requests for financial support to rebuild damaged buildings, we can explain MPE and identify significant factors affecting population displacement. It is possible to apply this innovative methodology to other disaster scenarios and use it by policymakers who want to understand the determinants of population displacement

    Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

    Get PDF
    The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1
    • …
    corecore