24,957 research outputs found
No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone
It is generally recognized that the traffic generated by an individual
connected to a network acts as his biometric signature. Several tools exploit
this fact to fingerprint and monitor users. Often, though, these tools assume
to access the entire traffic, including IP addresses and payloads. This is not
feasible on the grounds that both performance and privacy would be negatively
affected. In reality, most ISPs convert user traffic into NetFlow records for a
concise representation that does not include, for instance, any payloads. More
importantly, large and distributed networks are usually NAT'd, thus a few IP
addresses may be associated to thousands of users. We devised a new
fingerprinting framework that overcomes these hurdles. Our system is able to
analyze a huge amount of network traffic represented as NetFlows, with the
intent to track people. It does so by accurately inferring when users are
connected to the network and which IP addresses they are using, even though
thousands of users are hidden behind NAT. Our prototype implementation was
deployed and tested within an existing large metropolitan WiFi network serving
about 200,000 users, with an average load of more than 1,000 users
simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned
out to be very effective, with an accuracy greater than 90%. We also devised
new tools and refined existing ones that may be applied to other contexts
related to NetFlow analysis
Impact of the spatial context on human communication activity
Technology development produces terabytes of data generated by hu- man
activity in space and time. This enormous amount of data often called big data
becomes crucial for delivering new insights to decision makers. It contains
behavioral information on different types of human activity influenced by many
external factors such as geographic infor- mation and weather forecast. Early
recognition and prediction of those human behaviors are of great importance in
many societal applications like health-care, risk management and urban
planning, etc. In this pa- per, we investigate relevant geographical areas
based on their categories of human activities (i.e., working and shopping)
which identified from ge- ographic information (i.e., Openstreetmap). We use
spectral clustering followed by k-means clustering algorithm based on TF/IDF
cosine simi- larity metric. We evaluate the quality of those observed clusters
with the use of silhouette coefficients which are estimated based on the
similari- ties of the mobile communication activity temporal patterns. The area
clusters are further used to explain typical or exceptional communication
activities. We demonstrate the study using a real dataset containing 1 million
Call Detailed Records. This type of analysis and its application are important
for analyzing the dependency of human behaviors from the external factors and
hidden relationships and unknown correlations and other useful information that
can support decision-making.Comment: 12 pages, 11 figure
Creating Full Individual-level Location Timelines from Sparse Social Media Data
In many domain applications, a continuous timeline of human locations is
critical; for example for understanding possible locations where a disease may
spread, or the flow of traffic. While data sources such as GPS trackers or Call
Data Records are temporally-rich, they are expensive, often not publicly
available or garnered only in select locations, restricting their wide use.
Conversely, geo-located social media data are publicly and freely available,
but present challenges especially for full timeline inference due to their
sparse nature. We propose a stochastic framework, Intermediate Location
Computing (ILC) which uses prior knowledge about human mobility patterns to
predict every missing location from an individual's social media timeline. We
compare ILC with a state-of-the-art RNN baseline as well as methods that are
optimized for next-location prediction only. For three major cities, ILC
predicts the top 1 location for all missing locations in a timeline, at 1 and
2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all
compared methods). Specifically, ILC also outperforms the RNN in settings of
low data; both cases of very small number of users (under 50), as well as
settings with more users, but with sparser timelines. In general, the RNN model
needs a higher number of users to achieve the same performance as ILC. Overall,
this work illustrates the tradeoff between prior knowledge of heuristics and
more data, for an important societal problem of filling in entire timelines
using freely available, but sparse social media data.Comment: 10 pages, 8 figures, 2 table
Logical analysis of data as a tool for the analysis of probabilistic discrete choice behavior
Probabilistic Discrete Choice Models (PDCM) have been extensively used to interpret the behavior of heterogeneous decision makers that face discrete alternatives. The classification approach of Logical Analysis of Data (LAD) uses discrete optimization to generate patterns, which are logic formulas characterizing the different classes. Patterns can be seen as rules explaining the phenomenon under analysis. In this work we discuss how LAD can be used as the first phase of the specification of PDCM. Since in this task the number of patterns generated may be extremely large, and many of them may be nearly equivalent, additional processing is necessary to obtain practically meaningful information. Hence, we propose computationally viable techniques to obtain small sets of patterns that constitute meaningful representations of the phenomenon and allow to discover significant associations between subsets of explanatory variables and the output. We consider the complex socio-economic problem of the analysis of the utilization of the Internet in Italy, using real data gathered by the Italian National Institute of Statistics
A survey on Human Mobility and its applications
Human Mobility has attracted attentions from different fields of studies such
as epidemic modeling, traffic engineering, traffic prediction and urban
planning. In this survey we review major characteristics of human mobility
studies including from trajectory-based studies to studies using graph and
network theory. In trajectory-based studies statistical measures such as jump
length distribution and radius of gyration are analyzed in order to investigate
how people move in their daily life, and if it is possible to model this
individual movements and make prediction based on them. Using graph in mobility
studies, helps to investigate the dynamic behavior of the system, such as
diffusion and flow in the network and makes it easier to estimate how much one
part of the network influences another by using metrics like centrality
measures. We aim to study population flow in transportation networks using
mobility data to derive models and patterns, and to develop new applications in
predicting phenomena such as congestion. Human Mobility studies with the new
generation of mobility data provided by cellular phone networks, arise new
challenges such as data storing, data representation, data analysis and
computation complexity. A comparative review of different data types used in
current tools and applications of Human Mobility studies leads us to new
approaches for dealing with mentioned challenges
You never surf alone. Ubiquitous tracking of users' browsing habits
In the early age of the internet users enjoyed a large level of anonymity. At
the time web pages were just hypertext documents; almost no personalisation of
the user experience was o ered. The Web today has evolved as a world wide
distributed system following specific architectural paradigms. On the web now,
an enormous quantity of user generated data is shared and consumed by a network
of applications and services, reasoning upon users expressed preferences and
their social and physical connections. Advertising networks follow users'
browsing habits while they surf the web, continuously collecting their traces
and surfing patterns. We analyse how users tracking happens on the web by
measuring their online footprint and estimating how quickly advertising
networks are able to pro le users by their browsing habits
The Feasibility of Dynamically Granted Permissions: Aligning Mobile Privacy with User Preferences
Current smartphone operating systems regulate application permissions by
prompting users on an ask-on-first-use basis. Prior research has shown that
this method is ineffective because it fails to account for context: the
circumstances under which an application first requests access to data may be
vastly different than the circumstances under which it subsequently requests
access. We performed a longitudinal 131-person field study to analyze the
contextuality behind user privacy decisions to regulate access to sensitive
resources. We built a classifier to make privacy decisions on the user's behalf
by detecting when context has changed and, when necessary, inferring privacy
preferences based on the user's past decisions and behavior. Our goal is to
automatically grant appropriate resource requests without further user
intervention, deny inappropriate requests, and only prompt the user when the
system is uncertain of the user's preferences. We show that our approach can
accurately predict users' privacy decisions 96.8% of the time, which is a
four-fold reduction in error rate compared to current systems.Comment: 17 pages, 4 figure
- …