It is generally recognized that the traffic generated by an individual
connected to a network acts as his biometric signature. Several tools exploit
this fact to fingerprint and monitor users. Often, though, these tools assume
to access the entire traffic, including IP addresses and payloads. This is not
feasible on the grounds that both performance and privacy would be negatively
affected. In reality, most ISPs convert user traffic into NetFlow records for a
concise representation that does not include, for instance, any payloads. More
importantly, large and distributed networks are usually NAT'd, thus a few IP
addresses may be associated to thousands of users. We devised a new
fingerprinting framework that overcomes these hurdles. Our system is able to
analyze a huge amount of network traffic represented as NetFlows, with the
intent to track people. It does so by accurately inferring when users are
connected to the network and which IP addresses they are using, even though
thousands of users are hidden behind NAT. Our prototype implementation was
deployed and tested within an existing large metropolitan WiFi network serving
about 200,000 users, with an average load of more than 1,000 users
simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned
out to be very effective, with an accuracy greater than 90%. We also devised
new tools and refined existing ones that may be applied to other contexts
related to NetFlow analysis