Thursday, October 22, 2009

Verifying Dispersion in Human Mobility

Another article in Nature supports the model I put together for human mobility patterns
http://mobjectivist.blogspot.com/2009/10/scaling-laws-of-human-travel.html

My model seems to match the observed trends even more precisely and further reinforces the fundamental idea of entropic dispersion of travel velocities. Instead of using a paper money tracking system as in the previous Brockmann article, the authors (Gonzalez et al) used public cell-phone calling records -- this seems to perform more directly rather than the indirect mechanism of proxy records of bill tracking to monitor human mobility.

Given that money is carried by individuals, bank note dispersal is a proxy for human movement, suggesting that human trajectories are best modeled as a continuous time random walk with fat tailed displacements and waiting time distributions.
...
Contrary to bank notes, mobile phones are carried by the same individual during his/her daily routine, offering the best proxy to capture individual human trajectories.
...
Individuals display significant regularity, as they return to a few highly frequented locations, like home or work. This regularity does not apply to the bank notes: a bill always follows the trajectory of its current owner, i.e. dollar bills diffuse, but humans do not.
Even though the proxy records give the same general fat-tail trends, the essential problem with the bank note process remains the transaction process. The very likely possibility exists that a dollar bill exchanges hands among three unique individuals at a minimum between reporting instances, yet the cell-phone records an individual at more randomized and therefore less deterministic intervals.

So I don't expect the average rates of travel to necessarily agree between the two data-sets.

The following fit uses the same model as I used previously, with data sampled at 1 week intervals. Notice that the data fits the Maximum Entropy Dispersion (the green curve) even better than bill tracking at 10 day intervals.
dP/dr = beta*t/(beta*t+r)^2
The value of beta for this data set is 0.36 instead of 1 for the bill dispersion data set. I placed a cutoff on the dispersion by preventing a smearing into faster rates of 400 km/day and above, but this seems fairly reasonable as the model otherwise works over 5 orders of magnitude. It actually works so well that it detects a probability offset in the original data calibration; the probability PDF should sum to one over the entire interval yet the Gonzalez data exhibits a bias as it creeps up slightly over the normalized curve. This is real as their own heuristic function (when I took the time to plot it) also shows this bias.

Another figure that Gonzalez plots mines the data according to a different sampling process, yet the general trend remains.


Moreover, the researchers almost got the Maximum Entropy dispersion function right by doing a blind curve fit, but ultimately could not explain it. Instead of the predicted power-law exponent of -2, they use -1.75. Yet since they do not normalize their curve correctly with the beta parameter that a probability distribution requires, they think this value of -1.75 holds some significance. Instead the -1.75 power law is likely an erroneous fit and -2 works better -- while not violating Occam's law.

Entropy always wins out on these phenomena and it really tells us how (in the sense of having no additional information, i.e. a Jaynesian model of entropy) people will statistically use different forms of transportation. The smearing occurs over such a wide range because people will walk, bicycle, residentially drive, freeway drive, or take air transportation. The entropy of all these different velocities serves to generate the dispersion curve that we empirically observe. The fact that it takes such little effort to show this with a basic probability model truly demonstrates how universal the model remains. The bottom line is that a single parameter indicating an average value of dispersive velocity is able to map over several orders of magnitide; only a second-order correction having as much to do with the constrained physical breadth of the USA and how fast people can ultimately travel in a short period of time prevents a complete "single parameter" model fit.

Ultimately, what I find interesting is how the researchers in the field seem to flail about trying to explain the data with non-intuitive heuristics and obscure random walk models. Gonzalez at al have gotten tantalizingly close to coming up with a good interpretation, much closer than Brockmann in fact, yet they did not quite make the connection. If I could tell them, I would hint that their random walk is random but the randomness itself is not randomized. That explains so many phenomena yet they can't quite grasp it.


References:
  1. Understanding individual human mobility patterns, Marta C. González, César A. Hidalgo & Albert-László Barabási, Nature 453, 779-782 (5 June 2008)

There is an Addendum (12 March 2009) associated with this document.