Sunday, February 28, 2010

Quaking

What causes the relative magnitude distribution in earthquakes?
In other words, why do we measure many more small earthquakes than large ones? And why do the really large ones happen only occasionally enough to classify as Mandelbrotian gray swans?

Of course physicists want to ascribe it to the properties of what they call "critical phenomena". Read this assertion that makes the claim for a universal model of earthquakes:
Unified Scaling Laws for Earthquakes (2002)
Because only critical phenomena exhibit scaling laws, this result supports the hypothesis that earthquakes are self-organized critical (SOC) phenomena (6–11).
I consider that a strong assertion because you can also read it as if scaling laws would never apply to a noncritical phenomenon.

In just a few steps I will show how garden-variety disorder will accomplish the same thing. First the premises
  1. Stress (a force) causes a rupture in the Earth resulting in an earthquake.
  2. Strain (a displacement) within the crust results from the jostling between moving plates which the earthquake relieves.
  3. Strain builds up gradually over long time periods by the relentlessness of stress.

This build-up can occur over various time spans. We don't know the average time span, although one must exist, so we declare it as a dispersive Maximum Entropy probability distribution around tau
p(t) = 1/tau * exp(-t/tau)
Next the cumulative probability of achieving a strain (x) in time T is
P(x, v | T) = integral of p(t) for all t such that t is less than x/v
The term x/v acts as an abstraction to indicate that x changes linearly over time at some velocity v. This results in the conditional cumulative probability:
P(x, v | T) = 1- exp (-x/(vT))
At some point, the value of x reaches a threshold where the accumulated strain caused by stress breaks down (similar to the breakdown of a component's reliability, see adjacent figure) . We don't know this value either (but we know an average exists, which we call X) so by the Maximum Entropy Principle, we integrate this over a range of x.
P(v | T, X) = integral of P(x,v|T) * p(x) over all x where p(x) = 1/X*exp(-x/X)
This results in:
P(v | T, X) = X /(v*T+X)
So we have an expression that has two unknown constants given by X and T and one variate given by a velocity v (i.e. the stress). Yet, since the displacement x grows proportionally as v*T, then rewrite this as
P(x | T, X) = X /(X+x)
This gives the cumulative distribution of strains leading to an earthquake. The derivative of this cumulative is the density function which has the power-law exponent 2 for large x.

If we plot this as a best fit to California earthquakes in the article referenced above, we get the following curve, with the green curve showing the entropic dispersion:


This becomes another success in applying entroplets to understanding disordered phenomena. Displacements of faults in the vertical direction contribute to a potential energy that eventually will release. All the stored potential energy gets released in proportion to the seismic moment. The magnitude measured follows a 2/3 power law since seismometers only measure deflections and not energy. The competing mechanisms of a slow growth in the strain with an entropic dispersion in growth rates, and an entropic (or narrower) distribution of points where the fault will give way.

The result leads to the inverse power-law beyond the knee and the near perfect fit to the data. So we have an example of a scaling law that arises from a non-critical phenomenon.




Research physicists have this impulsive behavior of having to discover a new revolutionary law instead of settling for the simple parsimonious explanation. Do these guys really want us to believe that a self-organized critical phenomena associated with something akin to a phase transition causes earthquakes?

I can't really say, but it seems to me that explaining things away as arising simply from elementary considerations of randomness and disorder within the Earth's heterogeneous crust and upper mantle won't win any Nobel prizes.

The originator of self-organized criticality apparently had a streak of arrogance:
A sample of Prof. Bak's statements at conferences: After a young and hopeful researcher had presented his recent work, Prof. Bak stood up and almost screamed: "Perhaps I'm the only crazy person in here, but I understand zero - I mean ZERO - of what you said!". Another young scholar was met with the gratifying question: "Excuse me, but what is actually non-trivial about what you did?"
Was it possible that other physicists quaked in their boots at the prospect of ridicule for proposing the rather obvious?

Saturday, February 27, 2010

Entroplets

I have a new full article on TheOilDrum.com:
Dispersion, Diversity, and Resilience

It introduces the concept of entroplets which tries to unify many of the concepts I have worked on the last few years.

entroplet
(Click on the above to animate, its about 1MB in size)

Tuesday, February 16, 2010

Project delays and Extremistan

As a follow-up to the post on project bottlenecks, I had forgotten about this quote from Taleb's The Black Swan:
With human projects and ventures we have another story. These are often scalable, as I said in Chapter 3. With scalable variables, the ones from Extremistan, you will witness the exact opposite effect. Let's say a project is expected to terminate in 79 days, the same expectation in days as the newborn female has in years. On the 79th day, if the project is not finished, it will be expected to take another 25 days to complete. But on the 90th day, if the project is still not completed, it should have about 58 days to go. On the 100th, it should have 89 days to go. On the 119th, it should have an extra 149 days. On day 600, if the project is not done, you will be expected to need an extra 1,590 days. As you see, the longer you wait, the longer you will be expected to wait.
In the context of the staged process I described, I can definitely see this happening. As each stage misses its cut-off date, the latencies keep building up. The way Bayes rule works, the earlier probabilities get discounted as you reach a new milestone, and the weight of the future fat-tail data factors more prominently in the new estimate.

Since the same dispersion math figures into reserve growth and oil creaming curves, one can see how the rule of diminishing returns on long-term payouts plays into resource depletion as well. Once a reservoir reaches a certain level of depletion and the creaming curve starts flattening out, the estimated time to reach that same level of relative depletion will keep growing. The operator has no choice and makes a decision to shut-in the reservoir at that point. Again, since project scheduling and oil reservoir estimates possess the same level of uncertainties in early projections, it makes sense that no one really understands exactly how long these operations will take to play out. Thus, you see businesses take a "wait and see" attitude on when to shut down.

As expected business schools have actively researched the scheduling problem. This paper by Francesca Gino and Gary Pisano, "Toward a Theory of Behavioral Operations", describes most of the issues that crop up. They start with what I referred to as the 80/20 rule (they call it the "90% syndrome"). Helpfully, they do make the larger point of discussing the essential uncertainty in the process (why I rely on the Maximum Entropy Principle).
Why do product development projects always run late?
The first anomaly has to do with product development project performance. One of the most vexing management problems in product development is the tendency for projects to run late and over-budget. For instance, Ford and Sterman (2003a, 2003b) use the term “90% syndrome” to describe “a common concurrent development problem in which a project reaches about 90% completion according to the original schedule but then stalls, finally finishing about twice [the time] the original project duration has elapsed” (Ford and Sterman 2003b, p. 212). The phenomena has been documented in numerous case studies and examples (e.g. Cohen, Eliashberg and Ho, 1996) and widely discussed in the product development literature (e.g. Wheelwright and Clark, 1992).

Several studies in the OM literature have investigated the issue of improving timeliness and predictability in product development. Many prescriptions and formulas have been offered (see, for instance, Crawford, 1992; Gold, 1987). Most of this work has explained lateness in product development with the uncertainty inherent in development processes. At any of the stages of R&D, indeed, product development managers are asked to evaluate the available information about currently-under-development products and based on such information make decisions about whether or not to continue development. As many case studies have shown (e.g., Gino and Pisano, 2006a; 2006b), product development decisions are made under uncertainty, they are far from perfect and are affected by behavioral factors (such as sunk costs, escalation of commitment or even emotions).

To deal with uncertainty, OM scholars have developed tools and suggested approaches to better manage product development. These suggestions include the use of experimentation to resolve uncertainty early in the development process (e.g., Thomke, 2003), the use of parallel versus sequential sequencing of product development tasks (e.g., Loch, Terwiesch and Thomke, 2001), the use of cross functional or heavy-weight teams (Clark and Fujimoto, 1991) or the utilization of standardized project management tools like 3-point estimation techniques. In sum, OM research has tackled the lateness problem in product development by suggesting better tools, processes or organizational structures.

One of the puzzles behind the lateness problem, which should make one suspect of the uncertainty explanation, is that project duration is not symmetrically distributed. If true uncertainty were at work, then in fact, we would expect to see (and hear about) projects which were both late and early. The asymmetric nature of duration suggests other factors may be at work. Furthermore, case study research suggests that behavioral factors may underpin the lateness problem (see e.g. Gino and Pisano 2006a, Pisano et al 2005) Extant behavioral literature provides a rich trove of possible underlying explanations for project lateness, including the planning fallacy (e.g., Buehler, Griffin, and Ross, 1994; 2002), wishful thinking (e.g., Babad, 1987; Babad and Katz, 1991), and overconfidence bias (Einhorn and Hogarth, 1978; Fischhoff, Slovic and Lichtenstein, 1977; Oskamp, 1962; 1965).
Good points in mentioning the role of concurrency in reducing bottlenecks, but amazingly they see asymmetry and attribute that to something other than uncertainty. They evidently don't do math and thus haven't discovered the maximum entropy principle on productivity rates, and ultimately rely on their wrong intuition.

Monday, February 8, 2010

Why we can't finish stuff: Bottlenecks and Pareto

Why does the law-making process take much longer to finish than expected? Why does a project schedule consistently overrun? Why does it take so long to write a book?

In any individual case, you can often pinpoint the specific reason for an unwanted delay. But you will find (not surprisingly) that a single reason won't universally explain all delays across a population. And we don't seem to learn too quickly. For example, if we did learn what we did wrong in one case, ideally we should no longer do such a poor job the next time we tried to get something done. Yet, inevitably we will face a different set of bottlenecks in the process next time we try. I will demonstrate that these invariably derive from a basic uncertainty in our estimates of the rate it takes to do some task.

The explanation that follows belongs under the category of fat-tail and gray swan statistics. As far as I can tell, no one has treated the analysis quite in this fashion, even though it comes from some very basic concepts. In my own opinion, this has huge implications for how we look at bottom-line human productivity and how effectively we can manage uncertainty.

Why we can't finish stuff: Bottlenecks and Pareto

The classical (old-fashioned) development cycle follows a sequential process. In the bureaucratic sense it starts from a customer's specification of a desired product. Thereafter, the basic stages include requirements decomposition, preliminary design, detailed design, development, integration, and test. Iterations and spirals can exist on individual cycles, but the general staging remains -- the completion of each stage creates a natural succession to the next stage, thus leading to an overall sequential process. For now, I won't suggest either that this works well or that it falls flat. This just describes the way things typically get done under some managed regiment. You find this sequence in many projects and it gets a very thorough analysis by Eggers and O'Leary in their book "If We Can Put a Man on the Moon ... Getting Big Things Done in Government".


Figure 1: Eggers and O'Leary's project roadmap consists of five stages.
The stage called Stargate refers to the transition between virtual and real.

If done methodically, one can't really find too much to criticize about this process. It fosters a thorough approach to information gathering and careful review of the system design as it gains momentum. Given that the cycle depends on planning up front, someone needs to critically estimate each stage's duration and lay these into the project's master schedule. Only then can the project's team leader generate a bottom-line estimate for the final product delivery. Unfortunately, for any project of some size or complexity, the timed development cycle routinely overshoots the original estimate by a significant amount. Many of us have ideas of why this happens, and Eggers and O'Leary describes some of the specific problems, but no one has put a quantitative spin on the analysis.

The premise of this analysis is to put the cycle into a probabilistic perspective. We thus interpret each stage in the journey in stochastic terms and see exactly how it evolves. We don't have to know exactly the reasons for delays, just that we have uncertainty in the range of the delays.

Uncertainty in Rates

In Figure 1, we assume that we don't have extensive amounts of knowledge about how long a specific stage will take to complete. At the very least, we can estimate an average time for each stage's length. If only this average gets considered, then we can estimate the aggregated duration as the sum of the individual stages. That becomes the bottom-line number that management and the customer will have a deep interest in, but it does not tell the entire story. In fact, we need to instead consider a range around the average, and more importantly, we have to pick the right measure (or metric) to take as the average. As the initial premise, let's consider building the analysis around these two points:
  1. We have limited knowledge of the spread around the average.
  2. Use something other than time as the correct metric to evaluate.
I suggest we should use speed or rate as a metric instead of using time as the probability density function (PDF) for each stage.

But what happens if we lack an estimate for a realistic range in rates? Development projects do not share the same relative predictability of marathons, and so we have to deal with the added uncertainty of project completion. We do this by deriving from the principle of maximum entropy (MaxEnt). This postulates the idea that if we only have knowledge about some constraint (say an average value) then the most likely distribution of values within the constraint is the distribution that maximizes the entropy. This amounts to the same thing as maximizing uncertainty and it works out as a completely non-speculative procedure in that we can introduce as preconditions only the information that we know. Fortunately, the maximum entropy for a distribution parametrized solely by a single mean value is well known; I have used that same principle elsewhere to solve some sticky physics problems.

If we string several of these PDF's together to emulate a real staged process, then we can understand how the rate-based spread can cause the estimates to diverge from the expected value. In the following graphs, we chain 5 stochastic stages together (see Figure 1 for the schematic) with each stage parameterized by the average value T = \tau =10 time units. The time-based is simply a Gamma distribution of order 5. This has a mean of 5*10 and a mode (peak value) of (5-1)*10. Thus, the expected value is very close to the sum of the individual expected stage times. However, the staging of the rate-based distributions does not sharpen much (if any), and the majority of the completed efforts extend well beyond the expected values. This turns the highly predictable critical path into a routinely severely bottlenecked process.


Figure 2: Convolution of five PDF's assuming a mean time and mean rate

The time integration of the PDF's gives the cumulative distribution function (CDF); this becomes equivalent to the MSCP for the project depending on the scheduled completion date. You can see that the time-based estimate has a more narrow envelope and reaches a 60% success rate for meeting the original scheduled goal of 50 time units. On the other hand, the rate-based model has a comparatively very poor MSCP, achieving only a 9% success rate for the original goal of 50 time units. By the same token, it will take 150 time units to have the same 60% confidence level that we had for the time-based model, about 3 times as long as what we desired.

The reason for the divergence has to do with the fat-tail power laws in the PDF of the rate-based curves. Our original single stage cumulative success probability clearly diverges as we add more stages, and it will just get worse as we add more.

Bad News for Management

This gives us an explanation of why scheduled projects never meet their deadlines. Now, what do we do about it?

Not much. We can either (1) get our act together and remove all uncertainties and perhaps obtain funding for a zipper manufacturing plant, or (2) we can try to do tasks in parallel.

For case 1, we do our best to avoid piling on extra stages in the sequential process unless we can characterize each stage to the utmost degree. The successful chip companies in Silicon Valley, with their emphasis on formal design and empirical material science processes, have the characterization nailed. They can schedule each stage with certainty and retain confidence that the resultant outcome stays within their confidence limits.

For case 2, let us take the example of writing a book as a set of chapters. We have all heard about the author who spends 10 years trying to complete a manuscript. In many of these cases, the author simply got stuck at a certain stage. This simply reflects the uncertainty in setting a baseline pages per day writing pace. If you want to write a long manuscript, slowing down your pace in writing will absolutely kill your ability to meet deadlines. You can get bursts of creativity where you can briefly maintain a good pace, but the long interludes of slow writing totally outweigh this effect. As a way around this, you can also try to write stream-of-consciousness, meet your deadlines, and leave it in that state when done. This explains why we see many bad movies as well. Works of art don't have to "work as advertised" as an exit criteria. On the other hand, a successful and prolific author will never let his output stall when he lacks motivation in writing a specific chapter. Instead, the author will jump around and multitask on various chapters to keep his productivity up.

I don't lay complete blame on just the variances in productivity, as the peculiar properties of rates also play a factor. A similar rate fate awaits you if you ever happen to do biking in hilly country. You would think that the fast pace in going down hills will counter-balance the slower pace in going up hills and you might think you could keep up the same rate as on flat terrain. Not even close, as the slow rates going up-hill absolutely slows you down in the long run. That results from the mathematical properties of working with rates. Say that you had a 10 mile uphill stretch followed by a 10 mile downhill stretch, both equally steep. If you could achieve a rate of 20 MPH on flat terrain, you would feel good if you could maintain 12 MPH going up the hill. Over 10 miles, this would take you 50 minutes to complete. But then you would have to go down the hill at 60 MPH (!) to match the ground covered on flat terrain over that same total distance. This might seem non-intuitive until you do the math and you realize how much slow rates can slow your overall time down.

But then you look at a more rigidly designed transportation system such as bus schedules. Even though a bus line consists of many segments, most buses routinely arrive on schedule. The piling on of stages actually improves the statistics, just as the Central Limit Theorem would predict. The scheduling turns out highly predictable because the schedulers understand the routine delays, such as traffic lights, and have the characteristics nailed.

On the other hand, software development efforts and other complex designs do not have the process nailed. At best, we can guess at programmer productivity (a rate-based metric, i.e. lines of code/day) with a high degree of uncertainty, and we wonder why we don't meet schedules. For software, we can use some of the same tricks as book-writing and so for example skip around the class hierarchy if you get stuck. But pesky debugging can really kill the progress as it effectively slows down a programmer's productivity.

The legislative process also has little by way of alternatives. Since most laws follow a sequential process, they become very prone to delays. Consider just the fact that no one can ascertain the potential for filibuster or the page count of the proposed bill itself. Actually reading the contents of a bill could add so much uncertainty to the stage that the estimate for completion never matches the actual time. No wonder that no legislator actually reads the bills that quickly get pushed through the system. We get marginal laws fool of loopholes as a result of this. Only a total reworking of the process into more concurrent activities will accelerate this process.

And then the Pareto Principle comes in

If we look again at Figure 2, you can see another implication of the fat over-run tail. Many software developers have heard of the 80/20 rule, aka the Pareto Principle, where 80% of the time gets spent on 20% of the overall scheduled effort. In the figure below, I have placed a few duration bars to show how the 80/20 rule manifests itself in rate-driven scheduling.

Because this curve shows probability, we need to consider the 80/20 law in probabilistic terms. For a single stage, 80% of the effort routinely completes in the predicted time, but the last 20% of the effort, depending on how you set the exit criteria, can easily consume over 80% of the time. Although this describes only a single stage of development and most people would ascribe the 80/20 rule to variations within the stage, a general class equivalence holds.

Figure 3: Normalizing time ratios, we on average spend less than 20% of our time on 80% of the phase effort, and at least 80% of our time on the rest of the effort. This is the famous Pareto Principle or the 80-20 rule known to management.

In terms of software, the long poles in the 80/20 tent relate to cases of prolonged debugging efforts, and the shorter durations to where a steady development pace occurs.

These rules essentially show the asymmetry in effort versus time and quantitatively depends on how far you extend the tail. A variation of the Pareto Principle describes the 90-9-1 rule.

  • 100 time units to get 90% done
  • 1000 time units to get the next 9% done
  • 10000 time units to get the “final” 1% done

In strict probability terms, nothing ever quite finishes and business decisions will determine the exit criteria. This truly only happens with fat-tail statistics and again it only gets worse as we add stages. The possibility exists that the stochastic uncertainty in these schedule estimates don't turn out as bad as I suggest. If only we can improve our production process to eliminate potentially slow productivity paths through the system, this analysis will become moot. That may indeed occur, but our model does describe the real development world as we currently practice it quite well. Empirically, the construction of complex projects often take five times as long as originally desired and projects do get canceled because of this delay. Bills don't get turned into laws, and the next great novel never gets finished.

The Bottom Line

Lengthy delays occur in this model entirely from applying an uncertainty to our estimates. In other words, without further information about the actual productivity rates we can ultimately obtain, an average rate becomes our best estimate. A rate derived application of the maximum entropy principle thus helps guide our intuition, and to best solve the problem, we need to characterize and understand the entropic nature of the fundamental process. For now, we can only harness the beast of entropy, we cannot tame it.

Will our society ever get in gear and accommodate change effectively? History tells us that just about everything happens in stages. Watch how long it will take the ideas that I present here to eventually get accepted. I will observe and report back in several years.