Monday, August 18, 2008

Pipes and the Oil Shock Model

No one outside of engineering seems to understand the usefulness of mathematical convolution. To me, it seems a natural operation, as obvious as other more well known statistical operators such as auto-correlation. Yet, you can't find convolution listed in the @function list of a spreadsheet program such as M$ Excel. I have an interest in this because convolution remains at the heart of the Oil Shock Model.

The shock model essentially expresses an oil production curve as a series of temporal convolution operators acting on an initial oil discovery stimulus profile. Each convolution corresponds to a phase of the lifetime of an average discovery, expressed probabilistically to account for the spread in potential outcomes. So that the original discovery profile gets convolved initially by a damped exponential representing the range in fallow times. This output gets followed by a convolution of another profile representing the range in construction times. In turn, the output of this gets convolved into another range of maturation times and then into the final extraction rate profile.

I had previously posted a full program that internally generated the entire set of convolutions, but I finally relented and decided to simplify the concept and use the idea of data flow and short scripts to perhaps make the concept more accessible (and potentially more flexible). Via UNIX or a Windows command shell, one can use the operating system ideas of pipes and the standard input/output streams to generate a very simple instantiation of the Oil Shock Model in terms of a short script and data files.

The pipes in the title of this post offer both an abstraction as to what occurs in the physical world as well as a useful mathematical abstraction to enable us to solve the shock model.

A typical invocation would look something like this:
cat discover.dat | conv fallow.dat | conv cons.dat | conv mature.dat | conv extract.dat
Each one of the phases uses the standard output of the previous phase as a standard input via the operating system pipe command signified by the vertical bar "|". The term conv refers to a shell script or executable that takes the data piped into it and convolves it with data from a file given by the script's command line argument. The initial cat call reads from the discovery data file and pushes it into the first pipe.

In terms of a popular scripting language such as Ruby, the conv script looks like this:
# conf.rb: 
# convolution of stdio input array against array from file

def conv(a, b)
length = a.length + b.length
for i in 0..length do
sum = 0.0
i.downto(0) do |j|
sum += a[i-j].to_f * b[j].to_f
end
puts sum
end
end

conv(STDIN.readlines, IO.readlines(ARGV[0]))
The last line essentially calls the defined convolution function with two arrays generated automatically by using Ruby's readlines file parsing call. So that for each line in the file, representing a year's worth of data, an array is generated both for the standard input data stream, as well as the command line file's data. In other words, "a=the input data" and "b=the convolution data".

Operationally, to call this file using the Ruby interpreter, one has to invoke it with something akin to "ruby conv.rb file.dat". And the data in each of the profiles has to contain enough entries to cover the range of years that you desire to generate a production profile for. The convolution function takes care of the ranges automatically (i.e. the full convolution generates a range that covers the sum of the individual time ranges).

A typical data file for something with a 10 year damped normalized exponential profile would look like:
0.1
0.09048
0.08187
0.07408
....
The ... would go on for N number of lines corresponding to N years. Of course, the data files themselves we can easily generate through other tiny scripts. The UNIX shell has a command called the back-tick "`" which when invoked within the command-line can generate a script call in-place. This means that we have many convenient ways, including providing a lambda function to Ruby itself, to generate the data profiles needed by the convolution operator.

In general, I find nothing really complicated about the convolution operation and find it really a shame that we don't have this functionality built into the set of standard spreadsheet operators. So even though this alternative pipe approach looks incredibly simple in principle, enough people stay away from the command line that it will never achieve the popularity of a spreadsheet formulation. Something like Matlab of course has the convolution operator built-in but it costs big coin and caters to the engineering crowd (of course). Alas, for the moment we will have to satisfy ourselves with pipes.

Update:
I should have added the Shock function to the list of scripts. It essentially acts the same as a convolution, but since it relies on perturbations to a Markov (memoryless) process, we can only add it to the end of the sequence of convolutions. The file it works with contains a list of fractional extraction rates (acting proportionally to the current reserve) matched to the years since the first discovery occurred. For a stationary process, these rates stay relatively constant from year-to-year, but due to the possibility of global events, these rates can suddenly change, leading to sudden blips and dips in yearly production numbers.
The Shock script:

# shock.rb:
# Markov extraction of stdio data using perturbed rates from file

def shock(a, b)
length = b.length
temp = 0.0
for i in 0..length do
output = (a[i].to_f + temp) * b[i].to_f
temp = (a[i].to_f + temp) * (1.0 - b[i].to_f)
puts output
end
end

shock(STDIN.readlines, IO.readlines(ARGV[0]))

The extraction rate file would look like this:
0.1
0.1
0.1
0.12
0.1
....
The fourth entry shows a 20% upward bump in the extraction rate. The complete shock model invocation would thus look like this:
cat discover.dat | conv fallow.dat | conv cons.dat | conv mature.dat | shock rate.dat


Update 2:
The following script takes the standard input and applies a constant extraction rate to the accumulated reserve data. Notice how the convolution simplifies given a Markov approximation.
The un-Shocked script:

# Markov of input array against arg value

def exp(a, b)
length = a.length
temp = 0.0
for i in 0..length do
output = (a[i].to_f + temp) * b
temp = (a[i].to_f + temp) * (1.0 - b)
puts output
end
end

exp(STDIN.readlines, ARGV[0].to_f)
This becomes a "shock"-less model and gets invoked as "ruby exp.rb 0.1", where the argument becomes a single floating-point value instead of a file. The extraction extends for as long as the input data sustains itself, which means that you need to extrapolate the input data if you want to better extrapolate into the future. I suggest this as a useful technique for every one of the scripts.

All of these scripts generate a granularity of only one year so don't expect great results if the rates have time constants that get too close to one year. I would suggest that switching over to a smaller granularity than a year in this case; you just have to remember that the resultant output data will have this same granularity.