Thursday, April 8, 2010

Entroplet / Species Area Relationships

Ecologists use a concept called Species Area Relationships (SAR) to figure out diversity and richness in a plot of land. The field ecologist parcels up a few hectares of land into plots and start cumulatively counting the number of unique species that appear on each subarea of land.

Lots of research gets produced on this topic primarily because everyone wants to figure out a brilliant technique for deriving SAR. I do it here because it shows similarity to the oil discovery problem. Interesting that after working out some simple probability equations describing the problem, I can came up with a very simple solution.

The species diversity follows the entroplet formulation for the probability density function:

D(x) = d/(d+x)2
where x gives the relative abundance of a species and D(x) is the probability density function for this abundance. This shows many species of low relative abundance and progressively fewer species of high abundance.

The cumulative for finding any particular species within an area=A follows according to :

C(A=Area|x) = x*k*A /(1+x*k*A)

Arguably we could use a Poisson distribution (1-exp-xkA) but as we have little info on the boundaries of the area and the searching effectiveness, this formulation works well. Higher abundances will get accumulated more quickly, with the value k providing a cross-sectional efficiency that accommodates absolute densities.

Integrating this over all the species and their individual relative abundances:

Integral ( C(A|x) * D(x) ) dx =
A*k*d/(1-A*k*d)2 * (A*k*d - 1 - ln(A*k*d))

The value k*d mixes the entropic dispersion factor (d) with the cross-sectional capture efficiency (k), so we replace it with the combined density parameter P. For a specific plot of land, N gives the maximum count of species.
S (A) = N * A*P/(1-A*P)2 * (A*P - 1 - ln(A*P))

Species Area Curve for Tropical Forest Plots



The figure plots the tree Species Area curves for 5 different tropical forested island or nature reserve regions. Note that each one of these curves has the single parameter P that describes the dispersion/areal density cross-section for a specific region's diversity. The value of N comes from the data table below. The black curves are data and the markers are my solution (the red region is the classic power-law region that has historically been used to describe the SAR)

PN
Lambir, Malaysia1.11174
Barro Colorado Island, Panama2.55300
Huai Khae Khaeng, Thailand1.15231
Mudumalai, India0.971
Pasoh, Malaysia1.5817

In this solution, I used the exact same mathematics that I derived for Dispersive Aggregation and Discovery of oil (per volume) as for counting trees on plots of land (per area). Don't forget that I plotted it as a log-log plot so that the agreement ranges over several orders of magnitude.

Update:
In my TOD post, Dispersion, Diversity, and Resilience, I have data for Pasoh and BCI on values of d derived from the Relative Abundance Distributions (RAD). RAD is a more fundamental measure than SAR, so we can cull out the dispersion factor nicely.

BCI: d = 23
Pasoh: d = 14

then from P=k*d, I get k=0.11 for each case, which means that the SAR model is entirely consistent with the RAD model for BCI and Pasoh.

This seems an incredibly important insight, if not for ecological modeling, then for the entire mathematical approach I am advocating, covering everything from oil discovery to earthquake statistics and on. The SAR and RAD models are simply subcategories of the entroplet approach and a simple application of the idea.


Refs:
  1. Predicting species diversity in tropical forests
  2. Biodiversity shapes tree species aggregations in tropical forests