Soil Carbon Modeling with RStan

Overview

Experimental Data

Soil incubation experiments are used to estimate parameters of physical models of carbon flow in soil under different conditions. In the laboratory, external conditions such as temperature and moisture can be controlled, as can inputs such as leaf litter or microbial content and diversity. Measurements of carbon emissions are taken over time, perhaps along with other information such as microbial enzyme activity.

Soil incubation experiment in progress at the Max Planck Institute, Jena.

Physical Models

Most soil-carbon physical models approximate soil heterogeneity with discrete pools of carbon. For example, a two-compartment model posits a labile pool of carbon with readily available carbon and a recalcitrant pool from which carbon evolves more slowly. Systems may also posit additional pools for microbes, enzymes, and other physical features of soil ecology. Some of these pools may be measured. System dynamics are modeled with differential equations describing flows among the compartments.

Statistical Models and Inference

Statistical models are overlayed on the deterministic physical models to represent measurement error and unmodeled variation among replicated experiments. Bayesian statistical inference can be applied to estimate parameters based on combining experimental measurements with prior scientific knowledge of soil carbon dynamics or the experimental mechanism. Poststratification may then be used to predict behavior on regional or global scales based on information about biogeochemical conditions, which are typically estimated for geographical grids on the 10–100 kilometer square scale.

Pan-European soil organic carbon (SOC) stock of agricultural soils. Image from European Commission, Joint Research Centre, European Soil Portal.

Stan provides an expressive modeling language for statistical models including a full linear algebra and statistical library along with ordinary differential equation solvers. Stan also provides an efficient and scalable, fully Bayesian, inference mechanism based on adaptive Hamiltonian Monte Carlo sampling. Stan is integrated into the R statistical programming environment through the RStan interface.

This Document

The rest of this document describes

data from a soil incubation study of evolved CO₂,
a differential equation model of the process with two compartments and feedback,
an RStan implementation of the model, and
summaries of Bayesian parameter estimates (a.k.a., data integration, inverse analysis).

Data

Sierra and Müller (2014) performed a soil incubation experiment in which they took a series of 25 measurements over time of evolved CO₂. At each time point, multiple readings were taken from replicates and aggregated into a mean and standard deviation.

The data can be read in through Sierra and Müller’s (2014) SoilR package.

Load Data

library(SoilR,quietly=TRUE);

Munge Data

totalC_t0 <- 7.7;         # not included in data, so hard code here
t0 <- 0;
N_t <- 25;                # calculated by inspection
ts <- eCO2[1:25,2];
eCO2mean <- eCO2[1:25,3];   
eCO2sd <- eCO2[1:25,4];

Plot Data

library(ggplot2,quietly=TRUE);
df <- data.frame(list(ts=ts,eCO2mean=eCO2mean,eCO2sd=eCO2sd));
interval_95pct <- aes(ymin = eCO2mean + 1.96 * eCO2sd, 
                      ymax = eCO2mean - 1.96 * eCO2sd);
plot_data <- 
  ggplot(df, aes(x=ts, y=eCO2mean)) +
  geom_point() + 
  geom_errorbar(interval_95pct, width=0, colour="blue") +
  scale_x_continuous(name="time (days)") +
  scale_y_continuous(name="evolved CO2 (mgC g-1 soil)") +
  ggtitle("Evolved CO2 Measurements (with 95% intervals)");
plot(plot_data);

Caption Goes here and how will it look?

Physical Model

Two Pool Linear Model with Feedback

Sierra and Müller (2014) present a two-pool model with decomposition rates for each pool as well as transfer rates from one pool to another.

Let \(C(t) = (C_1(t), C_2(t))\) be the concentration of carbon in pools 1 and 2. The model assumes pool-specific decomposition rates \(\kappa_1, \kappa_2\) and transfer rates \(\alpha_{2,1}\) and \(\alpha_{1,2}\) from pool 2 to pool 1 and vice-versa. The system dynamics are modeled through the equations \[ \frac{d C_1(t)}{dt} = - \kappa_1 \, C_1(t) + \alpha_{2,1} \, \kappa_2 \, C_2(t) \] and \[ \frac{d C_2(t)}{dt} = - \kappa_2 \, C_2(t) + \alpha_{1,2} \, \kappa_1 \, C_1(t). \]

Given initial carbon concentrations \(C(0)\) at \(t = 0\) and values for parameters \(\kappa, \alpha\), these equations may be solved for \(C(t_n)\) at measurement times \(t_n > 0\).

Statistical Model

Measurements

Initial State Measurement

Because the two pools are theoretical constructs to account for soil heterogeneity, they are not measured directly. Instead, only the total initial carbon \(C_1(0) + C_2(0)\) is measured, not how it is partitioned between the two pools.

Evolved Carbon Measurement

For each measurement time \(t_n\), the total carbon evolved from the system since the initial time \(t = 0\) is measured and recorded as \(\mbox{eCO}_2(t_n)\). This measurement is of the quantity predicted by the system of differential equations to be the total initial carbon minus what is left in the pools at time \(t_n\), \[ \left[ C_1(0) + C_2(0) \right] - \left[ C_1(t_n) + C_2(t_n) \right]. \] The SoilR package provides the \(\mbox{eCO2}_2(t_n)\) measurements.

Statistical Model Parameters

The statistical model assumes the following parameters.

Param	Range	Description
\(\gamma\)	\((0,1)\)	proportion of initial carbon in pool 1
\(\kappa_1\)	\((0,\infty)\)	decomposition rate (pool 1)
\(\kappa_2\)	\((0,\infty)\)	decomposition rate (pool 2)
\(\alpha_{2,1}\)	\((0,\infty)\)	transfer rate (pool 2 to 1)
\(\alpha_{1,2}\)	\((0,\infty)\)	transfer rate (pool 1 to 2)
\(\sigma\)	\((0,\infty)\)	residual error scale

Probability Model

Priors

Weakly informative priors are supplied for all parameters, whose rough values are known from previous experiments.

\[\gamma \sim \mbox{Beta}(10,1)\]

\[\kappa_1, \kappa_2, \alpha_{1,2}, \alpha_{2,1} \sim \mbox{Normal}(0,1)\]

\[\sigma \sim \mbox{Cauchy}(0,1)\]

Sampling Distribution

The sampling distribution is a standard predictive error model assuming normal error of predictions,

\[\mbox{eCO}_2(t) \sim \mbox{Normal}(\hat{\mbox{eCO}_2}(t), \sigma),\]

where \(\mbox{eCO}_2(t)\) is the measured evolved carbon and \(\hat{\mbox{eCO}_2}(t)\) is the estimate of evolved carbon based on the solutions \(C(t)\) to the system equations given initial condition \(C(0)\) and model parameters \((\gamma,\kappa,\alpha,\sigma)\).

With this model, the noise scale \(\sigma\) is accounting both for measurement error in \(\mbox{eCO}_2(t)\) and the misprediction of the model with estimated parameters.

Posterior Distribution

By Bayes’s rule, the posterior distribution of the parameters given the observations is \[ p(\gamma,\kappa,\alpha,\sigma \, | \, \mbox{eCO}_2,C(0)) \ \propto \ p(\mbox{eCO}_2 \, | \, \gamma,\kappa,\alpha,\sigma,C(0)) \ p(\gamma,\kappa,\alpha,\sigma). \] Here, \(\mbox{eCO}_2\) is the full sequence of evolved carbon measurements.

Stan Program

The following is a dump of a file containing the Stan program to implement the 2-pool model with feedback as described above; the model code is fully documented, and is explained at more length below.

file_path <- "soil_incubation.stan";
lines <- readLines(file_path, encoding="ASCII");
for (n in 1:length(lines)) cat(lines[n],'\n');

functions { 
 
  /** 
   * ODE system for two pool model with feedback and no inputs. 
   * 
   * This is the version that does not deal with measurement error. 
   * 
   * System State C is two dimensional with C[1] and C[2]  
   * being carbon in pools 1 and 2. 
   * 
   * The system has parameters 
   * 
   *   theta = (k1, k2, alpha21, alpha12) 
   * 
   * where 
   * 
   *   k1:       pool 1 decomposition rate 
   *   k2:       pool 2 decomposition rate 
   *   alpha21:  transfer coefficient from pool 2 to pool 1 
   *   alpha12:  transfer coefficient from pool 1 to pool 2 
   * 
   * The system time derivatives are 
   * 
   *   d.C[1] / d.t  =  -k1 * C[1]  +  alpha12 * k2 * C[2] 
   * 
   *   d.C[2] / d.t  =  alpha21 * k1 * C[1]  -  k2 * C[2] 
   * 
   * @param t time at which derivatives are evaluated. 
   * @param C system state at which derivatives are evaluated. 
   * @param theta parameters for system. 
   * @param x_r real constants for system (empty). 
   * @param x_i integer constants for system (empty). 
   */ 
  real[] two_pool_feedback(real t, real[] C, real[] theta, 
                           real[] x_r, int[] x_i) { 
    real k1; 
    real k2; 
    real alpha21; 
    real alpha12; 
 
    real dC_dt[2]; 
 
    k1 <- theta[1]; 
    k2 <- theta[2]; 
    alpha21 <- theta[3]; 
    alpha12 <- theta[4]; 
     
    dC_dt[1] <- -k1 * C[1] + alpha12 * k2 * C[2]; 
    dC_dt[2] <- - k2 * C[2] + alpha21 * k1 * C[1] ; 
 
    return dC_dt; 
  } 
 
  /** 
   * Compute total evolved CO2 from the system given the specified 
   * parameters and times.  This is done by simulating the system 
   * defined by the ODE function two_pool_feedback and then 
   * subtracting the sum of the CO2 estimated in each pool from the 
   * initial CO2. 
   * 
   * @param T number of times. 
   * @param t0 initial time. 
   * @param ts observation times. 
   * @param gamma partitioning coefficient. 
   * @param k1 decomposition rate for pool 1 
   * @param k2 decomposition rate for pool 2 
   * @param alpha21 transfer coefficient from pool 2 to 1 
   * @param alpha12 transfer coefficient from pool 1 to 2 
   * @param x_r real data (empty) 
   * @param x_i integer data (empty) 
   * @return evolved CO2 for times ts 
   */ 
  real[] evolved_CO2(int N_t, real t0, real[] ts, 
                     real gamma, real totalC_t0, 
                     real k1, real k2,  
                     real alpha21, real alpha12, 
                     real[] x_r, int[] x_i) { 
 
    real C_t0[2];               // initial state 
    real theta[4];              // ODE parameters 
    real C_hat[N_t,2];          // predicted pool content 
 
    real eCO2_hat[N_t]; 
 
    C_t0[1] <- gamma * totalC_t0; 
    C_t0[2] <- (1 - gamma) * totalC_t0; 
 
    theta[1] <- k1; 
    theta[2] <- k2; 
    theta[3] <- alpha21; 
    theta[4] <- alpha12; 
 
    C_hat <- integrate_ode(two_pool_feedback,  
                           C_t0, t0, ts, theta, x_r, x_i); 
 
    for (t in 1:N_t) 
      eCO2_hat[t] <- totalC_t0 - sum(C_hat[t]); 
    return eCO2_hat; 
  } 
 
} 
data { 
  real<lower=0> totalC_t0;     // initial total carbon 
 
  real t0;                     // initial time 
  int<lower=0> N_t;            // number of measurement times 
  real<lower=t0> ts[N_t];      // measurement times 
 
 
  real<lower=0> eCO2mean[N_t]; // measured cumulative evolved carbon 
} 
transformed data { 
  real x_r[0];                 // no real data for ODE system 
  int x_i[0];                  // no integer data for ODE system 
} 
parameters { 
  real<lower=0> k1;            // pool 1 decomposition rate 
  real<lower=0> k2;            // pool 2 decomposition rate 
 
  real<lower=0> alpha21;       // transfer coeff from pool 2 to 1 
  real<lower=0> alpha12;       // transfer coeff from pool 1 to 2 
 
  real<lower=0,upper=1> gamma; // partitioning coefficient 
 
  real<lower=0> sigma;         // observation std dev 
} 
transformed parameters { 
  real eCO2_hat[N_t]; 
  eCO2_hat <- evolved_CO2(N_t, t0, ts, gamma, totalC_t0, 
                          k1, k2, alpha21, alpha12, x_r, x_i); 
} 
model { 
  // priors 
  gamma ~ beta(10,1);         // identifies pools 
  k1 ~ normal(0,1);           // weakly informative 
  k2 ~ normal(0,1); 
  alpha21 ~ normal(0,1); 
  alpha12 ~ normal(0,1); 
  sigma ~ cauchy(0,1); 
 
  // likelihood 
  for (t in 1:N_t) 
    eCO2mean[t] ~ normal(eCO2_hat[t], sigma);   // normal error 
}

The Stan program begins by defining two convenience functions in the functions block.

two_pool_feedback() computes the solutions to the system of differential equations given the model parameters, initial conditions, and requested solution times in an array. The second.
evolved_CO2() computes the cumulative evolved CO₂ given the compartment concentrations computed by the first function.

The data block declares the data that is read in, including the initial total carbon, initial time, measurement times, and measured total evolved carbon. The parameter block declares parameters with constraints as defined above. The transformed parameter block is used to compute the evolved CO₂ estimate given the system equations and parameter values. Finally, the model block just repeats the sampling distributions for all of the parameters.

Fitting the Model with RStan

library(rstan);
fit <- stan("soil_incubation.stan",
            data=c("totalC_t0", "t0", "N_t", "ts", "eCO2mean"),
            control=list(adapt_delta=0.90,
                         stepsize=0.005),
            chains=2, iter=200, seed=1234);

## 
## TRANSLATING MODEL 'soil_incubation' FROM Stan CODE TO C++ CODE NOW.
## COMPILING THE C++ CODE FOR MODEL 'soil_incubation' NOW.
## 
## SAMPLING FOR MODEL 'soil_incubation' NOW (CHAIN 1).
## 
## Iteration:   1 / 200 [  0%]  (Warmup)
## Iteration:  20 / 200 [ 10%]  (Warmup)
## Iteration:  40 / 200 [ 20%]  (Warmup)
## Iteration:  60 / 200 [ 30%]  (Warmup)
## Iteration:  80 / 200 [ 40%]  (Warmup)
## Iteration: 100 / 200 [ 50%]  (Warmup)
## Iteration: 101 / 200 [ 50%]  (Sampling)
## Iteration: 120 / 200 [ 60%]  (Sampling)
## Iteration: 140 / 200 [ 70%]  (Sampling)
## Iteration: 160 / 200 [ 80%]  (Sampling)
## Iteration: 180 / 200 [ 90%]  (Sampling)
## Iteration: 200 / 200 [100%]  (Sampling)
## #  Elapsed Time: 10.4934 seconds (Warm-up)
## #                5.7446 seconds (Sampling)
## #                16.238 seconds (Total)
## 
## 
## SAMPLING FOR MODEL 'soil_incubation' NOW (CHAIN 2).
## 
## Iteration:   1 / 200 [  0%]  (Warmup)
## Iteration:  20 / 200 [ 10%]  (Warmup)
## Iteration:  40 / 200 [ 20%]  (Warmup)
## Iteration:  60 / 200 [ 30%]  (Warmup)
## Iteration:  80 / 200 [ 40%]  (Warmup)
## Iteration: 100 / 200 [ 50%]  (Warmup)
## Iteration: 101 / 200 [ 50%]  (Sampling)
## Iteration: 120 / 200 [ 60%]  (Sampling)
## Iteration: 140 / 200 [ 70%]  (Sampling)
## Iteration: 160 / 200 [ 80%]  (Sampling)
## Iteration: 180 / 200 [ 90%]  (Sampling)
## Iteration: 200 / 200 [100%]  (Sampling)
## #  Elapsed Time: 8.25357 seconds (Warm-up)
## #                14.8398 seconds (Sampling)
## #                23.0934 seconds (Total)

Fit Summary

The following R code will print the summary of a model fit; explanations below.

options(width="100");
print(fit,digits=2);

## Inference for Stan model: soil_incubation.
## 2 chains, each with iter=200; warmup=100; thin=1; 
## post-warmup draws per chain=100, total post-warmup draws=200.
## 
##              mean se_mean   sd  2.5%  25%  50%  75% 97.5% n_eff Rhat
## k1           0.27    0.11 0.35  0.07 0.09 0.12 0.21  1.42    10 1.18
## k2           0.49    0.21 0.42  0.10 0.19 0.34 0.61  1.73     4 1.14
## alpha21      0.64    0.28 0.42  0.02 0.21 0.69 0.99  1.45     2 1.60
## alpha12      0.79    0.52 0.68  0.03 0.26 0.54 1.43  2.28     2 1.86
## gamma        0.90    0.04 0.09  0.64 0.86 0.93 0.98  1.00     5 1.26
## sigma        0.36    0.00 0.06  0.27 0.32 0.34 0.39  0.49   138 1.00
## eCO2_hat[1]  0.28    0.02 0.15 -0.04 0.18 0.30 0.40  0.50    55 1.01
## eCO2_hat[2]  0.61    0.02 0.17  0.25 0.49 0.62 0.75  0.89    98 1.00
## eCO2_hat[3]  1.81    0.01 0.16  1.48 1.70 1.82 1.92  2.09   200 1.00
## eCO2_hat[4]  2.22    0.01 0.15  1.92 2.11 2.22 2.31  2.46   200 1.01
## eCO2_hat[5]  2.59    0.01 0.14  2.32 2.50 2.60 2.69  2.83   200 1.01
## eCO2_hat[6]  2.93    0.01 0.13  2.68 2.85 2.94 3.01  3.16   145 1.01
## eCO2_hat[7]  3.26    0.01 0.12  3.03 3.18 3.27 3.35  3.47   142 1.01
## eCO2_hat[8]  4.13    0.01 0.10  3.93 4.06 4.13 4.21  4.32   115 1.00
## eCO2_hat[9]  4.39    0.01 0.10  4.19 4.33 4.39 4.45  4.58   116 0.99
## eCO2_hat[10] 4.60    0.01 0.10  4.42 4.55 4.60 4.66  4.78   115 0.99
## eCO2_hat[11] 4.82    0.01 0.10  4.63 4.76 4.82 4.88  5.00   112 1.00
## eCO2_hat[12] 5.02    0.01 0.10  4.83 4.96 5.02 5.08  5.20   108 1.00
## eCO2_hat[13] 5.56    0.01 0.10  5.35 5.50 5.56 5.63  5.75    86 1.01
## eCO2_hat[14] 5.71    0.01 0.10  5.50 5.65 5.71 5.77  5.91    80 1.02
## eCO2_hat[15] 5.85    0.01 0.10  5.63 5.79 5.85 5.91  6.06    76 1.02
## eCO2_hat[16] 5.98    0.01 0.10  5.77 5.92 5.98 6.04  6.19    71 1.03
## eCO2_hat[17] 6.10    0.01 0.10  5.88 6.04 6.10 6.16  6.30    68 1.03
## eCO2_hat[18] 6.42    0.01 0.10  6.22 6.37 6.42 6.48  6.63    52 1.04
## eCO2_hat[19] 6.52    0.01 0.10  6.32 6.47 6.52 6.58  6.72    50 1.04
## eCO2_hat[20] 6.59    0.01 0.10  6.39 6.54 6.59 6.65  6.79    49 1.05
## eCO2_hat[21] 6.67    0.01 0.10  6.48 6.62 6.67 6.73  6.86    47 1.05
## eCO2_hat[22] 6.74    0.01 0.09  6.56 6.70 6.74 6.80  6.93    46 1.05
## eCO2_hat[23] 7.24    0.02 0.07  7.11 7.20 7.24 7.29  7.38    19 1.06
## eCO2_hat[24] 7.28    0.02 0.07  7.15 7.24 7.27 7.32  7.40    19 1.07
## eCO2_hat[25] 7.31    0.01 0.06  7.18 7.27 7.30 7.34  7.43    19 1.07
## lp__         3.02    0.32 1.86 -1.11 2.04 3.22 4.25  6.10    33 1.09
## 
## Samples were drawn using NUTS(diag_e) at Thu Nov 27 15:06:02 2014.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

There is one row per parameter, with the following columns, with statistics computed over all draws making up the sample.

In addition to the parameters, the row lp__ indicates the log density evaluated by the model; note that this quantity is equal to the log posterior up to an additive constant.

Column	Description
`mean`	mean
`sd`	standard deviation
`N%`	percentiles (N/100 quantile)
`50%`	median
`n_eff`	number of effective samples
`R_hat`	potential scale reduction statistic (\(\hat{R}\))
`se_mean`	standard error of posterior mean estimate (`sd / sqrt(n_eff)`)

Inspecting the output, the Stan implementation provided comparable posterior estimates to those estimated with SoilR by Sierra and Müller (2014). This is a nice sanity check on both implementations.

Traceplots

traceplot(fit,c("k1","k2","alpha21","alpha12","gamma","sigma"));

The warmup draws are highlighted with grey background; rather than sampling, Stan is adapting the sampling parameters (discretization interval and mass matrix). The traceplots during the sampling stage indicate Stan is mixing well.

If Stan does not mix well, try lowering the initial step size, increasing the target acceptance rate, and increasing the number of warmup iterations.

Posterior Histograms and Scatterplots (pairs)

The pairs() function in RStan displays a grid of plots. The diagonal contains a histogram of the posterior samples for each parameter; the above and below diagonals provide scatterplots for pairs of parameters. The above and below plots are based on draws from the two different Markov chains.

pairs(fit, pars=c("k1","k2","alpha21","alpha12","gamma","sigma"));

## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009

Model Fit

The following figure, based on a figure in Sierra and Müller (2014), overlays the earlier plot of data and measurement error (blue) with the posterior median prediction for evolved CO₂ (red) with 95% intervals (yellow) calculated from the median of the residual noise (\(\sigma\)) in the posterior.

library(ggplot2);
fit_ss <- extract(fit);
sigma_hat <- median(fit_ss$sigma);
eCO2_hat <- rep(NA,N_t);
for (t in 1:N_t) {
  eCO2_hat[t] <- median(fit_ss$eCO2_hat[,t]);
}
df_post <- data.frame(list(ts = ts, 
                           eCO2meas = eCO2mean,
                           eCO2_hat = eCO2_hat));

library(ggplot2);
      
interval_95pct <- aes(ymin = eCO2mean + 1.96 * eCO2sd, 
                      ymax = eCO2mean - 1.96 * eCO2sd);

ggplot(df_post, aes(x = ts)) +
  geom_ribbon(aes(ymin = eCO2_hat - 1.96 * sigma_hat,
                  ymax = eCO2_hat + 1.96 * sigma_hat),
                  fill="lightyellow") +
  geom_line(aes(y=eCO2_hat),colour="darkred") +
  geom_point(aes(y=eCO2meas),colour="darkblue") +
  geom_errorbar(interval_95pct, width=0, colour="blue") +
  labs(x="time (days)", 
       y="evolved CO2 (mgC g-1 soil)") +
  ggtitle("Soil Incubation: 2 pools, feedback\nposterior median (red), predictive 95% (yellow)");

Measurement Error Model

It is straightforward to add a measurement error component to the model.

file_path <- "soil_incubation_measurement_err.stan";
lines <- readLines(file_path, encoding="ASCII");
for (n in 1:length(lines)) cat(lines[n],'\n');

functions { 
 
  real[] two_pool_feedback(real t, real[] C, real[] theta, 
                           real[] x_r, int[] x_i) { 
    real k1; 
    real k2; 
    real alpha21; 
    real alpha12; 
 
    real dC_dt[2]; 
 
    k1 <- theta[1]; 
    k2 <- theta[2]; 
    alpha21 <- theta[3]; 
    alpha12 <- theta[4]; 
     
    dC_dt[1] <- -k1 * C[1] + alpha12 * k2 * C[2]; 
    dC_dt[2] <- alpha21 * k1 * C[1] - k2 * C[2]; 
 
    return dC_dt; 
  } 
 
  real[] evolved_CO2(int N_t, real t0, real[] ts, 
                     real gamma, real totalC_t0, 
                     real k1, real k2,  
                     real alpha21, real alpha12, 
                     real[] x_r, int[] x_i) { 
 
    real C_t0[2];               // initial state 
    real theta[4];              // ODE parameters 
    real C_hat[N_t,2];          // predicted pool content 
 
    real eCO2_hat[N_t]; 
 
    C_t0[1] <- gamma * totalC_t0; 
    C_t0[2] <- (1 - gamma) * totalC_t0; 
 
    theta[1] <- k1; 
    theta[2] <- k2; 
    theta[3] <- alpha21; 
    theta[4] <- alpha12; 
 
    C_hat <- integrate_ode(two_pool_feedback,  
                           C_t0, t0, ts, theta, x_r, x_i); 
 
    for (t in 1:N_t) 
      eCO2_hat[t] <- totalC_t0 - sum(C_hat[t]); 
    return eCO2_hat; 
  } 
 
} 
data { 
  real<lower=0> totalC_t0;     // initial total carbon 
 
  real t0;                     // initial time 
  int<lower=0> N_t;            // number of measurement times 
  real<lower=t0> ts[N_t];      // measurement times 
 
  vector<lower=0>[N_t] eCO2mean; // measured cumulative evolved carbon 
  real<lower=0> eCO2sd[N_t];   // measured cumulative evolved carbon 
} 
transformed data { 
  real x_r[0];                 // no real data for ODE system 
  int x_i[0];                  // no integer data for ODE system 
} 
parameters { 
  real<lower=0> k1;            // pool 1 decomposition rate 
  real<lower=0> k2;            // pool 2 decomposition rate 
 
  real<lower=0> alpha21;       // transfer coeff from pool 2 to 1 
  real<lower=0> alpha12;       // transfer coeff from pool 1 to 2 
 
  real<lower=0,upper=1> gamma; // partitioning coefficient 
 
  real<lower=0> sigma;         // observation std dev 
 
  vector<lower=0>[N_t] eCO2;   // evolved CO2 
} 
transformed parameters { 
  real eCO2_hat[N_t]; 
  eCO2_hat <- evolved_CO2(N_t, t0, ts, gamma, totalC_t0, 
                          k1, k2, alpha21, alpha12, x_r, x_i); 
} 
model { 
  gamma ~ beta(10,1);         // identifies pools 
  k1 ~ normal(0,1);           // weakly informative 
  k2 ~ normal(0,1); 
  alpha21 ~ normal(0,1); 
  alpha12 ~ normal(0,1); 
  sigma ~ normal(0,1); 
 
  // likelihood is INTEGRAL_(0,inf) p(eCO2,eCO2mean|...) d.eCO2  
 
  eCO2 ~ normal(eCO2_hat, sigma); 
 
  eCO2mean ~ normal(eCO2, eCO2sd);  // measurement error 
}

This can be fit as before; note the additional data variable, eCO2sd.

fit_me <- stan("soil_incubation_measurement_err.stan",
              data=c("totalC_t0", "t0", "N_t", "ts", "eCO2mean", "eCO2sd"),
              control=list(adapt_delta=0.90,
                           stepsize=0.005),
              chains=2, iter=100, seed=1234);

## 
## TRANSLATING MODEL 'soil_incubation_measurement_err' FROM Stan CODE TO C++ CODE NOW.
## COMPILING THE C++ CODE FOR MODEL 'soil_incubation_measurement_err' NOW.
## 
## SAMPLING FOR MODEL 'soil_incubation_measurement_err' NOW (CHAIN 1).
## 
## Iteration:  1 / 100 [  1%]  (Warmup)
## Iteration: 10 / 100 [ 10%]  (Warmup)
## Iteration: 20 / 100 [ 20%]  (Warmup)
## Iteration: 30 / 100 [ 30%]  (Warmup)
## Iteration: 40 / 100 [ 40%]  (Warmup)
## Iteration: 50 / 100 [ 50%]  (Warmup)
## Iteration: 51 / 100 [ 51%]  (Sampling)
## Iteration: 60 / 100 [ 60%]  (Sampling)
## Iteration: 70 / 100 [ 70%]  (Sampling)
## Iteration: 80 / 100 [ 80%]  (Sampling)
## Iteration: 90 / 100 [ 90%]  (Sampling)
## Iteration: 100 / 100 [100%]  (Sampling)
## #  Elapsed Time: 1.52226 seconds (Warm-up)
## #                1.10213 seconds (Sampling)
## #                2.62439 seconds (Total)
## 
## 
## SAMPLING FOR MODEL 'soil_incubation_measurement_err' NOW (CHAIN 2).
## 
## Iteration:  1 / 100 [  1%]  (Warmup)
## Iteration: 10 / 100 [ 10%]  (Warmup)
## Iteration: 20 / 100 [ 20%]  (Warmup)
## Iteration: 30 / 100 [ 30%]  (Warmup)
## Iteration: 40 / 100 [ 40%]  (Warmup)
## Iteration: 50 / 100 [ 50%]  (Warmup)
## Iteration: 51 / 100 [ 51%]  (Sampling)
## Iteration: 60 / 100 [ 60%]  (Sampling)
## Iteration: 70 / 100 [ 70%]  (Sampling)
## Iteration: 80 / 100 [ 80%]  (Sampling)
## Iteration: 90 / 100 [ 90%]  (Sampling)
## Iteration: 100 / 100 [100%]  (Sampling)
## #  Elapsed Time: 18.2091 seconds (Warm-up)
## #                3.06753 seconds (Sampling)
## #                21.2766 seconds (Total)

Here is the summary of the posterior samples from the measurement error model.

options(width="100");
print(fit_me,digits=2);

## Inference for Stan model: soil_incubation_measurement_err.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##               mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## k1            0.32    0.02 0.35  0.08  0.12  0.19  0.36  1.36   404 1.01
## k2            0.47    0.02 0.41  0.09  0.18  0.33  0.63  1.59   564 1.00
## alpha21       1.02    0.02 0.32  0.30  0.88  1.02  1.14  1.77   239 1.01
## alpha12       0.40    0.04 0.32  0.02  0.19  0.36  0.53  1.36    78 1.06
## gamma         0.92    0.00 0.08  0.71  0.89  0.94  0.98  1.00   636 1.01
## sigma         0.29    0.00 0.07  0.18  0.24  0.28  0.33  0.44   618 1.00
## eCO2[1]       0.03    0.00 0.02  0.00  0.02  0.03  0.05  0.08  1119 1.00
## eCO2[2]       0.71    0.00 0.06  0.60  0.67  0.71  0.75  0.83  1859 1.00
## eCO2[3]       1.90    0.01 0.21  1.50  1.76  1.90  2.05  2.33  1291 1.00
## eCO2[4]       1.89    0.00 0.12  1.65  1.81  1.90  1.98  2.13  1197 1.00
## eCO2[5]       2.77    0.00 0.08  2.62  2.72  2.77  2.82  2.93  1530 1.00
## eCO2[6]       2.68    0.00 0.06  2.55  2.64  2.68  2.72  2.80  2000 1.00
## eCO2[7]       3.25    0.00 0.11  3.04  3.17  3.25  3.32  3.46  1296 1.00
## eCO2[8]       3.90    0.00 0.08  3.74  3.85  3.90  3.96  4.06   986 1.00
## eCO2[9]       4.30    0.01 0.22  3.86  4.15  4.30  4.44  4.73  1372 1.00
## eCO2[10]      4.82    0.00 0.06  4.70  4.78  4.82  4.86  4.93  1389 1.00
## eCO2[11]      4.72    0.01 0.22  4.28  4.58  4.73  4.88  5.14  1445 1.00
## eCO2[12]      5.09    0.01 0.20  4.71  4.96  5.09  5.22  5.50  1484 1.00
## eCO2[13]      5.81    0.00 0.18  5.45  5.69  5.81  5.93  6.17  1661 1.00
## eCO2[14]      5.77    0.01 0.27  5.24  5.59  5.76  5.94  6.31  1211 1.00
## eCO2[15]      5.92    0.01 0.25  5.43  5.76  5.91  6.08  6.41  1161 1.00
## eCO2[16]      5.81    0.00 0.02  5.77  5.80  5.81  5.83  5.85  2000 1.00
## eCO2[17]      5.94    0.01 0.21  5.52  5.81  5.95  6.08  6.33  1226 1.01
## eCO2[18]      6.31    0.01 0.24  5.82  6.15  6.31  6.46  6.76  1342 1.00
## eCO2[19]      6.63    0.00 0.20  6.25  6.49  6.63  6.77  7.01  1859 1.00
## eCO2[20]      6.61    0.01 0.23  6.20  6.45  6.60  6.75  7.06  1256 1.00
## eCO2[21]      6.58    0.01 0.26  6.05  6.41  6.58  6.76  7.08  1503 1.00
## eCO2[22]      6.80    0.00 0.13  6.54  6.71  6.80  6.89  7.06  1529 1.00
## eCO2[23]      7.85    0.01 0.20  7.47  7.72  7.85  7.99  8.24  1109 1.00
## eCO2[24]      7.61    0.00 0.16  7.31  7.50  7.60  7.71  7.92  1603 1.00
## eCO2[25]      7.63    0.00 0.05  7.52  7.59  7.63  7.66  7.73  2000 1.00
## eCO2_hat[1]   0.22    0.01 0.16 -0.12  0.13  0.24  0.33  0.48   798 1.00
## eCO2_hat[2]   0.54    0.01 0.17  0.21  0.43  0.54  0.65  0.85  1011 1.00
## eCO2_hat[3]   1.75    0.00 0.15  1.48  1.66  1.75  1.85  2.05  1193 1.00
## eCO2_hat[4]   2.17    0.00 0.14  1.92  2.08  2.17  2.26  2.45  1178 1.00
## eCO2_hat[5]   2.56    0.00 0.13  2.32  2.48  2.56  2.64  2.82  1196 1.00
## eCO2_hat[6]   2.91    0.00 0.12  2.69  2.83  2.91  2.98  3.14  1248 1.00
## eCO2_hat[7]   3.25    0.00 0.11  3.04  3.18  3.25  3.32  3.47  1341 1.00
## eCO2_hat[8]   4.13    0.00 0.10  3.95  4.07  4.13  4.20  4.33  1588 1.00
## eCO2_hat[9]   4.40    0.00 0.10  4.22  4.33  4.40  4.46  4.59  1595 1.00
## eCO2_hat[10]  4.62    0.00 0.10  4.44  4.55  4.62  4.68  4.81  1577 1.00
## eCO2_hat[11]  4.84    0.00 0.10  4.66  4.77  4.83  4.90  5.03  1539 1.00
## eCO2_hat[12]  5.04    0.00 0.10  4.86  4.98  5.04  5.11  5.23  1491 1.00
## eCO2_hat[13]  5.59    0.00 0.10  5.39  5.52  5.59  5.66  5.78  1338 1.00
## eCO2_hat[14]  5.74    0.00 0.10  5.54  5.67  5.73  5.81  5.93  1286 1.01
## eCO2_hat[15]  5.88    0.00 0.10  5.68  5.81  5.88  5.95  6.07  1180 1.01
## eCO2_hat[16]  6.01    0.00 0.10  5.81  5.94  6.01  6.08  6.21  1132 1.01
## eCO2_hat[17]  6.13    0.00 0.10  5.93  6.06  6.13  6.20  6.33  1095 1.01
## eCO2_hat[18]  6.46    0.00 0.10  6.26  6.39  6.46  6.53  6.65   909 1.01
## eCO2_hat[19]  6.55    0.00 0.10  6.35  6.48  6.55  6.62  6.74   890 1.01
## eCO2_hat[20]  6.62    0.00 0.10  6.43  6.56  6.62  6.69  6.81   876 1.01
## eCO2_hat[21]  6.70    0.00 0.10  6.51  6.64  6.70  6.77  6.89   862 1.01
## eCO2_hat[22]  6.78    0.00 0.09  6.59  6.71  6.78  6.84  6.95   849 1.01
## eCO2_hat[23]  7.27    0.00 0.07  7.13  7.22  7.27  7.31  7.40   781 1.01
## eCO2_hat[24]  7.30    0.00 0.07  7.16  7.25  7.30  7.34  7.42   778 1.01
## eCO2_hat[25]  7.33    0.00 0.06  7.20  7.29  7.33  7.37  7.45   774 1.01
## lp__         29.72    0.23 4.68 19.71 26.79 30.03 33.07 37.77   433 1.00
## 
## Samples were drawn using NUTS(diag_e) at Thu Nov 27 16:04:56 2014.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

And the traceplots.

traceplot(fit_me, c("k1","k2","alpha21","alpha12","gamma","sigma"));

Next Steps

There are several next steps which may depend on each other.

Two-Pool Model

Move noise to log scale to respect positivity constraint. This will make it multiplicative instead of additive; verify this with exploratory data analysis comparing residual to value.
Evaluate some reparameterizations
- independent \(a_{2,1} = \alpha_{2,1} \, \kappa_2\) and \(a_{1,2} = \alpha_{1,2} \, \kappa_1\), and
- total decay \(\kappa_1\) for pool 1 and then split that with a parameter in \((0,1)\) between evolution and pool 2
Implement a measurement error model for each aggregated observation. This will presuppose an underlying true carbon concentration which is measured with error on several samples to produce the reported mean and standard deviations.
Work directly with the individual measurements in a hierarchical model.

Michaelis-Menten, Forward and Backward

These are models that involve an enzyme pool and biomass data and data from two carbon pools (or maybe just the aggregate).

Get the data into usable format.
Write out math for forward and reverse models.
Write out statistical models.
Implement forward Michaelis-Menten kinetics.
Implement reverse Michaelis-Menten kinetics.

Mixture Models

Evaluate mixture model at each observation level, which will require running the integrator to solve the diff eqs just between observation times. Each new integration will have to take up where the last solver left off.
Evaluate change point model going from forward to reverse kinetics. This will require evaluating the integrator over intial and final intervals of time points.

Replication and Tutorial

Implement the second example of radiocarbon from Sierra and Müller (2014).

References

Papers

Schädel, C., Y. Luo, R. David Evans, S. Fei, and S. Schaeffer. 2013. Separating soil CO₂ efflux into C-pool-specific decay rates via inverse analysis of soil incubation data. Oecologia 171:721–732.
Sierra, Carlos A. and Markus Müller. 2014. Parameter estimation of compartment models in SoilR using classical and Bayesian methods. CRAN Vignette. [pdf]

R Packages

Statistics: RStan
Plots: ggplot2
Data: SoilR
Presentation: knitr

Colophon

All output, including plots, was generated directly from the code with the knitr R package using style Cerulean from within the RStudio integrated development environment.

This document is distributed under the CC-BY-3 license. The code is distrbuted under the new BSD license.