## 'data.frame':    481 obs. of  25 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sample_ID           : chr  "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" ...
##  $ site                : chr  "HOH" "HOH" "HOH" "HOH" ...
##  $ depth_cm            : int  13 40 76 104 11 30 45 75 9 27 ...
##  $ helper              : chr  "H-C-01-DNR-WET|13" "H-C-01-DNR-WET|40" "H-C-01-DNR-WET|76" "H-C-01-DNR-WET|104" ...
##  $ thickness_cm        : int  13 27 36 28 11 19 15 30 9 18 ...
##  $ rock_perc           : num  0 0 5 40 0 15 40 55 0 0 ...
##  $ BD_g_cm3            : num  0.07 0.16 0.51 0.88 0.38 0.49 0.7 0.57 0.72 0.95 ...
##  $ field_texture       : chr  "P" "MP" "MM" "C" ...
##  $ field_texture_binned: chr  "O" "O" "O" "C" ...
##  $ redox               : chr  NA NA NA NA ...
##  $ carbon_perc         : num  39.56 38.31 12.14 2.37 15.65 ...
##  $ carbon_stock_g_cm2  : num  0.376 1.678 2.105 0.352 0.65 ...
##  $ NDYI                : num  0.543 0.543 0.543 0.543 0.579 ...
##  $ Geomorphon_Class    : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Temp                : num  10.2 10.2 10.2 10.2 10.2 ...
##  $ Precip              : num  3070 3070 3070 3070 3054 ...
##  $ NDVI                : num  0.84 0.84 0.84 0.84 0.862 ...
##  $ MNDWI               : num  -0.477 -0.477 -0.477 -0.477 -0.446 ...
##  $ EVI                 : num  0.455 0.455 0.455 0.455 0.534 ...
##  $ SCI                 : num  0.649 0.649 0.649 0.649 0.648 ...
##  $ GEOLOGIC_AGE        : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...
##  $ WIP                 : num  0.873 0.873 0.873 0.873 0.896 ...
##  $ x                   : num  411350 411350 411350 411350 409931 ...
##  $ y                   : num  5295086 5295086 5295086 5295086 5294415 ...
## 'data.frame':    276 obs. of  25 variables:
##  $ X                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sample_ID        : chr  "CC_S26_R3" "CC_S26_R3" "CC_S26_R3" "CC_S27_R3" ...
##  $ upper_depth      : int  0 30 60 0 30 60 0 30 60 0 ...
##  $ lower_depth      : int  30 60 100 30 60 100 30 60 100 30 ...
##  $ SOC_stock_spline : num  0.1636 0.0983 0.0531 0.6126 0.6936 ...
##  $ site             : chr  "COL" "COL" "COL" "COL" ...
##  $ geomorphons      : chr  "f" "f" "f" "i" ...
##  $ HLI              : num  0.849 0.849 0.849 0.502 0.502 ...
##  $ Precip           : num  629 629 629 664 664 ...
##  $ Temp             : num  6.66 6.66 6.66 6.03 6.03 ...
##  $ WIP              : num  0.209 0.209 0.209 0.127 0.127 ...
##  $ tree_canopy_cover: num  31 31 31 56 56 ...
##  $ NDVI             : num  0.537 0.537 0.537 0.739 0.739 ...
##  $ MNDWI            : num  -0.517 -0.517 -0.517 -0.554 -0.554 ...
##  $ EVI              : num  0.292 0.292 0.292 0.387 0.387 ...
##  $ SCI              : num  0.511 0.511 0.511 0.616 0.616 ...
##  $ SAVI             : num  0.29 0.29 0.29 0.383 0.383 ...
##  $ EMBI             : num  0.188 0.188 0.188 0.162 0.162 ...
##  $ DSI              : num  0.92 0.92 0.92 0.625 0.625 ...
##  $ DSWI1            : num  1.16 1.16 1.16 1.61 1.61 ...
##  $ NDYI             : num  0.226 0.226 0.226 0.36 0.36 ...
##  $ LSWI             : num  0.0556 0.0556 0.0556 0.2314 0.2314 ...
##  $ ANDWI            : num  -0.539 -0.539 -0.539 -0.643 -0.643 ...
##  $ LITHOLOGY        : chr  "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" ...
##  $ GEOLOGIC_AGE     : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...

With many potential predictors we can eliminate some by examining correlation plots

There is weak correlation between SOC_stock_spline and the selected covariates. Additionally there is some collinearity between predictors. The spectral predictors such as EVI (Enhanced Vegetation Index) and SCI (Soil Condition Index) are well correlated with each other and also with NDVI (Normalized Difference Vegetation Index). We can remove these based on the correlation coefficient > 0.7.

Now we can begin examining the distribution of carbon stock values in the dataset to choose the appropriate transformation. We also scale and center the numeric predictor variables.

## 'data.frame':    276 obs. of  25 variables:
##  $ X                : num [1:276, 1] -1.72 -1.71 -1.7 -1.69 -1.67 ...
##   ..- attr(*, "scaled:center")= num 138
##   ..- attr(*, "scaled:scale")= num 79.8
##  $ sample_ID        : chr  "CC_S26_R3" "CC_S26_R3" "CC_S26_R3" "CC_S27_R3" ...
##  $ upper_depth      : num [1:276, 1] -1.1849 0.0492 1.2833 -1.1849 0.0492 ...
##   ..- attr(*, "scaled:center")= num 28.8
##   ..- attr(*, "scaled:scale")= num 24.3
##  $ lower_depth      : int  30 60 100 30 60 100 30 60 100 30 ...
##  $ SOC_stock_spline : num  0.1636 0.0983 0.0531 0.6126 0.6936 ...
##  $ site             : chr  "COL" "COL" "COL" "COL" ...
##  $ geomorphons      : chr  "f" "f" "f" "i" ...
##  $ HLI              : num [1:276, 1] 1.264 1.264 1.264 -0.907 -0.907 ...
##   ..- attr(*, "scaled:center")= num 0.647
##   ..- attr(*, "scaled:scale")= num 0.16
##  $ Precip           : num [1:276, 1] -1.21 -1.21 -1.21 -1.18 -1.18 ...
##   ..- attr(*, "scaled:center")= num 1870
##   ..- attr(*, "scaled:scale")= num 1022
##  $ Temp             : num [1:276, 1] -1.09 -1.09 -1.09 -1.48 -1.48 ...
##   ..- attr(*, "scaled:center")= num 8.42
##   ..- attr(*, "scaled:scale")= num 1.62
##  $ WIP              : num [1:276, 1] -0.801 -0.801 -0.801 -1.08 -1.08 ...
##   ..- attr(*, "scaled:center")= num 0.443
##   ..- attr(*, "scaled:scale")= num 0.292
##  $ tree_canopy_cover: num [1:276, 1] -2.25 -2.25 -2.25 -0.264 -0.264 ...
##   ..- attr(*, "scaled:center")= num 59.3
##   ..- attr(*, "scaled:scale")= num 12.6
##  $ NDVI             : num [1:276, 1] -3.5 -3.5 -3.5 -1.1 -1.1 ...
##   ..- attr(*, "scaled:center")= num 0.832
##   ..- attr(*, "scaled:scale")= num 0.0843
##  $ MNDWI            : num [1:276, 1] -0.324 -0.324 -0.324 -1.105 -1.105 ...
##   ..- attr(*, "scaled:center")= num -0.502
##   ..- attr(*, "scaled:scale")= num 0.0475
##  $ EVI              : num [1:276, 1] -1.932 -1.932 -1.932 -0.953 -0.953 ...
##   ..- attr(*, "scaled:center")= num 0.48
##   ..- attr(*, "scaled:scale")= num 0.0972
##  $ SCI              : num [1:276, 1] -2.81 -2.81 -2.81 -0.8 -0.8 ...
##   ..- attr(*, "scaled:center")= num 0.658
##   ..- attr(*, "scaled:scale")= num 0.0526
##  $ SAVI             : num [1:276, 1] -2.17 -2.17 -2.17 -1.02 -1.02 ...
##   ..- attr(*, "scaled:center")= num 0.466
##   ..- attr(*, "scaled:scale")= num 0.0811
##  $ EMBI             : num [1:276, 1] 1.52 1.52 1.52 1.3 1.3 ...
##   ..- attr(*, "scaled:center")= num 0.00762
##   ..- attr(*, "scaled:scale")= num 0.119
##  $ DSI              : num [1:276, 1] 3.03 3.03 3.03 1.2 1.2 ...
##   ..- attr(*, "scaled:center")= num 0.432
##   ..- attr(*, "scaled:scale")= num 0.161
##  $ DSWI1            : num [1:276, 1] -1.99 -1.99 -1.99 -1.36 -1.36 ...
##   ..- attr(*, "scaled:center")= num 2.58
##   ..- attr(*, "scaled:scale")= num 0.714
##  $ NDYI             : num [1:276, 1] -2.215 -2.215 -2.215 -0.683 -0.683 ...
##   ..- attr(*, "scaled:center")= num 0.419
##   ..- attr(*, "scaled:scale")= num 0.0874
##  $ LSWI             : num [1:276, 1] -2.61 -2.61 -2.61 -1.33 -1.33 ...
##   ..- attr(*, "scaled:center")= num 0.413
##   ..- attr(*, "scaled:scale")= num 0.137
##  $ ANDWI            : num [1:276, 1] 2.92 2.92 2.92 1.08 1.08 ...
##   ..- attr(*, "scaled:center")= num -0.703
##   ..- attr(*, "scaled:scale")= num 0.0564
##  $ LITHOLOGY        : chr  "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" ...
##  $ GEOLOGIC_AGE     : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...

Explicit parameter model building

Now build models using log transformed carbon stock data. We need to specify that sample_ID is a random effect because of the multiple samples at one location. lower_depth will be a random slope to adjust model based on how it is affected by depth. We could use a Generalized Linear Model or a Generalized Linear Mixed Model here too but they often fail to converge. We will model with the log-transformed soil carbon stocks in the linear mixed model

# Full model with hypothesized parameters
mod1 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp *
    ANDWI * NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# No interactions
mod2 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# No spectral
mod3 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No geology
mod4 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp +
    ANDWI + NDYI + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No WIP w/ interaction
mod5 <- lmer(log10(SOC_stock_spline) ~ Precip * Temp + ANDWI +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 1.5562 (tol = 0.002, component 1)
# No WIP w/o interaction
mod6 <- lmer(log10(SOC_stock_spline) ~ Precip + Temp + ANDWI +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No climate
mod7 <- lmer(log10(SOC_stock_spline) ~ WIP + +ANDWI + NDYI +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No climate No spectral
mod8 <- lmer(log10(SOC_stock_spline) ~ WIP + (GEOLOGIC_AGE) +
    lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0035401 (tol = 0.002, component 1)
# Just WIP
mod9 <- lmer(log10(SOC_stock_spline) ~ WIP + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# NULL
mod10 <- lmer(log10(SOC_stock_spline) ~ lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.431569 (tol = 0.002, component 1)

Note models 5, 8, and 10, failed to converge

Pairwise comparisons between the top, largest model and the rest show if models with fewer parameters are significantly different. Considering AIC will help determine the most parsimonious model with the best fit to the data.

Models npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) Significant
mod2 15 220.4259 274.7319 -95.21296 190.4259 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 36.52080 26 0.0824988
mod3 17 223.7913 285.3381 -94.89564 189.7913 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 35.88615 24 0.0562988
mod4 14 227.0389 277.7245 -99.51945 199.0389 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 45.13377 27 0.0157401 ***
mod5 15 229.4435 283.7495 -99.72175 199.4435 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 45.53837 26 0.0102674 ***
mod6 14 229.4083 280.0939 -100.70415 201.4083 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 47.50318 27 0.0087196 ***
mod7 13 230.4420 277.5072 -102.22099 204.4420 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 50.53685 28 0.0056307 ***
mod8 11 247.7352 287.5596 -112.86760 225.7352 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 71.83007 30 0.0000275 ***
mod9 6 245.6252 267.3476 -116.81262 233.6252 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 79.72011 35 0.0000243 ***
mod10 5 247.8359 265.9379 -118.91797 237.8359 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 83.93081 36 0.0000107 ***

From the ANOVAs it looks like model 2 & model 3 performed just as well as model 1meaning we should proceed with these. model 2 removed interactions and model 3 removed all the spectral metrics but kept interaction. However, model 2 has 2 less parameters than model 3 which gives it a lower AIC when compared together.

npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
mod2 15 220.4259 274.7319 -95.21296 190.4259 NA NA NA
mod3 17 223.7913 285.3381 -94.89564 189.7913 0.6346545 2 0.7280925

To hone in on differences between variations in model 2 we can remove some of the variables and compare AICs for a more parsimonious fit.

# No ANDWI
mod2.1 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No ANDWI, No NDYI
mod2.2 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0070341 (tol = 0.002, component 1)
# No NDYI
mod2.3 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00238014 (tol = 0.002, component 1)
# No Geology
mod2.4 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + NDYI + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.171981 (tol = 0.002, component 1)
# No NDYI or Geology
mod2.5 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0427664 (tol = 0.002, component 1)
# No ANDWI or Geology
mod2.6 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    NDYI + lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
# No spectral or Geology
mod2.7 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)

Note models 2, 3, 4, 5 failed to converge

Models npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) Significant
mod2.1 14 218.8356 269.5212 -95.41779 190.8356 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 0.409648 1 0.5221482
mod2.2 13 219.4581 266.5234 -96.72907 193.4581 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 3.032215 2 0.2195649
mod2.3 14 221.3225 272.0082 -96.66127 193.3225 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 2.896615 1 0.0887658
mod2.4 10 221.4128 257.6168 -100.70639 201.4128 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 10.986856 5 0.0516412
mod2.5 9 227.2392 259.8228 -104.61960 209.2392 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 18.813263 6 0.0044909 ***
mod2.6 9 220.0621 252.6457 -101.03105 202.0621 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 11.636166 6 0.0705954
mod2.7 8 226.0611 255.0243 -105.03055 210.0611 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 19.635164 7 0.0064141 ***

Model 2.1 and 2.2 are the lowest AIC and not significantly different from the fit in Model 2. Because Model 2.2 failed to converge, we will move forward with Model 2.1.

Now we can look at the table of all models and compare AICs. We find that Model 2.1 has the lowest AIC followed by Model 2.2 which failed to converge

AIC formula name delta
235.91 WIP * Precip * Temp * ANDWI * NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod1 17.07
220.43 WIP + Precip + Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2 1.59
223.79 WIP * Precip * Temp + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod3 4.95
227.04 WIP * Precip * Temp + ANDWI + NDYI + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod4 8.20
229.44 Precip * Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod5 10.60
229.41 Precip + Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod6 10.57
230.44 WIP + +ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod7 11.60
247.74 WIP + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod8 28.90
245.63 WIP + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod9 26.79
218.84 WIP + Precip + Temp + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.1 0.00
219.46 WIP + Precip + Temp + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.2 0.62
221.32 WIP + Precip + Temp + ANDWI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.3 2.48
221.41 WIP + Precip + Temp + ANDWI + NDYI + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.4 2.57
227.24 WIP + Precip + Temp + ANDWI + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.5 8.40
220.06 WIP + Precip + Temp + NDYI + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.6 1.22
226.06 WIP + Precip + Temp + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.7 7.22

The model \(R^2\) for mod2.1 is 0.892

Confidence intervals calculated with bootstrapping show that WIP and Precip are the most significant contributors to SOC stock. the overlapping confidence intervals do not show much separation between WIP and Precip but there is more variation in the Precip confidence interval.

  Model 2.1
Predictors Estimates CI p
(Intercept) -0.24 -0.63 – 0.11 0.192
WIP 0.12 0.05 – 0.19 <0.001
Precip 0.18 0.03 – 0.32 0.020
Temp -0.00 -0.12 – 0.12 0.992
NDYI 0.07 -0.02 – 0.16 0.116
GEOLOGIC AGE
[Miocene-Eocene]
-0.44 -0.90 – 0.09 0.108
GEOLOGIC AGE
[Oligocene-Eocene]
0.09 -0.29 – 0.52 0.654
GEOLOGIC AGE
[Pleistocene]
0.04 -0.33 – 0.46 0.844
GEOLOGIC AGE
[pre-Tertiary]
-0.05 -0.56 – 0.48 0.846
GEOLOGIC AGE [Quaternary] -0.28 -0.70 – 0.20 0.280
lower depth -0.01 -0.01 – -0.01 <0.001
Random Effects
σ2 0.05
τ00 sample_ID 0.04
τ11 sample_ID.lower_depth 0.00
ρ01  
ρ01  
ICC 0.40
N sample_ID 96
Observations 276
Marginal R2 / Conditional R2 0.554 / 0.731

Looking at some partial dependence plots we can see some relationships between the predicted values and WIP and Precip. The pattern in Precip shows the differences between the two study areas. with wetter areas having a higher range of SOC stock

## 
## Attaching package: 'ggpubr'
## The following object is masked from 'package:cowplot':
## 
##     get_legend
## The following object is masked from 'package:terra':
## 
##     rotate

We can now visualize our candidate model 2.1 in a predicted vs. actual plot


Dredging for model selection

With the explicit hypothesized approach, we can only evaluate so many models. However, there may be interactions between variables that might be more powerful controls and could give insight into relationships between controls.

The dredge function can be used to look through multiple combinations of models from a globally defined model. The number of parameters are limited so we run two models: one with the NDYI parameter and another with the MNDWI parameter included with Temp, Precip, and WIP fully interacting. GEOLOGIC_AGE is also included as an additional, non-interaction parameter and random effects are constant.

gmod1 <- lmer(log10(SOC_stock_spline) ~ WIP * Temp * Precip *
    NDYI + GEOLOGIC_AGE + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F, na.action = "na.fail")
dredge1 <- dredge(gmod1, beta = "sd")
(Intercept) GEOLOGIC_AGE lower_depth NDYI Precip Temp WIP NDYI:Precip NDYI:Temp NDYI:WIP Precip:Temp Precip:WIP Temp:WIP NDYI:Precip:Temp NDYI:Precip:WIP NDYI:Temp:WIP Precip:Temp:WIP NDYI:Precip:Temp:WIP df logLik AICc delta weight
24575 0 NA -0.3859947 0.7855336 -0.1358170 0.3672985 0.1104673 -0.3213815 0.4300134 0.1622869 -0.1453279 -0.2295751 0.0928943 -0.5578764 NA 0.1604055 NA NA 17 -90.27224 216.9166 0.000000 0.2995148
48 0
-0.3836525 0.1317391 0.3088780 NA 0.2106672 NA NA NA NA NA NA NA NA NA NA NA 13 -95.41784 218.2250 1.308419 0.1557036
14335 0 NA -0.3851104 0.7752096 -0.0954976 0.3346213 0.0855746 -0.1921094 0.2650052 0.1003702 -0.1163907 -0.1221449 NA -0.5543885 0.1354660 NA NA NA 16 -92.10787 218.3161 1.399547 0.1487683
57343 0 NA -0.3860028 0.7545347 -0.1191113 0.3462410 0.1768655 -0.3177870 0.4410082 0.1212566 -0.1486504 -0.1461895 0.0151525 -0.5340127 NA 0.2205693 -0.11752 NA 18 -89.84743 218.3563 1.439757 0.1458072
1072 0
-0.3836264 0.1396997 0.2998503 NA 0.2037453 NA NA NA NA -0.0801549 NA NA NA NA NA NA 14 -94.51771 218.6446 1.728044 0.1262346
32767 0 NA -0.3865633 0.7828342 -0.1474136 0.3813838 0.1424312 -0.3788402 0.4947691 0.1940555 -0.1510079 -0.2464318 0.0962508 -0.5516318 -0.1055494 0.2452163 NA NA 18 -90.00966 218.6808 1.764225 0.1239715

By using dredge we find that there are a few candidate models that fit the data well.

The difference between the model AICs is lower than a common threshold of 2 but there is a difference between the top model and others. This model contains multiple interactions including NDYI:Precip, NDYI:Temp, NDYI:WIP Precip:Temp, Precip:WIP, Temp:WIP,NDYI:Precip:Temp

We can visualize some of these interactions with interaction plots against SOC stock. It looks like there is strong interaction and separation with the NDYI parameter and WIP

We can look at the bootstrapped confidence intervals to examine the significance of the predictors between the two models. Interestingly the WIP parameter alone is not significant. And there is high effect sizes for NDYI, NDYI:Temp, and (NDYI × Precip) × Temp

  Dredge Model
Predictors Estimates CI p
(Intercept) -0.16 -0.30 – -0.02 0.014
lower depth -0.01 -0.01 – -0.01 <0.001
NDYI 0.44 0.24 – 0.64 <0.001
Precip -0.07 -0.27 – 0.12 0.428
Temp 0.21 0.02 – 0.39 0.028
WIP 0.06 -0.01 – 0.13 0.074
NDYI × Precip -0.20 -0.33 – -0.06 0.002
NDYI × Temp 0.25 0.08 – 0.42 0.006
NDYI × WIP 0.09 0.01 – 0.18 0.038
Precip × Temp -0.11 -0.26 – 0.03 0.104
Precip × WIP -0.14 -0.27 – 0.00 0.064
Temp × WIP 0.06 -0.07 – 0.16 0.326
(NDYI × Precip) × Temp -0.29 -0.47 – -0.11 0.002
(NDYI × Temp) × WIP 0.08 0.01 – 0.15 0.032
Random Effects
σ2 0.05
τ00 sample_ID 0.02
τ11 sample_ID.lower_depth 0.00
ρ01  
ρ01  
ICC 0.32
N sample_ID 96
Observations 276
Marginal R2 / Conditional R2 0.611 / 0.735

The R\(^{2}\) of the Dredge Model is R\(^{2}\) = 0.897

We now can examine the plot with the Dredge Model

Random Forest Machine Learning

Using Random Forest can see if a more flexible, machine learning model can capture any non-linear relationships or interaction relationships between parameters.

lower_depth is included as a predictor here in the data setup

We then use the tuneRF to choose the appropriate mtry number

rf_model <- randomForest((SOC_stock_spline) ~ ., ntree = 1000,
    mtry = 4, importance = TRUE, data = full)
plot(rf_model)

rf.full <- predict(rf_model, newdata = full)
vip::vip(rf_model)

61.63% of the out of bag (OOB) variation is explained. Looks likePrecip, lower_depth, WIP, and NDYI are the big drivers.

error.variable value
MAE 0.1306526
MAE^2 0.0170701
RMSE 0.1768086
RMSE non-log 1.1934027
Stdev of non-log SOC stock 0.4999442
R^2 on full dataset 0.9230161

Looking at partial dependency plots we can see how Precip and NDYI have higher influence on separating variation in model predictions. WIP less so but seems to play a role on the higher end.

Here, we show the fit between sampled and predicted SOC stocks for the full Random Forest Model