Analysis of soil carbon controls across the Pacific Northwest

## 'data.frame':	481 obs. of  25 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sample_ID           : chr  "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" ...
##  $ site                : chr  "HOH" "HOH" "HOH" "HOH" ...
##  $ depth_cm            : int  13 40 76 104 11 30 45 75 9 27 ...
##  $ helper              : chr  "H-C-01-DNR-WET|13" "H-C-01-DNR-WET|40" "H-C-01-DNR-WET|76" "H-C-01-DNR-WET|104" ...
##  $ thickness_cm        : int  13 27 36 28 11 19 15 30 9 18 ...
##  $ rock_perc           : num  0 0 5 40 0 15 40 55 0 0 ...
##  $ BD_g_cm3            : num  0.07 0.16 0.51 0.88 0.38 0.49 0.7 0.57 0.72 0.95 ...
##  $ field_texture       : chr  "P" "MP" "MM" "C" ...
##  $ field_texture_binned: chr  "O" "O" "O" "C" ...
##  $ redox               : chr  NA NA NA NA ...
##  $ carbon_perc         : num  39.56 38.31 12.14 2.37 15.65 ...
##  $ carbon_stock_g_cm2  : num  0.376 1.678 2.105 0.352 0.65 ...
##  $ NDYI                : num  0.543 0.543 0.543 0.543 0.579 ...
##  $ Geomorphon_Class    : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Temp                : num  10.2 10.2 10.2 10.2 10.2 ...
##  $ Precip              : num  3070 3070 3070 3070 3054 ...
##  $ NDVI                : num  0.84 0.84 0.84 0.84 0.862 ...
##  $ MNDWI               : num  -0.477 -0.477 -0.477 -0.477 -0.446 ...
##  $ EVI                 : num  0.455 0.455 0.455 0.455 0.534 ...
##  $ SCI                 : num  0.649 0.649 0.649 0.649 0.648 ...
##  $ GEOLOGIC_AGE        : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...
##  $ WIP                 : num  0.873 0.873 0.873 0.873 0.896 ...
##  $ x                   : num  411350 411350 411350 411350 409931 ...
##  $ y                   : num  5295086 5295086 5295086 5295086 5294415 ...
## 'data.frame':	276 obs. of  25 variables:
##  $ X                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sample_ID        : chr  "CC_S26_R3" "CC_S26_R3" "CC_S26_R3" "CC_S27_R3" ...
##  $ upper_depth      : int  0 30 60 0 30 60 0 30 60 0 ...
##  $ lower_depth      : int  30 60 100 30 60 100 30 60 100 30 ...
##  $ SOC_stock_spline : num  0.1636 0.0983 0.0531 0.6126 0.6936 ...
##  $ site             : chr  "COL" "COL" "COL" "COL" ...
##  $ geomorphons      : chr  "f" "f" "f" "i" ...
##  $ HLI              : num  0.849 0.849 0.849 0.502 0.502 ...
##  $ Precip           : num  629 629 629 664 664 ...
##  $ Temp             : num  6.66 6.66 6.66 6.03 6.03 ...
##  $ WIP              : num  0.209 0.209 0.209 0.127 0.127 ...
##  $ tree_canopy_cover: num  31 31 31 56 56 ...
##  $ NDVI             : num  0.537 0.537 0.537 0.739 0.739 ...
##  $ MNDWI            : num  -0.517 -0.517 -0.517 -0.554 -0.554 ...
##  $ EVI              : num  0.292 0.292 0.292 0.387 0.387 ...
##  $ SCI              : num  0.511 0.511 0.511 0.616 0.616 ...
##  $ SAVI             : num  0.29 0.29 0.29 0.383 0.383 ...
##  $ EMBI             : num  0.188 0.188 0.188 0.162 0.162 ...
##  $ DSI              : num  0.92 0.92 0.92 0.625 0.625 ...
##  $ DSWI1            : num  1.16 1.16 1.16 1.61 1.61 ...
##  $ NDYI             : num  0.226 0.226 0.226 0.36 0.36 ...
##  $ LSWI             : num  0.0556 0.0556 0.0556 0.2314 0.2314 ...
##  $ ANDWI            : num  -0.539 -0.539 -0.539 -0.643 -0.643 ...
##  $ LITHOLOGY        : chr  "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" ...
##  $ GEOLOGIC_AGE     : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...

Examining covariate predictors using correlation plots

plot of chunk unnamed-chunk-96

plot of chunk unnamed-chunk-96

There is weak correlation between SOC_stock_spline and the selected covariates. Additionally there is some collinearity between predictors. The spectral predictors such as EVI (Enhanced Vegetation Index) and SCI (Soil Condition Index) are well correlated with each other and also with NDVI (Normalized Difference Vegetation Index). We can remove these based on the correlation coefficient > 0.7.

Now we can begin examining the distribution of carbon stock values in the dataset to choose the appropriate transformation. We also scale and center the numeric predictor variables.

## 'data.frame':	276 obs. of  25 variables:
##  $ X                : num [1:276, 1] -1.72 -1.71 -1.7 -1.69 -1.67 ...
##   ..- attr(*, "scaled:center")= num 138
##   ..- attr(*, "scaled:scale")= num 79.8
##  $ sample_ID        : chr  "CC_S26_R3" "CC_S26_R3" "CC_S26_R3" "CC_S27_R3" ...
##  $ upper_depth      : num [1:276, 1] -1.1849 0.0492 1.2833 -1.1849 0.0492 ...
##   ..- attr(*, "scaled:center")= num 28.8
##   ..- attr(*, "scaled:scale")= num 24.3
##  $ lower_depth      : int  30 60 100 30 60 100 30 60 100 30 ...
##  $ SOC_stock_spline : num  0.1636 0.0983 0.0531 0.6126 0.6936 ...
##  $ site             : chr  "COL" "COL" "COL" "COL" ...
##  $ geomorphons      : chr  "f" "f" "f" "i" ...
##  $ HLI              : num [1:276, 1] 1.264 1.264 1.264 -0.907 -0.907 ...
##   ..- attr(*, "scaled:center")= num 0.647
##   ..- attr(*, "scaled:scale")= num 0.16
##  $ Precip           : num [1:276, 1] -1.21 -1.21 -1.21 -1.18 -1.18 ...
##   ..- attr(*, "scaled:center")= num 1870
##   ..- attr(*, "scaled:scale")= num 1022
##  $ Temp             : num [1:276, 1] -1.09 -1.09 -1.09 -1.48 -1.48 ...
##   ..- attr(*, "scaled:center")= num 8.42
##   ..- attr(*, "scaled:scale")= num 1.62
##  $ WIP              : num [1:276, 1] -0.801 -0.801 -0.801 -1.08 -1.08 ...
##   ..- attr(*, "scaled:center")= num 0.443
##   ..- attr(*, "scaled:scale")= num 0.292
##  $ tree_canopy_cover: num [1:276, 1] -2.25 -2.25 -2.25 -0.264 -0.264 ...
##   ..- attr(*, "scaled:center")= num 59.3
##   ..- attr(*, "scaled:scale")= num 12.6
##  $ NDVI             : num [1:276, 1] -3.5 -3.5 -3.5 -1.1 -1.1 ...
##   ..- attr(*, "scaled:center")= num 0.832
##   ..- attr(*, "scaled:scale")= num 0.0843
##  $ MNDWI            : num [1:276, 1] -0.324 -0.324 -0.324 -1.105 -1.105 ...
##   ..- attr(*, "scaled:center")= num -0.502
##   ..- attr(*, "scaled:scale")= num 0.0475
##  $ EVI              : num [1:276, 1] -1.932 -1.932 -1.932 -0.953 -0.953 ...
##   ..- attr(*, "scaled:center")= num 0.48
##   ..- attr(*, "scaled:scale")= num 0.0972
##  $ SCI              : num [1:276, 1] -2.81 -2.81 -2.81 -0.8 -0.8 ...
##   ..- attr(*, "scaled:center")= num 0.658
##   ..- attr(*, "scaled:scale")= num 0.0526
##  $ SAVI             : num [1:276, 1] -2.17 -2.17 -2.17 -1.02 -1.02 ...
##   ..- attr(*, "scaled:center")= num 0.466
##   ..- attr(*, "scaled:scale")= num 0.0811
##  $ EMBI             : num [1:276, 1] 1.52 1.52 1.52 1.3 1.3 ...
##   ..- attr(*, "scaled:center")= num 0.00762
##   ..- attr(*, "scaled:scale")= num 0.119
##  $ DSI              : num [1:276, 1] 3.03 3.03 3.03 1.2 1.2 ...
##   ..- attr(*, "scaled:center")= num 0.432
##   ..- attr(*, "scaled:scale")= num 0.161
##  $ DSWI1            : num [1:276, 1] -1.99 -1.99 -1.99 -1.36 -1.36 ...
##   ..- attr(*, "scaled:center")= num 2.58
##   ..- attr(*, "scaled:scale")= num 0.714
##  $ NDYI             : num [1:276, 1] -2.215 -2.215 -2.215 -0.683 -0.683 ...
##   ..- attr(*, "scaled:center")= num 0.419
##   ..- attr(*, "scaled:scale")= num 0.0874
##  $ LSWI             : num [1:276, 1] -2.61 -2.61 -2.61 -1.33 -1.33 ...
##   ..- attr(*, "scaled:center")= num 0.413
##   ..- attr(*, "scaled:scale")= num 0.137
##  $ ANDWI            : num [1:276, 1] 2.92 2.92 2.92 1.08 1.08 ...
##   ..- attr(*, "scaled:center")= num -0.703
##   ..- attr(*, "scaled:scale")= num 0.0564
##  $ LITHOLOGY        : chr  "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" ...
##  $ GEOLOGIC_AGE     : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...
plot of chunk unnamed-chunk-97plot of chunk unnamed-chunk-97

plot of chunk unnamed-chunk-97

Explicit parameter model building

Now build models using log transformed carbon stock data. We need to specify that sample_ID is a random effect because of the multiple samples at one location. depth can be a random slope to adjust model based on how it is affected by depth. We could use a Generalized Linear Model or a Generalized Linear Mixed Model here too but they often fail to converge. I am starting with the log-transformed linear mixed model first to test the hypotheses

# Full model with hypothesized parameters
mod1 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp *
    ANDWI * NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# No interactions
mod2 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# No spectral
mod3 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No geology
mod4 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp +
    ANDWI + NDYI + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No WIP w/ interaction
mod5 <- lmer(log10(SOC_stock_spline) ~ Precip * Temp + ANDWI +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
## : Model failed to converge with max|grad| = 1.5562 (tol = 0.002, component
## 1)
# No WIP w/o interaction
mod6 <- lmer(log10(SOC_stock_spline) ~ Precip + Temp + ANDWI +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No climate
mod7 <- lmer(log10(SOC_stock_spline) ~ WIP + +ANDWI + NDYI +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No climate No spectral
mod8 <- lmer(log10(SOC_stock_spline) ~ WIP + (GEOLOGIC_AGE) +
    lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
## : Model failed to converge with max|grad| = 0.0035401 (tol = 0.002,
## component 1)
# Just WIP
mod9 <- lmer(log10(SOC_stock_spline) ~ WIP + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# NULL
mod10 <- lmer(log10(SOC_stock_spline) ~ lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
## : Model failed to converge with max|grad| = 0.431569 (tol = 0.002, component
## 1)

Pairwise comparisons between the top, global model and the rest. Note the table contains mod1 multiple times and has been encoded with an additional number

Models npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) Significant
mod2 15 220.4259 274.7319 -95.21296 190.4259 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 36.52080 26 0.0824988
mod3 17 223.7913 285.3381 -94.89564 189.7913 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 35.88615 24 0.0562988
mod4 14 227.0389 277.7245 -99.51945 199.0389 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 45.13377 27 0.0157401 *
mod5 15 229.4435 283.7495 -99.72175 199.4435 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 45.53837 26 0.0102674 *
mod6 14 229.4083 280.0939 -100.70415 201.4083 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 47.50318 27 0.0087196 **
mod7 13 230.4420 277.5072 -102.22099 204.4420 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 50.53685 28 0.0056307 **
mod8 11 247.7352 287.5596 -112.86760 225.7352 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 71.83007 30 0.0000275 ***
mod9 6 245.6252 267.3476 -116.81262 233.6252 NA NA NA
mod1 41 235.9051 384.3416 -76.95256 153.9051 79.72011 35 0.0000243 ***
mod10 5 247.8359 265.9379 -118.91797 237.8359 NA NA NA
mod18 41 235.9051 384.3416 -76.95256 153.9051 83.93081 36 0.0000107 ***

From the ANOVAs it looks like model 2 & model 3 performed just as well as model 1meaning we should proceed with these. model 2 removed interactions and model 3 removed all the spectral metrics but kept interaction. However, model 2 has 2 less parameters than model 3 which gives it a lower AIC when compared together.

npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
mod2 15 220.4259 274.7319 -95.21296 190.4259 NA NA NA
mod3 17 223.7913 285.3381 -94.89564 189.7913 0.6346545 2 0.7280925
# No ANDWI
mod2.1 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No ANDWI, No NDYI
mod2.2 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
## : Model failed to converge with max|grad| = 0.0070341 (tol = 0.002,
## component 1)
# No NDYI
mod2.3 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
## : Model failed to converge with max|grad| = 0.00238014 (tol = 0.002,
## component 1)
# No Geology
mod2.4 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + NDYI + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
## : Model failed to converge with max|grad| = 0.171981 (tol = 0.002, component
## 1)
# No NDYI or Geology
mod2.5 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
## : Model failed to converge with max|grad| = 0.0427664 (tol = 0.002,
## component 1)
# No ANDWI or Geology
mod2.6 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    NDYI + lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
# No spectral or Geology
mod2.7 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
Models npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) Significant
mod2.1 14 218.8356 269.5212 -95.41779 190.8356 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 0.409648 1 0.5221482
mod2.2 13 219.4581 266.5234 -96.72907 193.4581 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 3.032215 2 0.2195649
mod2.3 14 221.3225 272.0082 -96.66127 193.3225 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 2.896615 1 0.0887658
mod2.4 10 221.4128 257.6168 -100.70639 201.4128 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 10.986856 5 0.0516412
mod2.5 9 227.2392 259.8228 -104.61960 209.2392 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 18.813263 6 0.0044909 **
mod2.6 9 220.0621 252.6457 -101.03105 202.0621 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 11.636166 6 0.0705954
mod2.7 8 226.0611 255.0243 -105.03055 210.0611 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 19.635164 7 0.0064141 **

Model 2.2 and 2.3 are the lowest AIC and not significantly different from the fit in Model 2. The choice is either between MNDWI or NDYI to be included in the model with WIP, Precip, Temp, and Geology. To test , I want to see if WIP interacts with these variables.

# Interaction
mod2.1.1 <- lmer(log10(SOC_stock_spline) ~ Precip + Temp + WIP:NDYI +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,
## : Model failed to converge with max|grad| = 0.0620781 (tol = 0.002,
## component 1)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) Significant
mod2.1.1 13 232.6320 279.6972 -103.31599 206.6320 NA NA NA
mod2 15 220.4259 274.7319 -95.21296 190.4259 16.20606 2 0.0003026 ***
mod2.1.11 13 232.6320 279.6972 -103.31599 206.6320 NA NA NA
mod2.1 14 218.8356 269.5212 -95.41779 190.8356 15.79641 1 0.0000705 ***

Doesn’t look like there are any significant interactions that improve model fit

Now we can look at the table of all models and compare AICs

AIC formula name delta
235.91 WIP * Precip * Temp * ANDWI * NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod1 17.07
220.43 WIP + Precip + Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2 1.59
223.79 WIP * Precip * Temp + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod3 4.95
227.04 WIP * Precip * Temp + ANDWI + NDYI + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod4 8.20
229.44 Precip * Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod5 10.60
229.41 Precip + Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod6 10.57
230.44 WIP + +ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod7 11.60
247.74 WIP + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod8 28.90
245.63 WIP + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod9 26.79
218.84 WIP + Precip + Temp + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.1 0.00
219.46 WIP + Precip + Temp + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.2 0.62
221.32 WIP + Precip + Temp + ANDWI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.3 2.48
221.41 WIP + Precip + Temp + ANDWI + NDYI + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.4 2.57
227.24 WIP + Precip + Temp + ANDWI + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.5 8.40
220.06 WIP + Precip + Temp + NDYI + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.6 1.22
226.06 WIP + Precip + Temp + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.7 7.22
232.63 Precip + Temp + WIP:NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 | sample_ID) + (0 + lower_depth | sample_ID)) mod2.1.1 13.79

Looks like the best model fit to the data according to AIC is the mod2.2 or mod2.3 which do not include any interactions. The model $R^2$ for mod2.2 is 0.892 compared to the $R^2$ for mod2.2 is 0.894

Confidence intervals calculated with bootstrapping between these two models show that NDYI is a significant predictor compared to MNDWI. Therefore, we use mod2.2 as our candidate model

</table> --- We can now visualize our candidate model 2.2.
plot of chunk unnamed-chunk-108

plot of chunk unnamed-chunk-108

### Dredging for model selection The `dredge` function can be used to look through multiple combinations of models from a globally defined model. The number of parameters are limited so we run two models: one with the `NDYI` parameter and another with the `MNDWI` parameter included with `Temp`, `Precip`, and `WIP` fully interacting. `GEOLOGIC_AGE` is also included as an additional, non-interaction parameter and random effects are constant. ```r gmod1 <- lmer(log10(SOC_stock_spline) ~ WIP * Temp * Precip * NDYI + GEOLOGIC_AGE + lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale, REML = F, na.action = "na.fail") gmod2 <- lmer(log10(SOC_stock_spline) ~ WIP * Temp * Precip * ANDWI + GEOLOGIC_AGE + lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale, REML = F, na.action = "na.fail") dredge1 <- dredge(gmod1, beta = "sd") dredge2 <- dredge(gmod2, beta = "sd") ```
 Model 2.1Model 2.2
PredictorsEstimatesCIpEstimatesCIp
(Intercept)-0.24-0.63 – 0.110.192-0.21-0.61 – 0.160.266
WIP0.120.05 – 0.19<0.0010.130.06 – 0.21<0.001
Precip0.180.03 – 0.320.0200.240.12 – 0.35<0.001
Temp-0.00-0.12 – 0.120.992-0.02-0.14 – 0.100.716
NDYI0.07-0.02 – 0.160.116
GEOLOGIC AGE
[Miocene-Eocene]
-0.44-0.90 – 0.090.108-0.50-0.96 – 0.050.078
GEOLOGIC AGE
[Oligocene-Eocene]
0.09-0.29 – 0.520.6540.09-0.30 – 0.520.676
GEOLOGIC AGE
[Pleistocene]
0.04-0.33 – 0.460.8440.02-0.35 – 0.450.934
GEOLOGIC AGE
[pre-Tertiary]
-0.05-0.56 – 0.480.846-0.09-0.63 – 0.440.726
GEOLOGIC AGE [Quaternary]-0.28-0.70 – 0.200.280-0.37-0.80 – 0.080.130
lower depth-0.01-0.01 – -0.01<0.001-0.01-0.01 – -0.01<0.001
Random Effects
σ20.050.05
τ000.04 sample_ID0.04 sample_ID
τ110.00 sample_ID.lower_depth0.00 sample_ID.lower_depth
ρ01  
ρ01  
ICC0.400.41
N96 sample_ID96 sample_ID
Observations276276
Marginal R2 / Conditional R20.554 / 0.7310.541 / 0.731
(Intercept) GEOLOGIC_AGE lower_depth NDYI Precip Temp WIP NDYI:Precip NDYI:Temp NDYI:WIP Precip:Temp Precip:WIP Temp:WIP NDYI:Precip:Temp NDYI:Precip:WIP NDYI:Temp:WIP Precip:Temp:WIP NDYI:Precip:Temp:WIP df logLik AICc delta weight
24575 0 NA -0.3859947 0.7855336 -0.1358170 0.3672985 0.1104673 -0.3213815 0.4300134 0.1622869 -0.1453279 -0.2295751 0.0928943 -0.5578764 NA 0.1604055 NA NA 17 -90.27224 216.9166 0.000000 0.2995148
48 0 + -0.3836525 0.1317391 0.3088780 NA 0.2106672 NA NA NA NA NA NA NA NA NA NA NA 13 -95.41784 218.2250 1.308419 0.1557036
14335 0 NA -0.3851104 0.7752096 -0.0954976 0.3346213 0.0855746 -0.1921094 0.2650052 0.1003702 -0.1163907 -0.1221449 NA -0.5543885 0.1354660 NA NA NA 16 -92.10787 218.3161 1.399547 0.1487683
57343 0 NA -0.3860028 0.7545347 -0.1191113 0.3462410 0.1768655 -0.3177870 0.4410082 0.1212566 -0.1486504 -0.1461895 0.0151525 -0.5340127 NA 0.2205693 -0.11752 NA 18 -89.84743 218.3563 1.439757 0.1458072
1072 0 + -0.3836264 0.1396997 0.2998503 NA 0.2037453 NA NA NA NA -0.0801549 NA NA NA NA NA NA 14 -94.51771 218.6446 1.728044 0.1262346
32767 0 NA -0.3865633 0.7828342 -0.1474136 0.3813838 0.1424312 -0.3788402 0.4947691 0.1940555 -0.1510079 -0.2464318 0.0962508 -0.5516318 -0.1055494 0.2452163 NA NA 18 -90.00966 218.6808 1.764225 0.1239715
(Intercept) ANDWI GEOLOGIC_AGE lower_depth Precip Temp WIP ANDWI:Precip ANDWI:Temp ANDWI:WIP Precip:Temp Precip:WIP Temp:WIP ANDWI:Precip:Temp ANDWI:Precip:WIP ANDWI:Temp:WIP Precip:Temp:WIP ANDWI:Precip:Temp:WIP df logLik AICc delta weight
47 0 NA + -0.3835239 0.3925743 NA 0.2366648 NA NA NA NA NA NA NA NA NA NA NA 12 -96.79139 218.7691 0.0000000 0.2865913
112 0 -0.0257032 + -0.3836962 0.3443734 NA 0.2064126 0.1209479 NA NA NA NA NA NA NA NA NA NA 14 -94.93212 219.4734 0.7043450 0.2015192
1071 0 NA + -0.3834712 0.3890182 NA 0.2318531 NA NA NA NA -0.072221 NA NA NA NA NA NA 13 -96.07931 219.5479 0.7788413 0.1941510
128 0 -0.0407113 + -0.3837535 0.4241922 -0.1170883 0.1912894 0.1395886 NA NA NA NA NA NA NA NA NA NA 15 -94.42212 220.6904 1.9213171 0.1096615
110 0 -0.0537485 NA -0.3855814 0.2607821 NA 0.1815864 0.1699411 NA NA NA NA NA NA NA NA NA NA 9 -101.03425 220.7452 1.9761163 0.1066976
63 0 NA + -0.3835387 0.4240789 -0.0394235 0.2339897 NA NA NA NA NA NA NA NA NA NA NA 13 -96.72907 220.8475 2.0783743 0.1013794
By using `dredge` we find that there are a few candidate models that fit the data well. The difference between the model AICs is negligible. The first includes no interactions between the parameters and removes `Temp` as a predictor. The second includes an interaction between `MNDWI` and `Precip` while also excluding `Temp`. We can look at the bootstrapped confidence intervals to examine the significance of the predictors between the two models.</table> Dredge Model 1 includes `NDYI`, `Precip`, `WIP`, and `GEOLOGIC_AGE` as significant predictors. Dredge Model 2 does not have significant confidence intervals for the `MNDWI` and `MNDWIxPrecip` parameters. There is also minimal differences between the R$^{2}$ between the two models where Dredge Model 1 has R$^{2}$ = 0.897 and Dredge Model 2 has R$^{2}$ = 0.894 We now can examine the plot with the Dredge Model 1
plot of chunk unnamed-chunk-113

plot of chunk unnamed-chunk-113

### Random Forest Machine Learning I also want to try Random Forest in order to see if a more flexible, machine learning model can capture any non-linear relationships with SOC and other variables. `lower_depth` is included as a predictor here in the data setup We then use the `tuneRF` to choose the appropriate `mtry` number
plot of chunk unnamed-chunk-115

plot of chunk unnamed-chunk-115

```r rf_model <- randomForest((SOC_stock_spline) ~ ., ntree = 1000, mtry = 4, importance = TRUE, data = full) plot(rf_model) rf.full <- predict(rf_model, newdata = full) vip::vip(rf_model) ```
plot of chunk unnamed-chunk-116plot of chunk unnamed-chunk-116

plot of chunk unnamed-chunk-116

~50% of the out of bag (OOB) variation is explained. Looks like after `lower_depth` `Precip`, `Temp`, `WIP`, and `NDYI` are the big drivers. The RF model does not appear to have the greatest fit...
 Dredge Model 1Dredge Model 2
PredictorsEstimatesCIpEstimatesCIp
(Intercept)-0.16-0.30 – -0.020.014-0.19-0.56 – 0.180.266
lower depth-0.01-0.01 – -0.01<0.001-0.01-0.01 – -0.01<0.001
NDYI0.440.24 – 0.64<0.001
Precip-0.07-0.27 – 0.120.4280.220.16 – 0.29<0.001
Temp0.210.02 – 0.390.028
WIP0.06-0.01 – 0.130.0740.130.06 – 0.20<0.001
NDYI × Precip-0.20-0.33 – -0.060.002
NDYI × Temp0.250.08 – 0.420.006
NDYI × WIP0.090.01 – 0.180.038
Precip × Temp-0.11-0.26 – 0.030.104
Precip × WIP-0.14-0.27 – 0.000.064
Temp × WIP0.06-0.07 – 0.160.326
(NDYI × Precip) × Temp-0.29-0.47 – -0.110.002
(NDYI × Temp) × WIP0.080.01 – 0.150.032
GEOLOGIC AGE
[Miocene-Eocene]
-0.50-0.98 – 0.030.072
GEOLOGIC AGE
[Oligocene-Eocene]
0.06-0.33 – 0.460.732
GEOLOGIC AGE
[Pleistocene]
-0.01-0.37 – 0.380.962
GEOLOGIC AGE
[pre-Tertiary]
-0.11-0.63 – 0.410.676
GEOLOGIC AGE [Quaternary]-0.39-0.80 – 0.030.060
Random Effects
σ20.050.05
τ000.02 sample_ID0.04 sample_ID
τ110.00 sample_ID.lower_depth0.00 sample_ID.lower_depth
ρ01  
ρ01  
ICC0.320.42
N96 sample_ID96 sample_ID
Observations276276
Marginal R2 / Conditional R20.611 / 0.7350.540 / 0.731
error.variable value
MAE 0.1306526
MAE^2 0.0170701
RMSE 0.1768086
RMSE non-log 1.1934027
Stdev of non-log SOC stock 0.4999442
R^2 on full dataset 0.9230161
Look at partial dependency plots
plot of chunk unnamed-chunk-118

plot of chunk unnamed-chunk-118

``` ## [1] "0.03 mean square error rf model from training" ## [1] "0.902 R^2 rf model from training" ## [1] "0.835 R^2 rf model from testing " ``` Here, we show the fit between sampled and predicted SOC stocks for the full Random Forest Model
plot of chunk unnamed-chunk-120

plot of chunk unnamed-chunk-120