Introduction

Mapping soil organic carbon (SOC) as a function climatic, topographic, and geologic factors across spatial extents is key for identifying vulnerable SOC stocks. Yet, large, vulnerable SOC stocks are contained in the organic soils of wetlands and can occur in close proximity to non-wetland soils but are controlled and function differently due to saturated soil conditions. Thus, wetland SOC is often omitted or underrepresented in SOC mapping and modeling making it difficult to assess controls compared to non-wetland soil types. Here, we use geospatial data and SOC stocks from 481 soil horizons in 96 pedons in both wetlands and non-wetlands across three ecoregion study areas of the Pacific Northwest to assess climatic, geologic, and topographic controls on SOC.

Hypotheses

  1. WIP is a significant predictor of SOC stock

  2. WIP interaction with climate (Precip & Temp) is a significant predictor of SOC stock

  3. Geology is a significant predictor of SOC stock

Data

This dataset has observations of SOC% and SOC stock for 96 locations in Washington state. There are geospatial covariates as well that are used to build models to predict SOC% and/or SOC stock

## 'data.frame':    481 obs. of  25 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sample_ID           : chr  "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" ...
##  $ site                : chr  "HOH" "HOH" "HOH" "HOH" ...
##  $ depth_cm            : int  13 40 76 104 11 30 45 75 9 27 ...
##  $ helper              : chr  "H-C-01-DNR-WET|13" "H-C-01-DNR-WET|40" "H-C-01-DNR-WET|76" "H-C-01-DNR-WET|104" ...
##  $ thickness_cm        : int  13 27 36 28 11 19 15 30 9 18 ...
##  $ rock_perc           : num  0 0 5 40 0 15 40 55 0 0 ...
##  $ BD_g_cm3            : num  0.07 0.16 0.51 0.88 0.38 0.49 0.7 0.57 0.72 0.95 ...
##  $ field_texture       : chr  "P" "MP" "MM" "C" ...
##  $ field_texture_binned: chr  "O" "O" "O" "C" ...
##  $ redox               : chr  NA NA NA NA ...
##  $ carbon_perc         : num  39.56 38.31 12.14 2.37 15.65 ...
##  $ carbon_stock_g_cm2  : num  0.376 1.678 2.105 0.352 0.65 ...
##  $ NDYI                : num  0.543 0.543 0.543 0.543 0.579 ...
##  $ Geomorphon_Class    : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Temp                : num  10.2 10.2 10.2 10.2 10.2 ...
##  $ Precip              : num  3070 3070 3070 3070 3054 ...
##  $ NDVI                : num  0.84 0.84 0.84 0.84 0.862 ...
##  $ MNDWI               : num  -0.477 -0.477 -0.477 -0.477 -0.446 ...
##  $ EVI                 : num  0.455 0.455 0.455 0.455 0.534 ...
##  $ SCI                 : num  0.649 0.649 0.649 0.649 0.648 ...
##  $ GEOLOGIC_AGE        : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...
##  $ WIP                 : num  0.873 0.873 0.873 0.873 0.896 ...
##  $ x                   : num  411350 411350 411350 411350 409931 ...
##  $ y                   : num  5295086 5295086 5295086 5295086 5294415 ...

Examining covariate predictors using correlation plots

There is weak correlation between carbon_stock_g_cm2 and the selected covariates. Additionally there is some collinearity between predictors. The spectral predictors such as EVI (Enhanced Vegetation Index) and SCI (Soil Condition Index) are well correlated with each other and also with NDVI (Normalized Difference Vegetation Index). We can remove these based on the correlation coefficient > 0.7.

Now we can begin examining the distribution of carbon stock values in the dataset to choose the appropriate transformation. We also scale and center the numeric predictor variables.

## 'data.frame':    481 obs. of  25 variables:
##  $ X                   : num [1:481, 1] -1.73 -1.72 -1.71 -1.71 -1.7 ...
##   ..- attr(*, "scaled:center")= num 241
##   ..- attr(*, "scaled:scale")= num 139
##  $ sample_ID           : chr  "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" ...
##  $ site                : chr  "HOH" "HOH" "HOH" "HOH" ...
##  $ depth_cm            : num [1:481, 1] -0.983 -0.306 0.596 1.297 -1.033 ...
##   ..- attr(*, "scaled:center")= num 52.2
##   ..- attr(*, "scaled:scale")= num 39.9
##  $ helper              : chr  "H-C-01-DNR-WET|13" "H-C-01-DNR-WET|40" "H-C-01-DNR-WET|76" "H-C-01-DNR-WET|104" ...
##  $ thickness_cm        : num [1:481, 1] -0.501 0.443 1.05 0.51 -0.636 ...
##   ..- attr(*, "scaled:center")= num 20.4
##   ..- attr(*, "scaled:scale")= num 14.8
##  $ rock_perc           : num [1:481, 1] -0.693 -0.693 -0.477 1.031 -0.693 ...
##   ..- attr(*, "scaled:center")= num 16.1
##   ..- attr(*, "scaled:scale")= num 23.2
##  $ BD_g_cm3            : num [1:481, 1] -1.805 -1.516 -0.391 0.797 -0.809 ...
##   ..- attr(*, "scaled:center")= num 0.632
##   ..- attr(*, "scaled:scale")= num 0.311
##  $ field_texture       : chr  "P" "MP" "MM" "C" ...
##  $ field_texture_binned: chr  "O" "O" "O" "C" ...
##  $ redox               : chr  NA NA NA NA ...
##  $ carbon_perc         : num  39.56 38.31 12.14 2.37 15.65 ...
##  $ carbon_stock_g_cm2  : num  0.376 1.678 2.105 0.352 0.65 ...
##  $ NDYI                : num [1:481, 1] 1.51 1.51 1.51 1.51 1.93 ...
##   ..- attr(*, "scaled:center")= num 0.414
##   ..- attr(*, "scaled:scale")= num 0.0858
##  $ Geomorphon_Class    : num [1:481, 1] 0.777 0.777 0.777 0.777 0.777 ...
##   ..- attr(*, "scaled:center")= num 8.02
##   ..- attr(*, "scaled:scale")= num 2.55
##  $ Temp                : num [1:481, 1] 1.14 1.14 1.14 1.14 1.16 ...
##   ..- attr(*, "scaled:center")= num 8.33
##   ..- attr(*, "scaled:scale")= num 1.61
##  $ Precip              : num [1:481, 1] 1.23 1.23 1.23 1.23 1.21 ...
##   ..- attr(*, "scaled:center")= num 1804
##   ..- attr(*, "scaled:scale")= num 1031
##  $ NDVI                : num [1:481, 1] 0.138 0.138 0.138 0.138 0.407 ...
##   ..- attr(*, "scaled:center")= num 0.829
##   ..- attr(*, "scaled:scale")= num 0.0823
##  $ MNDWI               : num [1:481, 1] 0.475 0.475 0.475 0.475 1.154 ...
##   ..- attr(*, "scaled:center")= num -0.499
##   ..- attr(*, "scaled:scale")= num 0.0467
##  $ EVI                 : num [1:481, 1] -0.171 -0.171 -0.171 -0.171 0.645 ...
##   ..- attr(*, "scaled:center")= num 0.471
##   ..- attr(*, "scaled:scale")= num 0.0965
##  $ SCI                 : num [1:481, 1] -0.0964 -0.0964 -0.0964 -0.0964 -0.1109 ...
##   ..- attr(*, "scaled:center")= num 0.654
##   ..- attr(*, "scaled:scale")= num 0.0511
##  $ GEOLOGIC_AGE        : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...
##  $ WIP                 : num [1:481, 1] 1.52 1.52 1.52 1.52 1.61 ...
##   ..- attr(*, "scaled:center")= num 0.436
##   ..- attr(*, "scaled:scale")= num 0.286
##  $ x                   : num [1:481, 1] -1.07 -1.07 -1.07 -1.07 -1.08 ...
##   ..- attr(*, "scaled:center")= num 610372
##   ..- attr(*, "scaled:scale")= num 185155
##  $ y                   : num [1:481, 1] -0.0158 -0.0158 -0.0158 -0.0158 -0.0238 ...
##   ..- attr(*, "scaled:center")= num 5296406
##   ..- attr(*, "scaled:scale")= num 83700

Explicit parameter model building

Now build models using log transformed carbon stock data. We need to specify that sample_ID is a random effect because of the multiple samples at one location. depth can be a random slope to adjust model based on how it is affected by depth. We could use a Generalized Linear Model or a Generalized Linear Mixed Model here too but they often fail to converge. I am starting with the log-transformed linear mixed model first to test the hypotheses

# Full model with hypothesized parameters
mod1 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Precip * Temp +
    MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) |
    sample_ID), data = wa_dat_scale, REML = F)
# No interactions
mod2 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) |
    sample_ID), data = wa_dat_scale, REML = F)
# No spectral
mod3 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Precip * Temp +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No geology
mod4 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Precip * Temp +
    MNDWI + NDVI + NDYI + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No WIP w/ interaction
mod5 <- lmer(log(carbon_stock_g_cm2) ~ Precip * Temp + MNDWI +
    NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No WIP w/o interaction
mod6 <- lmer(log(carbon_stock_g_cm2) ~ Precip + Temp + MNDWI +
    NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No climate
mod7 <- lmer(log(carbon_stock_g_cm2) ~ WIP + MNDWI + NDVI + NDYI +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No climate No spectral
mod8 <- lmer(log(carbon_stock_g_cm2) ~ WIP + (GEOLOGIC_AGE) +
    (1 + (depth_cm) | sample_ID), data = wa_dat_scale, REML = F)
# Just WIP
mod9 <- lmer(log(carbon_stock_g_cm2) ~ WIP + (1 + (depth_cm) |
    sample_ID), data = wa_dat_scale, REML = F)

Pairwise comparisons between the top, global model and the rest. Note the table contains mod1 multiple times and has been encoded with an additional number

Models npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) Significant
mod2 16 1394.726 1461.540 -681.3628 1362.726 NA NA NA
mod1 20 1398.184 1481.701 -679.0918 1358.184 4.541993 4 0.3375972
mod3 17 1397.114 1468.104 -681.5572 1363.114 NA NA NA
mod1 20 1398.184 1481.701 -679.0918 1358.184 4.930755 3 0.1769384
mod4 15 1402.769 1465.407 -686.3847 1372.769 NA NA NA
mod1 20 1398.184 1481.701 -679.0918 1358.184 14.585647 5 0.0122876
mod5 16 1406.090 1472.904 -687.0450 1374.090 NA NA NA
mod1 20 1398.184 1481.701 -679.0918 1358.184 15.906251 4 0.0031476 **
mod6 15 1405.705 1468.343 -687.8523 1375.705 NA NA NA
mod1 20 1398.184 1481.701 -679.0918 1358.184 17.521028 5 0.0036107 **
mod7 14 1402.863 1461.325 -687.4314 1374.863 NA NA NA
mod1 20 1398.184 1481.701 -679.0918 1358.184 16.679065 6 0.0105380
mod8 11 1424.304 1470.238 -701.1519 1402.304 NA NA NA
mod1 20 1398.184 1481.701 -679.0918 1358.184 44.120165 9 0.0000013 ***
mod9 6 1422.216 1447.271 -705.1081 1410.216 NA NA NA
mod1 20 1398.184 1481.701 -679.0918 1358.184 52.032483 14 0.0000028 ***

From the ANOVAs it looks like model 2 & model 3 performed just as well as model 1meaning we should proceed with these. model 2 removed interactions and model 3 removed all the spectral metrics but kept interaction. However, model 2 has 1 less parameter than model 3 which gives it a lower AIC when compared together.

npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
mod2 16 1394.726 1461.540 -681.3628 1362.726 NA NA NA
mod3 17 1397.114 1468.104 -681.5572 1363.114 0 1 1
# No MNDWI
mod2.1 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No MNDWI, No NDVI
mod2.2 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No NDYI, No NDVI
mod2.3 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No NDYI, No MNDWI
mod2.4 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDVI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No NDVI
mod2.5 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDYI + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No NDYI
mod2.6 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDVI + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No spectral
mod2.7 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
Models npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) Significant
mod2.1 15 1393.356 1455.994 -681.6780 1363.356 NA NA NA
mod2 16 1394.726 1461.540 -681.3628 1362.726 0.6303396 1 0.4272308
mod2.2 14 1391.475 1449.937 -681.7375 1363.475 NA NA NA
mod2 16 1394.726 1461.540 -681.3628 1362.726 0.7492708 2 0.6875399
mod2.3 14 1391.954 1450.417 -681.9772 1363.954 NA NA NA
mod2 16 1394.726 1461.540 -681.3628 1362.726 1.2287240 2 0.5409859
mod2.4 14 1394.233 1452.695 -683.1163 1366.233 NA NA NA
mod2 16 1394.726 1461.540 -681.3628 1362.726 3.5070131 2 0.1731657
mod2.5 15 1392.893 1455.531 -681.4467 1362.893 NA NA NA
mod2 16 1394.726 1461.540 -681.3628 1362.726 0.1677554 1 0.6821144
mod2.6 15 1393.914 1456.552 -681.9568 1363.914 NA NA NA
mod2 16 1394.726 1461.540 -681.3628 1362.726 1.1879664 1 0.2757401
mod2.7 13 1393.302 1447.588 -683.6508 1367.302 NA NA NA
mod2 16 1394.726 1461.540 -681.3628 1362.726 4.5758474 3 0.2056238

Model 2.2 and 2.3 are the lowest AIC and not significantly different from the fit in Model 2. The choice is either between MNDWI or NDYI to be included in the model with WIP, Precip, Temp, and Geology. To test , I want to see if WIP interacts with these variables.

# No MNDWI, No NDVI
mod2.2.1 <- lmer(log(carbon_stock_g_cm2) ~ Precip + Temp + WIP:NDYI +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No NDYI, No NDVI
mod2.3.1 <- lmer(log(carbon_stock_g_cm2) ~ Precip + Temp + WIP:MNDWI +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) Significant
mod2.2.1 13 1409.712 1463.998 -691.8560 1383.712 NA NA NA
mod2.2 14 1391.475 1449.937 -681.7375 1363.475 20.23698 1 6.8e-06 ***
mod2.3.1 13 1409.469 1463.755 -691.7343 1383.469 NA NA NA
mod2.3 14 1391.954 1450.417 -681.9772 1363.954 19.51414 1 1.0e-05 ***

Doesn’t look like there are any significant interactions that improve model fit

Now we can look at the table of all models and compare AICs

AIC formula name delta
1398.18 WIP * Precip * Temp + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod1 6.71
1394.73 WIP + Precip + Temp + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2 3.26
1397.11 WIP * Precip * Temp + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod3 5.64
1402.77 WIP * Precip * Temp + MNDWI + NDVI + NDYI + (1 + (depth_cm) | sample_ID) mod4 11.30
1406.09 Precip * Temp + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod5 14.62
1405.7 Precip + Temp + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod6 14.23
1402.86 WIP + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod7 11.39
1424.3 WIP + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod8 32.83
1422.22 WIP + (1 + (depth_cm) | sample_ID) mod9 30.75
1393.36 WIP + Precip + Temp + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.1 1.89
1391.47 WIP + Precip + Temp + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.2 0.00
1391.95 WIP + Precip + Temp + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.3 0.48
1394.23 WIP + Precip + Temp + NDVI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.4 2.76
1392.89 WIP + Precip + Temp + NDYI + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.5 1.42
1393.91 WIP + Precip + Temp + NDVI + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.6 2.44
1393.3 WIP + Precip + Temp + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.7 1.83
1409.71 Precip + Temp + WIP:NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.2.1 18.24
1409.47 Precip + Temp + WIP:MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID) mod2.3.1 18.00

Looks like the best model fit to the data according to AIC is the mod2.2 or mod2.3 which do not include any interactions. The model \(R^2\) for mod2.2 is 0.699 compared to the \(R^2\) for mod2.3 is 0.701

Confidence intervals calculated with bootstrapping between these two models show that NDYI is a significant predictor compared to MNDWI. Therefore, we use mod2.2 as our candidate model

  Model 2.2 Model 2.3
Predictors Estimates CI p Estimates CI p
(Intercept) -1.21 -1.91 – -0.51 0.002 -1.27 -1.98 – -0.56 0.002
WIP 0.24 0.12 – 0.38 <0.001 0.28 0.15 – 0.40 <0.001
Precip 0.35 0.10 – 0.59 0.006 0.48 0.28 – 0.69 <0.001
Temp -0.01 -0.24 – 0.21 0.912 -0.03 -0.26 – 0.20 0.772
NDYI 0.16 0.00 – 0.33 0.046
GEOLOGIC AGE
[Miocene-Eocene]
-1.08 -2.09 – -0.12 0.038 -1.14 -2.13 – -0.17 0.028
GEOLOGIC AGE
[Oligocene-Eocene]
0.21 -0.54 – 0.95 0.546 0.26 -0.50 – 1.00 0.462
GEOLOGIC AGE
[Pleistocene]
-0.04 -0.79 – 0.66 0.908 0.00 -0.73 – 0.70 0.994
GEOLOGIC AGE
[pre-Tertiary]
0.02 -0.97 – 0.92 0.974 0.03 -0.96 – 0.96 0.942
GEOLOGIC AGE [Quaternary] -0.59 -1.45 – 0.26 0.168 -0.67 -1.49 – 0.18 0.122
MNDWI 0.11 -0.01 – 0.23 0.070
Random Effects
σ2 0.64 0.64
τ00 0.65 sample_ID 0.61 sample_ID
τ11 0.51 sample_ID.depth_cm 0.50 sample_ID.depth_cm
ρ01 0.94 sample_ID 0.93 sample_ID
ICC 0.51 0.49
N 96 sample_ID 96 sample_ID
Observations 481 481
Marginal R2 / Conditional R2 0.174 / 0.591 0.182 / 0.582

We can now visualize our candidate model 2.2.

Dredging for model selection

The dredge function can be used to look through multiple combinations of models from a globally defined model. The number of parameters are limited so we run two models: one with the NDYI parameter and another with the MNDWI parameter included with Temp, Precip, and WIP fully interacting. GEOLOGIC_AGE is also included as an additional, non-interaction parameter and random effects are constant.

gmod1 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Temp * Precip *
    NDYI + GEOLOGIC_AGE + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F, na.action = "na.fail")
gmod2 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Temp * Precip *
    MNDWI + GEOLOGIC_AGE + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F, na.action = "na.fail")
dredge1 <- dredge(gmod1, beta = "sd")
dredge2 <- dredge(gmod2, beta = "sd")
(Intercept) GEOLOGIC_AGE NDYI Precip Temp WIP NDYI:Precip NDYI:Temp NDYI:WIP Precip:Temp Precip:WIP Temp:WIP NDYI:Precip:Temp NDYI:Precip:WIP NDYI:Temp:WIP Precip:Temp:WIP NDYI:Precip:Temp:WIP df logLik AICc delta weight
24 0
0.1299973 0.2590203 NA 0.1872057 NA NA NA NA NA NA NA NA NA NA NA 13 -681.7415 1390.262 0.0000000 0.3260432
536 0
0.1361008 0.2516699 NA 0.1831620 NA NA NA NA -0.0527466 NA NA NA NA NA NA 14 -681.1778 1391.257 0.9944499 0.1983047
22 0
NA 0.3486281 NA 0.2158639 NA NA NA NA NA NA NA NA NA NA NA 12 -683.7578 1392.182 1.9198364 0.1248498
56 0
0.1448841 0.2478062 NA 0.1834030 -0.022364 NA NA NA NA NA NA NA NA NA NA 14 -681.6511 1392.204 1.9410966 0.1235297
152 0
0.1270359 0.2618069 NA 0.1888159 NA NA -0.0070658 NA NA NA NA NA NA NA NA 14 -681.7318 1392.365 2.1023649 0.1139600
32 0
0.1289036 0.2660776 -0.0081215 0.1868638 NA NA NA NA NA NA NA NA NA NA NA 14 -681.7375 1392.376 2.1137575 0.1133127
(Intercept) GEOLOGIC_AGE MNDWI Precip Temp WIP MNDWI:Precip MNDWI:Temp MNDWI:WIP Precip:Temp Precip:WIP Temp:WIP MNDWI:Precip:Temp MNDWI:Precip:WIP MNDWI:Temp:WIP Precip:Temp:WIP MNDWI:Precip:Temp:WIP df logLik AICc delta weight
56 0
0.0668338 0.3640831 NA 0.2067567 0.0735801 NA NA NA NA NA NA NA NA NA NA 14 -680.9016 1390.704 0.0000000 0.24640525
24 0
0.0907875 0.3500011 NA 0.2136153 NA NA NA NA NA NA NA NA NA NA NA 13 -682.0105 1390.800 0.0959663 0.23486113
568 0
0.0722226 0.3608701 NA 0.2036170 0.0747569 NA NA NA -0.0550311 NA NA NA NA NA NA 15 -680.2725 1391.577 0.8727253 0.15927223
536 0
0.0965784 0.3466353 NA 0.2106885 NA NA NA NA -0.0544050 NA NA NA NA NA NA 14 -681.4154 1391.732 1.0276861 0.14739772
22 0
NA 0.3486281 NA 0.2158639 NA NA NA NA NA NA NA NA NA NA NA 12 -683.7578 1392.182 1.4778537 0.11768961
64 0
0.0634709 0.3963320 -0.0408569 0.2036933 0.0765553 NA NA NA NA NA NA NA NA NA NA 15 -680.7958 1392.624 1.9194222 0.09437407

By using dredge we find that there are a few candidate models that fit the data well.

The difference between the model AICs is negligible. The first includes no interactions between the parameters and removes Temp as a predictor. The second includes an interaction between MNDWI and Precip while also excluding Temp.

We can look at the bootstrapped confidence intervals to examine the significance of the predictors between the two models.
  Dredge Model 1 Dredge Model 2
Predictors Estimates CI p Estimates CI p
(Intercept) -1.21 -1.86 – -0.52 0.002 -1.25 -1.89 – -0.56 0.002
GEOLOGIC AGE
[Miocene-Eocene]
-1.09 -2.10 – -0.14 0.036 -1.23 -2.22 – -0.23 0.010
GEOLOGIC AGE
[Oligocene-Eocene]
0.21 -0.53 – 0.92 0.554 0.25 -0.48 – 0.97 0.472
GEOLOGIC AGE
[Pleistocene]
-0.05 -0.77 – 0.60 0.872 -0.04 -0.73 – 0.63 0.916
GEOLOGIC AGE
[pre-Tertiary]
0.00 -1.01 – 0.90 0.994 -0.03 -1.02 – 0.86 0.960
GEOLOGIC AGE [Quaternary] -0.63 -1.43 – 0.23 0.154 -0.72 -1.46 – 0.09 0.080
NDYI 0.16 0.01 – 0.33 0.038
Precip 0.34 0.17 – 0.50 <0.001 0.47 0.35 – 0.59 <0.001
WIP 0.24 0.12 – 0.38 <0.001 0.27 0.15 – 0.40 <0.001
MNDWI 0.09 -0.04 – 0.21 0.194
MNDWI × Precip 0.09 -0.04 – 0.20 0.152
Random Effects
σ2 0.64 0.64
τ00 0.65 sample_ID 0.58 sample_ID
τ11 0.51 sample_ID.depth_cm 0.50 sample_ID.depth_cm
ρ01 0.94 sample_ID 0.93 sample_ID
ICC 0.50 0.47
N 96 sample_ID 96 sample_ID
Observations 481 481
Marginal R2 / Conditional R2 0.175 / 0.591 0.193 / 0.576

Dredge Model 1 includes NDYI, Precip, WIP, and GEOLOGIC_AGE as significant predictors. Dredge Model 2 does not have significant confidence intervals for the MNDWI and MNDWIxPrecip parameters.

There is also minimal differences between the R\(^{2}\) between the two models where Dredge Model 1 has R\(^{2}\) = 0.699 and Dredge Model 2 has R\(^{2}\) = 0.698

We now can examine the plot with the Dredge Model 1

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

### Random Forest Machine Learning

I also want to try Random Forest in order to see if a more flexible, machine learning model can capture any non-linear relationships with SOC and other variables.

depth_cm is included as a predictor here in the data setup

## mtry = 3  OOB error = 0.8515942 
## Searching left ...
## mtry = 2     OOB error = 0.8907029 
## -0.04592407 1e-04 
## Searching right ...
## mtry = 4     OOB error = 0.8483588 
## 0.003799248 1e-04 
## mtry = 6     OOB error = 0.8291902 
## 0.02259492 1e-04 
## mtry = 9     OOB error = 0.8306786 
## -0.001794988 1e-04
##   mtry  OOBError
## 2    2 0.8907029
## 3    3 0.8515942
## 4    4 0.8483588
## 6    6 0.8291902
## 9    9 0.8306786

rf_model <- randomForest((carbon_stock_g_cm2) ~ ., ntree = 1000,
    mtry = 6, importance = TRUE, data = full)


rf_model
## 
## Call:
##  randomForest(formula = (carbon_stock_g_cm2) ~ ., data = full,      ntree = 1000, mtry = 6, importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 1000
## No. of variables tried at each split: 6
## 
##           Mean of squared residuals: 0.8269377
##                     % Var explained: 50.93
plot(rf_model)
varImpPlot(rf_model)

~50% of the out of bag (OOB) variation is explained. Looks like after depth_cm Precip, Temp, WIP, and NDYI are the big drivers. The RF model does not appear to have

error.variable value
MAE 0.3397694
MAE^2 0.1154432
RMSE 0.4511324
RMSE non-log 1.5700892
Stdev of non-log SOC stock 0.4698826
R^2 on full dataset 0.8994381

Look at partial dependency plots

## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:randomForest':
## 
##     combine
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:spatialEco':
## 
##     combine
## 
## Attaching package: 'grid'
## The following object is masked from 'package:spatialEco':
## 
##     explode
## The following object is masked from 'package:terra':
## 
##     depth

## [1] "1.578 mean square error rf model from training"
## [1] "0.113 R^2 rf model from training"
## [1] "0.07 R^2 rf model from testing "

Here, we show the fit between sampled and predicted SOC stocks for the full Random Forest Model

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'