SOC controls across Washington

Introduction

Mapping soil organic carbon (SOC) as a function climatic, topographic, and geologic factors across spatial extents is key for identifying vulnerable SOC stocks. Yet, large, vulnerable SOC stocks are contained in the organic soils of wetlands and can occur in close proximity to non-wetland soils but are controlled and function differently due to saturated soil conditions. Thus, wetland SOC is often omitted or underrepresented in SOC mapping and modeling making it difficult to assess controls compared to non-wetland soil types. Here, we use geospatial data and SOC stocks from 481 soil horizons in 96 pedons in both wetlands and non-wetlands across three ecoregion study areas of the Pacific Northwest to assess climatic, geologic, and topographic controls on SOC.

Hypotheses

WIP is a significant predictor of SOC stock
WIP interaction with climate (Precip & Temp) is a significant predictor of SOC stock
Geology is a significant predictor of SOC stock

Data

This dataset has observations of SOC% and SOC stock for 96 locations in Washington state. There are geospatial covariates as well that are used to build models to predict SOC% and/or SOC stock

## 'data.frame':    481 obs. of  25 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sample_ID           : chr  "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" ...
##  $ site                : chr  "HOH" "HOH" "HOH" "HOH" ...
##  $ depth_cm            : int  13 40 76 104 11 30 45 75 9 27 ...
##  $ helper              : chr  "H-C-01-DNR-WET|13" "H-C-01-DNR-WET|40" "H-C-01-DNR-WET|76" "H-C-01-DNR-WET|104" ...
##  $ thickness_cm        : int  13 27 36 28 11 19 15 30 9 18 ...
##  $ rock_perc           : num  0 0 5 40 0 15 40 55 0 0 ...
##  $ BD_g_cm3            : num  0.07 0.16 0.51 0.88 0.38 0.49 0.7 0.57 0.72 0.95 ...
##  $ field_texture       : chr  "P" "MP" "MM" "C" ...
##  $ field_texture_binned: chr  "O" "O" "O" "C" ...
##  $ redox               : chr  NA NA NA NA ...
##  $ carbon_perc         : num  39.56 38.31 12.14 2.37 15.65 ...
##  $ carbon_stock_g_cm2  : num  0.376 1.678 2.105 0.352 0.65 ...
##  $ NDYI                : num  0.543 0.543 0.543 0.543 0.579 ...
##  $ Geomorphon_Class    : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Temp                : num  10.2 10.2 10.2 10.2 10.2 ...
##  $ Precip              : num  3070 3070 3070 3070 3054 ...
##  $ NDVI                : num  0.84 0.84 0.84 0.84 0.862 ...
##  $ MNDWI               : num  -0.477 -0.477 -0.477 -0.477 -0.446 ...
##  $ EVI                 : num  0.455 0.455 0.455 0.455 0.534 ...
##  $ SCI                 : num  0.649 0.649 0.649 0.649 0.648 ...
##  $ GEOLOGIC_AGE        : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...
##  $ WIP                 : num  0.873 0.873 0.873 0.873 0.896 ...
##  $ x                   : num  411350 411350 411350 411350 409931 ...
##  $ y                   : num  5295086 5295086 5295086 5295086 5294415 ...

Examining covariate predictors using correlation plots

There is weak correlation between carbon_stock_g_cm2 and the selected covariates. Additionally there is some collinearity between predictors. The spectral predictors such as EVI (Enhanced Vegetation Index) and SCI (Soil Condition Index) are well correlated with each other and also with NDVI (Normalized Difference Vegetation Index). We can remove these based on the correlation coefficient > 0.7.

Now we can begin examining the distribution of carbon stock values in the dataset to choose the appropriate transformation. We also scale and center the numeric predictor variables.

## 'data.frame':    481 obs. of  25 variables:
##  $ X                   : num [1:481, 1] -1.73 -1.72 -1.71 -1.71 -1.7 ...
##   ..- attr(*, "scaled:center")= num 241
##   ..- attr(*, "scaled:scale")= num 139
##  $ sample_ID           : chr  "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" ...
##  $ site                : chr  "HOH" "HOH" "HOH" "HOH" ...
##  $ depth_cm            : num [1:481, 1] -0.983 -0.306 0.596 1.297 -1.033 ...
##   ..- attr(*, "scaled:center")= num 52.2
##   ..- attr(*, "scaled:scale")= num 39.9
##  $ helper              : chr  "H-C-01-DNR-WET|13" "H-C-01-DNR-WET|40" "H-C-01-DNR-WET|76" "H-C-01-DNR-WET|104" ...
##  $ thickness_cm        : num [1:481, 1] -0.501 0.443 1.05 0.51 -0.636 ...
##   ..- attr(*, "scaled:center")= num 20.4
##   ..- attr(*, "scaled:scale")= num 14.8
##  $ rock_perc           : num [1:481, 1] -0.693 -0.693 -0.477 1.031 -0.693 ...
##   ..- attr(*, "scaled:center")= num 16.1
##   ..- attr(*, "scaled:scale")= num 23.2
##  $ BD_g_cm3            : num [1:481, 1] -1.805 -1.516 -0.391 0.797 -0.809 ...
##   ..- attr(*, "scaled:center")= num 0.632
##   ..- attr(*, "scaled:scale")= num 0.311
##  $ field_texture       : chr  "P" "MP" "MM" "C" ...
##  $ field_texture_binned: chr  "O" "O" "O" "C" ...
##  $ redox               : chr  NA NA NA NA ...
##  $ carbon_perc         : num  39.56 38.31 12.14 2.37 15.65 ...
##  $ carbon_stock_g_cm2  : num  0.376 1.678 2.105 0.352 0.65 ...
##  $ NDYI                : num [1:481, 1] 1.51 1.51 1.51 1.51 1.93 ...
##   ..- attr(*, "scaled:center")= num 0.414
##   ..- attr(*, "scaled:scale")= num 0.0858
##  $ Geomorphon_Class    : num [1:481, 1] 0.777 0.777 0.777 0.777 0.777 ...
##   ..- attr(*, "scaled:center")= num 8.02
##   ..- attr(*, "scaled:scale")= num 2.55
##  $ Temp                : num [1:481, 1] 1.14 1.14 1.14 1.14 1.16 ...
##   ..- attr(*, "scaled:center")= num 8.33
##   ..- attr(*, "scaled:scale")= num 1.61
##  $ Precip              : num [1:481, 1] 1.23 1.23 1.23 1.23 1.21 ...
##   ..- attr(*, "scaled:center")= num 1804
##   ..- attr(*, "scaled:scale")= num 1031
##  $ NDVI                : num [1:481, 1] 0.138 0.138 0.138 0.138 0.407 ...
##   ..- attr(*, "scaled:center")= num 0.829
##   ..- attr(*, "scaled:scale")= num 0.0823
##  $ MNDWI               : num [1:481, 1] 0.475 0.475 0.475 0.475 1.154 ...
##   ..- attr(*, "scaled:center")= num -0.499
##   ..- attr(*, "scaled:scale")= num 0.0467
##  $ EVI                 : num [1:481, 1] -0.171 -0.171 -0.171 -0.171 0.645 ...
##   ..- attr(*, "scaled:center")= num 0.471
##   ..- attr(*, "scaled:scale")= num 0.0965
##  $ SCI                 : num [1:481, 1] -0.0964 -0.0964 -0.0964 -0.0964 -0.1109 ...
##   ..- attr(*, "scaled:center")= num 0.654
##   ..- attr(*, "scaled:scale")= num 0.0511
##  $ GEOLOGIC_AGE        : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...
##  $ WIP                 : num [1:481, 1] 1.52 1.52 1.52 1.52 1.61 ...
##   ..- attr(*, "scaled:center")= num 0.436
##   ..- attr(*, "scaled:scale")= num 0.286
##  $ x                   : num [1:481, 1] -1.07 -1.07 -1.07 -1.07 -1.08 ...
##   ..- attr(*, "scaled:center")= num 610372
##   ..- attr(*, "scaled:scale")= num 185155
##  $ y                   : num [1:481, 1] -0.0158 -0.0158 -0.0158 -0.0158 -0.0238 ...
##   ..- attr(*, "scaled:center")= num 5296406
##   ..- attr(*, "scaled:scale")= num 83700

Explicit parameter model building

Now build models using log transformed carbon stock data. We need to specify that sample_ID is a random effect because of the multiple samples at one location. depth can be a random slope to adjust model based on how it is affected by depth. We could use a Generalized Linear Model or a Generalized Linear Mixed Model here too but they often fail to converge. I am starting with the log-transformed linear mixed model first to test the hypotheses

# Full model with hypothesized parameters
mod1 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Precip * Temp +
    MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) |
    sample_ID), data = wa_dat_scale, REML = F)
# No interactions
mod2 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) |
    sample_ID), data = wa_dat_scale, REML = F)
# No spectral
mod3 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Precip * Temp +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No geology
mod4 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Precip * Temp +
    MNDWI + NDVI + NDYI + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No WIP w/ interaction
mod5 <- lmer(log(carbon_stock_g_cm2) ~ Precip * Temp + MNDWI +
    NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No WIP w/o interaction
mod6 <- lmer(log(carbon_stock_g_cm2) ~ Precip + Temp + MNDWI +
    NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No climate
mod7 <- lmer(log(carbon_stock_g_cm2) ~ WIP + MNDWI + NDVI + NDYI +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No climate No spectral
mod8 <- lmer(log(carbon_stock_g_cm2) ~ WIP + (GEOLOGIC_AGE) +
    (1 + (depth_cm) | sample_ID), data = wa_dat_scale, REML = F)
# Just WIP
mod9 <- lmer(log(carbon_stock_g_cm2) ~ WIP + (1 + (depth_cm) |
    sample_ID), data = wa_dat_scale, REML = F)

Pairwise comparisons between the top, global model and the rest. Note the table contains mod1 multiple times and has been encoded with an additional number

Models	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)	Significant
mod2	16	1394.726	1461.540	-681.3628	1362.726	NA	NA	NA
mod1	20	1398.184	1481.701	-679.0918	1358.184	4.541993	4	0.3375972
mod3	17	1397.114	1468.104	-681.5572	1363.114	NA	NA	NA
mod1	20	1398.184	1481.701	-679.0918	1358.184	4.930755	3	0.1769384
mod4	15	1402.769	1465.407	-686.3847	1372.769	NA	NA	NA
mod1	20	1398.184	1481.701	-679.0918	1358.184	14.585647	5	0.0122876
mod5	16	1406.090	1472.904	-687.0450	1374.090	NA	NA	NA
mod1	20	1398.184	1481.701	-679.0918	1358.184	15.906251	4	0.0031476	**
mod6	15	1405.705	1468.343	-687.8523	1375.705	NA	NA	NA
mod1	20	1398.184	1481.701	-679.0918	1358.184	17.521028	5	0.0036107	**
mod7	14	1402.863	1461.325	-687.4314	1374.863	NA	NA	NA
mod1	20	1398.184	1481.701	-679.0918	1358.184	16.679065	6	0.0105380
mod8	11	1424.304	1470.238	-701.1519	1402.304	NA	NA	NA
mod1	20	1398.184	1481.701	-679.0918	1358.184	44.120165	9	0.0000013	***
mod9	6	1422.216	1447.271	-705.1081	1410.216	NA	NA	NA
mod1	20	1398.184	1481.701	-679.0918	1358.184	52.032483	14	0.0000028	***

From the ANOVAs it looks like model 2 & model 3 performed just as well as model 1meaning we should proceed with these. model 2 removed interactions and model 3 removed all the spectral metrics but kept interaction. However, model 2 has 1 less parameter than model 3 which gives it a lower AIC when compared together.

	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)
mod2	16	1394.726	1461.540	-681.3628	1362.726	NA	NA	NA
mod3	17	1397.114	1468.104	-681.5572	1363.114	0	1	1

# No MNDWI
mod2.1 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No MNDWI, No NDVI
mod2.2 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No NDYI, No NDVI
mod2.3 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No NDYI, No MNDWI
mod2.4 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDVI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No NDVI
mod2.5 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDYI + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No NDYI
mod2.6 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    NDVI + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID),
    data = wa_dat_scale, REML = F)
# No spectral
mod2.7 <- lmer(log(carbon_stock_g_cm2) ~ WIP + Precip + Temp +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)

Models	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)
mod2.1	15	1393.356	1455.994	-681.6780	1363.356	NA	NA	NA
mod2	16	1394.726	1461.540	-681.3628	1362.726	0.6303396	1	0.4272308
mod2.2	14	1391.475	1449.937	-681.7375	1363.475	NA	NA	NA
mod2	16	1394.726	1461.540	-681.3628	1362.726	0.7492708	2	0.6875399
mod2.3	14	1391.954	1450.417	-681.9772	1363.954	NA	NA	NA
mod2	16	1394.726	1461.540	-681.3628	1362.726	1.2287240	2	0.5409859
mod2.4	14	1394.233	1452.695	-683.1163	1366.233	NA	NA	NA
mod2	16	1394.726	1461.540	-681.3628	1362.726	3.5070131	2	0.1731657
mod2.5	15	1392.893	1455.531	-681.4467	1362.893	NA	NA	NA
mod2	16	1394.726	1461.540	-681.3628	1362.726	0.1677554	1	0.6821144
mod2.6	15	1393.914	1456.552	-681.9568	1363.914	NA	NA	NA
mod2	16	1394.726	1461.540	-681.3628	1362.726	1.1879664	1	0.2757401
mod2.7	13	1393.302	1447.588	-683.6508	1367.302	NA	NA	NA
mod2	16	1394.726	1461.540	-681.3628	1362.726	4.5758474	3	0.2056238

Model 2.2 and 2.3 are the lowest AIC and not significantly different from the fit in Model 2. The choice is either between MNDWI or NDYI to be included in the model with WIP, Precip, Temp, and Geology. To test , I want to see if WIP interacts with these variables.

# No MNDWI, No NDVI
mod2.2.1 <- lmer(log(carbon_stock_g_cm2) ~ Precip + Temp + WIP:NDYI +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)
# No NDYI, No NDVI
mod2.3.1 <- lmer(log(carbon_stock_g_cm2) ~ Precip + Temp + WIP:MNDWI +
    (GEOLOGIC_AGE) + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F)

	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)	Significant
mod2.2.1	13	1409.712	1463.998	-691.8560	1383.712	NA	NA	NA
mod2.2	14	1391.475	1449.937	-681.7375	1363.475	20.23698	1	6.8e-06	***
mod2.3.1	13	1409.469	1463.755	-691.7343	1383.469	NA	NA	NA
mod2.3	14	1391.954	1450.417	-681.9772	1363.954	19.51414	1	1.0e-05	***

Doesn’t look like there are any significant interactions that improve model fit

Now we can look at the table of all models and compare AICs

AIC	formula	name	delta
1398.18	WIP * Precip * Temp + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod1	6.71
1394.73	WIP + Precip + Temp + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2	3.26
1397.11	WIP * Precip * Temp + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod3	5.64
1402.77	WIP * Precip * Temp + MNDWI + NDVI + NDYI + (1 + (depth_cm) \| sample_ID)	mod4	11.30
1406.09	Precip * Temp + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod5	14.62
1405.7	Precip + Temp + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod6	14.23
1402.86	WIP + MNDWI + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod7	11.39
1424.3	WIP + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod8	32.83
1422.22	WIP + (1 + (depth_cm) \| sample_ID)	mod9	30.75
1393.36	WIP + Precip + Temp + NDVI + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.1	1.89
1391.47	WIP + Precip + Temp + NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.2	0.00
1391.95	WIP + Precip + Temp + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.3	0.48
1394.23	WIP + Precip + Temp + NDVI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.4	2.76
1392.89	WIP + Precip + Temp + NDYI + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.5	1.42
1393.91	WIP + Precip + Temp + NDVI + MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.6	2.44
1393.3	WIP + Precip + Temp + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.7	1.83
1409.71	Precip + Temp + WIP:NDYI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.2.1	18.24
1409.47	Precip + Temp + WIP:MNDWI + (GEOLOGIC_AGE) + (1 + (depth_cm) \| sample_ID)	mod2.3.1	18.00

Looks like the best model fit to the data according to AIC is the mod2.2 or mod2.3 which do not include any interactions. The model \(R^2\) for mod2.2 is 0.699 compared to the \(R^2\) for mod2.3 is 0.701

Confidence intervals calculated with bootstrapping between these two models show that NDYI is a significant predictor compared to MNDWI. Therefore, we use mod2.2 as our candidate model

	Model 2.2			Model 2.3
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	-1.21	-1.91 – -0.51	0.002	-1.27	-1.98 – -0.56	0.002
WIP	0.24	0.12 – 0.38	<0.001	0.28	0.15 – 0.40	<0.001
Precip	0.35	0.10 – 0.59	0.006	0.48	0.28 – 0.69	<0.001
Temp	-0.01	-0.24 – 0.21	0.912	-0.03	-0.26 – 0.20	0.772
NDYI	0.16	0.00 – 0.33	0.046
GEOLOGIC AGE [Miocene-Eocene]	-1.08	-2.09 – -0.12	0.038	-1.14	-2.13 – -0.17	0.028
GEOLOGIC AGE [Oligocene-Eocene]	0.21	-0.54 – 0.95	0.546	0.26	-0.50 – 1.00	0.462
GEOLOGIC AGE [Pleistocene]	-0.04	-0.79 – 0.66	0.908	0.00	-0.73 – 0.70	0.994
GEOLOGIC AGE [pre-Tertiary]	0.02	-0.97 – 0.92	0.974	0.03	-0.96 – 0.96	0.942
GEOLOGIC AGE [Quaternary]	-0.59	-1.45 – 0.26	0.168	-0.67	-1.49 – 0.18	0.122
MNDWI				0.11	-0.01 – 0.23	0.070
Random Effects
σ²	0.64			0.64
τ₀₀	0.65 _{sample_ID}			0.61 _{sample_ID}
τ₁₁	0.51 _{sample_ID.depth_cm}			0.50 _{sample_ID.depth_cm}
ρ₀₁	0.94 _{sample_ID}			0.93 _{sample_ID}
ICC	0.51			0.49
N	96 _{sample_ID}			96 _{sample_ID}
Observations	481			481
Marginal R² / Conditional R²	0.174 / 0.591			0.182 / 0.582

We can now visualize our candidate model 2.2.

Dredging for model selection

The dredge function can be used to look through multiple combinations of models from a globally defined model. The number of parameters are limited so we run two models: one with the NDYI parameter and another with the MNDWI parameter included with Temp, Precip, and WIP fully interacting. GEOLOGIC_AGE is also included as an additional, non-interaction parameter and random effects are constant.

gmod1 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Temp * Precip *
    NDYI + GEOLOGIC_AGE + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F, na.action = "na.fail")
gmod2 <- lmer(log(carbon_stock_g_cm2) ~ WIP * Temp * Precip *
    MNDWI + GEOLOGIC_AGE + (1 + (depth_cm) | sample_ID), data = wa_dat_scale,
    REML = F, na.action = "na.fail")
dredge1 <- dredge(gmod1, beta = "sd")
dredge2 <- dredge(gmod2, beta = "sd")

	NDYI	Precip	Temp	WIP	NDYI:Precip	NDYI:Temp	NDYI:WIP	Precip:Temp	Precip:WIP	Temp:WIP	NDYI:Precip:Temp	NDYI:Precip:WIP	NDYI:Temp:WIP	Precip:Temp:WIP	NDYI:Precip:Temp:WIP	df	logLik	AICc	delta	weight
24	0.1299973	0.2590203	NA	0.1872057	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	13	-681.7415	1390.262	0.0000000	0.3260432
536	0.1361008	0.2516699	NA	0.1831620	NA	NA	NA	NA	-0.0527466	NA	NA	NA	NA	NA	NA	14	-681.1778	1391.257	0.9944499	0.1983047
22	NA	0.3486281	NA	0.2158639	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	12	-683.7578	1392.182	1.9198364	0.1248498
56	0.1448841	0.2478062	NA	0.1834030	-0.022364	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	14	-681.6511	1392.204	1.9410966	0.1235297
152	0.1270359	0.2618069	NA	0.1888159	NA	NA	-0.0070658	NA	NA	NA	NA	NA	NA	NA	NA	14	-681.7318	1392.365	2.1023649	0.1139600
32	0.1289036	0.2660776	-0.0081215	0.1868638	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	14	-681.7375	1392.376	2.1137575	0.1133127

	MNDWI	Precip	Temp	WIP	MNDWI:Precip	MNDWI:Temp	MNDWI:WIP	Precip:Temp	Precip:WIP	Temp:WIP	MNDWI:Precip:Temp	MNDWI:Precip:WIP	MNDWI:Temp:WIP	Precip:Temp:WIP	MNDWI:Precip:Temp:WIP	df	logLik	AICc	delta	weight
56	0.0668338	0.3640831	NA	0.2067567	0.0735801	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	14	-680.9016	1390.704	0.0000000	0.24640525
24	0.0907875	0.3500011	NA	0.2136153	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	13	-682.0105	1390.800	0.0959663	0.23486113
568	0.0722226	0.3608701	NA	0.2036170	0.0747569	NA	NA	NA	-0.0550311	NA	NA	NA	NA	NA	NA	15	-680.2725	1391.577	0.8727253	0.15927223
536	0.0965784	0.3466353	NA	0.2106885	NA	NA	NA	NA	-0.0544050	NA	NA	NA	NA	NA	NA	14	-681.4154	1391.732	1.0276861	0.14739772
22	NA	0.3486281	NA	0.2158639	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	12	-683.7578	1392.182	1.4778537	0.11768961
64	0.0634709	0.3963320	-0.0408569	0.2036933	0.0765553	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	15	-680.7958	1392.624	1.9194222	0.09437407

By using dredge we find that there are a few candidate models that fit the data well.

The difference between the model AICs is negligible. The first includes no interactions between the parameters and removes Temp as a predictor. The second includes an interaction between MNDWI and Precip while also excluding Temp.

We can look at the bootstrapped confidence intervals to examine the significance of the predictors between the two models.

	Dredge Model 1			Dredge Model 2
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	-1.21	-1.86 – -0.52	0.002	-1.25	-1.89 – -0.56	0.002
GEOLOGIC AGE [Miocene-Eocene]	-1.09	-2.10 – -0.14	0.036	-1.23	-2.22 – -0.23	0.010
GEOLOGIC AGE [Oligocene-Eocene]	0.21	-0.53 – 0.92	0.554	0.25	-0.48 – 0.97	0.472
GEOLOGIC AGE [Pleistocene]	-0.05	-0.77 – 0.60	0.872	-0.04	-0.73 – 0.63	0.916
GEOLOGIC AGE [pre-Tertiary]	0.00	-1.01 – 0.90	0.994	-0.03	-1.02 – 0.86	0.960
GEOLOGIC AGE [Quaternary]	-0.63	-1.43 – 0.23	0.154	-0.72	-1.46 – 0.09	0.080
NDYI	0.16	0.01 – 0.33	0.038
Precip	0.34	0.17 – 0.50	<0.001	0.47	0.35 – 0.59	<0.001
WIP	0.24	0.12 – 0.38	<0.001	0.27	0.15 – 0.40	<0.001
MNDWI				0.09	-0.04 – 0.21	0.194
MNDWI × Precip				0.09	-0.04 – 0.20	0.152
Random Effects
σ²	0.64			0.64
τ₀₀	0.65 _{sample_ID}			0.58 _{sample_ID}
τ₁₁	0.51 _{sample_ID.depth_cm}			0.50 _{sample_ID.depth_cm}
ρ₀₁	0.94 _{sample_ID}			0.93 _{sample_ID}
ICC	0.50			0.47
N	96 _{sample_ID}			96 _{sample_ID}
Observations	481			481
Marginal R² / Conditional R²	0.175 / 0.591			0.193 / 0.576

Dredge Model 1 includes NDYI, Precip, WIP, and GEOLOGIC_AGE as significant predictors. Dredge Model 2 does not have significant confidence intervals for the MNDWI and MNDWIxPrecip parameters.

There is also minimal differences between the R\(^{2}\) between the two models where Dredge Model 1 has R\(^{2}\) = 0.699 and Dredge Model 2 has R\(^{2}\) = 0.698

We now can examine the plot with the Dredge Model 1

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

### Random Forest Machine Learning

I also want to try Random Forest in order to see if a more flexible, machine learning model can capture any non-linear relationships with SOC and other variables.

depth_cm is included as a predictor here in the data setup

## mtry = 3  OOB error = 0.8515942 
## Searching left ...
## mtry = 2     OOB error = 0.8907029 
## -0.04592407 1e-04 
## Searching right ...
## mtry = 4     OOB error = 0.8483588 
## 0.003799248 1e-04 
## mtry = 6     OOB error = 0.8291902 
## 0.02259492 1e-04 
## mtry = 9     OOB error = 0.8306786 
## -0.001794988 1e-04
##   mtry  OOBError
## 2    2 0.8907029
## 3    3 0.8515942
## 4    4 0.8483588
## 6    6 0.8291902
## 9    9 0.8306786

rf_model <- randomForest((carbon_stock_g_cm2) ~ ., ntree = 1000,
    mtry = 6, importance = TRUE, data = full)


rf_model
## 
## Call:
##  randomForest(formula = (carbon_stock_g_cm2) ~ ., data = full,      ntree = 1000, mtry = 6, importance = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 1000
## No. of variables tried at each split: 6
## 
##           Mean of squared residuals: 0.8269377
##                     % Var explained: 50.93
plot(rf_model)
varImpPlot(rf_model)

~50% of the out of bag (OOB) variation is explained. Looks like after depth_cm Precip, Temp, WIP, and NDYI are the big drivers. The RF model does not appear to have

error.variable	value
MAE	0.3397694
MAE^2	0.1154432
RMSE	0.4511324
RMSE non-log	1.5700892
Stdev of non-log SOC stock	0.4698826
R^2 on full dataset	0.8994381

Look at partial dependency plots

## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:randomForest':
## 
##     combine
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:spatialEco':
## 
##     combine
## 
## Attaching package: 'grid'
## The following object is masked from 'package:spatialEco':
## 
##     explode
## The following object is masked from 'package:terra':
## 
##     depth

## [1] "1.578 mean square error rf model from training"
## [1] "0.113 R^2 rf model from training"
## [1] "0.07 R^2 rf model from testing "

Here, we show the fit between sampled and predicted SOC stocks for the full Random Forest Model

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'