Analysis of soil carbon controls across the Pacific Northwest

## 'data.frame':    481 obs. of  25 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sample_ID           : chr  "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" "H-C-01-DNR-WET" ...
##  $ site                : chr  "HOH" "HOH" "HOH" "HOH" ...
##  $ depth_cm            : int  13 40 76 104 11 30 45 75 9 27 ...
##  $ helper              : chr  "H-C-01-DNR-WET|13" "H-C-01-DNR-WET|40" "H-C-01-DNR-WET|76" "H-C-01-DNR-WET|104" ...
##  $ thickness_cm        : int  13 27 36 28 11 19 15 30 9 18 ...
##  $ rock_perc           : num  0 0 5 40 0 15 40 55 0 0 ...
##  $ BD_g_cm3            : num  0.07 0.16 0.51 0.88 0.38 0.49 0.7 0.57 0.72 0.95 ...
##  $ field_texture       : chr  "P" "MP" "MM" "C" ...
##  $ field_texture_binned: chr  "O" "O" "O" "C" ...
##  $ redox               : chr  NA NA NA NA ...
##  $ carbon_perc         : num  39.56 38.31 12.14 2.37 15.65 ...
##  $ carbon_stock_g_cm2  : num  0.376 1.678 2.105 0.352 0.65 ...
##  $ NDYI                : num  0.543 0.543 0.543 0.543 0.579 ...
##  $ Geomorphon_Class    : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Temp                : num  10.2 10.2 10.2 10.2 10.2 ...
##  $ Precip              : num  3070 3070 3070 3070 3054 ...
##  $ NDVI                : num  0.84 0.84 0.84 0.84 0.862 ...
##  $ MNDWI               : num  -0.477 -0.477 -0.477 -0.477 -0.446 ...
##  $ EVI                 : num  0.455 0.455 0.455 0.455 0.534 ...
##  $ SCI                 : num  0.649 0.649 0.649 0.649 0.648 ...
##  $ GEOLOGIC_AGE        : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...
##  $ WIP                 : num  0.873 0.873 0.873 0.873 0.896 ...
##  $ x                   : num  411350 411350 411350 411350 409931 ...
##  $ y                   : num  5295086 5295086 5295086 5295086 5294415 ...
## 'data.frame':    276 obs. of  25 variables:
##  $ X                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sample_ID        : chr  "CC_S26_R3" "CC_S26_R3" "CC_S26_R3" "CC_S27_R3" ...
##  $ upper_depth      : int  0 30 60 0 30 60 0 30 60 0 ...
##  $ lower_depth      : int  30 60 100 30 60 100 30 60 100 30 ...
##  $ SOC_stock_spline : num  0.1636 0.0983 0.0531 0.6126 0.6936 ...
##  $ site             : chr  "COL" "COL" "COL" "COL" ...
##  $ geomorphons      : chr  "f" "f" "f" "i" ...
##  $ HLI              : num  0.849 0.849 0.849 0.502 0.502 ...
##  $ Precip           : num  629 629 629 664 664 ...
##  $ Temp             : num  6.66 6.66 6.66 6.03 6.03 ...
##  $ WIP              : num  0.209 0.209 0.209 0.127 0.127 ...
##  $ tree_canopy_cover: num  31 31 31 56 56 ...
##  $ NDVI             : num  0.537 0.537 0.537 0.739 0.739 ...
##  $ MNDWI            : num  -0.517 -0.517 -0.517 -0.554 -0.554 ...
##  $ EVI              : num  0.292 0.292 0.292 0.387 0.387 ...
##  $ SCI              : num  0.511 0.511 0.511 0.616 0.616 ...
##  $ SAVI             : num  0.29 0.29 0.29 0.383 0.383 ...
##  $ EMBI             : num  0.188 0.188 0.188 0.162 0.162 ...
##  $ DSI              : num  0.92 0.92 0.92 0.625 0.625 ...
##  $ DSWI1            : num  1.16 1.16 1.16 1.61 1.61 ...
##  $ NDYI             : num  0.226 0.226 0.226 0.36 0.36 ...
##  $ LSWI             : num  0.0556 0.0556 0.0556 0.2314 0.2314 ...
##  $ ANDWI            : num  -0.539 -0.539 -0.539 -0.643 -0.643 ...
##  $ LITHOLOGY        : chr  "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" ...
##  $ GEOLOGIC_AGE     : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...

With many potential predictors we can eliminate some by examining correlation plots

There is weak correlation between SOC_stock_spline and the selected covariates. Additionally there is some collinearity between predictors. The spectral predictors such as EVI (Enhanced Vegetation Index) and SCI (Soil Condition Index) are well correlated with each other and also with NDVI (Normalized Difference Vegetation Index). We can remove these based on the correlation coefficient > 0.7.

Now we can begin examining the distribution of carbon stock values in the dataset to choose the appropriate transformation. We also scale and center the numeric predictor variables.

## 'data.frame':    276 obs. of  25 variables:
##  $ X                : num [1:276, 1] -1.72 -1.71 -1.7 -1.69 -1.67 ...
##   ..- attr(*, "scaled:center")= num 138
##   ..- attr(*, "scaled:scale")= num 79.8
##  $ sample_ID        : chr  "CC_S26_R3" "CC_S26_R3" "CC_S26_R3" "CC_S27_R3" ...
##  $ upper_depth      : num [1:276, 1] -1.1849 0.0492 1.2833 -1.1849 0.0492 ...
##   ..- attr(*, "scaled:center")= num 28.8
##   ..- attr(*, "scaled:scale")= num 24.3
##  $ lower_depth      : int  30 60 100 30 60 100 30 60 100 30 ...
##  $ SOC_stock_spline : num  0.1636 0.0983 0.0531 0.6126 0.6936 ...
##  $ site             : chr  "COL" "COL" "COL" "COL" ...
##  $ geomorphons      : chr  "f" "f" "f" "i" ...
##  $ HLI              : num [1:276, 1] 1.264 1.264 1.264 -0.907 -0.907 ...
##   ..- attr(*, "scaled:center")= num 0.647
##   ..- attr(*, "scaled:scale")= num 0.16
##  $ Precip           : num [1:276, 1] -1.21 -1.21 -1.21 -1.18 -1.18 ...
##   ..- attr(*, "scaled:center")= num 1870
##   ..- attr(*, "scaled:scale")= num 1022
##  $ Temp             : num [1:276, 1] -1.09 -1.09 -1.09 -1.48 -1.48 ...
##   ..- attr(*, "scaled:center")= num 8.42
##   ..- attr(*, "scaled:scale")= num 1.62
##  $ WIP              : num [1:276, 1] -0.801 -0.801 -0.801 -1.08 -1.08 ...
##   ..- attr(*, "scaled:center")= num 0.443
##   ..- attr(*, "scaled:scale")= num 0.292
##  $ tree_canopy_cover: num [1:276, 1] -2.25 -2.25 -2.25 -0.264 -0.264 ...
##   ..- attr(*, "scaled:center")= num 59.3
##   ..- attr(*, "scaled:scale")= num 12.6
##  $ NDVI             : num [1:276, 1] -3.5 -3.5 -3.5 -1.1 -1.1 ...
##   ..- attr(*, "scaled:center")= num 0.832
##   ..- attr(*, "scaled:scale")= num 0.0843
##  $ MNDWI            : num [1:276, 1] -0.324 -0.324 -0.324 -1.105 -1.105 ...
##   ..- attr(*, "scaled:center")= num -0.502
##   ..- attr(*, "scaled:scale")= num 0.0475
##  $ EVI              : num [1:276, 1] -1.932 -1.932 -1.932 -0.953 -0.953 ...
##   ..- attr(*, "scaled:center")= num 0.48
##   ..- attr(*, "scaled:scale")= num 0.0972
##  $ SCI              : num [1:276, 1] -2.81 -2.81 -2.81 -0.8 -0.8 ...
##   ..- attr(*, "scaled:center")= num 0.658
##   ..- attr(*, "scaled:scale")= num 0.0526
##  $ SAVI             : num [1:276, 1] -2.17 -2.17 -2.17 -1.02 -1.02 ...
##   ..- attr(*, "scaled:center")= num 0.466
##   ..- attr(*, "scaled:scale")= num 0.0811
##  $ EMBI             : num [1:276, 1] 1.52 1.52 1.52 1.3 1.3 ...
##   ..- attr(*, "scaled:center")= num 0.00762
##   ..- attr(*, "scaled:scale")= num 0.119
##  $ DSI              : num [1:276, 1] 3.03 3.03 3.03 1.2 1.2 ...
##   ..- attr(*, "scaled:center")= num 0.432
##   ..- attr(*, "scaled:scale")= num 0.161
##  $ DSWI1            : num [1:276, 1] -1.99 -1.99 -1.99 -1.36 -1.36 ...
##   ..- attr(*, "scaled:center")= num 2.58
##   ..- attr(*, "scaled:scale")= num 0.714
##  $ NDYI             : num [1:276, 1] -2.215 -2.215 -2.215 -0.683 -0.683 ...
##   ..- attr(*, "scaled:center")= num 0.419
##   ..- attr(*, "scaled:scale")= num 0.0874
##  $ LSWI             : num [1:276, 1] -2.61 -2.61 -2.61 -1.33 -1.33 ...
##   ..- attr(*, "scaled:center")= num 0.413
##   ..- attr(*, "scaled:scale")= num 0.137
##  $ ANDWI            : num [1:276, 1] 2.92 2.92 2.92 1.08 1.08 ...
##   ..- attr(*, "scaled:center")= num -0.703
##   ..- attr(*, "scaled:scale")= num 0.0564
##  $ LITHOLOGY        : chr  "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" "continental glacial outwash, Fraser-age" ...
##  $ GEOLOGIC_AGE     : chr  "Pleistocene" "Pleistocene" "Pleistocene" "Pleistocene" ...

Explicit parameter model building

Now build models using log transformed carbon stock data. We need to specify that sample_ID is a random effect because of the multiple samples at one location. lower_depth will be a random slope to adjust model based on how it is affected by depth. We could use a Generalized Linear Model or a Generalized Linear Mixed Model here too but they often fail to converge. We will model with the log-transformed soil carbon stocks in the linear mixed model

# Full model with hypothesized parameters
mod1 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp *
    ANDWI * NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# No interactions
mod2 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# No spectral
mod3 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No geology
mod4 <- lmer(log10(SOC_stock_spline) ~ WIP * Precip * Temp +
    ANDWI + NDYI + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No WIP w/ interaction
mod5 <- lmer(log10(SOC_stock_spline) ~ Precip * Temp + ANDWI +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 1.5562 (tol = 0.002, component 1)
# No WIP w/o interaction
mod6 <- lmer(log10(SOC_stock_spline) ~ Precip + Temp + ANDWI +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No climate
mod7 <- lmer(log10(SOC_stock_spline) ~ WIP + +ANDWI + NDYI +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No climate No spectral
mod8 <- lmer(log10(SOC_stock_spline) ~ WIP + (GEOLOGIC_AGE) +
    lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0035401 (tol = 0.002, component 1)
# Just WIP
mod9 <- lmer(log10(SOC_stock_spline) ~ WIP + lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
# NULL
mod10 <- lmer(log10(SOC_stock_spline) ~ lower_depth + (lower_depth ||
    sample_ID), data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.431569 (tol = 0.002, component 1)

Note models 5, 8, and 10, failed to converge

Pairwise comparisons between the top, largest model and the rest show if models with fewer parameters are significantly different. Considering AIC will help determine the most parsimonious model with the best fit to the data.

Models	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)	Significant
mod2	15	220.4259	274.7319	-95.21296	190.4259	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	36.52080	26	0.0824988
mod3	17	223.7913	285.3381	-94.89564	189.7913	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	35.88615	24	0.0562988
mod4	14	227.0389	277.7245	-99.51945	199.0389	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	45.13377	27	0.0157401	***
mod5	15	229.4435	283.7495	-99.72175	199.4435	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	45.53837	26	0.0102674	***
mod6	14	229.4083	280.0939	-100.70415	201.4083	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	47.50318	27	0.0087196	***
mod7	13	230.4420	277.5072	-102.22099	204.4420	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	50.53685	28	0.0056307	***
mod8	11	247.7352	287.5596	-112.86760	225.7352	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	71.83007	30	0.0000275	***
mod9	6	245.6252	267.3476	-116.81262	233.6252	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	79.72011	35	0.0000243	***
mod10	5	247.8359	265.9379	-118.91797	237.8359	NA	NA	NA
mod1	41	235.9051	384.3416	-76.95256	153.9051	83.93081	36	0.0000107	***

From the ANOVAs it looks like model 2 & model 3 performed just as well as model 1meaning we should proceed with these. model 2 removed interactions and model 3 removed all the spectral metrics but kept interaction. However, model 2 has 2 less parameters than model 3 which gives it a lower AIC when compared together.

	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)
mod2	15	220.4259	274.7319	-95.21296	190.4259	NA	NA	NA
mod3	17	223.7913	285.3381	-94.89564	189.7913	0.6346545	2	0.7280925

To hone in on differences between variations in model 2 we can remove some of the variables and compare AICs for a more parsimonious fit.

# No ANDWI
mod2.1 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    NDYI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
# No ANDWI, No NDYI
mod2.2 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0070341 (tol = 0.002, component 1)
# No NDYI
mod2.3 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + (GEOLOGIC_AGE) + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00238014 (tol = 0.002, component 1)
# No Geology
mod2.4 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + NDYI + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.171981 (tol = 0.002, component 1)
# No NDYI or Geology
mod2.5 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    ANDWI + lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0427664 (tol = 0.002, component 1)
# No ANDWI or Geology
mod2.6 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    NDYI + lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)
# No spectral or Geology
mod2.7 <- lmer(log10(SOC_stock_spline) ~ WIP + Precip + Temp +
    lower_depth + (lower_depth || sample_ID), data = wa_spl_dat_scale,
    REML = F)

Note models 2, 3, 4, 5 failed to converge

Models	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)	Significant
mod2.1	14	218.8356	269.5212	-95.41779	190.8356	NA	NA	NA
mod2	15	220.4259	274.7319	-95.21296	190.4259	0.409648	1	0.5221482
mod2.2	13	219.4581	266.5234	-96.72907	193.4581	NA	NA	NA
mod2	15	220.4259	274.7319	-95.21296	190.4259	3.032215	2	0.2195649
mod2.3	14	221.3225	272.0082	-96.66127	193.3225	NA	NA	NA
mod2	15	220.4259	274.7319	-95.21296	190.4259	2.896615	1	0.0887658
mod2.4	10	221.4128	257.6168	-100.70639	201.4128	NA	NA	NA
mod2	15	220.4259	274.7319	-95.21296	190.4259	10.986856	5	0.0516412
mod2.5	9	227.2392	259.8228	-104.61960	209.2392	NA	NA	NA
mod2	15	220.4259	274.7319	-95.21296	190.4259	18.813263	6	0.0044909	***
mod2.6	9	220.0621	252.6457	-101.03105	202.0621	NA	NA	NA
mod2	15	220.4259	274.7319	-95.21296	190.4259	11.636166	6	0.0705954
mod2.7	8	226.0611	255.0243	-105.03055	210.0611	NA	NA	NA
mod2	15	220.4259	274.7319	-95.21296	190.4259	19.635164	7	0.0064141	***

Model 2.1 and 2.2 are the lowest AIC and not significantly different from the fit in Model 2. Because Model 2.2 failed to converge, we will move forward with Model 2.1.

Now we can look at the table of all models and compare AICs. We find that Model 2.1 has the lowest AIC followed by Model 2.2 which failed to converge

AIC	formula	name	delta
235.91	WIP * Precip * Temp * ANDWI * NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod1	17.07
220.43	WIP + Precip + Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod2	1.59
223.79	WIP * Precip * Temp + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod3	4.95
227.04	WIP * Precip * Temp + ANDWI + NDYI + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod4	8.20
229.44	Precip * Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod5	10.60
229.41	Precip + Temp + ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod6	10.57
230.44	WIP + +ANDWI + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod7	11.60
247.74	WIP + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod8	28.90
245.63	WIP + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod9	26.79
218.84	WIP + Precip + Temp + NDYI + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod2.1	0.00
219.46	WIP + Precip + Temp + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod2.2	0.62
221.32	WIP + Precip + Temp + ANDWI + (GEOLOGIC_AGE) + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod2.3	2.48
221.41	WIP + Precip + Temp + ANDWI + NDYI + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod2.4	2.57
227.24	WIP + Precip + Temp + ANDWI + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod2.5	8.40
220.06	WIP + Precip + Temp + NDYI + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod2.6	1.22
226.06	WIP + Precip + Temp + lower_depth + ((1 \| sample_ID) + (0 + lower_depth \| sample_ID))	mod2.7	7.22

The model \(R^2\) for mod2.1 is 0.892

Confidence intervals calculated with bootstrapping show that WIP and Precip are the most significant contributors to SOC stock. the overlapping confidence intervals do not show much separation between WIP and Precip but there is more variation in the Precip confidence interval.

	Model 2.1
Predictors	Estimates	CI	p
(Intercept)	-0.24	-0.63 – 0.11	0.192
WIP	0.12	0.05 – 0.19	<0.001
Precip	0.18	0.03 – 0.32	0.020
Temp	-0.00	-0.12 – 0.12	0.992
NDYI	0.07	-0.02 – 0.16	0.116
GEOLOGIC AGE [Miocene-Eocene]	-0.44	-0.90 – 0.09	0.108
GEOLOGIC AGE [Oligocene-Eocene]	0.09	-0.29 – 0.52	0.654
GEOLOGIC AGE [Pleistocene]	0.04	-0.33 – 0.46	0.844
GEOLOGIC AGE [pre-Tertiary]	-0.05	-0.56 – 0.48	0.846
GEOLOGIC AGE [Quaternary]	-0.28	-0.70 – 0.20	0.280
lower depth	-0.01	-0.01 – -0.01	<0.001
Random Effects
σ²	0.05
τ₀₀ _{sample_ID}	0.04
τ₁₁ _{sample_ID.lower_depth}	0.00
ρ₀₁
ρ₀₁
ICC	0.40
N _{sample_ID}	96
Observations	276
Marginal R² / Conditional R²	0.554 / 0.731

Looking at some partial dependence plots we can see some relationships between the predicted values and WIP and Precip. The pattern in Precip shows the differences between the two study areas. with wetter areas having a higher range of SOC stock

## 
## Attaching package: 'ggpubr'
## The following object is masked from 'package:cowplot':
## 
##     get_legend
## The following object is masked from 'package:terra':
## 
##     rotate

We can now visualize our candidate model 2.1 in a predicted vs. actual plot

Dredging for model selection

With the explicit hypothesized approach, we can only evaluate so many models. However, there may be interactions between variables that might be more powerful controls and could give insight into relationships between controls.

The dredge function can be used to look through multiple combinations of models from a globally defined model. The number of parameters are limited so we run two models: one with the NDYI parameter and another with the MNDWI parameter included with Temp, Precip, and WIP fully interacting. GEOLOGIC_AGE is also included as an additional, non-interaction parameter and random effects are constant.

gmod1 <- lmer(log10(SOC_stock_spline) ~ WIP * Temp * Precip *
    NDYI + GEOLOGIC_AGE + lower_depth + (lower_depth || sample_ID),
    data = wa_spl_dat_scale, REML = F, na.action = "na.fail")
dredge1 <- dredge(gmod1, beta = "sd")

	GEOLOGIC_AGE	lower_depth	NDYI	Precip	Temp	WIP	NDYI:Precip	NDYI:Temp	NDYI:WIP	Precip:Temp	Precip:WIP	Temp:WIP	NDYI:Precip:Temp	NDYI:Precip:WIP	NDYI:Temp:WIP	Precip:Temp:WIP	NDYI:Precip:Temp:WIP	df	logLik	AICc	delta	weight
24575	NA	-0.3859947	0.7855336	-0.1358170	0.3672985	0.1104673	-0.3213815	0.4300134	0.1622869	-0.1453279	-0.2295751	0.0928943	-0.5578764	NA	0.1604055	NA	NA	17	-90.27224	216.9166	0.000000	0.2995148
48		-0.3836525	0.1317391	0.3088780	NA	0.2106672	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	13	-95.41784	218.2250	1.308419	0.1557036
14335	NA	-0.3851104	0.7752096	-0.0954976	0.3346213	0.0855746	-0.1921094	0.2650052	0.1003702	-0.1163907	-0.1221449	NA	-0.5543885	0.1354660	NA	NA	NA	16	-92.10787	218.3161	1.399547	0.1487683
57343	NA	-0.3860028	0.7545347	-0.1191113	0.3462410	0.1768655	-0.3177870	0.4410082	0.1212566	-0.1486504	-0.1461895	0.0151525	-0.5340127	NA	0.2205693	-0.11752	NA	18	-89.84743	218.3563	1.439757	0.1458072
1072		-0.3836264	0.1396997	0.2998503	NA	0.2037453	NA	NA	NA	NA	-0.0801549	NA	NA	NA	NA	NA	NA	14	-94.51771	218.6446	1.728044	0.1262346
32767	NA	-0.3865633	0.7828342	-0.1474136	0.3813838	0.1424312	-0.3788402	0.4947691	0.1940555	-0.1510079	-0.2464318	0.0962508	-0.5516318	-0.1055494	0.2452163	NA	NA	18	-90.00966	218.6808	1.764225	0.1239715

By using dredge we find that there are a few candidate models that fit the data well.

The difference between the model AICs is lower than a common threshold of 2 but there is a difference between the top model and others. This model contains multiple interactions including NDYI:Precip, NDYI:Temp, NDYI:WIP Precip:Temp, Precip:WIP, Temp:WIP,NDYI:Precip:Temp

We can visualize some of these interactions with interaction plots against SOC stock. It looks like there is strong interaction and separation with the NDYI parameter and WIP

We can look at the bootstrapped confidence intervals to examine the significance of the predictors between the two models. Interestingly the WIP parameter alone is not significant. And there is high effect sizes for NDYI, NDYI:Temp, and (NDYI × Precip) × Temp

	Dredge Model
Predictors	Estimates	CI	p
(Intercept)	-0.16	-0.30 – -0.02	0.014
lower depth	-0.01	-0.01 – -0.01	<0.001
NDYI	0.44	0.24 – 0.64	<0.001
Precip	-0.07	-0.27 – 0.12	0.428
Temp	0.21	0.02 – 0.39	0.028
WIP	0.06	-0.01 – 0.13	0.074
NDYI × Precip	-0.20	-0.33 – -0.06	0.002
NDYI × Temp	0.25	0.08 – 0.42	0.006
NDYI × WIP	0.09	0.01 – 0.18	0.038
Precip × Temp	-0.11	-0.26 – 0.03	0.104
Precip × WIP	-0.14	-0.27 – 0.00	0.064
Temp × WIP	0.06	-0.07 – 0.16	0.326
(NDYI × Precip) × Temp	-0.29	-0.47 – -0.11	0.002
(NDYI × Temp) × WIP	0.08	0.01 – 0.15	0.032
Random Effects
σ²	0.05
τ₀₀ _{sample_ID}	0.02
τ₁₁ _{sample_ID.lower_depth}	0.00
ρ₀₁
ρ₀₁
ICC	0.32
N _{sample_ID}	96
Observations	276
Marginal R² / Conditional R²	0.611 / 0.735

The R\(^{2}\) of the Dredge Model is R\(^{2}\) = 0.897

We now can examine the plot with the Dredge Model

Random Forest Machine Learning

Using Random Forest can see if a more flexible, machine learning model can capture any non-linear relationships or interaction relationships between parameters.

lower_depth is included as a predictor here in the data setup

We then use the tuneRF to choose the appropriate mtry number

rf_model <- randomForest((SOC_stock_spline) ~ ., ntree = 1000,
    mtry = 4, importance = TRUE, data = full)
plot(rf_model)

rf.full <- predict(rf_model, newdata = full)
vip::vip(rf_model)

61.63% of the out of bag (OOB) variation is explained. Looks likePrecip, lower_depth, WIP, and NDYI are the big drivers.

error.variable	value
MAE	0.1306526
MAE^2	0.0170701
RMSE	0.1768086
RMSE non-log	1.1934027
Stdev of non-log SOC stock	0.4999442
R^2 on full dataset	0.9230161

Looking at partial dependency plots we can see how Precip and NDYI have higher influence on separating variation in model predictions. WIP less so but seems to play a role on the higher end.

Here, we show the fit between sampled and predicted SOC stocks for the full Random Forest Model

Analysis of soil carbon controls across the Pacific Northwest

Anthony Stewart

Explicit parameter model building

Dredging for model selection

Random Forest Machine Learning