Wetland Feature Maps
Background
In my PhD, I did a lot of wetland mapping with the Wetland Intrisic Potentail (WIP) tool. This tool was a pixel-based appraoch meaning that every pixel was modeled using points that extracted feature information at that location. For example a digital elevation model (DEM) would be used to calculate slope and then a point location would extract that slope value for that pixel (or groups of pixels with some sampling interpolation). With a bunch of features, especially those calculated at different length scales See the paper here for details, we could confidently map wetlands using a random forest algorithm and produce a continous probability.

But one thing that these pixel based approaches find difficult it segmenting groups of pixels into objects. In wetland terms, a group of high wetland probability pixels could be grouped into one “wetland”. There are pros/cons to this. Wetlands rarely have sharp boundaries, but often wetland delineators need to establish boundaries for regulations. Segmentation also known as object-based image analysis, can happen before the modeling such as segmenting groups of pixels and then taking a mean of the pixel feature values or other metric to be fed into the model. It can also happen after, which in the WIPs case we could smooth pixel values to remove spottyness and gaps and then create groups above certain thresholds.
However, deep learning alogorithms are able to complete image segementation without these pre/post processing steps. These algorithms like convolutional neural networks search images for features then group pixels together based on those features giving each pixel a label and a probability. For wetland mapping this was very interesting since we could tackle the problem of the segementation with the algorithm.
But what are these deep learnign features? In the WIP random forest model, we calculated several of our own features at multiple length scales to capture the spatial variability that might correspond to wetland presence. In convolutional neural networks such as the U-Net, features are captured with convolutions and resampling down to multiple levels of resolution. The features are similar to hot-spots of variation in an image but corresponding to all bands. They might not look anything like the engineered features from terrain that we calculated from WIP.
So that brings me to this attempt to visualize those featuers as they are built in the deep learning model.
The deep learning model architecture
We are using a convolutional neural network architecture called a U-Net originally used for medical images it has been adapted to other areas included geospatial. Specifically, we are using U-Net3+ which aims to take advantage of all scales used in the regular U-Net

The U-Net architecture has three main components: Encoders, Decoders, and a Bottleneck. Input images from a training dataset enter the model pipeline at the encoder. Encoders are where convolutions take place and the feature maps are created. The convolutions are usually 3x3 kernels that down-sample the resolution (e.g. 256x256 to 128x128). In our U-Net the first encoder starts with 64 convolutions creating 64 feature maps. A ReLU activation function then finds the features based on a threshold (usually 0 since everything is normalized) and a max pooling filter then moves across all of these activated feature maps to downsample (usually by 1/2, e.g. 256x256 -> 128x128) and produce a single feature map that moves to the next encoder. The next encoder doubles to 128 convolutions and then 128 features, which similarly go through ReLU activation max pooling. At each encoder level, the input image gets downsampled until it reaches the bottleneck where the lowest resolution but highest number of feature maps reside.
Below is a sampling of images that moved through the encoder layers

It is the most abstract.
Visualizing the feature maps learned by a deep learning model for wetland mapping. The interactive notebook is embedded below.