S. Hu, H. Qi, Z. Wang et al.
Environmental Science and Ecotechnology 30 (2026) 100682
activation function. We used the fully connected layer to map the output of the BERT model into raw score vectors ( z ) for different categories based on equation (1): z = Wx + b (1) where x is the output of the BERT model, W is the weight matrix, b is the bias vector, and z is the initial score vector. Subsequently, we mapped z to the 0 – 1 range using the Softmax activation function to ensure that the sum of all elements equaled 1. Finally, we classified the text using equation (2): Fig. 4. Architecture of the BERT model. Text inputs are vectorized using token, segment, and position embeddings and encoded by stacked Transformer layers. The final representations are fed into a softmax-linear classifier to generate predicted class labels. The two panels on the right show enlarged views of the Transformer block and the classifier head block. BERT: bidirectional encoder representations from the transformers.
In the decoder, shallow features from the ResNet-101 encoder were first subjected to a 1 × 1 convolution layer. In parallel, the ASPP-derived deep semantic features were upsampled by a factor of four (linear interpolation) to restore the spatial resolution to the size of the shallow feature maps and ensure multiscale features. The transformed shallow features were concatenated with the upsampled deep features along the channel dimension and then refined by a 3 × 3 convolution layer. Finally, the decoder upsam- pled the resulting feature map was upsampled by an additional factor of four (bilinear interpolation) to recover the original input image resolution and produce a pixel-level semantic segmentation map. 2.4.2. BERT model The BERT architecture comprises three main components: a vector embedding layer, a pretrained BERT model, and a linear classifier [40]. The embedding layer transformed the textual input into word, text, and position vectors via token, segment, and po- sition embeddings (Fig. 4). These representations were then fed into the pretrained BERT model. The pretrained BERT model con- sisted of multiple stacked transformer encoder layers, each with an identical architecture comprising two sublayers: a multihead attention mechanism and a feedforward neural network. The linear classifier included a fully connected layer and a Softmax Fig. 3. Architecture of the DeepLab v3 + model. The DeepLabv3 + model adopts an encoder – decoder architecture. The encoder employs a ResNet-101 backbone inte- grated with atrous spatial pyramid pooling (ASPP) to capture multi-scale contextual information for semantic segmentation. The decoder fuses shallow and deep features to refine object boundaries and progressively up-samples feature maps to generate high-resolution segmentation maps. Conv: convolution.
exp ( z i ) exp (
p
i =
z
j )
∑ C j = 1
(2)
where p i is the probability that the input sample belongs to class i , i is the raw score of the input sample belonging to class i , and C is the number of classes in the classification (C = 5). z 2.4.3. Carbon accounting model To better capture the relationship between functional area extents and the carbon emissions generated by PPPs, we devel- oped an area-based carbon accounting model according to equa- tion (3): E c ; i = a i × A p ; i + b i × A r ; i + c i × A o ; i + d i × A w ; i + e i × A t ; i + f i × A b ; i + ε i (3)
where E
to the carbon emissions of the i -th type of PPP,
c ; i refers
(ton of CO 2 ); A
A
A
w ; i , A
t ; i ,
and A
the primary
p ; i ,
r ; i ,
o ; i , A
b ; i represent
5
Made with FlippingBook interactive PDF creator