当前位置：首页 > news >正文

LSTM模型变种

news 2025/11/9 2:07:03

LSTM模型变种

一、GRU

1.什么是GRU

GRU（Gated Recurrent Unit）是一种循环神经网络（RNN）的变体，它被设计用来解决传统RNN在处理长序列时可能遇到的梯度消失或梯度爆炸问题。GRU通过引入门控机制来控制信息的流动，使得模型能够更好地捕捉长时间依赖关系。

GRU的主要特点在于它的结构比LSTM（Long Short-Term Memory）更简单，因为它只有一个隐藏状态，而没有像LSTM那样的细胞状态。尽管如此，GRU仍然能够有效地学习长期依赖，并且通常计算效率更高。

2.GRU的基本结构

一个GRU单元包含两个主要的门：重置门（Reset Gate）和更新门（Update Gate）。这些门的作用是决定如何将新输入的信息与先前的记忆相结合。

重置门（ $r_t$ ）决定了前一时刻的状态 $h_{t−1}$ 有多少信息会被用于当前时刻的候选激活 $\tilde{h}_t $的计算。
更新门（ $z_t$ ）决定了前一时刻的状态 $h_{t−1}$ 和当前时刻的候选激活 $\tilde{h}_t$ 如何结合以产生当前时刻的状态 $h_t$ 。

3.模型结构

模型图：

在这里插入图片描述

内部结构：

在这里插入图片描述

4.代码实现

原理实现

import numpy as npclass GRU:def __init__(self, input_size, hidden_size):self.input_size = input_sizeself.hidden_size = hidden_size# 初始化参数和偏置# 更新门self.W_z = np.random.randn(hidden_size, hidden_size+input_size)self.b_z = np.zeros(hidden_size)# 重置门self.W_r = np.random.randn(hidden_size, hidden_size + input_size)self.b_r = np.zeros(hidden_size)# 候选隐藏状态self.W_h = np.random.randn(hidden_size, hidden_size + input_size)self.b_h = np.zeros(hidden_size)def tanh(self, x):return np.tanh(x)def sigmoid(self, x):return 1 / (1 + np.exp(-x))def forward(self, x):# 初始化隐藏状态h_prev = np.zeros((self.hidden_size, ))concat_input = np.concatenate([x, h_prev], axis=0)z_t = self.sigmoid(np.dot(self.W_z, concat_input) + self.b_z)r_t = self.sigmoid(np.dot(self.W_r, concat_input) + self.b_r)concat_reset_input = np.concatenate([x, r_t*h_prev], axis=0)h_hat_t = self.tanh(np.dot(self.W_h, concat_reset_input) + self.b_h)h_t = (1 - z_t) * h_prev + z_t * h_hat_treturn h_t# 测试数据
input_size = 3
hidden_size = 2
seq_len = 4x = np.random.randn(seq_len, input_size)
gru = GRU(input_size, hidden_size)all_h = []
for t in range(seq_len):h_t = gru.forward(x[t, :])all_h.append(h_t)print(np.array(all_h).shape)

nn.GRUCell

import torch.nn as nn
import torchclass GRUCell(nn.Module):def __init__(self, input_size, hidden_size):super(GRUCell, self).__init__()self.input_size = input_sizeself.hidden_size = hidden_sizeself.gru_cell = nn.GRUCell(input_size, hidden_size)def forward(self, x):h_t = self.gru_cell(x)return h_t# 测试数据
input_size = 3
hidden_size = 2
seq_len = 4gru_model = GRUCell(input_size, hidden_size)x = torch.randn(seq_len, input_size)all_h = []
for t in range(seq_len):h_t = gru_model(x[t])all_h.append(h_t)print(all_h)

nn.GRU

import torch.nn as nn
import torchclass GRU(nn.Module):def __init__(self, input_size, hidden_size):super(GRU, self).__init__()self.input_size = input_sizeself.hidden_size = hidden_sizeself.gru = nn.GRU(input_size, hidden_size)def forward(self, x):out = self.gru(x)return out# 测试数据
input_size = 3
hidden_size = 2
seq_len = 4
batch_size = 5x = torch.randn(seq_len, batch_size, input_size)gru_model = GRU(input_size, hidden_size)out = gru_model(x)print(out)

二、BiLSTM

1.什么是BiLSTM

BiLSTM（Bidirectional Long Short-Term Memory）是LSTM的一种扩展，它通过同时考虑序列的前向和后向信息来增强模型对序列数据的理解。传统的LSTM只能从过去到未来单方向处理序列，而BiLSTM则能够同时捕捉到序列中每个时间点上的前后文信息，从而提高模型在许多任务中的表现。

2.BiLSTM的工作原理

在BiLSTM中，对于每个时间步t，模型包含两个独立的LSTM层：

前向LSTM：按照正常的时间顺序处理输入序列，即从第一个时间步到最后一个时间步。
后向LSTM：以相反的时间顺序处理输入序列，即从最后一个时间步到第一个时间步。

这两个LSTM层分别输出一个隐藏状态，然后将这两个隐藏状态拼接起来形成最终的隐藏状态。这个拼接后的隐藏状态可以用来做进一步的预测或计算。

在这里插入图片描述

优点：

上下文感知：BiLSTM可以利用整个序列的信息，因此对于需要理解上下文的任务特别有效。
更好的性能：由于它可以捕捉更丰富的序列信息，通常在诸如命名实体识别、情感分析等自然语言处理任务上表现出色。

3.标注集

BMES标注：汉字作为词语开始Begin，结束End,中间Middle，单字Single，这四种情况就可以囊括所有的分词情况。比如“参观了北京天安门”这句话的标注结果就是BESBEBME

词性标注

4.代码实现

原理实现

import numpy as np
import torchclass BILSTM:def __init__(self, input_size, hidden_size, output_size):# 参数：词向量大小，隐藏层大小， 输出类别self.input_size = input_sizeself.hidden_size = hidden_sizeself.output_size = output_size# 正向self.lstm_forward = LSTM(input_size, hidden_size, output_size)# 反向self.lstm_backward = LSTM(input_size, hidden_size, output_size)def forward(self, x):# 正向LSTMoutput, _, _ = self.lstm_forward.forward(x)# 反向LSTM,np.flip是将数组进行翻转output_backward, _, _ = self.lstm_backward.forward(np.flip(x))# 合并两层的隐藏状态combine_output = [np.concatenate((x, y), axis=0) for x, y in zip(output, output_backward)]return combine_outputclass LSTM:def __init__(self, input_size, hidden_size, output_size):# 参数：词向量大小，隐藏层大小， 输出类别self.input_size = input_sizeself.hidden_size = hidden_sizeself.output_size = output_size# 初始化权重，偏置，把结构的W，U拼接在一起self.W_f = np.random.rand(hidden_size, input_size+hidden_size)self.b_f = np.random.rand(hidden_size)self.W_i = np.random.rand(hidden_size, input_size + hidden_size)self.b_i = np.random.rand(hidden_size)self.W_c = np.random.rand(hidden_size, input_size + hidden_size)self.b_c = np.random.rand(hidden_size)self.W_o = np.random.rand(hidden_size, input_size + hidden_size)self.b_o = np.random.rand(hidden_size)# 输出层self.W_y = np.random.rand(output_size, hidden_size)self.b_y = np.random.rand(output_size)def tanh(self, x):return np.tanh(x)def sigmoid(self, x):return 1/(1+np.exp(-x))def forward(self, x):# 初始化隐藏状态h_t = np.zeros((self.hidden_size,))# 初始化细胞状态c_t = np.zeros((self.hidden_size,))h_states = []  # 存储每一个时间步的隐藏状态c_states = []  # 存储每一个时间步的细胞状态for t in range(x.shape[0]):x_t = x[t]  # 获取当前时间步的输入（一个词向量）# 将x_t和h_t进行垂直方向拼接x_t = np.concatenate([x_t, h_t])# 遗忘门 "dot"迷茫中,这里是点积的效果,(5,7)点积(7,)得到的是(5,)f_t = self.sigmoid(np.dot(self.W_f, x_t) + self.b_f)# 输出门i_t = self.sigmoid(np.dot(self.W_i, x_t) + self.b_i)# 候选细胞状态c_hat_t = self.tanh(np.dot(self.W_c, x_t) + self.b_c)# 更新细胞状态, "*"对应位置直接相乘c_t = f_t * c_t + i_t * c_hat_t# 输出门o_t = self.sigmoid(np.dot(self.W_o, x_t) + self.b_o)# 更新隐藏状态h_t = o_t * self.tanh(c_t)# 保存时间步的隐藏状态和细胞状态h_states.append(h_t)c_states.append(c_t)# 输出层，分类类别y_t = np.dot(self.W_y, h_t) + self.b_youtput = torch.softmax(torch.tensor(y_t), dim=0)return np.array(h_states), np.array(c_states), output# 测试数据
input_size = 3
hidden_size = 2
seq_len = 4x = np.random.randn(seq_len, input_size)
bilstm = BILSTM(input_size, hidden_size, 5)
out = bilstm.forward(x)print(out)
print(np.array(out).shape)

API的使用

import torch
import torch.nn as nn
import numpy as npclass BiLstm(nn.Module):def __init__(self, input_size, hidden_size, output_size):super(BiLstm, self).__init__()# 定义双向LSTMself.lstm = nn.LSTM(input_size, hidden_size, bidirectional=True)# 因为双向的，所以第一个参数是隐藏层的的二倍self.linear = nn.Linear(hidden_size*2, output_size)def forward(self, x):out, _ = self.lstm(x)out = self.linear(out)return out# 测试数据
input_size = 3
hidden_size = 8
seq_len = 4
output_size = 5
batch_size = 6x = torch.randn(seq_len, batch_size, input_size)bilstm = BiLstm(input_size, hidden_size, output_size)output = bilstm(x)print(output.shape)