当前位置：首页 > news >正文

[内存泄漏][PyTorch](create_graph=True)

news 2026/2/11 1:42:31

PyTorch保存计算图导致内存泄漏

1. 内存泄漏定义
2. 问题发现背景
3. pytorch中关于这个问题的讨论

1. 内存泄漏定义

内存泄漏（Memory Leak）是指程序中已动态分配的堆内存由于某种原因程序未释放或无法释放，造成系统内存的浪费，导致程序运行速度减慢甚至系统崩溃等严重后果。

2. 问题发现背景

在使用深度学习求解PDE时，由于经常需要计算高阶导数，在pytorch框架下写的代码需要用到torch.autograd.grad(create_graph=True)或者torch.backward(create_graph=True)这个参数，然后发现了这个内存泄漏的问题。如果要保存计算图用来计算高阶导数，那么其所占的内存不会被释放，会一直占用。也就是如果设置create_graph=True，那么其保存的计算图所占的内存只有在程序运行结束时才会释放，这样导致了一个问题，即如果在循环中需要保存计算图，例如每个循环都需要计算一次黑塞矩阵，那么这个内存占用就会越来越多，最终导致out of memory报错。
在这里插入图片描述

3. pytorch中关于这个问题的讨论

官网中关于这个问题的讨论见https://github.com/pytorch/pytorch/issues/7343,这里提出的内存泄漏的例子如下：

import torch
import gc_ = torch.randn(1, device='cuda')
del _
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())
x = torch.randn(1, device='cuda', requires_grad=True)
y = x.tanh()
y.backward(torch.ones_like(y), create_graph=True)
del x, y
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())

在这里插入图片描述
可以看到虽然删除了变量，依然造成了内存泄漏。这里红色的警告就是关于这个内存泄漏的问题。

UserWarning: Using backward() with create_graph=True will create a reference cycle between
the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad 
when creating the graph to avoid this. If you have to use this function, make sure to reset 
the .grad fields of your parameters to None after use to break the cycle and avoid the leak. 
(Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\autograd\engine.cpp:1000.)
allow_unreachable=True, accumulate_grad=True) 
# Calls into the C++ engine to run the backward pass

看这个UserWarning，提示我们使用torch.autograd.grad()函数可以避免这个梯度泄漏，然后对代码进行改动：

import torch
import gc
from torch.autograd import grad_ = torch.randn(1, device='cuda')
del _
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())
x = torch.randn(1, device='cuda', requires_grad=True)
y = x.tanh()
z = grad(y, x, retain_graph=True, create_graph=True)
# y.backward(torch.ones_like(y), create_graph=True)
del x, y, z
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())

在这里插入图片描述
结果显示没有梯度泄漏。进一步，我们求一下二阶导数：

import torch
import gc
from torch.autograd import grad_ = torch.randn(1, device='cuda')
del _
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())
x = torch.randn(1, device='cuda', requires_grad=True)
y = x.tanh()
z = grad(y, x, retain_graph=True, create_graph=True)
print(torch.cuda.memory_allocated())
q = grad(z, x)
del x, y, z, q
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())

在这里插入图片描述
结果也没有内存泄漏。但是，如果我们不删除结果二阶导数q，这样是出于如果写在一个函数中，需要将q作为return值返回的情况。

import torch
import gc
from torch.autograd import grad_ = torch.randn(1, device='cuda')
del _
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())
x = torch.randn(1, device='cuda', requires_grad=True)
y = x.tanh()
z = grad(y, x, retain_graph=True, create_graph=True)
print(torch.cuda.memory_allocated())
q = grad(z, x)
del x, y, z
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())

在这里插入图片描述
可以看到，这还是会导致一部分内存泄漏。那如果我们一定要返回q，又不想内存泄漏，这里本人想到一直办法，就是将q转换成numpy数据类型，返回这个numpy数组，就不会导致内存泄漏了。

import torch
import gc
from torch.autograd import grad_ = torch.randn(1, device='cuda')
del _
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())
x = torch.randn(1, device='cuda', requires_grad=True)
y = x.tanh()
z = grad(y, x, retain_graph=True, create_graph=True)
print(torch.cuda.memory_allocated())
q = grad(z, x)
k = q[0].cpu().numpy()
del x, y, z, q
torch.cuda.synchronize()
gc.collect()
print(torch.cuda.memory_allocated())

在这里插入图片描述

[内存泄漏][PyTorch](create_graph=True)

PyTorch保存计算图导致内存泄漏

1. 内存泄漏定义

2. 问题发现背景

3. pytorch中关于这个问题的讨论

相关文章：

[内存泄漏][PyTorch](create_graph=True)

【Git学习二】时光回溯：git reset和git checkout命令详解

多维时序 | MATLAB实现PSO-GRU-Attention粒子群优化门控循环单元融合注意力机制的多变量时间序列预测

MySQL缓冲池的优化与性能提升

一些RLHF的平替汇总

7.docker部署前端vue项目，实现反向代理配置

字符串函数详解

Mybatis学习笔记-映射文件，标签，插件

【C++】模板初阶【深入浅出理解模板】

无需API开发，伯俊科技实现电商与客服系统的无缝集成

Python | 机器学习之逻辑回归

手机,蓝牙开发板,TTL/USB模块,电脑四者之间的通讯

Springboot更新用户头像

Express.js 与 Nest.js对比

总结 CNN 模型：将焦点转移到基于注意力的架构

2023.11.16 hivesql高阶函数之开窗函数

QTableWidget常用信号的功能

Vue理解01

4、FFmpeg命令行操作8

【MySQL】索引与事务

c#开发AI模型对话

【开发技术】.Net使用FFmpeg视频特定帧上绘制内容

鸿蒙DevEco Studio HarmonyOS 5跑酷小游戏实现指南

人机融合智能 | “人智交互”跨学科新领域

永磁同步电机无速度算法--基于卡尔曼滤波器的滑模观测器

goreplay

HTTPS证书一年多少钱？

【大模型】RankRAG：基于大模型的上下文排序与检索增强生成的统一框架

表单设计器拖拽对象时添加属性

从0开始学习R语言--Day17--Cox回归