当前位置：首页 > news >正文

机器学习笔记之优化算法(十五)Baillon Haddad Theorem简单认识

news 2026/5/19 0:02:26

机器学习笔记之优化算法——Baillon Haddad Theorem简单认识

引言

引言

本节将简单认识 $\text{Baillon Haddad Theorem}$ (白老爹定理)，并提供相关证明。

$\text{Baillon Haddad Theorem}$ 简单认识

如果函数 $f(\cdot)$ 在其定义域内可微，并且是凸函数，则存在如下等价条件：
以下几个条件之间相互等价。

关于 $f(\cdot)$ 的梯度 $\nabla f(\cdot)$ 满足 $\mathcal L$ -利普希兹连续；
$\begin{cases} \forall x,\hat x \in \mathbb R^n ,\exist \mathcal L: \quad s.t.||f(x) - f(\hat x)|| \leq \mathcal L \cdot ||x - \hat x|| \\ \quad \\ \begin{aligned} \exist \xi \in (x,\hat x) \Rightarrow \frac{||f(x) - f(\hat x)||}{||x - \hat x||} = f'(\xi) \leq \mathcal L \end{aligned} \end{cases}$
关于利普希兹连续详见二次上界引理。从逻辑的角度理解，这意味着：函数 $f(\cdot)$ 中斜率的变化量被利普希兹常数 $\mathcal L$ 约束。从图像的角度模糊观察，由于 $\mathcal L$ 的限制，不会出现斜率过于陡峭的情况。
见下图。从 $\Rightarrow y$ 的过程中， $\nabla f(x) \Rightarrow \nabla f(y)$ 发生了剧烈的变化。这本质上说明 $f(\cdot)$ 在 $[x, y]$ 区间内过于陡峭的原因。
关于函数 $\begin{aligned}\mathcal G(x) = \frac{\mathcal L}{2} x^T x - f(x)\end{aligned}$ 同样是凸函数。

观察 $\mathcal G(x)$ ，可以发现它由两部分组成：系数是 $\begin{aligned}\frac{\mathcal L}{2}\end{aligned}$ ，关于变量 $x$ 的二次项结果；以及 $f (x)$ 自身。而二次函数 $\begin{aligned}\frac{\mathcal L}{2}x^Tx\end{aligned}$ 其自身一定是个凸函数。该条件意味着：这两个凸函数的差也是凸函数。

如果从逻辑角度对 $\begin{aligned}\frac{\mathcal L}{2}x^Tx - f(x)\end{aligned}$ 进行认知：两个凸函数之间做减法，若 $f (x)$ 的陡峭程度要高于 $\begin{aligned}\frac{\mathcal L}{2}x^Tx\end{aligned}$ ，这势必使得减法结果可能不是凸函数；因而该等价条件的本质依然是：约束 $f (x)$ 斜率的变化率，而该变化率的约束与利普希兹常数 $\mathcal L$ 存在关联关系。
关于函数的梯度 $\nabla f(\cdot)$ 具有余强制性 $(\text{Co-coercive})$ 。即：
$\left[\nabla f(x) - \nabla f(y)\right]^T(x - y) \geq \frac{1}{\mathcal L} ||\nabla f(x) - \nabla f(y)||^2$
首先解释一下强制性 $(\text{Coercive})$ 。它也被称作强单调性 $(\text{Strongly monotonicity})$ 。从名字可以看出来——它比一般的单调性更强。关于 $f(\cdot) :\mathbb R \mapsto \mathbb R$ ，其单调性的定义表示为：
- 自变量的差异性与对应函数差异性之间同号。
- 关于 $n$ 维的特征空间 $f(\cdot):\mathbb R^n \mapsto \mathbb R^n$ ,那么此时的 $f (x) - f (y)$ 与 $x - y$ 都是向量。对应单调性的定义即： $f(y)]^T(x - y) \geq 0$
  $\forall x,y \in \mathbb R \quad s.t. [f(x) - f(y)] \cdot (x - y) \geq 0$
而强单调性在单调性同号的基础上，进行了更强的约束：将式子右侧的 $0$ 替换为一个恒正的值。该值通常表示为：系数 $\alpha$ 与 $x$ 的增量 $x - y||^2$ 的乘积形式：
$f(y)]^T (x - y) \geq \alpha \cdot ||x - y||^2$
若该值使用 $f (x)$ 的增量进行表示，我们称之为余强制性，也被称作逆向强单调性 $(\text{Inverse Strongly monotonicity})$ ：
$f(y)]^T (x - y) \geq \alpha \cdot ||f(x) - f(y)||^2$
回顾等价条件 $3$ ：不等式左侧就是 $\nabla f(\cdot)$ 单调性的定义；不等式右侧则是关于余强制性的表述。需要关注的点在于：参与描述正值的系数 $\alpha$ 与利普希兹常数 $\mathcal L$ 之间存在关联关系： $\begin{aligned}\alpha = \frac{1}{\mathcal L}\end{aligned}$ 。

证明过程

通过证明：条件 $\Rightarrow$ 条件 $2$ ，条件 $\Rightarrow$ 条件 $3$ ,条件 $\Rightarrow$ 条件 $1$ 来实现 $3$ 个条件之间的等价关系。

证明：条件 $\Rightarrow$ 条件 $2$

若 $f(\cdot)$ 是凸函数，在定义域内可微；并且梯度函数 $\nabla f(\cdot)$ 满足 $\mathcal L$ -利普希兹连续，求证：函数 $\begin{aligned}\mathcal G(x) = \frac{\mathcal L}{2} x^Tx - f(x)\end{aligned}$ 是凸函数。
关于凸函数的一种证法在于，证明该函数的梯度满足单调性。之所以引入梯度的另一个原因是可以将 $\begin{aligned}\frac{\mathcal L}{2} x^Tx\end{aligned}$ 化成一次项。

证明过程：由 $\begin{aligned}\mathcal G(x) = \frac{\mathcal L}{2} x^Tx -f(x)\end{aligned}$ 可知，关于 $\mathcal G(x)$ 梯度 $\nabla \mathcal G(x)$ 可表示为：
$\nabla \mathcal G(x) = \mathcal L \cdot x - \nabla f(x)$
至此，观察 $\nabla \mathcal G(x)$ 的单调性：
仅需证明 $\mathcal I \geq 0$ 恒成立即可。
$\forall x_1,x_2 \in \mathbb R^n \Rightarrow \mathcal I = [\nabla \mathcal G(x_1) - \nabla \mathcal G(x_2)]^T (x_1 - x_2)$
将上述梯度结果代入，有：
继续展开~
$\begin{aligned} \mathcal I & = [\mathcal L \cdot x_1 - \nabla f(x_1) - \mathcal L \cdot x_2 + \nabla f(x_2)]^T (x_1 - x_2) \\ & = \mathcal L\cdot (x_1 - x_2)^T(x_1 - x _2) - [\nabla f(x_1) - \nabla f(x_2)]^T(x_1 - x_2) \end{aligned}$
观察后一项： $-[\nabla f(x_1) - \nabla f(x_2)]^T (x_1 - x_2)$ ，这明显是两个向量的内积形式。可以根据柯西施瓦茨不等式，得到如下结果：
该部分同样可以使用向量乘法描述: $a^Tb = |a|\cdot|b| \cdot \cos \theta \leq |a| \cdot |b|$ 因为 $\cos \theta \in [-1,1] \leq 1$ 。
$[\nabla f(x_1) - \nabla f(x_2)]^T(x_1 - x_2) \leq ||\nabla f(x_1) - \nabla f(x_2)|| \cdot ||x_1 - x_2||$
加上负号与前一项，从而有：
至于 $x_1 - x_2)^T(x_1 - x_2) = ||x_1 - x_2||^2$ ,两向量重合，夹角为 $0$ 。
$\mathcal I \geq \mathcal L \cdot ||x_1 - x_2||^2 - ||\nabla f(x_1) - \nabla f(x_2)|| \cdot ||x_1 - x_2||$
由于梯度函数 $\nabla f(\cdot)$ 满足 $\mathcal L$ -利普希兹连续，因而将 $||\nabla f(x_1) - \nabla f(x_2)|| \leq \mathcal L \cdot ||x_1 - x_2||$ ，对上式中的 $||\nabla f(x_1) - \nabla f(x_2)||$ 进行替换，最终不等号的方向不发生变化：
$\begin{cases} -||\nabla f(x_1) - \nabla f(x_2)|| \geq -\mathcal L \cdot ||x_1 - x_2|| \\ \quad \\ \begin{aligned} \mathcal I & \geq \mathcal L \cdot ||x_1 - x_2||^2 - ||\nabla f(x_1) - \nabla f(x_2)|| \cdot ||x_1 - x_2|| \\ & \geq \mathcal L \cdot ||x_1 - x_2||^2 - (\mathcal L \cdot ||x_1 - x_2||) \cdot |||x_1 - x_2|| \\ & = 0 \end{aligned} \end{cases}$

最终可证明： $\mathcal I \geq 0 \Rightarrow$ 梯度函数 $\nabla \mathcal G(x)$ 有单调性。从而函数 $\mathcal G(x)$ 是凸函数。

证明：条件 $\Rightarrow$ 条件 $1$

若梯度函数 $\nabla f(\cdot)$ 有余强制性，那么该梯度函数 $\nabla f(\cdot)$ 满足 $\mathcal L$ -利普希兹连续。

证明过程：基于 $\nabla f(\cdot)$ 余强制性，结合柯西施瓦茨不等式，有：
使用柯西施瓦茨不等式将不等式左侧表示为模的乘积形式。
$\begin{cases} \begin{aligned} \left[\nabla f(x) - \nabla f(y)\right]^T(x - y) & \geq \frac{1}{\mathcal L} ||\nabla f(x) - \nabla f(y)||^2 \\ & \Downarrow \\ ||\nabla f(x_1) - \nabla f(x_2)|| \cdot ||x_1 - x_2|| & \geq [\nabla f(x_1) - \nabla f(x_2)]^T (x_1 - x_2) \\ & \geq \frac{1}{\mathcal L} ||\nabla f(x_1) - \nabla f(x_2)||^2 \end{aligned} \end{cases}$
消去 $||\nabla f(x_1) - \nabla f(x_2)||$ ，整理有：
$||\nabla f(x_1) - \nabla f(x_2)|| \leq \mathcal L \cdot ||x_1 - x_2||$
从而得证： $\nabla f(\cdot)$ 满足 $\mathcal L$ -利普希兹连续。

证明：条件 $\Rightarrow$ 条件 $3$

若 $\begin{aligned}\mathcal G(x) = \frac{\mathcal L}{2}x^Tx - f(x)\end{aligned}$ 是凸函数，那么关于梯度函数 $\nabla f(\cdot)$ 有余强制性。

证明思路：在证明之前，引入几个辅助变量：
将余强制性不等式左侧 $[\nabla f(x_1) - \nabla f(x_2)]^T (x_1 - x_2)$ 记作 $\Delta$ ，并将其分解为如下形式：

其中将 $x_1 - x_2$ 转化成 $x_2 - x_1)$ ,并将负号提出来。
其中 $[\nabla f(x_1) - \nabla f(x_2)]^T = \left\{[\nabla f(x_1)]^T - [\nabla f(x_2)]^T\right\}$ 。
$\begin{aligned} \Delta & = \underbrace{[f(x_1) + f(x_2)] - [f(x_1) + f(x_2)]}_{=0} - \left\{[\nabla f(x_1)]^T - [\nabla f(x_2)]^T\right\}(x_2 - x_1) \\ & = \underbrace{f(x_2) - \{f(x_1) + [\nabla f(x_1)]^T (x_2 - x_1)\}}_{\Delta_1} + \underbrace{f(x_1) - \left\{f(x_2) + [\nabla f(x_2)]^T(x_1 - x_2)\right\}}_{\Delta_2} \\ & = \Delta_1 + \Delta_2 \end{aligned}$

可以在图像中描述出 $\Delta_1,\Delta_2$ 的表示：

其中 $f(x_1) + [\nabla f(x_1)]^T (x_2 - x_1)$ 表示过点 $x_1$ 的 $f(\cdot)$ 的切线，与 $x= x_2$ 相交后，到点 $x_2$ 的距离。见黄色实线部分；
对应 $\Delta_1$ 则表示： $f(x_2)$ 与 $f(x_1) + [\nabla f(x_1)]^T (x_2 - x_1)$ 之间的距离差值。见红色实线部分。
同理，关于 $\Delta_2$ 的图像描述表示为：
对应的 $\Delta_2$ 表示为图中的绿色实线部分。

如果 $\Delta_1$ 或者 $\Delta_2$ 满足： $\begin{aligned}\Delta_1;\Delta_2 \geq \frac{1}{2\mathcal L} ||\nabla f(x_1) - \nabla f(x_2)||^2\end{aligned}$ 即可。

证明过程：
这里以 $\Delta_1$ 为例，将 $\Delta_1$ 展开，有：
$\begin{aligned} \Delta_1 & = \underbrace{f(x_2) - [\nabla f(x_1)]^T x_2}_{1} - \underbrace{\left\{f(x_1) - [\nabla f(x_1)]^T x_1 \right\}}_{2} \end{aligned}$
可以发现，上述的 $1, 2$ 两个部分存在相同的格式。因此假设一个函数：
关于函数 $\mathcal H_{x_1}(\mathcal Z)$ ,其中 $\mathcal Z$ 是自变量，而内部的 $x_1$ 被视作可变参数。
$\mathcal H_{x_1}(\mathcal Z) = f(\mathcal Z) - [\nabla f(x_1)]^T \mathcal Z$
从而 $\Delta_1$ 可表示为：
$\Delta_1 = \mathcal H_{x_1}(x_2) - \mathcal H_{x_1}(x_1)$
观察 $\mathcal H_{x_1}(\mathcal Z)$ 函数，其中 $f(\mathcal Z)$ 是关于 $\mathcal Z$ 的凸函数；而 $-[\nabla f(x_1)]^T \mathcal Z$ 本质上是关于 $\mathcal Z$ 的一次函数，自然也是凸函数。根据保凸运算可知， $\mathcal H_{x_1}(\mathcal Z)$ 一定是一个凸函数；并且由于 $f(\mathcal Z)$ 与 $-[\nabla f(x_1)]^T \mathcal Z$ 均在 $\mathcal Z$ 定义域内可微，因而 $\mathcal H_{x_1}(\mathcal Z)$ 同样可微。因而 $\mathcal H_{x_1}(\mathcal Z)$ 关于 $\mathcal Z$ 的梯度 $\nabla \mathcal H_{x_1}(\mathcal Z)$ 可表示为：
$\begin{aligned}\nabla \mathcal H_{x_1}(\mathcal Z) = \nabla f(\mathcal Z) - \nabla f(x_1) \end{aligned}$
当 $\mathcal Z = x_1$ 时，有： $\nabla \mathcal H_{x_1}(x_1) = 0$ 。这意味着： $\mathcal Z = x_1$ 是函数 $\mathcal H_{x_1}(\mathcal Z)$ 的极值点。而又因为 $\mathcal H_{x_1}(\mathcal Z)$ 的凸函数性质，因而该点一定是最小值点。记 $\mathcal H_{x_1}(\mathcal Z)$ 的最小值结果为 $\mathcal H_{x_1}^*$ ，从而可得：
$\mathcal H_{x_1}^* = \mathcal H_{x_1}(x_1)$
根据条件 $2$ ： $\begin{aligned}\mathcal G(\mathcal Z) = \frac{\mathcal L}{2} \mathcal Z^T \mathcal Z - f(\mathcal Z) \end{aligned}$ 是凸函数，将 $f(\mathcal Z) = \mathcal H_{x_1}(\mathcal Z) + [\nabla f(x_1)]^T \mathcal Z$ 代入到条件 $2$ 中有：
这里将变量符号 $x$ 替换成变量符号 $\mathcal Z$ ,便于下面的计算，并将 $\mathcal Z^T\mathcal Z$ 使用 $||\mathcal Z||^2$ 替代。
$\begin{aligned} \mathcal G(\mathcal Z) & = \frac{\mathcal L}{2}||\mathcal Z||^2 - f(\mathcal Z) \\ & = \frac{\mathcal L}{2}||\mathcal Z||^2 - \mathcal H_{x_1}(\mathcal Z) - [\nabla f(x_1)]^T \mathcal Z \\ & \quad \\ \Rightarrow \mathcal G(\mathcal Z) + & [\nabla f(x_1)]^T \mathcal Z = \frac{\mathcal L}{2}||\mathcal Z||^2 - \mathcal H_{x_1}(\mathcal Z) \end{aligned}$
观察上式的等号左侧： $\mathcal G(\mathcal Z) + [\nabla f(x_1)]^T \mathcal Z$ ，同样可以如法炮制 $\mathcal H_{x_1}(\mathcal Z) = f(\mathcal Z) + [\nabla f(x_1)]^T \mathcal Z$ 一样，定义一个符号 $\mathcal G_{x_1}(\mathcal Z)$ ，使得：
$\mathcal G_{x_1}(\mathcal Z) = \mathcal G(\mathcal Z) + [\nabla f(x_1)]^T \mathcal Z$
观察 $\mathcal G_{x_1}(\mathcal Z)$ 的相关性质：

关于第一项，根据条件 $2$ 描述： $\mathcal G(\mathcal Z)$ 自身是凸函数，可微；
关于第二项与 $\mathcal H_{x_1}(\mathcal Z)$ 的第二项相同：关于 $\mathcal Z$ 的一次函数 $[\nabla f(x_1)]^T \mathcal Z$ 同样是凸函数，并在自身定义域内可微。

综上，依然可以根据保凸运算，关于函数 $\mathcal G_{x_1}(\mathcal Z)$ 也是凸函数，并在定义域内可微。从而该函数的梯度 $\nabla \mathcal G_{x_1}(\mathcal Z)$ 表示如下：
$\begin{aligned} \nabla \mathcal G_{x_1}(\mathcal Z) & = \frac{\mathcal L}{2} \cdot 2 \cdot \mathcal Z - \nabla \mathcal H_{x_1}(\mathcal Z) \\ & = \mathcal L \cdot \mathcal Z - \nabla \mathcal H_{x_1}(\mathcal Z) \end{aligned}$
根据 $\mathcal G_{x_1}(\mathcal Z)$ 凸函数的性质，在 $\mathcal Z$ 定义域内取 $z_1 \leq z_2,z_1,z_2 \in \mathbb R$ ，必然有：
$\mathcal G_{x_1}(z_2) \geq \mathcal G_{x_1}(z_1) + \left[\nabla \mathcal G_{x_1}(z_1)\right]^T(z_2 - z_1)$
从上述图像中观察更加直观。也就是说： $\Delta_1 \geq 0$ 恒成立。将上述 $\begin{aligned}\mathcal G_{x_1}(\mathcal Z) = \frac{\mathcal L}{2}||\mathcal Z||^2 - \mathcal H_{x_1}(\mathcal Z)\end{aligned}$ 代入，有：
$\underbrace{\frac{\mathcal L}{2} ||z_2||^2 - \mathcal H_{x_1}(z_2)}_{\mathcal G_{x_1}(z_2)} \geq \underbrace{\frac{\mathcal L}{2}||z_1||^2 - \mathcal H_{x_1}(z_1)}_{\mathcal G_{x_1}(x_1)} + \underbrace{[\mathcal L \cdot z_1 - \nabla \mathcal H_{x_1}(z_1)]^T}_{[\mathcal G_{x_1}(z_1)]^T} \cdot (z_2 - z_1)$
至此，描述 $\mathcal G_{x_1}(\mathcal Z)$ 凸函数性质的式子全部由 $\mathcal H_{x_1}(\mathcal Z)$ 进行代替。经过整理，有：
对比一下二次上界引理,它们确实比较相似，但并不是。因为 $\begin{aligned}\frac{\mathcal L}{2}||z_2||^2 - \frac{\mathcal L}{2}||z_1||^2\end{aligned}$ 与 $\begin{aligned}\frac{\mathcal L}{2}||z_2 - z_1||^2\end{aligned}$ 绝大多数情况不相等。
$\mathcal H_{x_1}(z_2) \leq \frac{\mathcal L}{2}||z_2||^2 - \frac{\mathcal L}{2} ||z_1||^2 + \mathcal H_{x_1}(z_1) + \left[\nabla \mathcal H_{x_1}(z_1) - \mathcal L \cdot z_1\right]^T(z_2 - z_1)$
但该式子并不影响我们使用二次上界引理中的操作：将 $z_1$ 视作上一次迭代产生的数值解，因而 $z_1$ 是已知项，从而不等式右侧是关于 $z_2$ 的函数，记作 $\phi(z_2)$ ：
$\mathcal H_{x_1}(z_2) \leq \phi(z_2) \triangleq \frac{\mathcal L}{2}||z_2||^2 - \frac{\mathcal L}{2} ||z_1||^2 + \mathcal H_{x_1}(z_1) + \left[\nabla \mathcal H_{x_1}(z_1) - \mathcal L \cdot z_1\right]^T(z_2 - z_1)$
再次观察 $\phi(z_2)$ 中与 $z_2$ 相关的项(其中仅与 $z_1$ 相关的项被视作常数)：

$\begin{aligned}\frac{\mathcal L}{2}||z_2||^2\end{aligned}$ 是关于 $z_2$ 的二次项，是凸函数；且二次项系数 $\begin{aligned}\frac{\mathcal L}{2} \geq 0\end{aligned}$ ，必然存在最小值；
$\left[\nabla \mathcal H_{x_1}(z_1) - \mathcal L \cdot z_1\right]^T(z_2 - z_1)$ 是关于 $z_1$ 的一次函数，同样是凸函数。

最终通过保凸运算，能够确定 $\phi(z_2)$ 是一个凸二次函数。由于 $\mathcal H_{x_1}(z_2) \leq \phi(z_2)$ ，必然也小于 $\phi(z_2)$ 的最小值，也就是下界 $\inf \{\phi(z_2)\} = \mathop{\min} \phi(z_2)$ ：
$\mathcal H_{x_1}(z_2) \leq \inf \{\phi(z_2)\}$
下面关于 $\inf\{\phi(z_2)\}$ 进行求解：

求解梯度 $\nabla \phi(z_2)$ ：
$\nabla \phi(z_2) = \mathcal L \cdot z_2 + \nabla \mathcal H_{x_1}(z_1) - \mathcal L \cdot z_1$
令 $\nabla \phi(z_2) \triangleq 0$ ，有：
也就是说： $\phi(z_{2;min}) = \min \phi(z_2)$ 。
$z_{2;min} =z_1 - \frac{\nabla \mathcal H_{x_1}(z_1)}{\mathcal L}$
将 $z_{2;min}$ 带回原式，得到 $\min \phi(z_2)$ 有：
$\phi(z_{2;min}) = \frac{\mathcal L}{2} ||\frac{\mathcal L\cdot z_1 - \nabla \mathcal H_{x_1}(z_1)}{\mathcal L}||^2 - \frac{\mathcal L}{2}||z_1||^2 + \mathcal H_{x_1}(z_1) + [\nabla \mathcal H_{x_1}(z_1) - \mathcal L \cdot z_1]^T\left[- \frac{\nabla \mathcal H_{x_1}(z_1)}{\mathcal L}\right]$
很明显，只剩下了已知项 $z_1$ 。整理有：
- 提出公因式 $\begin{aligned}\frac{1}{2\mathcal L}[\mathcal L \cdot z_1 - \nabla \mathcal H_{x_1}(z_1)]\end{aligned}$
- 使用乘法分配律~
  $\begin{aligned} \phi(z_{2;min}) & = \frac{1}{2\mathcal L}||\mathcal L \cdot z_1 - \nabla \mathcal H_{x_1}(z_1)||^2 - \frac{\mathcal L}{2}||z_1||^2 + \mathcal H_{x_1}(z_1) + \frac{1}{\mathcal L} [\mathcal L \cdot z_1 - \nabla \mathcal H_{x_1}(z_1)]^T \nabla \mathcal H_{x_1}(z_1) \\ & = \frac{1}{2\mathcal L} [\mathcal L \cdot z_1 - \nabla \mathcal H_{x_1}(z_1)]^T \left\{\mathcal L \cdot z_1 - \nabla \mathcal H_{x_1}(z_1) + 2 \nabla \mathcal H_{x_1}(z_1)\right\} + h_{x_1}(z_1) - \frac{\mathcal L}{2}||z_1||^2 \\ & = \frac{1}{2\mathcal L} \underbrace{[\mathcal L \cdot z_1 - \nabla \mathcal H_{x_1}(z_1)]^T \left\{\mathcal L \cdot z_1 + \nabla \mathcal H_{x_1}(z_1) \right\}}_{分配律} + h_{x_1}(z_1) - \frac{\mathcal L}{2}||z_1||^2 \\ & = \frac{1}{2\mathcal L} \left[\mathcal L^2 \cdot ||z_1||^2 - ||\nabla \mathcal H_{x_1}(z_1)||^2\right] + \mathcal H_{x_1}(z_1) - \frac{\mathcal L}{2}||z_1||^2 \\ & = \mathcal H_{x_1}(z_1) - \frac{1}{2\mathcal L}||\nabla \mathcal H_{x_1}(z_1)||^2 \end{aligned}$

至此，我们找到了关于 $\mathcal H_{x_1}(z_2)$ 的二次上界：
$\mathcal H_{x_1}(z_2) \leq \mathcal H_{x_1}(z_1) - \frac{1}{2\mathcal L}||\nabla \mathcal H_{x_1}(z_1)||^2$
在 $\mathcal H_{x_1}(\cdot)$ 函数的收敛过程中，其最小值 $\mathcal H_{x_1}^*$ 必然有：
通过数值解只能无限接近最小值。
$\mathcal H_{x_1}^* \leq \mathcal H_{x_1}(z_2) \leq \mathcal H_{x_1}(z_1) - \frac{1}{2\mathcal L}||\nabla \mathcal H_{x_1}(z_1)||^2$
因为 $\mathcal H_{x_1}(\cdot)$ 函数在 $x_1$ 处取得最小值： $\mathcal H_{x_1}(x_1) = \mathcal H_{x_1}^*$ ，并且 $z_1$ 与 $x_1$ 定义域相同，不妨设： $z_1 = x_2$ ，有：
$\begin{aligned} & \mathcal H_{x_1}(x_1) \leq \mathcal H_{x_1}(x_2) - \frac{1}{2\mathcal L}||\nabla \mathcal H_{x_1}(x_2)||^2 \\ \Rightarrow & \mathcal H_{x_1}(x_2) - \mathcal H_{x_1}(x_1) \geq \frac{1}{2\mathcal L}||\nabla \mathcal H_{x_1}(x_2)||^2 \end{aligned}$
由于 $\Delta_1 = \mathcal H_{x_1}(x_2) - \mathcal H_{x_1}(x_1)$ ，因而最终有：
将 $\nabla \mathcal H_{x_1}(\mathcal Z = x_2) = \nabla f(x_2) - \nabla f(x_1)$ 代入：
$\begin{aligned} \Delta_1 & \geq \frac{1}{2\mathcal L}||\nabla \mathcal H_{x_1}(x_2)||^2 \\ & = \frac{1}{2\mathcal L} ||\nabla f(x_2) - \nabla f(x_1)||^2 \\ & = \frac{1}{2\mathcal L} ||\nabla f(x_1) - \nabla f(x_2)||^2 \end{aligned}$
当然，这仅仅证明了一半，我们同样需要针对 $\Delta_2$ 执行上述流程：
和上述流程完全相同，只不过可变参数由 $x_1$ 变成了 $x_2$ ,这里不再赘述。
$\begin{aligned} \Delta_2 & = [f(x_1) - f(x_2)] - \left\{[\nabla f(x_2)]^T x_1 - [\nabla f(x_2)]^T x_2 \right\} \\ & = \underbrace{f(x_1) - [\nabla f(x_2)]^T x_1}_{1} - \underbrace{\{f(x_2) - [\nabla f(x_2)]^T x_2\}}_{2} \\ & = \mathcal H_{x_2}(x_1) - \mathcal H_{x_2}(x_2) \end{aligned}$
最终也可以得到一个类似结果：
$\Delta_2 \geq \frac{1}{2\mathcal L} ||\nabla f(x_1) - \nabla f(x_2)||^2$
从而最终可得：
$\begin{aligned} \Delta_1 + \Delta_2 & \geq 2 \cdot \frac{1}{2\mathcal L}||\nabla f(x_1) - \nabla f(x_2)||^2 \\ & = \frac{1}{\mathcal L} ||\nabla f(x_1) - \nabla f(x_2)||^2 \end{aligned}$
即：
$[\nabla f(x_1) - \nabla f(x_2)]^T(x_1 - x_2) \geq \frac{1}{\mathcal L} ||\nabla f(x_1) - \nabla f(x_2)||^2$
即梯度函数 $\nabla f(\cdot)$ 具备余强制性，证毕。

相关参考：
【优化算法】梯度下降法-白老爹定理（上）
【优化算法】梯度下降法-白老爹定理（下）

机器学习笔记之优化算法(十五)Baillon Haddad Theorem简单认识

机器学习笔记之优化算法——Baillon Haddad Theorem简单认识引言 Baillon Haddad Theorem \text{Baillon Haddad Theorem} Baillon Haddad Theorem简单认识证明过程证明：条件 1 ⇒ 1 \Rightarrow 1⇒ 条件 2 2 2证明：条件 3 ⇒ 3 \Rightarrow 3⇒条件 1…...

编程日记 2023/8/19 7:53:33

HighTec工程用命令行编译

当工程中含有太多模型生成的代码的时候，如果修改了一部分代码，HighTec自带的编译器编译时间会非常的慢，有的需要半个小时甚至一个小时，这是因为每次修改之后HighTec都会从头重新检索更新，太浪费时间了，于是…...

编程日记 2023/8/19 7:52:31

【C语言】每日一题（找到所有数组中消失的数字）

找到所有数组中消失的数字，链接奉上。这里简单说一下，因为还没有接触到动态内存，数据结构，所以知识有限，也是尽力而为，结合题库的评论区找到了适合我的解法，以后有机会，会补上各种…...

编程日记 2023/8/19 7:51:30

PostgreSql 备份恢复

一、概述数据库备份一般可分为物理备份和逻辑备份，其中物理备份又可分为物理冷备和物理热备，下面就各种备份方式进行详细说明（一般情况下，生产环境采取的定时物理热备逻辑备份的方式，均是以下述方式为基础进一步研发编…...

编程日记 2023/8/19 7:50:29

鲲鹏916/920处理器性能比较

CPUKunpeng916Kunpeng920指令集Cotex-A75TaiShan-V110主频2.4GHz2.6GHz/3.0GHz核数3224/32/48/64CacheL1: 48 KB instruction cache and 32 KB data cache L2: 256 KB private per core L3: 32 MB L1: 64 KB instruction cache and 64 KB data cache L2: 512 KB private per co…...

编程日记 2023/8/19 7:49:27

《Go 语言第一课》课程学习笔记（八）

基本数据类型 Go 原生支持的数值类型有哪些？ Go 语言的类型大体可分为基本数据类型、复合数据类型和接口类型这三种。其中，我们日常 Go 编码中使用最多的就是基本数据类型，而基本数据类型中使用占比最大的又是数值类型。整型 Go 语言的…...

编程日记 2023/8/19 7:48:25

管理类联考——逻辑——真题篇——按知识分类——汇总篇——一、形式逻辑——联选言

文章目录第五节　联言+选言-摩根定理-非(A或B)=非A且非B，非(A且B)=非A或非B真题（2013-49）-联言+选言-摩根定理-非(A或B)=非A且非B，非(A且B)=非A或非B真题（2012-33）-联言+选言-摩根定理-非(A或B)=非A且非B，非(A且B)=非A或非B真题（2014-42）-联言+选言-摩根定理-非(A或B…...

编程日记 2023/8/19 7:47:23

CAS 一些隐藏的知识，您了解吗

目录 ConcurrentHashMap 一定是线程安全的吗 CAS 机制的注意事项使用java 并行流 ，您要留意了 ConcurrentHashMap 在JDK1.8中ConcurrentHashMap 内部使用的是数组加链表加红黑树的结构，通过CASvolatile或synchronized的方式来保证线程安全的,这些原理…...

编程日记 2023/8/19 7:46:21

ChatGPT逐句逐句地解释代码并分析复杂度的提示词prompt

前提安装chrome 插件 AI Prompt Genius， 请参考 3 个 ChatGPT 插件您需要立即下载你是首席软件工程师。请解释这段代码：{{code}} 添加注释并重写代码，用注释解释每一行代码的作用。最后分析复杂度。快捷键 / 选择 Explain Code 输入代码提…...

编程日记 2023/8/19 7:45:20

【Lua语法】算术、条件、逻辑、位、三目运算符

1.算术运算符加减乘除取余： - * / % Lua中独有的：幂运算 ^ 注意： 1.Lua中没有自增自减(、–)，也没有复合运算符(、-) 2.Lua中字符串可以进行算术运算符操作，会自动转成number 如：“10.3” 1 结果为11.3…...

编程日记 2023/8/19 7:44:18

Cygwin 配置C/C++编译环境以及如何编译项目

文章目录一、安装C、C编译环境需要的包1. 选择gcc-core、gcc-g2. 选择gdb3. 选择mingw64下的gcc-core、gcc-g4. 选择make5. 选择cmake6. 确认更改7. 查看包安装状态二、C、C 项目编译示例step1：解压缩sed-4.9.tar.gzstep2：执行./configure生成Makefile…...

编程日记 2023/8/19 7:43:18

回归预测 | MATLAB实现FA-BP萤火虫算法优化BP神经网络多输入单输出回归预测（多指标，多图）

回归预测 | MATLAB实现FA-BP萤火虫算法优化BP神经网络多输入单输出回归预测（多指标，多图） 目录回归预测 | MATLAB实现FA-BP萤火虫算法优化BP神经网络多输入单输出回归预测（多指标，多图）效果一览基本介绍程…...

编程日记 2023/8/19 7:42:16

【100天精通python】Day39：GUI界面编程_PyQt 从入门到实战（下）_图形绘制和动画效果，数据可视化，刷新交互

目录专栏导读 6 图形绘制与动画效果 6.1 绘制基本图形、文本和图片 6.2 实现动画效果和过渡效果 7 数据可视化 7.1 使用 Matplotlib绘制图表 7.2 使用PyQtGraph绘制图表 7.3 数据的实时刷新和交互操作 7.3.1 数据的实时刷新 7.3.2 交互操作 7.4 自定义数据可视化…...

编程日记 2023/8/19 7:41:15

Java课题笔记~ Ajax

1.1 概述 AJAX (Asynchronous JavaScript And XML)：异步的 JavaScript 和 XML。我们先来说概念中的 JavaScript 和 XML，JavaScript 表明该技术和前端相关；XML 是指以此进行数据交换。 1.1.1 作用 AJAX 作用有以下两方面： 与服…...

编程日记 2023/8/19 7:40:14

调整mysql 最大传输数据 max_allowed_packet=500M

查看 -- show VARIABLES like %max_allowed_packet%; -- set global max_allowed_packet 1024*1024*64;-- show variables like %timeout%; -- show global status like com_kill; show global variables like max_allowed_packet; -- set global max_allowed_packet1024*102…...

编程日记 2023/8/19 7:39:13

机器学习笔记之优化算法(十五)Baillon Haddad Theorem简单认识

机器学习笔记之优化算法——Baillon Haddad Theorem简单认识

引言

$\text{Baillon Haddad Theorem}$ 简单认识

证明过程

证明：条件 $\Rightarrow$ 条件 $2$

证明：条件 $\Rightarrow$ 条件 $1$

证明：条件 $\Rightarrow$ 条件 $3$

相关文章：

机器学习笔记之优化算法(十五)Baillon Haddad Theorem简单认识

HighTec工程用命令行编译

【C语言】每日一题（找到所有数组中消失的数字）

PostgreSql 备份恢复

鲲鹏916/920处理器性能比较

《Go 语言第一课》课程学习笔记（八）

管理类联考——逻辑——真题篇——按知识分类——汇总篇——一、形式逻辑——联选言

CAS 一些隐藏的知识，您了解吗

ChatGPT逐句逐句地解释代码并分析复杂度的提示词prompt

【Lua语法】算术、条件、逻辑、位、三目运算符

Cygwin 配置C/C++编译环境以及如何编译项目

回归预测 | MATLAB实现FA-BP萤火虫算法优化BP神经网络多输入单输出回归预测（多指标，多图）

【100天精通python】Day39：GUI界面编程_PyQt 从入门到实战（下）_图形绘制和动画效果，数据可视化，刷新交互

Java课题笔记~ Ajax

调整mysql 最大传输数据 max_allowed_packet=500M

【工具】删除Chrome安装的“创建快捷方式”

windows上的docker自动化部署到服务器脚本

VoxWeekly｜The Sandbox 生态周报｜20230814

Aurora 8B/10B

如何关闭“若要接收后续google chrome更新,您需使用windows10或更高版本”

宝塔面板如何定期清理日志垃圾_设置计划任务自动清理

Arm Neoverse V2内存架构与PCIe地址管理解析

libiec61850实战：手把手教你用C语言动态获取IED设备模型（附完整代码）

基于电阻分压网络的传感器复用与蓝牙报警系统设计

LLM函数调用工程化：从基础概念到智能体框架设计实战

树莓派驱动MAX31855热电偶传感器：从SPI通信到高精度测温实践

应对2026知网维普算法更新：论文降AI全攻略，实测3款主流工具与手动微调方法

通达信主力进场洗盘拉升出货副图指标公式源码

浏览器智能体开发指南：从语义驱动到LLM集成的自动化实践

利用 Taotoken 多模型能力为 AIGC 应用构建降级容灾方案

机器学习笔记之优化算法——Baillon Haddad Theorem简单认识

引言

Baillon Haddad Theorem \text{Baillon Haddad Theorem} Baillon Haddad Theorem简单认识

证明过程

证明：条件 1 ⇒ 1 \Rightarrow 1⇒ 条件 2 2 2

证明：条件 3 ⇒ 3 \Rightarrow 3⇒条件 1 1 1

证明：条件 2 ⇒ 2 \Rightarrow 2⇒条件 3 3 3

相关文章：

$\text{Baillon Haddad Theorem}$ 简单认识

证明：条件 $\Rightarrow$ 条件 $2$

证明：条件 $\Rightarrow$ 条件 $1$

证明：条件 $\Rightarrow$ 条件 $3$