当前位置：首页 > article >正文

Double/Debiased Machine Learning

article 2025/9/26 11:11:43

独立同步分布的观测数据 $\{W_i=(Y_i,D_i,X_i)| i\in \{1,...,n\}\}$ ，其中 $Y_i$ 表示结果变量， $D_i$ 表示因变量， $X_i$ 表示控制变量。

目标参数 $\theta_0$ 的一般定义形式为：

$E[m(W;\theta_0,\eta_0)] = 0$

$W$ 为观测到的变量， $\theta_0\in \Theta$ 为目标参数， $\eta_0\in \mathcal{T}$ 为辅助参数

例如，ATE 的定义为：

$\theta_0^{ATE}\equiv E[E[Y_i|D_i=1,X_i] - E[Y_i|D_i=0,X_i]]$

ATE的IPW估计定义为：

$m_{IPW}(W_i;\theta,\alpha)\equiv \alpha(D_i,X_i)Y_i - \theta \equiv [\frac{D_i}{E[D_i|X_i]} - \frac{1-D_i}{1-E[D_i|X_i]}]Y_i - \theta$

ATE的Doubly Robust估计的定义为：

$m_{DR}(W_i;\theta,\eta)\equiv \alpha(D_i,X_i)(Y_i - E[Y_i|D_i,X_i])Y_i + E[Y_i|D_i=1,X_i]- E[Y_i|D_i=0,X_i]-\theta$

$\equiv [\frac{D_i}{E[D_i|X_i]} - \frac{1-D_i}{1-E[D_i|X_i]}] Y_i + E[Y_i|D_i=1,X_i]- E[Y_i|D_i=0,X_i]-\theta$

一般情况下，目标参数 $\theta_0$ 的估计值定义为：

$\hat{\theta}:\frac{1}{n}\sum_{i=1}^nm(W_i;\hat{\theta},\hat{\eta}) = 0$

一阶泰勒展得出：

$\frac{1}{n}\sum_{i=1}^nm(W_i;\hat{\theta},\hat{\eta}) \approx \frac{1}{n}\sum_{i=1}^nm(W_i;\theta_0,\eta_0) + \frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)(\hat{\theta} - \theta_0) + \frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\eta}m(W_i;\theta_0,\eta_0)(\hat{\eta} - \eta_0) \approx 0$

$(\theta_0 - \hat{\theta})\approx [\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)]^{-1}\frac{1}{n}\sum_{i=1}^nm(W_i;\theta_0,\eta_0) + [\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)]^{-1}(\hat{\eta} - \eta_0)\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\eta}m(W_i;\theta_0,\eta_0)$

目标参数的估计偏差 $(\theta_0 - \hat{\theta})$ 将受到辅助参数估计偏差 $(\hat{\eta} - \eta_0)$ 的影响，说明目标参数的估计偏差的两种来源分别是：

辅助参数的估计偏差 $(\hat{\eta} - \eta_0)$ 本身，称之为正则化偏差
辅助参数的估计偏差 $(\hat{\eta} - \eta_0)$ 与 $W_i$ 的强相关性，称之为过拟合偏差

Neyman Orthogonality

$\frac{\partial}{\partial\lambda}\{E[\psi(W_i;\theta_0,\eta_0 + \lambda(\eta-\eta_0))]\}|_{\lambda=0}= 0,\forall\eta\in \mathcal{T}$

$m_{IPW}$ is not Neyman orthogonal, $m_{DR}$ is Neyman orthogonal.

Cross Fitting

$\hat{\theta}:\frac{1}{n}\sum_{k=1}^K\sum_{i\in I_k}m(W_i;\hat{\theta},\hat{\eta}_{-k}) = 0$

DML

$\hat{\theta}:\frac{1}{n}\sum_{k=1}^K\sum_{i\in I_k}\psi(W_i;\hat{\theta},\hat{\eta}_{-k}) = 0$

直接回归不满足 Neyman 正交性

$\theta T + g(X) + \epsilon$

$m(W;\theta,g) = Y - \theta T - g(X) + \epsilon$

$\frac{\partial }{\partial \lambda}E[m(w;\theta,g + \lambda\Delta g)]|_{\lambda=0} = E[-\Delta g(x)] \ne 0$

DML 满足Neyman正交性

$\theta (T - m(x)) + \epsilon',l(x) = E[Y|X=x],m(x)=E[T|X=x]$

$m(W;\theta,\eta) = Y-l(x) - \theta (T - m(x)) - \epsilon',\eta = (l, m)$

$\frac{\partial}{\partial\lambda}E[W;\theta,\eta + \lambda\Delta\eta]|_{\lambda=0} = E[-\Delta l(x) + \theta\Delta m(x)] = 0$

Example

模拟数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import dowhy.datasets, dowhy.plotter
rvar = 1 if np.random.uniform() > 0.2 else 0
is_linear = False # A non-linear dataset. Change to True to see results for a linear dataset.
data_dict = dowhy.datasets.xy_dataset(10000, effect=rvar,num_common_causes=2,is_linear=is_linear,sd_error=0.2)
df = data_dict['df']
print(df.head())
dowhy.plotter.plot_treatment_outcome(df[data_dict["treatment_name"]], df[data_dict["outcome_name"]],df[data_dict["time_val"]])

请添加图片描述

因果关系假设：

基于领域知识提出因果关系的假设，定义模型结构

from dowhy import CausalModel
model= CausalModel(data=df,treatment=data_dict["treatment_name"],outcome=data_dict["outcome_name"],common_causes=data_dict["common_causes_names"],instruments=data_dict["instrument_names"])
model.view_model(layout="dot")

请添加图片描述

因果关系识别：

identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

因果关系估计：

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor
dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML",control_value = 0,treatment_value = 1,confidence_intervals=False,method_params={"init_params":{'model_y':GradientBoostingRegressor(),'model_t': GradientBoostingRegressor(),"model_final":LassoCV(fit_intercept=False),'featurizer':PolynomialFeatures(degree=2, include_bias=True)},"fit_params":{}})
print(dml_estimate)

因果关系反驳测试：

res_placebo=model.refute_estimate(identified_estimand, dml_estimate,method_name="placebo_treatment_refuter", placebo_type="permute",num_simulations=20)
print(res_placebo)

Double/Debiased Machine Learning

Neyman Orthogonality

Cross Fitting

DML

直接回归不满足 Neyman 正交性

DML 满足Neyman正交性

Example

相关文章：

Double/Debiased Machine Learning

HarmonyOS Next 弹窗系列教程（4）

【C】-递归

飞马LiDAR500雷达数据预处理

Kerberos面试内容整理-在 Linux/Windows 中的 Kerberos 实践

在 Allegro PCB Editor 中取消（解除或删除）已创建的 Module 的操作指南

基于springboot的校园社团信息系统的设计与实现

nodejs里面的http模块介绍和使用

mamba架构和transformer区别

嵌入式鸿蒙开发环境搭建操作方法与实现

在 Spring Boot 中使用 WebFilter：实现请求拦截、日志记录、跨域处理等通用逻辑！

CSS预处理器：Sass与Less的语法和特性（含实际案例）

QT常用控件（1）

明基编程显示器终于有优惠了，程序员快来，错过等一年！

【计算机网络】非阻塞IO——select实现多路转接

Figma 中构建 Master Control Panel (MCP) 的完整设计方案

什么是权威解析服务器？权威解析服务器哪些作用？

LeetCode--23.合并k个升序链表

ComfyUI 工作流

使用glide 同步获取图片

【推荐算法】NeuralCF：深度学习重构协同过滤的革命性突破

负载均衡相关基本概念

服务器中日志分析的作用都有哪些

【React】useId

【51单片机】0. 基础软件安装

集成电路设计：从概念到实现的完整解析优雅草卓伊凡

动态规划之网格图模型（二）

uniapp 集成腾讯云 IM 消息搜索功能

robot_lab——rsl_rl的train.py整体逻辑

AI推荐系统演进史：从协同过滤到图神经网络与强化学习的融合