完整地实现了推荐系统的构建、实验和评估过程,为不同推荐算法在同一数据集上的性能比较提供了可重复实验的框架
{"cells": [{"cell_type": "markdown","metadata": {},"source": ["# 基于用户的协同过滤算法"]},{"cell_type": "code","execution_count": 1,"metadata": {},"outputs": [],"source": ["# 导入包\n","import random\n","import math\n","import time\n","from tqdm import tqdm"]},{"cell_type": "markdown","metadata": {},"source": ["## 一. 通用函数定义"]},{"cell_type": "code","execution_count": 2,"metadata": {},"outputs": [],"source": ["# 定义装饰器,监控运行时间\n","def timmer(func):\n"," def wrapper(*args, **kwargs):\n"," start_time = time.time()\n"," res = func(*args, **kwargs)\n"," stop_time = time.time()\n"," print('Func %s, run time: %s' % (func.__name__, stop_time - start_time))\n"," return res\n"," return wrapper"]},{"cell_type": "markdown","metadata": {},"source": ["### 1. 数据处理相关\n","1. load data\n","2. split data"]},{"cell_type": "code","execution_count": 3,"metadata": {},"outputs": [],"source": ["class Dataset():\n"," \n"," def __init__(self, fp):\n"," # fp: data file path\n"," self.data = self.loadData(fp)\n"," \n"," @timmer\n"," def loadData(self, fp):\n"," data = []\n"," for l in open(fp):\n"," data.append(tuple(map(int, l.strip().split('::')[:2])))\n"," return data\n"," \n"," @timmer\n"," def splitData(self, M, k, seed=1):\n"," '''\n"," :params: data, 加载的所有(user, item)数据条目\n"," :params: M, 划分的数目,最后需要取M折的平均\n"," :params: k, 本次是第几次划分,k~[0, M)\n"," :params: seed, random的种子数,对于不同的k应设置成一样的\n"," :return: train, test\n"," '''\n"," train, test = [], []\n"," random.seed(seed)\n"," for user, item in self.data:\n"," # 这里与书中的不一致,本人认为取M-1较为合理,因randint是左右都覆盖的\n"," if random.randint(0, M-1) == k: \n"," test.append((user, item))\n"," else:\n"," train.append((user, item))\n","\n"," # 处理成字典的形式,user->set(items)\n"," def convert_dict(data):\n"," data_dict = {}\n"," for user, item in data:\n"," if user not in data_dict:\n"," data_dict[user] = set()\n"," data_dict[user].add(item)\n"," data_dict = {k: list(data_dict[k]) for k in data_dict}\n"," return data_dict\n","\n"," return convert_dict(train), convert_dict(test)"]},{"cell_type": "markdown","metadata": {},"source": ["### 2. 评价指标\n","1. Precision\n","2. Recall\n","3. Coverage\n","4. Popularity(Novelty)"]},{"cell_type": "code","execution_count": 4,"metadata": {},"outputs": [],"source": ["class Metric():\n"," \n"," def __init__(self, train, test, GetRecommendation):\n"," '''\n"," :params: train, 训练数据\n"," :params: test, 测试数据\n"," :params: GetRecommendation, 为某个用户获取推荐物品的接口函数\n"," '''\n"," self.train = train\n"," self.test = test\n"," self.GetRecommendation = GetRecommendation\n"," self.recs = self.getRec()\n"," \n"," # 为test中的每个用户进行推荐\n"," def getRec(self):\n"," recs = {}\n"," for user in self.test:\n"," rank = self.GetRecommendation(user)\n"," recs[user] = rank\n"," return recs\n"," \n"," # 定义精确率指标计算方式\n"," def precision(self):\n"," all, hit = 0, 0\n"," for user in self.test:\n"," test_items = set(self.test[user])\n"," rank = self.recs[user]\n"," for item, score in rank:\n"," if item in test_items:\n"," hit += 1\n"," all += len(rank)\n"," return round(hit / all * 100, 2)\n"," \n"," # 定义召回率指标计算方式\n"," def recall(self):\n"," all, hit = 0, 0\n"," for user in self.test:\n"," test_items = set(self.test[user])\n"," rank = self.recs[user]\n"," for item, score in rank:\n"," if item in test_items:\n"," hit += 1\n"," all += len(test_items)\n"," return round(hit / all * 100, 2)\n"," \n"," # 定义覆盖率指标计算方式\n"," def coverage(self):\n"," all_item, recom_item = set(), set()\n"," for user in self.test:\n"," for item in self.train[user]:\n"," all_item.add(item)\n"," rank = self.recs[user]\n"," for item, score in rank:\n"," recom_item.add(item)\n"," return round(len(recom_item) / len(all_item) * 100, 2)\n"," \n"," # 定义新颖度指标计算方式\n"," def popularity(self):\n"," # 计算物品的流行度\n"," item_pop = {}\n"," for user in self.train:\n"," for item in self.train[user]:\n"," if item not in item_pop:\n"," item_pop[item] = 0\n"," item_pop[item] += 1\n","\n"," num, pop = 0, 0\n"," for user in self.test:\n"," rank = self.recs[user]\n"," for item, score in rank:\n"," # 取对数,防止因长尾问题带来的被流行物品所主导\n"," pop += math.log(1 + item_pop[item])\n"," num += 1\n"," return round(pop / num, 6)\n"," \n"," def eval(self):\n"," metric = {'Precision': self.precision(),\n"," 'Recall': self.recall(),\n"," 'Coverage': self.coverage(),\n"," 'Popularity': self.popularity()}\n"," print('Metric:', metric)\n"," return metric"]},{"cell_type": "markdown","metadata": {},"source": ["## 二. 算法实现\n","1. Random\n","2. MostPopular\n","3. UserCF\n","4. UserIIF"]},{"cell_type": "code","execution_count": 5,"metadata": {},"outputs": [],"source": ["# 1. 随机推荐\n","def Random(train, K, N):\n"," '''\n"," :params: train, 训练数据集\n"," :params: K, 可忽略\n"," :params: N, 超参数,设置取TopN推荐物品数目\n"," :return: GetRecommendation,推荐接口函数\n"," '''\n"," items = {}\n"," for user in train:\n"," for item in train[user]:\n"," items[item] = 1\n"," \n"," def GetRecommendation(user):\n"," # 随机推荐N个未见过的\n"," user_items = set(train[user])\n"," rec_items = {k: items[k] for k in items if k not in user_items}\n"," rec_items = list(rec_items.items())\n"," random.shuffle(rec_items)\n"," return rec_items[:N]\n"," \n"," return GetRecommendation"]},{"cell_type": "code","execution_count": 6,"metadata": {},"outputs": [],"source": ["# 2. 热门推荐\n","def MostPopular(train, K, N):\n"," '''\n"," :params: train, 训练数据集\n"," :params: K, 可忽略\n"," :params: N, 超参数,设置取TopN推荐物品数目\n"," :return: GetRecommendation, 推荐接口函数\n"," '''\n"," items = {}\n"," for user in train:\n"," for item in train[user]:\n"," if item not in items:\n"," items[item] = 0\n"," items[item] += 1\n"," \n"," def GetRecommendation(user):\n"," # 随机推荐N个没见过的最热门的\n"," user_items = set(train[user])\n"," rec_items = {k: items[k] for k in items if k not in user_items}\n"," rec_items = list(sorted(rec_items.items(), key=lambda x: x[1], reverse=True))\n"," return rec_items[:N]\n"," \n"," return GetRecommendation"]},{"cell_type": "code","execution_count": 7,"metadata": {},"outputs": [],"source": ["# 3. 基于用户余弦相似度的推荐\n","def UserCF(train, K, N):\n"," '''\n"," :params: train, 训练数据集\n"," :params: K, 超参数,设置取TopK相似用户数目\n"," :params: N, 超参数,设置取TopN推荐物品数目\n"," :return: GetRecommendation, 推荐接口函数\n"," '''\n"," # 计算item->user的倒排索引\n"," item_users = {}\n"," for user in train:\n"," for item in train[user]:\n"," if item not in item_users:\n"," item_users[item] = []\n"," item_users[item].append(user)\n"," \n"," # 计算用户相似度矩阵\n"," sim = {}\n"," num = {}\n"," for item in item_users:\n"," users = item_users[item]\n"," for i in range(len(users)):\n"," u = users[i]\n"," if u not in num:\n"," num[u] = 0\n"," num[u] += 1\n"," if u not in sim:\n"," sim[u] = {}\n"," for j in range(len(users)):\n"," if j == i: continue\n"," v = users[j]\n"," if v not in sim[u]:\n"," sim[u][v] = 0\n"," sim[u][v] += 1\n"," for u in sim:\n"," for v in sim[u]:\n"," sim[u][v] /= math.sqrt(num[u] * num[v])\n"," \n"," # 按照相似度排序\n"," sorted_user_sim = {k: list(sorted(v.items(), \\\n"," key=lambda x: x[1], reverse=True)) \\\n"," for k, v in sim.items()}\n"," \n"," # 获取接口函数\n"," def GetRecommendation(user):\n"," items = {}\n"," seen_items = set(train[user])\n"," for u, _ in sorted_user_sim[user][:K]:\n"," for item in train[u]:\n"," # 要去掉用户见过的\n"," if item not in seen_items:\n"," if item not in items:\n"," items[item] = 0\n"," items[item] += sim[user][u]\n"," recs = list(sorted(items.items(), key=lambda x: x[1], reverse=True))[:N]\n"," return recs\n"," \n"," return GetRecommendation"]},{"cell_type": "code","execution_count": 8,"metadata": {},"outputs": [],"source": ["# 4. 基于改进的用户余弦相似度的推荐\n","def UserIIF(train, K, N):\n"," '''\n"," :params: train, 训练数据集\n"," :params: K, 超参数,设置取TopK相似用户数目\n"," :params: N, 超参数,设置取TopN推荐物品数目\n"," :return: GetRecommendation, 推荐接口函数\n"," '''\n"," # 计算item->user的倒排索引\n"," item_users = {}\n"," for user in train:\n"," for item in train[user]:\n"," if item not in item_users:\n"," item_users[item] = []\n"," item_users[item].append(user)\n"," \n"," # 计算用户相似度矩阵\n"," sim = {}\n"," num = {}\n"," for item in item_users:\n"," users = item_users[item]\n"," for i in range(len(users)):\n"," u = users[i]\n"," if u not in num:\n"," num[u] = 0\n"," num[u] += 1\n"," if u not in sim:\n"," sim[u] = {}\n"," for j in range(len(users)):\n"," if j == i: continue\n"," v = users[j]\n"," if v not in sim[u]:\n"," sim[u][v] = 0\n"," # 相比UserCF,主要是改进了这里\n"," sim[u][v] += 1 / math.log(1 + len(users))\n"," for u in sim:\n"," for v in sim[u]:\n"," sim[u][v] /&#相关文章:
完整地实现了推荐系统的构建、实验和评估过程,为不同推荐算法在同一数据集上的性能比较提供了可重复实验的框架
{"cells": [{"cell_type": "markdown","metadata": {},"source": ["# 基于用户的协同过滤算法"]},{"cell_type": "code","execution_count": 1,"metadata": {},"ou…...
DRV8311三相PWM无刷直流电机驱动器
1 特性 • 三相 PWM 电机驱动器 – 三相无刷直流电机 • 3V 至 20V 工作电压 – 24V 绝对最大电压 • 高输出电流能力 – 5A 峰值电流驱动能力 • 低导通状态电阻 MOSFET – TA 25C 时,RDS(ON) (HS LS) 为210mΩ(典型值) • 低功耗睡眠模式…...
Mysql--运维篇--备份和恢复(逻辑备份,mysqldump,物理备份,热备份,温备份,冷备份,二进制文件备份和恢复等)
MySQL 提供了多种备份方式,每种方式适用于不同的场景和需求。根据备份的粒度、速度、恢复时间和对数据库的影响,可以选择合适的备份策略。主要备份方式有三大类:逻辑备份(mysqldump),物理备份和二进制文件备…...
机器学习-归一化
文章目录 一. 归一化二. 归一化的常见方法1. 最小-最大归一化 (Min-Max Normalization)2. Z-Score 归一化(标准化)3. MaxAbs 归一化 三. 归一化的选择四. 为什么要进行归一化1. 消除量纲差异2. 提高模型训练速度3. 增强模型的稳定性4. 保证正则化项的有效…...
Linux 串口检查状态的实用方法
在 Linux 系统中,串口通信是非常常见的操作,尤其在嵌入式系统、工业设备以及其他需要串行通信的场景中。为了确保串口设备的正常工作,检查串口的连接状态和配置信息是非常重要的。本篇文章将介绍如何在 Linux 上检查串口的连接状态࿰…...
Qt的核心机制概述
Qt的核心机制概述 1. 元对象系统(The Meta-Object System) 基本概念:元对象系统是Qt的核心机制之一,它通过moc(Meta-Object Compiler)工具为继承自QObject的类生成额外的代码,从而扩展了C语言…...
微调神经机器翻译模型全流程
MBART: Multilingual Denoising Pre-training for Neural Machine Translation 模型下载 mBART 是一个基于序列到序列的去噪自编码器,使用 BART 目标在多种语言的大规模单语语料库上进行预训练。mBART 是首批通过去噪完整文本在多种语言上预训练序列到序列模型的方…...
Cesium加载地形
Cesium的地形来源大致可以分为两种,一种是由Cesium官方提供的数据源,一种是第三方的数据源,官方源依赖于Cesium Assets,如果设置了AccessToken后,就可以直接使用Cesium的地形静态构造方法来获取数据源CesiumTerrainPro…...
gitlab runner正常连接 提示 作业挂起中,等待进入队列 解决办法
方案1 作业挂起中,等待进入队列 重启gitlab-runner gitlab-runner stop gitlab-runner start gitlab-runner run方案2 启动 gitlab-runner 服务 gitlab-runner start成功启动如下 [rootdocserver home]# gitlab-runner start Runtime platform …...
C#对动态加载的DLL进行依赖注入,并对DLL注入服务
文章目录 什么是依赖注入概念常用的依赖注入实现什么是动态加载定义示例对动态加载的DLL进行依赖注入什么是依赖注入 概念 依赖注入(Dependency Injection,简称 DI)是一种软件设计模式,用于解耦软件组件之间的依赖关系。在 C# 开发中,它主要解决的是类与类之间的强耦合问题…...
HDMI接口
HDMI接口 前言各版本区别概述(Overview)接口接口类型Type A/E 引脚定义Type B 引脚定义Type C 引脚定义Type D 引脚定义 传输流程概述Control Period前导码字符边界同步Control Period 编/解码 Data Island PeriodLeading/Trailing Guard BandTERC4 编/解…...
A/B 测试:玩转假设检验、t 检验与卡方检验
一、背景:当“审判”成为科学 1.1 虚拟场景——法庭审判 想象这样一个场景:有一天,你在王国里担任“首席审判官”。你面前站着一位嫌疑人,有人指控他说“偷了国王珍贵的金冠”。但究竟是他干的,还是他是被冤枉的&…...
第143场双周赛:最小可整除数位乘积 Ⅰ、执行操作后元素的最高频率 Ⅰ、执行操作后元素的最高频率 Ⅱ、最小可整除数位乘积 Ⅱ
Q1、最小可整除数位乘积 Ⅰ 1、题目描述 给你两个整数 n 和 t 。请你返回大于等于 n 的 最小 整数,且该整数的 各数位之积 能被 t 整除。 2、解题思路 问题拆解: 题目要求我们找到一个整数,其 数位的积 可以被 t 整除。 数位的积 是指将数…...
【STM32】LED状态翻转函数
1.利用状态标志位控制LED状态翻转 在平常编写LED状态翻转函数时,通常利用状态标志位实现LED状态的翻转。如下所示: unsigned char led_turn_flag; //LED状态标志位,1-点亮,0-熄灭/***************************************函…...
uniapp 小程序 textarea 层级穿透,聚焦光标位置错误怎么办?
前言 在开发微信小程序时,使用 textarea 组件可能会遇到一些棘手的问题。最近我在使用 uniapp 开发微信小程序时,就遇到了两个非常令人头疼的问题: 层级穿透:由于 textarea 是原生组件,任何元素都无法遮盖住它。当其…...
汽车 SOA 架构下的信息安全新问题及对策漫谈
摘要:随着汽车行业的快速发展,客户和制造商对车辆功能的新需求促使汽车架构从面向信号向面向服务的架构(SOA)转变。本文详细阐述了汽车 SOA 架构的协议、通信模式,并与传统架构进行对比,深入分析了 SOA 给信…...
Unity-Mirror网络框架-从入门到精通之RigidbodyPhysics示例
文章目录 前言示例一、球体的基础配置二、三个球体的设置差异三、示例意图LatencySimulation前言 在现代游戏开发中,网络功能日益成为提升游戏体验的关键组成部分。本系列文章将为读者提供对Mirror网络框架的深入了解,涵盖从基础到高级的多个主题。Mirror是一个用于Unity的开…...
小程序如何引入腾讯位置服务
小程序如何引入腾讯位置服务 1.添加服务 登录 微信公众平台 注意:小程序要企业版的 第三方服务 -> 服务 -> 开发者资源 -> 开通腾讯位置服务 在设置 -> 第三方设置 中可以看到开通的服务,如果没有就在插件管理中添加插件 2.腾讯位置服务…...
H3CNE-12-静态路由(一)
静态路由应用场景: 静态路由是指由管理员手动配置和维护的路由 路由表:路由器用来妆发数据包的一张“地图” 查看命令: dis ip routing-table 直连路由:接口配置好IP地址并UP后自动生成的路由 静态路由配置: ip…...
多线程锁
在并发编程中,锁(Lock)是一种用于控制多个线程对共享资源访问的机制。正确使用锁可以确保数据的一致性和完整性,避免出现竞态条件(Race Condition)、死锁(Deadlock)等问题。Java 提供…...
Xshell远程连接Kali(默认 | 私钥)Note版
前言:xshell远程连接,私钥连接和常规默认连接 任务一 开启ssh服务 service ssh status //查看ssh服务状态 service ssh start //开启ssh服务 update-rc.d ssh enable //开启自启动ssh服务 任务二 修改配置文件 vi /etc/ssh/ssh_config //第一…...
Mac软件卸载指南,简单易懂!
刚和Adobe分手,它却总在Library里给你写"回忆录"?卸载的Final Cut Pro像电子幽灵般阴魂不散?总是会有残留文件,别慌!这份Mac软件卸载指南,将用最硬核的方式教你"数字分手术"࿰…...
从零开始打造 OpenSTLinux 6.6 Yocto 系统(基于STM32CubeMX)(九)
设备树移植 和uboot设备树修改的内容同步到kernel将设备树stm32mp157d-stm32mp157daa1-mx.dts复制到内核源码目录下 源码修改及编译 修改arch/arm/boot/dts/st/Makefile,新增设备树编译 stm32mp157f-ev1-m4-examples.dtb \stm32mp157d-stm32mp157daa1-mx.dtb修改…...
Java编程之桥接模式
定义 桥接模式(Bridge Pattern)属于结构型设计模式,它的核心意图是将抽象部分与实现部分分离,使它们可以独立地变化。这种模式通过组合关系来替代继承关系,从而降低了抽象和实现这两个可变维度之间的耦合度。 用例子…...
七、数据库的完整性
七、数据库的完整性 主要内容 7.1 数据库的完整性概述 7.2 实体完整性 7.3 参照完整性 7.4 用户定义的完整性 7.5 触发器 7.6 SQL Server中数据库完整性的实现 7.7 小结 7.1 数据库的完整性概述 数据库完整性的含义 正确性 指数据的合法性 有效性 指数据是否属于所定…...
人工智能(大型语言模型 LLMs)对不同学科的影响以及由此产生的新学习方式
今天是关于AI如何在教学中增强学生的学习体验,我把重要信息标红了。人文学科的价值被低估了 ⬇️ 转型与必要性 人工智能正在深刻地改变教育,这并非炒作,而是已经发生的巨大变革。教育机构和教育者不能忽视它,试图简单地禁止学生使…...
JavaScript 数据类型详解
JavaScript 数据类型详解 JavaScript 数据类型分为 原始类型(Primitive) 和 对象类型(Object) 两大类,共 8 种(ES11): 一、原始类型(7种) 1. undefined 定…...
WebRTC从入门到实践 - 零基础教程
WebRTC从入门到实践 - 零基础教程 目录 WebRTC简介 基础概念 工作原理 开发环境搭建 基础实践 三个实战案例 常见问题解答 1. WebRTC简介 1.1 什么是WebRTC? WebRTC(Web Real-Time Communication)是一个支持网页浏览器进行实时语音…...
django blank 与 null的区别
1.blank blank控制表单验证时是否允许字段为空 2.null null控制数据库层面是否为空 但是,要注意以下几点: Django的表单验证与null无关:null参数控制的是数据库层面字段是否可以为NULL,而blank参数控制的是Django表单验证时字…...
安卓基础(Java 和 Gradle 版本)
1. 设置项目的 JDK 版本 方法1:通过 Project Structure File → Project Structure... (或按 CtrlAltShiftS) 左侧选择 SDK Location 在 Gradle Settings 部分,设置 Gradle JDK 方法2:通过 Settings File → Settings... (或 CtrlAltS)…...
