当前位置: 首页 > news >正文

Constitutional AI

用中文以结构树的方式列出这篇讲稿的知识点:
Although you can use a reward model to eliminate the need for human evaluation during RLHF fine tuning, the human effort required to produce the trained reward model in the first place is huge. The labeled data set used to train the reward model typically requires large teams of labelers, sometimes many thousands of people to evaluate many prompts each. This work requires a lot of time and other resources which can be important limiting factors. As the number of models and use cases increases, human effort becomes a limited resource. Methods to scale human feedback are an active area of research. One idea to overcome these limitations is to scale through model self supervision. Constitutional AI is one approach of scale supervision. First proposed in 2022 by researchers at Anthropic, Constitutional AI is a method for training models using a set of rules and principles that govern the model's behavior. Together with a set of sample prompts, these form the constitution. You then train the model to self critique and revise its responses to comply with those principles. Constitutional AI is useful not only for scaling feedback, it can also help address some unintended consequences of RLHF. For example, depending on how the prompt is structured, an aligned model may end up revealing harmful information as it tries to provide the most helpful response it can. As an example, imagine you ask the model to give you instructions on how to hack your neighbor's WiFi. Because this model has been aligned to prioritize helpfulness, it actually tells you about an app that lets you do this, even though this activity is illegal. Providing the model with a set of constitutional principles can help the model balance these competing interests and minimize the harm. Here are some example rules from the research paper that Constitutional AI I asks LLMs to follow. For example, you can tell the model to choose the response that is the most helpful, honest, and harmless. But you can play some bounds on this, asking the model to prioritize harmlessness by assessing whether it's response encourages illegal, unethical, or immoral activity. Note that you don't have to use the rules from the paper, you can define your own set of rules that is best suited for your domain and use case. When implementing the Constitutional AI method, you train your model in two distinct phases. In the first stage, you carry out supervised learning, to start your prompt the model in ways that try to get it to generate harmful responses, this process is called red teaming. You then ask the model to critique its own harmful responses according to the constitutional principles and revise them to comply with those rules. Once done, you'll fine-tune the model using the pairs of red team prompts and the revised constitutional responses. Let's look at an example of how one of these prompt completion pairs is generated. Let's return to the WiFi hacking problem. As you saw earlier, this model gives you a harmful response as it tries to maximize its helpfulness. To mitigate this, you augment the prompt using the harmful completion and a set of predefined instructions that ask the model to critique its response. Using the rules outlined in the Constitution, the model detects the problems in its response. In this case, it correctly acknowledges that hacking into someone's WiFi is illegal. Lastly, you put all the parts together and ask the model to write a new response that removes all of the harmful or illegal content. The model generates a new answer that puts the constitutional principles into practice and does not include the reference to the illegal app. The original red team prompt, and this final constitutional response can then be used as training data. You'll build up a data set of many examples like this to create a fine-tuned NLM that has learned how to generate constitutional responses. The second part of the process performs reinforcement learning. This stage is similar to RLHF, except that instead of human feedback, we now use feedback generated by a model. This is sometimes referred to as reinforcement learning from AI feedback or RLAIF. Here you use the fine-tuned model from the previous step to generate a set of responses to your prompt. You then ask the model which of the responses is preferred according to the constitutional principles. The result is a model generated preference dataset that you can use to train a reward model. With this reward model, you can now fine-tune your model further using a reinforcement learning algorithm like PPO, as discussed earlier. Aligning models is a very important topic and an active area of research. The foundations of RLHF that you've explored in this lesson will allow you to follow along as the field evolves. I'm really excited to see what new discoveries researchers make in this area. I encourage you to keep an eye out for any new methods and best practices that emerge in the coming months and years

  • RLHF
    • Use of reward model to eliminate need for human evaluation
      • Large human effort required to produce trained reward model
        • Large teams of labelers needed for labeled data set used to train reward model
      • Human effort becomes limited resource as number of models and use cases increases
      • Methods to scale human feedback an active area of research
    • Constitutional AI as approach to scale through model self supervision
      • Method for training models using set of rules and principles that govern model's behavior and form constitution
      • Train model to self critique and revise responses to comply with principles
      • Can help address unintended consequences of RLHF, such as revealing harmful information
      • Example constitutional principles/rules:
        • Choose most helpful, honest, and harmless response
        • Prioritize harmlessness by assessing whether response encourages illegal, unethical, or immoral activity
        • Can define own set of rules suited for domain/use case
      • Train model using two distinct phases:
        • Supervised learning to generate harmful responses and critique and revise them according to constitutional principles (red teaming)
        • Reinforcement learning using feedback generated by model to train reward model
  • Fine-tuned NLM
  • Reinforcement learning algorithms (PPO)

  • 深度强化学习 (Deep Reinforcement Learning)
  • 奖励模型 (Reward Model)
  • 人工评估 (Human Evaluation)
  • 训练奖励模型的数据集 (Labeled Dataset)
  • 大规模标签队伍 (Large Teams of Labelers)
  • 自我监督 (Self Supervision)
  • 宪法型人工智能 (Constitutional AI)
  • 宪法中的规则和原则 (Rules and Principles in the Constitution)
  • RLHF的意外后果 (Unintended Consequences of RLHF)
  • 宪法中的规则示例 (Example Rules in the Constitution)
  • 监督学习 (Supervised Learning)
  • 红队测试 (Red Teaming)
  • 奖励模型的训练 (Training of Reward Model)
  • 强化学习 (Reinforcement Learning)
  • AI反馈的强化学习 (RLAIF)

  • 使用奖励模型消除RLHF微调过程中人工评估的需求
  • 为训练奖励模型需要大量的人力资源
  • 通过模型自我监督来扩展人类反馈的方法
  • 宪法AI是一种扩展反馈的方法,通过一组规则和原则来训练模型的行为
  • 使用宪法AI能够避免RLHF的一些意外后果
  • 宪法AI的规则可以根据领域和用例的需要进行定义和调整
  • 使用宪法AI的方法进行训练分为两个阶段:第一阶段进行有监督学习,第二阶段进行强化学习
  • 在强化学习阶段,使用奖励模型进行模型反馈,称为RLAIF
  • 定期关注领域内新的方法和最佳实践

相关文章:

Constitutional AI

用中文以结构树的方式列出这篇讲稿的知识点: Although you can use a reward model to eliminate the need for human evaluation during RLHF fine tuning, the human effort required to produce the trained reward model in the first place is huge. The label…...

TDengine 资深研发整理:基于 SpringBoot 多语言实现 API 返回消息国际化

作为一款在 Java 开发社区中广受欢迎的技术框架,SpringBoot 在开发者和企业的具体实践中应用广泛。具体来说,它是一个用于构建基于 Java 的 Web 应用程序和微服务的框架,通过简化开发流程、提供约定大于配置的原则以及集成大量常用库和组件&a…...

数据结构-冒泡排序Java实现

目录 一、引言二、算法步骤三、原理演示四、代码实战五、结论 一、引言 冒泡排序是一种基础的比较排序算法,它的思想很简单:重复地遍历待排序的元素列表,比较相邻元素,如果它们的顺序不正确,则交换它们。这个过程不断重…...

完整教程:Java+Vue+Websocket实现OSS文件上传进度条功能

引言 文件上传是Web应用开发中常见的需求之一,而实时显示文件上传的进度条可以提升用户体验。本教程将介绍如何使用Java后端和Vue前端实现文件上传进度条功能,借助阿里云的OSS服务进行文件上传。 技术栈 后端:Java、Spring Boot 、WebSock…...

【微服务 SpringCloud】实用篇 · 服务拆分和远程调用

微服务(2) 文章目录 微服务(2)1. 服务拆分原则2. 服务拆分示例1.2.1 导入demo工程1.2.2 导入Sql语句 3. 实现远程调用案例1.3.1 案例需求:1.3.2 注册RestTemplate1.3.3 实现远程调用1.3.4 查看效果 4. 提供者与消费者 …...

Linux 下I/O操作

一、文件IO 文件 IO 是 Linux 系统提供的接口,针对文件和磁盘进行操作,不带缓存机制;标准IO是C 语言函数库里的标准 I/O 模型,在 stdio.h 中定义,通过缓冲区操作文件,带缓存机制。   标准 IO 和文件 IO 常…...

C#内映射lua表

都是通过同一个方法得到的 例如得到List List<int> list LuaMgr.GetInstance().Global.Get<List<int>>("testList"); 只要把Get的泛型换成对应的类型即可 得到Dictionnary Dictionary<string, int> dic2 LuaMgr.GetInstance().Global…...

android studio检测不到真机

我的情况是&#xff1a; 以前能检测到&#xff0c;有一天我使用无线调试&#xff0c;发现调试有问题&#xff0c;想改为USB调试&#xff0c;但是半天没反应&#xff0c;我就点了手机上的撤销USB调试授权&#xff0c;然后就G了。 解决办法&#xff1a; 我这个情况比较简单&…...

【Eclipse】设置自动提示

前言&#xff1a; eclipse默认有个快捷键&#xff1a;alt /就可以弹出自动提示&#xff0c;但是这样也太麻烦啦&#xff01;每次都需要手动按这个快捷键&#xff0c;下面给大家介绍的是&#xff1a;如何设置敲的过程中就会出现自动提示的教程&#xff01; 先按路线找到需要的页…...

单片机TDL的功能、应用与技术特点 | 百能云芯

在现代电子领域中&#xff0c;单片机&#xff08;Microcontroller&#xff09;是一种至关重要的电子元件&#xff0c;广泛应用于各种应用中。TDL&#xff08;Time Division Multiplexing&#xff0c;时分多路复用&#xff09;是一种数据传输技术&#xff0c;结合单片机的应用&a…...

解决笔记本无线网络5G比2.4还慢的奇怪问题

环境&#xff1a;笔记本Dell XPS15 9570&#xff0c;内置无线网卡Killer Wireless-n/a/ac 1535 Wireless Network Adapter&#xff0c;系统win10家庭版&#xff0c;路由器H3C Magic R2Pro千兆版 因为笔记本用的不多&#xff0c;一直没怎么注意网络速度&#xff0c;直到最近因为…...

GitHub Action 通过SSH 自动部署到云服务器上

准备 正式开始之前&#xff0c;你需要掌握 GitHub Action 的基础语法&#xff1a; workflow &#xff08;工作流程&#xff09;&#xff1a;持续集成一次运行的过程&#xff0c;就是一个 workflow。name: 工作流的名称。on: 指定次工作流的触发器。push 表示只要有人将更改推…...

【AOP系列】7.数据校验

在Java中&#xff0c;我们可以使用Spring AOP&#xff08;面向切面编程&#xff09;和自定义注解来做数据校验。以下是一个简单的示例&#xff1a; 首先&#xff0c;我们创建一个自定义注解&#xff0c;用于标记需要进行数据校验的方法&#xff1a; import java.lang.annotat…...

黑马JVM总结(三十七)

&#xff08;1&#xff09;synchronized-轻量级锁-无竞争 &#xff08;2&#xff09;synchronized-轻量级锁-锁膨胀 重量级锁就是我们前面介绍过的Monitor enter &#xff08;3&#xff09;synchronized-重量级锁-自旋 &#xff08;4&#xff09;synchronized-偏向锁 轻量级锁…...

企业如何通过媒体宣传扩大自身影响力

传媒如春雨&#xff0c;润物细无声&#xff0c;大家好&#xff0c;我是51媒体网胡老师。 企业可以通过媒体宣传来扩大自身的影响力。可以通过以下的方法。 1. 制定媒体宣传战略&#xff1a; - 首先&#xff0c;制定一份清晰的媒体宣传战略&#xff0c;明确您的宣传目标、目标…...

处理vue直接引入图片地址时显示不出来的问题 src=“[object Module]“

在webpack中使用vue-loader编译template之后&#xff0c;发现图片加载不出来了&#xff0c;开发人员工具中显示src“[object Module]” 这是因为当vue-loader编译template块之后&#xff0c;会将所有的资源url转换为webpack模块请求 这是因为vue使用的是commonjs语法规范&…...

vue3 v-md-editor markdown编辑器(VMdEditor)和预览组件(VMdPreview )的使用

vue3 v-md-editor markdown编辑器和预览组件的使用 概述安装支持vue3版本使用1.使用markdown编辑器 VMdEditor2.markdown文本格式前端渲染 VMdPreview 例子效果代码部分 完整代码 概述 v-md-editor 是基于 Vue 开发的 markdown 编辑器组件 轻量版编辑器 轻量版编辑器左侧编辑…...

java正则表达式 及应用场景爬虫,捕获分组非捕获分组

正则表达式 通常用于校验 比如说qq号 看输入的是否符合规则就可以用这个 public class regex {public static void main(String[] args) {//正则表达式判断qq号是否正确//规则 6位及20位以内 0不能再开头 必须全是数子String qq"1234567890";System.out.println(qq…...

基于 Debian 稳定分支发行版的Zephix 7 发布

Zephix 是一个基于 Debian 稳定版的实时 Linux 操作系统。它可以完全从可移动媒介上运行&#xff0c;而不触及用户系统磁盘上存储的任何文件。 Zephix 是一个基于 Debian 稳定版的实时 Linux 操作系统。它可以完全从可移动媒介上运行&#xff0c;而不触及用户系统磁盘上存储的…...

MBR20100CT-ASEMI肖特基MBR20100CT参数、规格、尺寸

编辑&#xff1a;ll MBR20100CT-ASEMI肖特基MBR20100CT参数、规格、尺寸 型号&#xff1a;MBR20100CT 品牌&#xff1a;ASEMI 芯片个数&#xff1a;2 封装&#xff1a;TO-220 恢复时间&#xff1a;&#xff1e;50ns 工作温度&#xff1a;-65C~175C 浪涌电流&#xff1a…...

k8s从入门到放弃之Ingress七层负载

k8s从入门到放弃之Ingress七层负载 在Kubernetes&#xff08;简称K8s&#xff09;中&#xff0c;Ingress是一个API对象&#xff0c;它允许你定义如何从集群外部访问集群内部的服务。Ingress可以提供负载均衡、SSL终结和基于名称的虚拟主机等功能。通过Ingress&#xff0c;你可…...

STM32F4基本定时器使用和原理详解

STM32F4基本定时器使用和原理详解 前言如何确定定时器挂载在哪条时钟线上配置及使用方法参数配置PrescalerCounter ModeCounter Periodauto-reload preloadTrigger Event Selection 中断配置生成的代码及使用方法初始化代码基本定时器触发DCA或者ADC的代码讲解中断代码定时启动…...

《用户共鸣指数(E)驱动品牌大模型种草:如何抢占大模型搜索结果情感高地》

在注意力分散、内容高度同质化的时代&#xff0c;情感连接已成为品牌破圈的关键通道。我们在服务大量品牌客户的过程中发现&#xff0c;消费者对内容的“有感”程度&#xff0c;正日益成为影响品牌传播效率与转化率的核心变量。在生成式AI驱动的内容生成与推荐环境中&#xff0…...

srs linux

下载编译运行 git clone https:///ossrs/srs.git ./configure --h265on make 编译完成后即可启动SRS # 启动 ./objs/srs -c conf/srs.conf # 查看日志 tail -n 30 -f ./objs/srs.log 开放端口 默认RTMP接收推流端口是1935&#xff0c;SRS管理页面端口是8080&#xff0c;可…...

Cinnamon修改面板小工具图标

Cinnamon开始菜单-CSDN博客 设置模块都是做好的&#xff0c;比GNOME简单得多&#xff01; 在 applet.js 里增加 const Settings imports.ui.settings;this.settings new Settings.AppletSettings(this, HTYMenusonichy, instance_id); this.settings.bind(menu-icon, menu…...

Cloudflare 从 Nginx 到 Pingora:性能、效率与安全的全面升级

在互联网的快速发展中&#xff0c;高性能、高效率和高安全性的网络服务成为了各大互联网基础设施提供商的核心追求。Cloudflare 作为全球领先的互联网安全和基础设施公司&#xff0c;近期做出了一个重大技术决策&#xff1a;弃用长期使用的 Nginx&#xff0c;转而采用其内部开发…...

【学习笔记】深入理解Java虚拟机学习笔记——第4章 虚拟机性能监控,故障处理工具

第2章 虚拟机性能监控&#xff0c;故障处理工具 4.1 概述 略 4.2 基础故障处理工具 4.2.1 jps:虚拟机进程状况工具 命令&#xff1a;jps [options] [hostid] 功能&#xff1a;本地虚拟机进程显示进程ID&#xff08;与ps相同&#xff09;&#xff0c;可同时显示主类&#x…...

视觉slam十四讲实践部分记录——ch2、ch3

ch2 一、使用g++编译.cpp为可执行文件并运行(P30) g++ helloSLAM.cpp ./a.out运行 二、使用cmake编译 mkdir build cd build cmake .. makeCMakeCache.txt 文件仍然指向旧的目录。这表明在源代码目录中可能还存在旧的 CMakeCache.txt 文件,或者在构建过程中仍然引用了旧的路…...

SQL慢可能是触发了ring buffer

简介 最近在进行 postgresql 性能排查的时候,发现 PG 在某一个时间并行执行的 SQL 变得特别慢。最后通过监控监观察到并行发起得时间 buffers_alloc 就急速上升,且低水位伴随在整个慢 SQL,一直是 buferIO 的等待事件,此时也没有其他会话的争抢。SQL 虽然不是高效 SQL ,但…...

[大语言模型]在个人电脑上部署ollama 并进行管理,最后配置AI程序开发助手.

ollama官网: 下载 https://ollama.com/ 安装 查看可以使用的模型 https://ollama.com/search 例如 https://ollama.com/library/deepseek-r1/tags # deepseek-r1:7bollama pull deepseek-r1:7b改token数量为409622 16384 ollama命令说明 ollama serve #&#xff1a…...