A Brief History: from GPT-1 to GPT-3
This is my reading notes of 《Developing Apps with GPT-4 and ChatGPT》.
In this section, we will introduce the evolution of the OpenAI GPT medels from GPT-1 to GPT-4.
GPT-1
In mid-2018, OpenAI published a paper titled “Improving Language Understanding by Generative Pre-Training” by Radford, Alec, et al. in which they introduced the Generative Pre-trained Transformer, also known as GPT-1.
The full name of GPT is Generative Pre-trained Transformer.
Before GPT-1, the common approach to building high-performance NLP neural models relied on supervised learning which needs large amounts of manually labeled data. However, the need for large amounts of well-annotated supervised data has limited the performance of these techniques because such datasets are both difficult and expensive to generate.
The authors of GPT-1 proposed a new learning process where an unsupervised pre-training step is introduced. In this step, no labeled data is needed.Instead, the model is trained to predict what the next token is.
The GPT-1 model used the BooksCorpus dataset for the pre-training which is a dataset containing the text of approximately 11,000 unpublished books.
In the unsupervised learning phase, the model learned to predict the next item in the texts of the BookCorpus dataset.
However, because the model is small, it was unable to perform complex tasks without fine-tuning.
To adapt the model to a specific target task, a second supervised learning step, called fine-tuning, was performed on a small set fo manually labeled data.
The process of fine-tuning allowed the parameters learned in the initial pre-training phase to be modified to fit the task at hand better.
In contrast to other NLP neural models, GPT-1 showed remarkable performance on several NLP tasks using only a small amount of manually labeled data for fine-tuning.
NOTE
GPT-1 was trained in two stages:
Stage 1: Unsupervised Pre-training
Goal: To learn general language patterns and presentations.
Method: The model is trained to predict the next token in the sentence.
Data: A large unlabeled text dataset
Type of Learning: Unsupervised learning – no manual labels are needed.
Outcome: The model learns a strong general understanding of language, but it’s not yet specialized for specific tasks(e.g., sentiment analysis or question answering)
Stage 2: Supervise Fine-tuning
Goal: To adapt the pre-trained model to a specific downstream task.
Method: The model is further trained on a small labeled dataset.
Type of Learning: Supervised learning – the data includes input-output pairs(e.g., a sentence and its sentiment label).
Outcome: The model’s parameters are fine-tuned so it performs better on that particular task.
Summary:
- Pre-training teaches the model how language works(general knowledge).
- Fine-tuning teaches the model how to perform a specific task(specialized skills).
A good analogy would be:
The model first read lots of books to be smart(pre-training), and then takes a short course to learn a particular job(fine-tuning).
The architecture of GPT-1 was a similar encoder from the original transformer, introduced in 2017, with 117 million parameters.
This first GPT model paved the way for future models with larger datasets and more parameters to take better advantage of the potential of the transformer architectures.
GPT-2
In early 2019, OpenAI proposed GPT-2, a scaled-up version of the GPT-1 model, increasing the number of parameters and the size of the training dataset tenfold.
The number of parameters of this new version was 1.5 billion, trained on 40 GB of text.
In November 2019, OpenAI released the full version of the GPT-2 language model.
GPT-2 is publicly available and can be downloaded from Huggingface or GitHub.
GPT-2 showed that training a larger language model on a larger dataset improves the ability of a language model to understand tasks and outperforms the state-of-art on many jobs.
GPT-3
GPT-3 was released by OpenAI in June 2020.
The main differences between GPT-2 and GPT-3 are the size of the model and the quantity of data used for the training.
GPT-3 is a much larger model, with 175 billion parameters, allowing it to capture more complex pattern.
In addition, GPT-3 is trained on a more extensive dataset.
This includes Common Crawl, a large web archive containing text from billions of web pages and other sources, such as Wikipedia.
This training dataset, which includes content from websites, books, and articles, allows GPT-3 to develop a deeper understanding of the language and context.
As a result, GPT-3 improved performance on a variety of linguistic tasks.
GPT-3 eliminates the need for a fine-tuning step that was mandatory for its predecessors.
NOTE
How GPT-3 eliminates the need for fine-tuning:
GPT-3 is trained on a massive amount of data, and it’s much larger than GPT-1 and GPT-2 – with 175 billion parameters.
Because of the scale, GPT-3 learns very strong general language skills during pre-training alone.
Instead of fine-tuning, GPT-3 uses:
- Zero-shot learning
Just give it a task description in plain text – no example needed.- One-shot learning
Give it one example in the prompt to show what kind of answer you want.- Few-shot learning
Give it a few examples in the prompt, and it learns the pattern on the fly.
So in short:
GPT-3 doesn’t need fine-tuning because it can understand and adapt to new tasks just by seeing a few examples in the input prompt — thanks to its massive scale and powerful pre-training.
GPT-3 is indeed capable of handling many tasks without traditional fine-tuning, but that doesn’t mean it completely lacks support for or never uses fine-tuning.
GPT-3’s default approach: Few-shot / Zero-shot Learning
What makes GPT-3 so impressive is that it can:
- Perform tasks without retraining (fine-tuning)
- Learn through prompts alone
Does GPT-3 support fine-tuning?
Yes! OpenAI eventually provided a fine-tuning API for GPT-3, which is useful in scenarios like:
When you have domain-specific data (e.g., legal, medical).
When you want the model to maintain a consistent tone or writing style.
When you need a stable and structured output format (e.g., JSON).
When prompt engineering isn’t sufficient.
To summarize:
Does GPT-3 need fine-tuning?
Usually no — few-shot/zero-shot learning is enough for most tasks.Does GPT-3 support fine-tuning?
Yes, especially useful for domain-specific or high-requirement tasks.
相关文章:
A Brief History: from GPT-1 to GPT-3
This is my reading notes of 《Developing Apps with GPT-4 and ChatGPT》. In this section, we will introduce the evolution of the OpenAI GPT medels from GPT-1 to GPT-4. GPT-1 In mid-2018, OpenAI published a paper titled “Improving Language Understanding …...
大模型在支气管肺癌预测及临床决策中的应用研究报告
目录 一、引言 1.1 研究背景与意义 1.2 研究目的 二、大模型预测支气管肺癌的原理与技术基础 2.1 大模型简介 2.2 数据收集与预处理 2.3 模型训练与优化 三、术前预测 3.1 病情评估 3.1.1 肿瘤大小、位置及分期预测 3.1.2 转移风险预测 3.2 手术风险预测 3.2.1 患…...
SylixOS 中 select 原理及使用分析
1、select接口简介 1.1 select接口使用用例 select 是操作系统多路 I/O 复用技术实现的方式之一。 select 函数允许程序监视多个文件描述符,等待所监视的一个或者多个文件描述符变为“准备好”的状态。所谓的”准备好“状态是指:文件描述符不再是阻塞状…...
软考笔记——软件工程基础知识
第五章节——软件工程基础知识 软件工程基础知识 第五章节——软件工程基础知识一、软件工程概述1. 计算机软件2. 软件工程基本原理3. 软件生命周期4. 软件过程 二、软件过程模型1. 瀑布模型2. 增量模型3. 演化模型(原型模型、螺旋模型)4. 喷泉模型5. 基于构建的开发…...
FastGPT原理分析-数据集创建第二步:处理任务的执行
概述 文章《FastGPT原理分析-数据集创建第一步》已经分析了数据集创建的第一步:文件上传和预处理的实现逻辑。本文介绍文件上传后,数据处理任务的具体实现逻辑。 数据集创建总体实现步骤 从上文可知数据集创建总体上来说分为两大步骤: &a…...
基于Python的3D贴图制作技术研究与实践
摘要:本文深入探讨了利用Python进行3D贴图制作的技术,介绍了Python在3D图形领域的应用优势,阐述了3D贴图的基本原理和常见类型。详细讲解了借助Python的相关库,如Pillow、OpenCV、PyTorch3D开展3D贴图制作的流程,包括纹…...
【MySQL数据库】视图 + 三范式
视图 视图的基本介绍 MySQL中的视图(View)是一种虚拟的表,其内容是从一个或多个基本表中检索出来的。视图可以简化复杂的查询操作,提高查询效率,同时也可以对敏感数据进行安全性控制。下面是关于MySQL视图的一些基本…...
STM32学习笔记之存储器映射(原理篇)
📢:如果你也对机器人、人工智能感兴趣,看来我们志同道合✨ 📢:不妨浏览一下我的博客主页【https://blog.csdn.net/weixin_51244852】 📢:文章若有幸对你有帮助,可点赞 👍…...
如何通过数据可视化提升管理效率
通过数据可视化提升管理效率的核心方法包括清晰展示关键指标、及时发现和解决问题、支持决策优化。其中,清晰展示关键指标尤为重要。通过数据可视化工具直观地呈现关键绩效指标(KPI),管理者能快速、准确地理解业务现状,…...
数据结构:利用递推式计算next表
next 表是 KMP 算法的核心内容,下面介绍一种计算 next 表的方法:利用递推式计算 如图 6.3.1 所示,在某一趟匹配中,当对比到最后一个字符的时候,发现匹配失败(s[i] ≠ t[j])。根据 BF 算法&…...
每日算法-250326
83. 删除排序链表中的重复元素 题目描述 思路 使用快慢指针遍历排序链表。slow 指针指向当前不重复序列的最后一个节点,fast 指针用于向前遍历探索。当 fast 找到一个与 slow 指向的节点值不同的新节点时,就将 slow 的 next 指向 fast,然后 …...
trino查询mysql报Unknown or incorrect time zone: ‘Asia/Shanghai‘
问题 trino查询mysql时报Error listing schemas for catalog mysql: java.sql.SQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.,trino的日志中看到Unknown or incorrect time zone…...
java学习笔记7——面向对象
关键字:static 类变量 静态变量的内存解析: 相关代码: public class ChineseTest {public static void main(String[] args) {System.out.println(Chinese.nation); //null 没赋值前System.out.println(Chinese.nation); //中国 静态变量赋值…...
leetcode day31 453+435
453 用最少数量引爆气球 有一些球形气球贴在一堵用 XY 平面表示的墙面上。墙面上的气球记录在整数数组 points ,其中points[i] [xstart, xend] 表示水平直径在 xstart 和 xend之间的气球。你不知道气球的确切 y 坐标。 一支弓箭可以沿着 x 轴从不同点 完全垂直 地…...
C++三大特性之继承
1.继承的概念及定义 回忆封装 C Stack类设计和C设计Stack对比。封装更好:访问限定符类的数据和方法放在一起 -> 避免底层接口的暴露,数据更加的安全,程序的耦合性更高迭代器的设计,封装了容器底层结构,在不暴露底层…...
PyQt QDoubleSpinBox控件用法详解
QDoubleSpinBox 是 PyQt中用于输入浮点数的控件,支持键盘输入和上下箭头调整数值。与QtSpinBox不同,QtSpinBox是用于输入整数的控件。 关键属性和方法 QDoubleSpinBox 的关键属性和方法如下表所示: 方法/属性说明setRange(min, max)设置数…...
解决Vmware 运行虚拟机Ubuntu22.04卡顿、终端打字延迟问题
亲测可用 打开虚拟机设置,关闭加速3D图形 (应该是显卡驱动的问题,不知道那个版本的驱动不会出现这个问题,所以干脆把加速关了)...
查询Marklogic数据库,因索引配置造成的返回数据count不同的问题
查询Marklogic数据库,因索引配置造成的返回数据count不同的问题 一,问题: 目前由两个MarkLogic DB,其中A表示所有的数据库统称,包含于BCD; 调用查询接口,通过A和B入口且相同的查询条件去查询B…...
ctfshow做题笔记—栈溢出—pwn73、pwn74
目录 一、pwn73(愉快的尝试一下一把梭吧!) 二、pwn74(噢?好像到现在为止还没有了解到one_gadget?) 前言: 抽空闲时间继续学习,记录了两道题,pwn74卡了几天哈哈。 一、pwn73(愉快的尝试一下一把梭吧!) …...
026-zstd
zstd 以下为Zstandard(zstd)压缩算法从原理到代码实现的技术调研报告,结合流程图、结构图及完整C代码实现: 一、核心原理与技术架构 1.1 算法原理 Zstd基于LZ77衍生算法与熵编码(FSE/Huffman)的混合架构&…...
AF3 quat_to_rot函数解读
AlphaFold3 rigid_utils 模块的 quat_to_rot 函数的功能是把四元数转换为旋转矩阵,函数利用预定义的四元数到旋转矩阵的转换表 _QTR_MAT 来简化计算。 理解四元数到旋转矩阵的转换 源代码: _quat_elements = ["a", "b", "c", "d"]…...
Elasticsearch 的搜索功能
Elasticsearch 的搜索功能 建议阅读顺序: Elasticsearch 入门Elasticsearch 搜索(本文) 1. 介绍 使用 Elasticsearch 最终目的是为了实现搜索功能,现在先将文档添加到索引中,接下来完成搜索的方法。 查询的分类&…...
Vala编成语言教程-构造函数和析构函数
构造函数 Vala支持两种略有不同的构造方案:我们将重点讨论Java/C#风格的构造方案,另一种是GObject风格的构造方案。 Vala不支持构造函数重载的原因与方法重载不被允许的原因相同,这意味着一个类不能有多个同名构造函数。但这并不构成问题&…...
Mybatis-plus配置动态多数据源
前言:微服务架构中,有些模块中可能不同组件用不同数据源,或者做数据库主从集群需要读写分离,动态切换数据源,或不同方法需要不同数据源就需要一个快捷方便的方法。引用动态数据源组件dynamic-datasource-spring-boot-s…...
CSS+JS 堆叠图片动态交互切换
结合DeepSeek提供的代码,终于实现了堆叠两张图片动态循环切换,以下是代码: 通过绝对定位放了两张图片 <div class"col-lg-5" style"z-index: 40; position: relative;"><img src"images/banner_1.png&quo…...
内存检查之Valgrind工具
内存检查之Valgrind工具 Author: Once Day Date: 2025年3月26日 一位热衷于Linux学习和开发的菜鸟,试图谱写一场冒险之旅,也许终点只是一场白日梦… 漫漫长路,有人对你微笑过嘛… 全系列文章请查看专栏: Linux实践记录_Once-Day的博客-CSD…...
强大的AI网站推荐(第四集)—— Gamma
网站:Gamma 号称:展示创意的新媒介 博主评价:快速展示创意,重点是展示,在几秒钟内快速生成幻灯片、网站、文档等内容 推荐指数:🌟🌟🌟🌟🌟&#x…...
javafx项目结构+代码规范
javafx项目 1. 新建项目,对项目的理解 jdk: 是 Java Development ToolKit 的简称,也就是 Java 开发工具包。JDK 是整个 Java 的核心,包括 Java 运行环境(Java Runtime Envirnment,简称 JRE)&a…...
国外计算机证书推荐(考证)(6 Sigma、AWS、APICS、IIA、Microsoft、Oracle、PMI、Red Hat)
文章目录 证书推荐1. 六西格玛 (6 Sigma)2. 亚马逊网络服务 (AWS)3. 美国生产与库存控制学会 (APICS)4. 内部审计师协会 (IIA)5. 微软 (Microsoft)6. 甲骨文 (Oracle)7. 项目管理协会 (PMI)8. 红帽 (Red Hat) 证书推荐 1. 六西格玛 (6 Sigma) 介绍:六西格玛是一种…...
黑盒测试与白盒测试详解
黑盒测试和白盒测试是软件测试中两种最基本的测试方法,它们在测试视角、测试重点和适用场景等方面存在显著差异。 一、黑盒测试 1. 基本概念 黑盒测试又称功能测试,将软件视为一个"黑盒子",不关心内部结构和实现细节,只…...
