当前位置：首页 > news >正文

1. 机器学习基本知识(5)——练习题（参考答案）

news 2026/2/11 5:07:56

20.🔗本章代码笔记📓链接（需要🪜）：（01_the_machine_learning_landscape.ipynb - Colab (google.com)）

如果你不想通过上面的官方网址下载本章的笔记，还可以在本篇博文的附件中下载笔记！但我更推荐你支持原版的书和原版的网址

21.参考答案原文及其中文翻译：

Machine Learning is about building systems that can learn from data.
机器学习是关于构建能够从数据中学习的系统。

Learning means getting better at some task, given some performance measure.
学习意味着在某些任务上变得更好，这是根据某些性能度量来衡量的。

Machine Learning is great for complex problems for which we have no algorithmic solution, to replace long lists of hand-tuned rules, to build systems that adapt to fluctuating environments, and finally to help humans learn (e.g., data mining).
机器学习非常适合那些我们没有算法解决方案的复杂问题，用来替代长长的手工调整规则列表，构建能够适应波动环境的系统，最终帮助人类学习（例如，数据挖掘）。

A labeled training set is a training set that contains the desired solution (a.k.a. a label) for each instance.
一个被标记的训练集是一个训练集，它为每个实例包含了期望的解决方案（即标签）。

The two most common supervised tasks are regression and classification.
两种最常见的监督任务是回归和分类。

Common unsupervised tasks include clustering, visualization, dimensionality reduction, and association rule learning.
常见的无监督任务包括聚类、可视化、降维和关联规则学习。

Reinforcement Learning is likely to perform best if we want a robot to learn to walk in various unknown terrains, since this is typically the type of problem that Reinforcement Learning tackles. It might be possible to express the problem as a supervised or semi-supervised learning problem, but it would be less natural.
如果我们希望机器人学会在各种未知地形中行走，强化学习可能会表现得最好，因为这是强化学习通常处理的问题类型。虽然有可能将问题表达为监督或半监督学习问题，但这样做会显得不那么自然。

If you don’t know how to define the groups, then you can use a clustering algorithm (unsupervised learning) to segment your customers into clusters of similar customers. However, if you know what groups you would like to have, then you can feed many examples of each group to a classification algorithm (supervised learning), and it will classify all your customers into these groups.
如果你不知道如何定义组别，那么可以使用聚类算法（无监督学习）将客户分割成相似客户的群集。然而，如果你知道你想要的组别，那么你可以向分类算法（监督学习）提供每个组的许多示例，它将把所有客户分类到这些组中。

Spam detection is a typical supervised learning problem: the algorithm is fed many emails along with their labels (spam or not spam).
垃圾邮件检测是一个典型的监督学习问题：算法被输入了许多电子邮件及其标签（垃圾邮件或非垃圾邮件）。

An online learning system can learn incrementally, as opposed to a batch learning system. This makes it capable of adapting rapidly to both changing data and autonomous systems, and of training on very large quantities of data.
在线学习系统可以逐步学习，与批量学习系统相反。这使得它能够快速适应变化的数据和自主系统，并且能够训练大量数据。

Out-of-core algorithms can handle vast quantities of data that cannot fit in a computer’s main memory. An out-of-core learning algorithm chops the data into mini-batches and uses online learning techniques to learn from these mini-batches.
核心外算法可以处理大量无法适应计算机主存储器的数据。核心外学习算法将数据分割成小批量，并使用在线学习技术从小批量中学习。

An instance-based learning system learns the training data by heart; then, when given a new instance, it uses a similarity measure to find the most similar learned instances and uses them to make predictions.
基于实例的学习系统通过心记忆训练数据；然后，当给定一个新的实例时，它使用相似性度量来找到最相似的学习实例，并用它们进行预测。

A model has one or more model parameters that determine what it will predict given a new instance (e.g., the slope of a linear model). A learning algorithm tries to find optimal values for these parameters such that the model generalizes well to new instances. A hyperparameter is a parameter of the learning algorithm itself, not of the model (e.g., the amount of regularization to apply).
模型有一个或多个模型参数，这些参数决定了它将对新实例进行什么预测（例如，线性模型的斜率）。学习算法试图找到这些参数的最优值，以便模型能够很好地泛化到新实例。超参数是学习算法本身的参数，而不是模型的参数（例如，要应用的正则化量）。

Model-based learning algorithms search for an optimal value for the model parameters such that the model will generalize well to new instances. We usually train such systems by minimizing a cost function that measures how bad the system is at making predictions on the training data, plus a penalty for model complexity if the model is regularized. To make predictions, we feed the new instance’s features into the model’s prediction function, using the parameter values found by the learning algorithm.
基于模型的学习算法寻找模型参数的最优值，以便模型能够很好地泛化到新实例。我们通常通过最小化一个代价函数来训练这样的系统，该函数衡量系统在训练数据上进行预测的表现有多差，如果模型进行了正则化，还会加上模型复杂性的惩罚。要进行预测，我们将新实例的特征输入到模型的预测函数中，使用学习算法找到的参数值。

Some of the main challenges in Machine Learning are the lack of data, poor data quality, nonrepresentative data, uninformative features, excessively simple models that underfit the training data, and excessively complex models that overfit the data.
机器学习面临的一些主要挑战包括数据缺乏、数据质量差、数据不具代表性、特征不具信息量、过于简单的模型导致训练数据欠拟合，以及过于复杂的模型导致数据过拟合。

If a model performs great on the training data but generalizes poorly to new instances, the model is likely overfitting the training data (or we got extremely lucky on the training data). Possible solutions to overfitting are getting more data, simplifying the model (selecting a simpler algorithm, reducing the number of parameters or features used, or regularizing the model), or reducing the noise in the training data.
如果一个模型在训练数据上表现很好，但对新实例的泛化能力很差，那么模型可能过拟合了训练数据（或者我们在训练数据上非常幸运）。解决过拟合的可能方案是获取更多数据、简化模型（选择一个更简单的算法、减少使用的参数或特征数量，或者对模型进行正则化）或减少训练数据中的噪声。

A test set is used to estimate the generalization error that a model will make on new instances, before the model is launched in production.
测试集用于在模型投入生产之前估计模型将在新实例上犯的泛化错误。

A validation set is used to compare models. It makes it possible to select the best model and tune the hyperparameters.
验证集用于比较模型。它使得选择最佳模型和调整超参数成为可能。

The train-dev set is used when there is a risk of mismatch between the training data and the data used in the validation and test datasets (which should always be as close as possible to the data used once the model is in production). The train-dev set is a part of the training set that’s held out (the model is not trained on it). The model is trained on the rest of the training set, and evaluated on both the train-dev set and the validation set. If the model performs well on the training set but not on the train-dev set, then the model is likely overfitting the training set. If it performs well on both the training set and the train-dev set, but not on the validation set, then there is probably a significant data mismatch between the training data and the validation + test data, and you should try to improve the training data to make it look more like the validation + test data.
当训练数据与验证和测试数据集使用的数据之间存在不匹配的风险时，使用训练-开发集（train-dev set）。训练-开发集是保留出来的训练集的一部分（模型未在此部分上训练）。模型在训练集的其余部分上进行训练，并在训练-开发集和验证集上进行评估。如果模型在训练集上表现良好，但在训练-开发集上表现不佳，那么模型可能过拟合了训练集。如果它在训练集和训练-开发集上都表现良好，但在验证集上表现不佳，那么训练数据与验证+测试数据之间可能存在显著的数据不匹配，你应该尝试改进训练数据，使其更接近验证+测试数据。

If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error you measure will be optimistic (you may launch a model that performs worse than you expect).
如果你使用测试集来调整超参数，你冒着过拟合测试集的风险，你测量的泛化错误将会是乐观的（你可能推出的模型表现比你预期的要差）。

1. 机器学习基本知识(5)——练习题（参考答案）

20.🔗本章代码笔记📓链接（需要🪜）：（01_the_machine_learning_landscape.ipynb - Colab (google.com)） 如果你不想通过上面的官方网址下载本章的笔记，还可以在本篇博文的…...

编程日记 2024/12/17 0:49:28

wordcount sc.textFile("../data/data.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(__).collect 读取json 文件并通过sql 执行 join 查询 public static void main(String[] args) {SparkSession session SparkSession.builder().master(&qu…...

编程日记 2024/12/17 0:47:25

基于softmax回归的多分类

基于softmax回归的多分类任务是机器学习领域中的一种常见应用。softmax回归，又称多项逻辑回归或多类逻辑回归，是逻辑回归在多分类问题上的推广。以下是对基于softmax回归的多分类任务的详细解释：一、softmax回归的原理 softmax回归的核心思想是通过softmax函数将输入数据…...

编程日记 2024/12/17 0:46:24

bs4基本运用

1. bs4基本使用 1.1. 简介 bs4的全称为 BeautifulSoup。和lxml一样，是一个html的解析器，主要功能也是解析数据和提取数据。本模块作为了解模块，实际开发中很少用这个模块去解析数据，大家可能会想为什么这个模块会逐渐被淘汰&…...

编程日记 2024/12/17 0:45:23

MySQL 时区参数 time_zone 详解

文章目录前言1. 时区参数影响2. 如何设置3. 字段类型选择前言 MySQL 时区参数 time_zone 有什么用？修改它有什么影响？如何设置该参数，本篇文章会详细介绍。 1. 时区参数影响 time_zone 参数影响着 MySQL 系统函数还有字段的 DEFAULT CUR…...

编程日记 2024/12/17 0:34:09

Redis - 消息队列 Stream

一、概述消息队列定义消息队列模型：一种分布式系统中的消息传递方案，由消息队列、生产者和消费者组成消息队列：负责存储和管理消息的中间件，也称为消息代理（Message Broker）生产者：负责产…...

编程日记 2024/12/17 0:32:06

Docker：国内加速源

阿里云docker加速云： sudo tee /etc/docker/daemon.json <<EOF { “registry-mirrors”: [“https://euf11uji.mirror.aliyuncs.com”] } EOFhttps://docker.mozhu.dev/ sudo tee /etc/docker/daemon.json <<EOF {"registry-mirrors": [&qu…...

编程日记 2024/12/17 0:31:04

Android Studio更改项目使用的JDK

一、吐槽过去，在安卓项目中配置JDK和Gradle的过程非常直观，只需要进入Android Studio的File菜单中的Project Structure即可进行设置，十分方便。原本可以在这修改JDK: 但大家都知道，Android Studio的狗屎性能，再加…...

编程日记 2024/12/17 0:28:02

ubuntu+ros新手笔记（四）：gazebo无法加载

以下为ChatGPT 的解决方案，对我来说是可行的！！ 我按照第2步操作就解决辣！！ 我的提问： 在ubuntu 22.04 和ros2 humble环境下，gazebo加载不了 ChatGPT 回答： 在 Ubuntu 22.04 和 …...

编程日记 2024/12/17 0:26:00

vue季度选择器(antd2.0 版本无此控件，单独写一个)

vue季度选择器效果显示效果显示 <template><div><a-popoverplacement"bottom"overlayClassName"season-picker"trigger"click"v-model"showSeason"><template #content><div class"season-picker-b…...

编程日记 2024/12/17 0:22:56

C/C++代码性能优化技巧的书籍及资料

使用C/C开发的场景，大多对代码的执行的速度，实时性有较高的要求，像嵌入式系统的开发，资源还受限。在算力存储空间有限的MCU上写出简洁又高效的代码实际是一种艺术。软件工程师在代码设计上的这种差距，会反映在产品的性…...

编程日记 2024/12/17 0:21:54

通俗易懂的 Nginx 反向代理配置

通俗易懂的 Nginx 反向代理配置首先 root 与 alias 的区别 root 是直接拼接 root location location /i/ {root /data/w3; }当请求 /i/top.gif ，/data/w3/i/top.gif 会被返回。 alias 是用 alias 替换 location location /i/ {alias /data/w3/images/; }当请…...

编程日记 2024/12/17 0:17:50

docker设置容器自动启动

说起开机自动启动应该很多人都遇到过，我们公司做的系统很多的中间件都没有设置开机自动启动然后中间修改问题又设置了一些临时生效的文件，开始的时候大家都不以为意，知道公司陆续有人离职入职管理交接一塌糊涂，项目成了历史遗留问…...

编程日记 2024/12/17 0:16:48

蓝桥杯刷题——day1

蓝桥杯刷题——day1 题目一题干题目解析代码题目二题干题目解析代码题目一题干给定一个字符串 s ，验证 s 是否是回文串 ，只考虑字母和数字字符，可以忽略字母的大小写。本题中，将空字符串定义为有效的回文串。题目链接&a…...

编程日记 2024/12/17 0:15:47

Leetcode 面试150题 399.除法求值

系列博客目录文章目录系列博客目录题目思路代码题目链接思路广度优先搜索我们可以将整个问题建模成一张图：给定图中的一些点（点即变量），以及某些边的权值（权值即两个变量的比值），试…...

编程日记 2024/12/17 0:14:46

活动预告 |【Part2】Microsoft 安全在线技术公开课：安全性、合规性和身份基础知识

课程介绍通过参加“Microsoft 安全在线技术公开课：安全性、合规性和身份基础知识”活动提升你的技能。在本次免费的介绍性活动中，你将获得所需的安全技能和培训，以创造影响力并利用机会推动职业发展。你将了解安全性、合规性和身份的基础知…...

编程日记 2024/12/17 0:13:43

Unity游戏实战

很小的时候在键盘机上玩过一个游戏叫寻秦，最近看有大佬把他的安卓版做出来了，打开封面就是Unity，想自己也尝试一下。...

编程日记 2024/12/17 0:12:42

SQL中的替换函数replace() 使用

这条 SQL 语句的作用是将 tool_tool 表中所有 link 字段包含 https://www.xxspvip.cn 的记录中的 https://www.xxspvip.cn 替换为 http://192.168.1.1。具体解释如下： SQL 语句分解 UPDATE tool_toolSET link REPLACE(link, https://www.xxspvip.cn, http://192.…...

编程日记 2024/12/17 0:11:41

Python面试常见问题及答案5

一、基础语法相关问题1： Python的可变数据类型和不可变数据类型有哪些？ 答案： 在Python中，可变数据类型有列表（list）、字典（dict）、集合（set）。这些数据类型…...

编程日记 2024/12/17 0:10:40

(css)element中el-select下拉框整体样式修改

(css)element中el-select下拉框整体样式修改重点代码（颜色可行修改） // 修改input默认值颜色兼容其它主流浏览器 /deep/ input::-webkit-input-placeholder {color: rgba(255, 255, 255, 0.50); } /deep/ input::-moz-input-placeholder {color: rgba…...

编程日记 2024/12/17 0:09:38

stm32G473的flash模式是单bank还是双bank？

今天突然有人stm32G473的flash模式是单bank还是双bank？由于时间太久，我真忘记了。搜搜发现，还真有人和我一样。见下面的链接：https://shequ.stmicroelectronics.cn/forum.php?modviewthread&tid644563 根据STM32G4系列参考手…...

编程新知 2026/2/8 20:41:51

树莓派超全系列教程文档--(61)树莓派摄像头高级使用方法

树莓派摄像头高级使用方法配置通过调谐文件来调整相机行为使用多个摄像头安装 libcam 和 rpicam-apps依赖关系开发包文章来源： http://raspberry.dns8844.cn/documentation 原文网址配置大多数用例自动工作，无需更改相机配置。但是，一…...

编程新知 2026/2/5 4:39:03

蓝牙 BLE 扫描面试题大全(2)：进阶面试题与实战演练

前文覆盖了 BLE 扫描的基础概念与经典问题蓝牙 BLE 扫描面试题大全(1)：从基础到实战的深度解析-CSDN博客，但实际面试中，企业更关注候选人对复杂场景的应对能力（如多设备并发扫描、低功耗与高发现率的平衡）和前沿技术的…...

编程新知 2026/2/5 3:41:42

【算法训练营Day07】字符串part1

文章目录反转字符串反转字符串II替换数字反转字符串题目链接：344. 反转字符串双指针法，两个指针的元素直接调转即可 class Solution {public void reverseString(char[] s) {int head 0;int end s.length - 1;while(head < end) {char temp …...

编程新知 2025/8/27 14:38:58

EtherNet/IP转DeviceNet协议网关详解

一，设备主要功能疆鸿智能JH-DVN-EIP本产品是自主研发的一款EtherNet/IP从站功能的通讯网关。该产品主要功能是连接DeviceNet总线和EtherNet/IP网络，本网关连接到EtherNet/IP总线中做为从站使用，连接到DeviceNet总线中做为从站使用。在自动…...

编程新知 2026/1/31 6:53:51

AI书签管理工具开发全记录（十九）：嵌入资源处理

1.前言 📝 在上一篇文章中，我们完成了书签的导入导出功能。本篇文章我们研究如何处理嵌入资源，方便后续将资源打包到一个可执行文件中。 2.embed介绍 🎯 Go 1.16 引入了革命性的 embed 包，彻底改变了静态资源管理的…...

编程新知 2026/1/30 16:24:23

虚拟电厂发展三大趋势：市场化、技术主导、车网互联

市场化：从政策驱动到多元盈利政策全面赋能 2025年4月，国家发改委、能源局发布《关于加快推进虚拟电厂发展的指导意见》，首次明确虚拟电厂为“独立市场主体”，提出硬性目标：2027年全国调节能力≥2000万千瓦&#xff0…...

编程新知 2025/12/20 18:09:59

iview框架主题色的应用

1.下载 less要使用3.0.0以下的版本 npm install less2.7.3 npm install less-loader4.0.52./src/config/theme.js文件 module.exports {yellow: {theme-color: #FDCE04},blue: {theme-color: #547CE7} }在sass中使用theme配置的颜色主题，无需引入，直接可…...

编程新知 2026/1/31 9:29:45

（一）单例模式

一、前言单例模式属于六大创建型模式，即在软件设计过程中，主要关注创建对象的结果，并不关心创建对象的过程及细节。创建型设计模式将类对象的实例化过程进行抽象化接口设计，从而隐藏了类对象的实例是如何被创建的，封装了软件系统使用的具体对象类型。六大创建型模式包括…...

编程新知 2026/1/30 6:03:31

android13 app的触摸问题定位分析流程

一、知识点一般来说，触摸问题都是app层面出问题，我们可以在ViewRootImpl.java添加log的方式定位；如果是touchableRegion的计算问题，就会相对比较麻烦了，需要通过adb shell dumpsys input > input.log指令，且通过打印堆栈的方式，逐步定位问题，并找到修改方案。问题…...

编程新知 2026/1/31 13:18:31

1. 机器学习基本知识(5)——练习题（参考答案）

20.🔗本章代码笔记📓链接（需要🪜）：（01_the_machine_learning_landscape.ipynb - Colab (google.com)）

21.参考答案原文及其中文翻译：

相关文章：

1. 机器学习基本知识(5)——练习题（参考答案）

spark-sql 备忘录

基于softmax回归的多分类

bs4基本运用

MySQL 时区参数 time_zone 详解

Redis - 消息队列 Stream

Docker：国内加速源

Android Studio更改项目使用的JDK

ubuntu+ros新手笔记（四）：gazebo无法加载

vue季度选择器(antd2.0 版本无此控件，单独写一个)

C/C++代码性能优化技巧的书籍及资料

通俗易懂的 Nginx 反向代理配置

docker设置容器自动启动

蓝桥杯刷题——day1

Leetcode 面试150题 399.除法求值

活动预告 |【Part2】Microsoft 安全在线技术公开课：安全性、合规性和身份基础知识

Unity游戏实战

SQL中的替换函数replace() 使用

Python面试常见问题及答案5

(css)element中el-select下拉框整体样式修改

stm32G473的flash模式是单bank还是双bank？

树莓派超全系列教程文档--(61)树莓派摄像头高级使用方法

蓝牙 BLE 扫描面试题大全(2)：进阶面试题与实战演练

【算法训练营Day07】字符串part1

EtherNet/IP转DeviceNet协议网关详解

AI书签管理工具开发全记录（十九）：嵌入资源处理

虚拟电厂发展三大趋势：市场化、技术主导、车网互联

iview框架主题色的应用

（一）单例模式

android13 app的触摸问题定位分析流程