当前位置：首页 > news >正文

【Hive实战】Hive MetaStore升级调研（Mysql）

news 2026/3/18 22:40:11

Hive MetaStore升级调研（Mysql库）

文章目录

Hive MetaStore升级调研（Mysql库）
- 升级步骤
- 脚本说明
- 原文

MetaStore升级的主要部分是对存储媒介mysql进行schema进行升级。

升级步骤

关闭MetaStore实例并限制对MetaStore MySQL数据库的访问。在执行schema升级时，不要让其他人访问或修改数据库的内容，这一点非常重要。【停止元数据服务，代表升级期间，关于hive的服务均不可用。】
创建MySQL metastore数据库的备份。如果出现问题，这将允许你恢复在升级过程中所做的任何更改。mysqldump工具是创建MySQL数据库备份最简单的方法：【备份Mysql（元数据服务库）数据】
```
> mysqldump --opt <metastore_db_name> > metastore_backup.sql
```
注意，你可能还需要使用–host和–user命令行开关指定主机名和用户名。
将metastore数据库schema转储到文件中。我们再次使用mysqldump工具程序，但这次使用命令行选项，指定我们只对转储创建schema所需的DDL语句：【备份Mysql（元数据服务库）的schema】
```
> mysqldump --skip-add-drop-table --no-data <metastore_db_name> > my-schema-x.y.z.mysql.sql
```
schema升级脚本假定你正在升级的schema与你的特定版本Hive的官方schema非常匹配。该目录下的文件名如hive-schema-x.y.z.mysql.sql包含Hive每个发布版本对应的官方schema的备份。你可以通过将官方转储的内容与上一步中创建的schema备份的内容进行区分，来确定你的schema与官方schema之间的差异。有些差异是可以接受的，不会干扰升级过程，但其他差异需要手动解决，否则升级脚本将无法完成。
- 表缺少：Hive的默认配置导致MetaStore只在需要时创建schema元素。如果你没有创建相应的Hive目录对象，一些表可能会从MetaStore schema中丢失，例如，如果你没有在MetaStore中创建任何表分区，那么PARTITIONS表可能不存在。你必须在运行升级脚本之前创建这些缺失的表。最简单的方法是针对schema执行正式的schemaDDL脚本。schema脚本中的每个CREATE TABLE语句都包含一个IF NOT EXISTS子句，因此schema中已经存在的表将被忽略，而不存在的表将被创建。【升级脚本不会主动创建和补充未使用的元素，执行升级脚本之前，需要先执行对应版本的DDL脚本用来创建表，确保表不会缺失。】
- 额外的表：schema可能包括一个名为NUCLEUS_TABLES的表或一个名为SEQUENCE_TABLE的表。这些表由DataNucleus ORM层管理，如果它们不存在，将自动创建。你不需要采取任何行动。【可以忽略，若执行建表sql，则必然包含上述两张表】
- 同一表中相反的列约束名称：具有多个约束的表可能具有反向的约束名称。例如，PARTITIONS表包含两个外键约束，分别名为PARTITIONS_FK1和PARTITIONS_FK2，它们分别引用SDS.SD_ID和TBLS.TBL_ID。但是，在你的schema中，你可能会发现PARTITIONS_FK1引用TBLS.TBL_ID和PARTITIONS_FK2引用SDS.SD_ID。任何一个版本都是可以接受的——唯一的要求是这些约束确实存在。【列约束名称的引用可以不同，但是需要约束要完整】
- 列/约束名称的差异：你的schema可能包含列名为IDX或唯一键名为unique <tab_name>的表。如果在schema中发现了这两种情况，则需要在运行升级脚本之前将其名称更改为INTEGER_IDX和UNIQUE_<tab_name>。有关此问题的更多背景信息，请参阅hive-1435。【UNIQUE_开头有点疑问了，官方的ddl里面就存在非UNIQUE_开头的唯一键】
现在可以运行schema升级脚本了。【如果你要从Hive 0.5.0升级到Hive 0.6.0，你需要运行upgrade-0.5.0-to-0.6.0.mysql.SQL脚本，但是如果要从0.5.0升级到0.7.0，则需要先运行0.5.0到0.6.0升级脚本，然后再运行0.6.0到0.7.0升级脚本。】

【不支持跨大版本升级，需要按顺序执行升级脚本】
```
> mysql --verbose
mysql> use <metastore_db_name>;
Database changed
mysql> source upgrade-1.2.0-to-2.0.0.mysql.sql
mysql> source upgrade-2.0.0-to-2.1.0.mysql.sql
mysql> source upgrade-2.1.0-to-2.2.0.mysql.sql
mysql> source upgrade-2.2.0-to-2.3.0.mysql.sql
```
这些脚本应该运行到没有任何错误。如果确实遇到错误，则需要分析原因，并尝试将其追溯到前面的步骤之一。
升级过程的最后一步是根据Hive特定版本的官方schema验证新升级的schema。这是通过**重复步骤(3)和(4)**来完成的，但这次是与升级后的schema的正式版本进行比较，例如，如果你将schema升级到Hive 0.7.0，那么你将需要将你的schema备份与hive-schema-0.7.0.mysql.sql的内容进行比较。【将1.2.0升级到2.0.0之后。备份schema，将升级到2.0.0的schema与官方直接的schema2.0.0进行比对，若无问题，再将2.0.0升级到2.1.0，再与官方直接的schema2.1.0比对，再将2.1.0升级到2.2.0，一步步升级到2.3.0。】

脚本说明

脚本来源：hive 2.3.4源码metastore/scripts/upgrade/mysql/下

官方直接schema DDL脚本有两类hive-schema-a.b.c.mysql.sql和hive-txn-schema-a.b.c.mysql.sql，例如hive-schema-2.3.0.mysql.sql和hive-txn-schema-2.3.0.mysql.sql。

upgrade脚本主要是去执行XXX-HIVE-XXXXX.mysql.sql类的脚本，去逐个分析每个脚本里里面的操作。如下：

SELECT 'Upgrading MetaStore schema from 1.2.0 to 2.0.0' AS ' ';
SOURCE 021-HIVE-7018.mysql.sql;
SOURCE 022-HIVE-11970.mysql.sql;
SOURCE 023-HIVE-12807.mysql.sql;
SOURCE 024-HIVE-12814.mysql.sql;
SOURCE 025-HIVE-12816.mysql.sql;
SOURCE 026-HIVE-12818.mysql.sql;
SOURCE 027-HIVE-12819.mysql.sql;
SOURCE 028-HIVE-12821.mysql.sql;
SOURCE 029-HIVE-12822.mysql.sql;
SOURCE 030-HIVE-12823.mysql.sql;
SOURCE 031-HIVE-12831.mysql.sql;
SOURCE 032-HIVE-12832.mysql.sql;UPDATE VERSION SET SCHEMA_VERSION='2.0.0', VERSION_COMMENT='Hive release version 2.0.0' where VER_ID=1;
SELECT 'Finished upgrading MetaStore schema from 1.2.0 to 2.0.0' AS ' ';

若在执行升级脚本时出现错误：

方式一：还原schema后修复问题后，重新执行升级脚本。
方式二：根据升级脚本的语句，手动修复达到官方直接脚本的效果。

原文

文章来源：hive 2.3.4源码metastore/scripts/upgrade/mysql/README

This document describes how to upgrade the schema of a MySQL backed Hive MetaStore instance from one release version of Hive to another release version of Hive. For example, by following the steps listed below it is possible to upgrade a Hive 0.5.0 MetaStore schema to a Hive 0.7.0 MetaStore schema. Before attempting this project we strongly recommend that you read through all of the steps in this document and familiarize yourself with the required tools.

MetaStore Upgrade Steps

Shutdown your MetaStore instance and restrict access to the MetaStore’s MySQL database. It is very important that no one else accesses or modifies the contents of database while you are performing the schema upgrade.
Create a backup of your MySQL metastore database. This will allow you to revert any changes made during the upgrade process if something goes wrong. The mysqldump utility is the easiest way to create a backup of a MySQL database:
```
% mysqldump --opt <metastore_db_name> > metastore_backup.sql
```
Note that you may need also need to specify a hostname and username using the --host and --user command line switches.
Dump your metastore database schema to a file. We use the mysqldump utility again, but this time with a command line option that specifies we are only interested in dumping the DDL statements required to create the schema:
```
% mysqldump --skip-add-drop-table --no-data <metastore_db_name> > my-schema-x.y.z.mysql.sql
```
The schema upgrade scripts assume that the schema you are upgrading closely matches the official schema for your particular version of Hive. The files in this directory with names like “hive-schema-x.y.z.mysql.sql” contain dumps of the official schemas corresponding to each of the released versions of Hive. You can determine differences between your schema and the official schema by diffing the contents of the official dump with the schema dump you created in the previous step. Some differences are acceptable and will not interfere with the upgrade process, but others need to be resolved manually or the upgrade scripts will fail to complete.
- Missing Tables: Hive’s default configuration causes the MetaStore to create schema elements only when they are needed. Some tables may be missing from your MetaStore schema if you have not created the corresponding Hive catalog objects, e.g. the PARTITIONS table will probably not exist if you have not created any table partitions in your MetaStore. You MUST create these missing tables before running the upgrade scripts. The easiest way to do this is by executing the official schema DDL script against your schema. Each of the CREATE TABLE statements in the schema script include an IF NOT EXISTS clause, so tables which already exist in your schema will be ignored, and those which don’t exist will get created.
- Extra Tables: Your schema may include a table named NUCLEUS_TABLES or a table named SEQUENCE_TABLE. These tables are managed by the DataNucleus ORM layer and will be created automatically if they don’t exist. No action on your part is required.
- Reversed Column Constraint Names in the Same Table: Tables with multiple constraints may have the names of the constraints reversed. For example, the PARTITIONS table contains two foreign key constraints named PARTITIONS_FK1 and PARTITIONS_FK2 which reference SDS.SD_ID and TBLS.TBL_ID respectively. However, in your schema you may find that PARTITIONS_FK1 references TBLS.TBL_ID and PARTITIONS_FK2 references SDS.SD_ID. Either version is acceptable – the only requirement is that these constraints actually exist.
- Differences in Column/Constraint Names: Your schema may contain tables with columns named “IDX” or unique keys named “UNIQUE<tab_name>”. If you find either of these in your schema you will need to change the names to “INTEGER_IDX” and “UNIQUE_<tab_name>” before running the upgrade scripts. For more background on this issue please refer to HIVE-1435.
You are now ready to run the schema upgrade scripts. If you are upgrading from Hive 0.5.0 to Hive 0.6.0 you need to run the upgrade-0.5.0-to-0.6.0.mysql.sql script, but if you are upgrading from 0.5.0 to 0.7.0 you will need to run the 0.5.0 to 0.6.0 upgrade script followed by the 0.6.0 to 0.7.0 upgrade script.
```
% mysql --verbose
mysql> use <metastore_db_name>;
Database changed
mysql> source upgrade-0.5.0-to-0.6.0.mysql.sql
mysql> source upgrade-0.6.0-to-0.7.0.mysql.sql
```
These scripts should run to completion without any errors. If you do encounter errors you need to analyze the cause and attempt to trace it back to one of the preceding steps.
The final step of the upgrade process is validating your freshly upgraded schema against the official schema for your particular version of Hive. This is accomplished by repeating steps (3) and (4), but this time comparing against the official version of the upgraded schema, e.g. if you upgraded the schema to Hive 0.7.0 then you will want to compare your schema dump against the contents of hive-schema-0.7.0.mysql.sql

【Hive实战】Hive MetaStore升级调研（Mysql）

Hive MetaStore升级调研（Mysql库）

文章目录

升级步骤

脚本说明

原文

相关文章：

【Hive实战】Hive MetaStore升级调研（Mysql）

优化漏洞扫描流程以保障企业数字化业务安全

【大数据算法】一文掌握大数据算法之：大数据算法分析技术。

使用AITemplate和AMD GPU的高效图像生成：结合Stable Diffusion模型

基于yolov10的驾驶员抽烟打电话安全带检测系统python源码+pytorch模型+评估指标曲线+精美GUI界面

虚拟机网络设置为桥接模式

Numpy基础02

Elasticsearch是做什么的？

Java中消息队列

高频面试手撕

Spring Boot 3.3 【八】整合实现高可用 Redis 集群

循环控制结构穷举同构数

主机本地IP与公网IP以及虚拟机的适配器和WSL发行版的IP

@MassageMapping和@SendTo注解详解

2.1_Linux发展与基础

c#子控件拖动父控件方法及父控件限在窗体内拖动

Redis --- 第八讲 --- 关于主从复制哨兵

【数据结构】时间和空间复杂度-Java

tensorRT安装详解（linux与windows）

MYSQL OPTIMIZE TABLE 命令重建表和索引

QZSS增强服务深度对比：L6E与L6D在东亚地区的定位性能差异（含基准站数据解析）

AI写论文有妙招！4款AI论文生成工具，解决毕业论文写作难题！

Java毕业设计基于springboot+java云平台的信息安全攻防实训平台

用了三周ArkClaw，我说说真实感受

HarmonyOS开发过程中ArkTs和H5之间相互通信

评价关键词出现负面趋势如何做快速定位与修复

ios19/iOS高级技巧：利用Frida与Objection实现iOS应用动态分析

lev/leveldb高级特性：事务支持与并发控制的实现原理

如何利用External-Attention-pytorch打造智能环境感知系统：从原理到实践

基于LADRC自抗扰控制的VSG三相逆变器预同步并网策略