当前位置：首页 > news >正文

Apache Paimon 使用之Creating Catalogs

news 2026/5/19 21:34:41

Paimon Catalog 目前支持两种类型的metastores：

filesystem metastore (default)，在文件系统中存储元数据和表文件。
hive metastore，将metadata存储在Hive metastore中。用户可以直接从Hive访问表。

1.使用 Filesystem Metastore 创建 Catalog

Flink引擎

Flink SQL注册并使用名为my_catalog的Paimon catalog，元数据和表文件存储在hdfs:///path/to/warehouse下。

CREATE CATALOG my_catalog WITH ('type' = 'paimon','warehouse' = 'hdfs:///path/to/warehouse'
);USE CATALOG my_catalog;

在 Catalog 中创建的 tables，可以使用前缀table-default.定义任何默认表选项。

Spark3引擎

通过 shell 命令注册一个名为paimon的paimon catalog，元数据和表文件存储在hdfs:///path/to/warehouse下。

spark-sql ... \--conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \--conf spark.sql.catalog.paimon.warehouse=hdfs:///path/to/warehouse

对于 catalog 中创建的 tables，可以使用前缀spark.sql.catalog.paimon.table-default.定义默认表选项。

spark-sql启动后，使用以下SQL切换到paimon目录的default数据库。

USE paimon.default;

2.使用 Hive Metastore 创建 Catalog

使用Paimon Hive catalog，对 catalog 的更改将直接影响相应的Hive metastore，在此类 catalog 中创建的表可以直接从 Hive 访问。

要使用Hive catalog，数据库名称、表名和字段名均应小写。

Flink 引擎

Flink 中的Paimon Hive catalog依赖于Flink Hive connector bundled jar，首先要下载Hive connector bundled jar，并将其添加到classpath。

以下Flink SQL注册并使用名为my_hive的Paimon Hive catalog，元数据和表文件存储在hdfs:///path/to/warehouse下，元数据也存储在Hive metastore中。

如果Hive需要security authentication，如Kerberos、LDAP、Ranger，或者希望paimon表由Apache Atlas管理（在hive-site.xml中设置"hive.metastore.event.listeners"），可以在hive-site.xml文件路径中指定hive-conf-dir和hadoop-conf-dir参数。

CREATE CATALOG my_hive WITH ('type' = 'paimon','metastore' = 'hive',-- 'uri' = 'thrift://<hive-metastore-host-name>:<port>', default use 'hive.metastore.uris' in HiveConf-- 'hive-conf-dir' = '...', this is recommended in the kerberos environment-- 'hadoop-conf-dir' = '...', this is recommended in the kerberos environment-- 'warehouse' = 'hdfs:///path/to/warehouse', default use 'hive.metastore.warehouse.dir' in HiveConf
);USE CATALOG my_hive;

对于在 catalog 中创建的表，可以使用前缀table-default.定义默认表选项。

此外，还可以创建Flink Generic Catalog。

Spark3引擎

Spark需要包含Hive dependencies。

以下shell命令注册一个名为paimon的Paimon Hive Catalog，元数据和表文件存储在hdfs:///path/to/warehouse下，此外，元数据也存储在Hive metastore中。

spark-sql ... \--conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \--conf spark.sql.catalog.paimon.warehouse=hdfs:///path/to/warehouse \--conf spark.sql.catalog.paimon.metastore=hive \--conf spark.sql.catalog.paimon.uri=thrift://<hive-metastore-host-name>:<port>

对于 Catalog 中创建的表，可以使用前缀spark.sql.catalog.paimon.table-default.定义默认表选项。

spark-sql启动后，可以使用以下SQL切换到paimon catalog的default数据库。

USE paimon.default;

此外，还可以创建Spark Generic Catalog。

当使用Hive Catalog通过alter table更改不兼容的列类型时，需要配置hive.metastore.disallow.incompatible.col.type.changes=false。

如果使用的是Hive3，请禁用Hive ACID：

hive.strict.managed.tables=false
hive.create.as.insert.only=false
metastore.create.as.acid=false

3.在Properties中设置Location

如果使用的是对象存储，并且不希望paimon表/数据库的location被hive的文件系统访问，这可能会导致诸如“No filesystem for scheme:s3a”之类的错误，可以通过在属性中配置location来设置表/数据库的location-in-properties。

4.同步Partitions到Hive Metastore

默认，Paimon不会将新创建的分区同步到Hive metastore中，用户将在Hive中看到一个未分区的表，Partition push-down将改为通过filter push-down进行。

如果想在Hive中查看分区表，并将新创建的分区同步到Hive metastore中，请将表属性metastore.partitioned-table设置为true。

5.添加参数到Hive Table

使用table option有助于方便地定义Hive表参数，以hive.前缀的参数将在Hive表的TBLPROPERTIES中自动定义。例如，使用hive.table.owner=Jon将在创建过程中自动将表参数table.owner=Jon添加到表属性中。

6.CatalogOptions

Key	Default	Type	Description
fs.allow-hadoop-fallback	true	Boolean	Allow to fallback to hadoop File IO when no file io found for the scheme.
lineage-meta	(none)	String	The lineage meta to store table and data lineage information. Possible values: “jdbc”: Use standard jdbc to store table and data lineage information.“custom”: You can implement LineageMetaFactory and LineageMeta to store lineage information in customized storage.
lock-acquire-timeout	8 min	Duration	The maximum time to wait for acquiring the lock.
lock-check-max-sleep	8 s	Duration	The maximum sleep time when retrying to check the lock.
lock.enabled	false	Boolean	Enable Catalog Lock.
metastore	“filesystem”	String	Metastore of paimon catalog, supports filesystem and hive.
table.type	managed	Enum	Type of table. Possible values:“managed”: Paimon owned table where the entire lifecycle of the table data is managed.“external”: The table where Paimon has loose coupling with the data stored in external locations.
uri	(none)	String	Uri of metastore server.
warehouse	(none)	String	The warehouse root path of catalog.

FilesystemCatalogOptions

Key	Default	Type	Description
case-sensitive	true	Boolean	Is case sensitive. If case insensitive, you need to set this option to false, and the table name and fields be converted to lowercase.

HiveCatalogOptions

Key	Default	Type	Description
hadoop-conf-dir	(none)	String	File directory of the core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml. Currently, only local file system paths are supported. If not configured, try to load from ‘HADOOP_CONF_DIR’ or ‘HADOOP_HOME’ system environment. Configure Priority: 1.from ‘hadoop-conf-dir’ 2.from HADOOP_CONF_DIR 3.from HADOOP_HOME/conf 4.HADOOP_HOME/etc/hadoop.
hive-conf-dir	(none)	String	File directory of the hive-site.xml , used to create HiveMetastoreClient and security authentication, such as Kerberos, LDAP, Ranger and so on. If not configured, try to load from ‘HIVE_CONF_DIR’ env.
location-in-properties	false	Boolean	Setting the location in properties of hive table/database. If you don’t want to access the location by the filesystem of hive when using a object storage such as s3,oss you can set this option to true.

FlinkCatalogOptions

Key	Default	Type	Description
default-database	“default”	String
disable-create-table-in-default-db	false	Boolean	If true, creating table in default database is not allowed. Default is false.

Apache Paimon 使用之Creating Catalogs

1.使用 Filesystem Metastore 创建 Catalog

2.使用 Hive Metastore 创建 Catalog

3.在Properties中设置Location

4.同步Partitions到Hive Metastore

5.添加参数到Hive Table

6.CatalogOptions

相关文章：

Apache Paimon 使用之Creating Catalogs

IntelliJ IDEA分支svn

.NET Core日志内容详解，详解不同日志级别的区别和有关日志记录的实用工具和第三方库详解与示例

Vue开发实例（七）Axios的安装与使用

2024.3.6

抖音视频批量采集软件|视频评论下载工具

苹果 Vision Pro零售部件成本价格分析

Seurat 中的数据可视化方法

ImportError: cannot import name ‘InterpolationMode‘

HSRP和VRRP

C及C++每日练习(1)

Oracle 12c dataguard查看主备库同步情况的新变化

时间序列-AR MA ARIMA

Spring Boot(六十六)：集成Alibaba Druid 连接池

leetcode 经典题目42.接雨水

高防服务器的主要作用有哪些？

【30 天 JavaScript 挑战】学习笔记

生成 Linux/ubuntu/Debian 上已安装软件包的列表

精品中国货出海wordpress外贸独立站建站模板

使用Animated.View实现全屏页面可以向下拖动,松开手指页面返回原处的效果

减肥成功的人，都有这 4 个共同点

基于金橙子MarkEzd.dll的激光打标二次开发实战：从函数解析到自动化标刻系统构建

终极游戏MOD加载指南：5分钟学会使用ASI加载器提升游戏体验

Solidworks 2018+ 机器人模型避坑指南：用SW2URDF插件导出URDF，再导入Webots R2023a完整流程

如何高效下载B站视频：BiliDownloader终极使用教程

长期使用聚合API平台，对账单清晰度与费用追溯的满意度反馈

Altium Designer 21 规则设置保姆级指南：从新手到老鸟，这些默认值千万别乱动

2026年热门抠图软件怎么选？好用的抠图工具实测对比指南

为开源Agent框架Hermes配置Taotoken作为模型供应商

LinuxCNC新手到专家：5个步骤打造你的完美数控系统