当前位置：首页 > news >正文

大数据实时数仓Hologres（三）：存储格式介绍

news 2026/5/12 8:43:31

文章目录

存储格式介绍

一、格式

二、使用建议

三、技术原理

1、列存

2、行存

3、行列共存

四、使用示例

存储格式介绍

一、格式

在Hologres中支持行存、列存和行列共存三种存储格式，不同的存储格式适用于不同的场景。在建表时通过设置orientation属性指定表的存储格式，语法如下：

-- 2.1版本起支持
CREATE TABLE <table_name> (...) WITH (orientation = '[column | row | row,column]');-- 所有版本支持
BEGIN;
CREATE TABLE <table_name> (...);
call set_table_property('<table_name>', 'orientation', '[column | row | row,column]');
COMMIT;

注意事项：

orientation：指定了数据库表在Hologres中的存储模式是列存还是行存，Hologres从 V1.1版本开始支持行列共存的模式。
建表时默认为列存（column storage）形式。行存或行列共存需要在建表时显式指定。修改表的存储格式需要重新建表，不能直接转换。

二、使用建议

表的存储模式使用建议如下：

三、技术原理

1、列存

begin;
create table public.tbl_col (
id text NOT NULL,
name text NOT NULL,
class text NOT NULL,
in_time TIMESTAMPTZ NOT NULL,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_col', 'orientation', 'column');
call set_table_property('public.tbl_col', 'clustering_key', 'class');
call set_table_property('public.tbl_col', 'bitmap_columns', 'name');
call set_table_property('public.tbl_col', 'event_time_column', 'in_time');
commit;
select * from public.tbl_col where id ='3333';
select id, class,name from public.tbl_col where id < '3333' order by id;

示意图如下图：

2、行存

如果Hologres的表设置的是行存，那么数据将会按照行存储。行存默认使用SST格式，数据按照Key有序分块压缩存储，并且通过Block Index、Bloom Filter等索引，以及后台Compaction机制对文件进行整理，优化点查查询效率。

PK和Clustering Key一致

系统会为每张表在底层存储一个主键索引文件，详情请参见主键Primary Key。行存表设置了Primary Key（PK）的场景，系统会自动生成一个Row Identifier（RID），RID用于定位整行数据，同时系统也会将PK设置为Distribution Key和Clustering Key，这样就能快速定位到数据所在的Shard和文件，在基于主键查询的场景上，只需要扫描一个主键就能快速拿到所有列的全行数据，提升查询效率。

PK和Clustering Key不一致

如果在建表时，设置表为行存表，且将PK和Clustering Key设置为不同的字段，查询时，系统会根据PK定位到Clustering Key和RID，再通过Clustering Key和RID快速定位到全行数据，相当于扫描了两次，有一定的性能牺牲。

（推荐）设置主键Primary Key

begin;
create table public.tbl_row (id text NOT NULL,name text NOT NULL,class text ,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_row', 'orientation', 'row');
call set_table_property('public.tbl_row', 'clustering_key', 'id');
call set_table_property('public.tbl_row', 'distribution_key', 'id');
commit;--基于PK的点查示例
select * from public.tbl_row where id ='1111';--查询多个key
select * from public.tbl_row where id in ('1111','2222','3333');

begin;
create table public.tbl_row (id text NOT NULL,name text NOT NULL,class text ,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_row', 'orientation', 'row');
call set_table_property('public.tbl_row', 'clustering_key', 'id');
call set_table_property('public.tbl_row', 'distribution_key', 'id');
commit;--基于PK的点查示例
select * from public.tbl_row where id ='1111';

设置的PK和Clustering Key不一致(不建议使用)

begin;
create table public.tbl_row (id text NOT NULL,name text NOT NULL,class text ,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_row', 'orientation', 'row');
call set_table_property('public.tbl_row', 'clustering_key', 'name');
call set_table_property('public.tbl_row', 'distribution_key', 'id');
commit;

行存总结：

行存表非常适用于基于PK的点查场景，能够实现高QPS的点查。
建表时建议只设置PK，系统会自动将PK设置为Distribution Key和Clustering Key，以提升查询性能。
不建议将PK和Clustering Key设置为不同的字段，设置为不同的字段会有一定的性能牺牲。

3、行列共存

在实际应用场景中，一张表可能用于主键点查，又用于OLAP查询，因此Hologres在V1.1版本支持了行列共存的存储格式。行列共存同时拥有行列和列存的能力，既支持高性能的基于PK点查，又支持OLAP分析。数据在底层存储时会存储两份，一份按照行存格式存储，一份按照列存格式存储，因此会带来更多的存储开销。

数据写入时，会同时写一份行存格式和写一份列存格式，只有两份数据都写完了才会返回成功，保证数据的原子性。
数据查询时，优化器会根据SQL，解析出对应的执行计划，执行引擎会根据执行计划判断走行存还是列存的查询效率更高，要求行列共存的表必须设置主键。

因此行列共存表在通常查询场景，尤其是非主键点查场景，查询效率更好，示例：

begin;
create table public.tbl_row_col (
id text NOT NULL,
name text NOT NULL,
class text ,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_row_col', 'orientation','row,column');
call set_table_property('public.tbl_row_col', 'distribution_key','id');
call set_table_property('public.tbl_row_col', 'clustering_key','class');
call set_table_property('public.tbl_row_col', 'bitmap_columns','name');
commit;SELECT * FROM public.tbl_row_col where id ='2222'; --基于主键的点查
SELECT * FROM public.tbl_row_col where class='二班';--非主键点查
SELECT * FROM public.tbl_row_col where id ='2222' and class='二班'; --普通OLAP查

示意图如下：

四、使用示例

创建不同存储模式的表使用示例如下：

--建行存表
begin;
create table public.tbl_row (a integer NOT NULL,b text NOT NULL,PRIMARY KEY (a)
);
call set_table_property('public.tbl_row', 'orientation', 'row');
commit;--建列存表
begin;
create table tbl_col (a int not null, b text not null);
call set_table_property('tbl_col', 'orientation', 'column');
commit;--建行列共存
begin;
create table tbl_col_row (pk  text  not null, col1 text, col2 text, col3 text, PRIMARY KEY (pk)); 
call set_table_property('tbl_col_row', 'orientation', 'row,column');
commit;

📢博客主页：https://lansonli.blog.csdn.net
📢欢迎点赞 👍 收藏 ⭐留言 📝 如有错误敬请指正！
📢本文由 Lansonli 原创，首发于 CSDN博客🙉
📢停下休息的时候不要忘了别人还在奔跑，希望大家抓紧时间学习，全力奔赴更美好的生活✨

大数据实时数仓Hologres（三）：存储格式介绍

存储格式介绍

一、格式

二、使用建议

三、技术原理

1、列存

2、行存

3、行列共存

四、使用示例

相关文章：

大数据实时数仓Hologres（三）：存储格式介绍

关于vue2+uniapp+uview+vuex 私募基金项目小程序总结

多线程（一）：线程的基本特点线程安全问题ThreadRunnable

启动hadoop集群出现there is no HDFS_NAMENODE_USER defined.Aborting operation

Redis实现短信登录解决状态登录刷新的问题

33. java快速排序

普通二叉搜索树的模拟实现【C++】

unity 介绍Visual Scripting Scene Variables

linux服务器部署filebeat

个人获取Wiley 、ScienceDirect、SpringerLink三个数据库文献的方法

Java五子棋

【从0开始自动驾驶】用python做一个简单的自动驾驶仿真可视化界面

一拖二快充线：单接与双接的多场景应用

接口自动化测试概述

Fingerprint.js：精准用户识别的浏览器指纹技术

Gson将对象转换为JSON（学习笔记）

什么是IPv6

python画图|放大和缩小图像

Mac优化清理工具CleanMyMac X 4.15.6 for mac中文版

资质申请中常见的错误有哪些？

B站视频转文字终极指南：3分钟学会用开源工具提取视频内容

WechatDecrypt技术实现：如何通过开源工具实现微信数据本地解密与隐私保护

Vivado 伪双口RAM IP核的配置精髓与实战避坑指南

别再只点保存了！QGIS工程文件.QGZ和.QGS到底怎么选？附XML结构详解

cpdown：精准下载Git仓库文件，告别克隆整个项目的低效操作

规则驱动流程引擎：告别if-else，构建灵活业务自动化核心

英雄联盟终极助手：League Akari 完整使用指南

BurstGPT：大语言模型驱动高性能计算，实现自然语言科学仿真

保姆级教程：用Winbox给ROS配置一线多拨，实测200M宽带叠加效果（附避坑指南）

隐私优先的API密钥泄露检测工具：compromising-position设计与实战