当前位置：首页 > news >正文

大数据课程E7——Flume的Interceptor

news 2025/12/20 16:22:35

文章作者邮箱：yugongshiye@sina.cn 地址：广东惠州

▲ 本章节目的

⚪ 了解Interceptor的概念和配置参数；

⚪ 掌握Interceptor的使用方法；

⚪ 掌握Interceptor的Host Interceptor；

⚪ 掌握Interceptor的Static Interceptor；

⚪ 掌握Interceptor的UUID Interceptor；

⚪ 掌握Interceptor的Search And Replace Interceptor；

⚪ 掌握Interceptor的Regex Filtering Interceptor；

⚪ 掌握Interceptor的Custom Interceptor；

一、Timestamp Interceptor

1. 概述

1. Timestamp Interceptor是在headers中来添加一个timestamp字段来标记数据被收集的时间。

2. Timestamp Interceptor结合HDFS Sink可以实现数据按天存储。

2. 配置属性

属性	解释
type	timestamp

3. 案例

1. 编写格式文件，添加如下内容：

a1.sources = s1

a1.channels = c1

a1.sinks = k1

a1.sources.s1.type = netcat

a1.sources.s1.bind = 0.0.0.0

a1.sources.s1.port = 8090

# 给Interceptor起名

a1.sources.s1.interceptors = i1

# 指定Timestamp Interceptor

a1.sources.s1.interceptors.i1.type = timestamp

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

2. 启动Flume：

../bin/flume-ng agent -n a1 -c ../conf -f in.conf -

Dflume.root.logger=INFO,console

4. 数据按天存放

1. 编写格式文件，添加如下内容：

a1.sources = s1

a1.channels = c1

a1.sinks = k1

a1.sources.s1.type = netcat

a1.sources.s1.bind = hadoop01

a1.sources.s1.port = 8090

a1.sources.s1.interceptors = i1

a1.sources.s1.interceptors.i1.type = timestamp

a1.channels.c1.type = memory

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://hadoop01:9000/flumedata/date=%Y-%m-%d

a1.sinks.k1.hdfs.fileType = DataStream

a1.sinks.k1.hdfs.rollInterval = 3600

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

2. 启动Flume：

../bin/flume-ng agent -n a1 -c ../conf -f hdfsin.conf -

Dflume.root.logger=INFO,console

二、Host Interceptor

1. 概述

1. Host Interceptor是在headers中添加一个字段host。

2. Host Interceptor可以用于标记数据来源于哪一台主机。

2. 配置属性

属性	解释
type	必须是host

3. 案例

1. 编写格式文件，添加如下内容：

a1.sources = s1

a1.channels = c1

a1.sinks = k1

a1.sources.s1.type = netcat

a1.sources.s1.bind = 0.0.0.0

a1.sources.s1.port = 8090

# 给Interceptor起名

a1.sources.s1.interceptors = i1 i2

# 指定Timestamp Interceptor

a1.sources.s1.interceptors.i1.type = timestamp

# 指定Host Interceptor

a1.sources.s1.interceptors.i2.type = host

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

2. 启动Flume：

../bin/flume-ng agent -n a1 -c ../conf -f in.conf -

Dflume.root.logger=INFO,console

三、Static Interceptor

1. 概述

1. Static Interceptor是在headers中添加指定字段。

2. 可以利用这个Interceptor来标记数据的类型。

2. 配置属性

属性	解释
type	必须是static
key	指定在headers中的字段名
value	指定在headers中的字段值

3. 案例

1. 编写格式文件，添加如下内容：

a1.sources = s1

a1.channels = c1

a1.sinks = k1

a1.sources.s1.type = netcat

a1.sources.s1.bind = 0.0.0.0

a1.sources.s1.port = 8090

# 给Interceptor起名

a1.sources.s1.interceptors = i1 i2 i3

# 指定Timestamp Interceptor

a1.sources.s1.interceptors.i1.type = timestamp

# 指定Host Interceptor

a1.sources.s1.interceptors.i2.type = host

# 指定Static Interceptor

a1.sources.s1.interceptors.i3.type = static

a1.sources.s1.interceptors.i3.key = kind

a1.sources.s1.interceptors.i3.value = log

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

2. 启动Flume：

../bin/flume-ng agent -n a1 -c ../conf -f in.conf -

Dflume.root.logger=INFO,console

四、UUID Interceptor

1. 概述

1. UUID Interceptor是在headers中添加一个id字段。

2. 可以用于标记数据的唯一性。

2. 配置属性

属性	解释
type	必须是org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder

3. 案例

1. 编写格式文件，添加如下内容：

a1.sources = s1

a1.channels = c1

a1.sinks = k1

a1.sources.s1.type = netcat

a1.sources.s1.bind = 0.0.0.0

a1.sources.s1.port = 8090

# 给Interceptor起名

a1.sources.s1.interceptors = i1 i2 i3 i4

# 指定Timestamp Interceptor

a1.sources.s1.interceptors.i1.type = timestamp

# 指定Host Interceptor

a1.sources.s1.interceptors.i2.type = host

# 指定Static Interceptor

a1.sources.s1.interceptors.i3.type = static

a1.sources.s1.interceptors.i3.key = kind

a1.sources.s1.interceptors.i3.value = log

# 指定UUID Interceptor

a1.sources.s1.interceptors.i4.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

2. 启动Flume：

../bin/flume-ng agent -n a1 -c ../conf -f in.conf -

Dflume.root.logger=INFO,console

五、Search And Replace Interceptor

1. 概述

1. Search And Replace Interceptor在使用的时候，需要指定正则表达式，会根据正则表达式的规则，将符合正则表达式的数据替换为指定形式的数据。

2. 在替换的时候，不会替换headers中的数据，而是会替换body中的数据。

2. 配置属性

属性	解释
type	必须是search_replace
searchPattern	指定要匹配的正则形式
replaceString	指定要替换的字符串

3. 案例

1. 编写格式文件，添加如下内容：

a1.sources = s1

a1.channels = c1

a1.sinks = k1

a1.sources.s1.type = http

a1.sources.s1.port = 8090

# 给拦截器起名

a1.sources.s1.interceptors = i1

# 指定类型

a1.sources.s1.interceptors.i1.type = search_replace

a1.sources.s1.interceptors.i1.searchPattern = [0-9]

a1.sources.s1.interceptors.i1.replaceString = *

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

2. 启动Flume：

../bin/flume-ng agent -n a1 -c ../conf -f searchin.conf -

Dflume.root.logger=INFO,console

六、Regex Filtering Interceptor

1. 概述

1. Regex Filtering Interceptor在使用的时候需要指定正则表达式。

2. 属性excludeEvents的值如果不指定，默认是false。

3. 如果没有配置excludeEvents的值或者配置excludeEvents的值配置为false，则只有符合正则表达式的数据会留下来，其他不符合正则表达式的数据会被过滤掉；如果excludeEvents的值，那么符合正则表达式的数据会被过滤掉，其他的数据则会被留下来。

2. 配置属性

属性	解释
type	必须是regex_filter
regex	指定正则表达式
excludeEvents	true或者false

3. 案例

1. 编写格式文件，添加如下内容：

# 定义数据源(输入端) 缓冲区输出源(输出端)

a1.sources = r1

a1.channels = c1

a1.sinks = k1

# 输入端

a1.sources.r1.type = spooldir

a1.sources.r1.spoolDir = /opt/upload

a1.sources.r1.fileSuffix = .done

# 拦截器

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = regex_filter

#全部都是符合条件的数据

a1.sources.r1.interceptors.i1.regex = ^.*INFO.*$

#排除符合正则表达式的数据

# a1.sources.r1.interceptors.i1.excludeEvents = true

# 输出端

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://flume45:9000/interceptors/%Y%m%d/%H

#是否使用本地时间戳

a1.sinks.k1.hdfs.useLocalTimeStamp = true

# 序列化

a1.sinks.k1.hdfs.fileType = DataStream

a1.sinks.k1.hdfs.rollInterval = 0

# 使用一个在内存中缓冲事件的通道

a1.channels.c1.type = memory

# 连接通道

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

2. 启动Flume：

../bin/flume-ng agent -n a1 -c ../conf -f regexin.conf -

Dflume.root.logger=INFO,console

七、Custom Interceptor

1. 概述

1. 在Flume中，也允许自定义拦截器。但是不同于其他组件，自定义Interceptor的时候，需要再额外覆盖其中的内部接口。

2. 步骤：

a. 构建Maven工程，导入对应的依赖。

b. 自定义一个类实现Interceptor接口，覆盖其中initialize，intercept和close方法。

c. 定义静态内部类，实现Interceptor.Builder内部接口。

d. 打成jar包方法Flume安装目录的lib目录下。

e. 编写格式文件，添加如下内容：

a1.sources = s1

a1.channels = c1

a1.sinks = k1

a1.sources.s1.type = netcat

a1.sources.s1.bind = 0.0.0.0

a1.sources.s1.port = 8090

# 指定拦截器

a1.sources.s1.interceptors = i1

a1.sources.s1.interceptors.i1.type = cn.tedu.flume.interceptor.AuthInterceptor$Builder

a1.channels.c1.type = memory

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

f. 启动Flume：

../bin/flume-ng agent -n a1 -c ../conf -f authin.conf -

Dflume.root.logger=INFO,console

大数据课程E7——Flume的Interceptor

文章作者邮箱：yugongshiyesina.cn 地址：广东惠州 ▲ 本章节目的 ⚪ 了解Interceptor的概念和配置参数； ⚪ 掌握Interceptor的使用方法； ⚪ 掌握Interceptor的Host Interceptor； ⚪ 掌握Interceptor的…...

编程日记 2023/8/1 19:00:42

P2P网络NAT穿透原理(打洞方案)

1.关于NAT NAT技术（Network Address Translation，网络地址转换）是一种把内部网络（简称为内网）私有IP地址转换为外部网络（简称为外网）公共IP地址的技术，它使得一定范围内的多台主机只…...

编程日记 2023/8/1 18:59:41

Gof23设计模式之桥接外观模式

1.概述又名门面模式，是一种通过为多个复杂的子系统提供一个一致的接口，而使这些子系统更加容易被访问的模式。该模式对外有一个统一接口，外部应用程序不用关心内部子系统的具体的细节，这样会大大降低应用程序的复杂度&#xff0…...

编程日记 2023/8/1 18:58:39

微服务性能分析工具 Pyroscope 初体验

Go 自带接口性能分析工具 pprof，较为常用的有以下 4 种分析： CPU Profiling: CPU 分析，按照一定的频率采集所监听的应用程序 CPU（含寄存器）的使用情况，可确定应用程序在主动消耗 CPU 周期时花费时间的位置…...

编程日记 2023/8/1 18:57:38

工作记录------单元测试（持续更新）

工作记录------单元测试之前的工作中从来没有写过单元测试，新入职公司要求写单元测试， 个人觉得，作为程序员单元测试还是必须会写的于此记录一下首次编写单元测试的过程。首先引入单元测试相关的依赖 <dependency><groupId>…...

编程日记 2023/8/1 18:56:38

C#再windowForm窗体中绘画扇形并给其填充颜色 Graphics graphics this.CreateGraphics();graphics.SmoothingMode SmoothingMode.AntiAlias;int width this.Width;int height this.Height;h this.Height;w this.Width;Rectangle rct new Rectangle(0 - h / 6, 0 - h / 6…...

编程日记 2023/8/1 18:55:37

MBA拓展有感-见好就收，还是挑战到底？MBA拓展有感-见好就收，还是挑战到底？

今天看到新闻提到某位坚持了14年高考的同学滑档，让人心生感叹：无论在日常工作还是生活中，选择都是非常重要的。不由想起前段时间我参加研究生新生拓展时的一些感悟，和大家分享一下。事情的起因是拓展活动中的一个分队竞技类的活…...

编程日记 2023/8/1 18:54:35

综合布线系统光缆分类及其特点？

综合布线系统光缆是一种用于数据传输和通信的电缆，常用于建筑物内部网络和通信系统的布线。光缆采用光纤作为传输介质，能够以光的形式传输大量数据，具有高带宽、低延迟、抗干扰等特点，适用于高速数据传输和长距离通信需求。光缆…...

编程日记 2023/8/1 18:53:32

前端构建（打包）工具发展史

大多同学的前端学习路线：三件套框架慢慢延伸到其他，在这个过程中，有一个词出现的频率很高：webpack 。作为一个很出名的前端构建工具我们在网上随便一搜，就会有各种教程：loader plugin entry吧啦吧啦。但…...

编程日记 2023/8/1 18:52:31

【数据可视化】（一）数据可视化概述

目录 0.本章节概述一、数据可视化 1、什么是数据可视化？ 2、数据可视化的好处 3、数据可视化的用途二、数据探索 1、数据相关工具的使用情景： 2、探索性查询三、数据挑战 1、什么是数据挑战？...

编程日记 2023/8/1 18:51:30

GoogleLeNet Inception V2 V3

文章目录卷积核分解第一步分解，对称分解第二步分解，非对称分解在Inception中的改造一般模型的参数节省量可能导致的问题针对两个辅助分类起的改造特征图尺寸缩减Model Regularization via Label Smoothing——LSR问题描述，也就是LSR解决什么…...

编程日记 2023/8/1 18:50:29

【css】背景图片附着

属性：background-attachment 属性指定背景图像是应该滚动还是固定的（不会随页面的其余部分一起滚动）。 background-attachment: fixed：为固定； background-attachment: scroll为滚动代码： <!DOCTYPE h…...

编程日记 2023/8/1 18:49:28

解决运行flutter doctor --android-licenses时报错

问题描述： 配置flutter环境时，会使用flutter doctor命令来检查运行flutter的相关依赖是否配好。能看到还差 Android license status unknown.未解决。 C:\Users\ipkiss.wu>flutter doctor Flutter assets will be downloaded from https://storage.…...

编程日记 2023/8/1 18:48:26

在使用Python爬虫时遇到503 Service Unavailable错误解决办法汇总

在进行Python爬虫的过程中，有时会遇到503 Service Unavailable错误，这意味着所请求的服务不可用，无法获取所需的数据。为了解决这个常见的问题，本文将提供一些解决办法，希望能提供实战价值，让爬虫任务顺利完…...

编程日记 2023/8/1 18:47:25

小研究 - 主动式微服务细粒度弹性缩放算法研究（一）

微服务架构已成为云数据中心的基本服务架构。但目前关于微服务系统弹性缩放的研究大多是基于服务或实例级别的水平缩放，忽略了能够充分利用单台服务器资源的细粒度垂直缩放，从而导致资源浪费。为此，本文设计了主动式微服务细粒度弹性缩放算法…...

编程日记 2023/8/1 18:46:24

【LeetCode】215.数组中的第K个最大元素

题目给定整数数组 nums 和整数 k，请返回数组中第 k 个最大的元素。请注意，你需要找的是数组排序后的第 k 个最大的元素，而不是第 k 个不同的元素。你必须设计并实现时间复杂度为 O(n) 的算法解决此问题。示例 1: 输入: [3,2,1,5,6,4…...

编程日记 2023/8/1 18:45:22

MySQL学习记录：第七章存储过程和函数

文章目录第七章存储过程和函数一、存储过程1、创建语法*2、调用语法（1）空参列表（2）创建带in参数模式的存储过程，需终端运行（3）创建带out参数模式的存储过程，需终端运行（4）创建带inout参数模式的存储过程，需终端运行3、删除存储过程4、查看存储过程的信息二、函数…...

编程日记 2023/8/1 18:44:20

Docker中gitlab以及gitlab-runner的安装与使用

1、本文主要讲述如何使用Docker安装gitlab以及gitlab-runner，并且会讲述gitlab-runner如何使用 2、gitlab部分不需要修改过多的配置即可使用，本文未讲述https配置，如有需求，可自行百度 3、Docker如何安装可以自行百度一、Docker安…...

编程日记 2023/8/1 18:43:19

一起学SF框架系列5.12-spring-beans-数据绑定dataBinding

数据绑定有助于将用户输入动态绑定到应用程序的域模型（或用于处理用户输入的任何对象），主要用于web层，但实际可用于任何层。Spring提供了DataBinder来做到这一点，并提供了Validator进行数据验证，两者组成了…...

编程日记 2023/8/1 18:42:18

火热报名中 | 赛宁独家技术支持第七届“蓝帽杯”网络安全技能大赛

由公安部网络安全保卫局、教育部教育管理信息中心、中国教育协会指导，中国人民公安大学主办，奇安信科技集团股份有限公司协办，南京赛宁信息技术有限公司提供技术支持的2023第七届“蓝帽杯”全国大学生网络安全技能大赛于近日正式开启报名。 …...

编程日记 2023/8/1 18:41:17

Java 语言特性(面试系列1)

一、面向对象编程 1. 封装（Encapsulation） 定义：将数据（属性）和操作数据的方法绑定在一起，通过访问控制符（private、protected、public）隐藏内部实现细节。示例： public …...

编程新知 2025/10/6 4:03:40

R语言AI模型部署方案：精准离线运行详解

R语言AI模型部署方案：精准离线运行详解一、项目概述本文将构建一个完整的R语言AI部署解决方案，实现鸢尾花分类模型的训练、保存、离线部署和预测功能。核心特点： 100%离线运行能力自包含环境依赖生产级错误处理跨平台兼容性模型版本管理# 文件结构说明 Iris_AI_Deployme…...

编程新知 2025/12/6 1:56:35

条件运算符

C中的三目运算符（也称条件运算符，英文：ternary operator）是一种简洁的条件选择语句，语法如下： 条件表达式 ? 表达式1 : 表达式2• 如果“条件表达式”为true，则整个表达式的结果为“表达式1”…...

编程新知 2025/12/6 22:50:48

c++ 面试题(1)-----深度优先搜索（DFS）实现

操作系统：ubuntu22.04 IDE:Visual Studio Code 编程语言：C11 题目描述地上有一个 m 行 n 列的方格，从坐标 [0,0] 起始。一个机器人可以从某一格移动到上下左右四个格子，但不能进入行坐标和列坐标的数位之和大于 k 的格子。例…...

编程新知 2025/11/5 20:18:24

HBuilderX安装（uni-app和小程序开发）

下载HBuilderX 访问官方网站：https://www.dcloud.io/hbuilderx.html 根据您的操作系统选择合适版本： Windows版（推荐下载标准版） Windows系统安装步骤运行安装程序： 双击下载的.exe安装文件如果出现安全提示&…...

编程新知 2025/12/15 13:32:19

微服务商城-商品微服务

数据表 CREATE TABLE product (id bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 商品id,cateid smallint(6) UNSIGNED NOT NULL DEFAULT 0 COMMENT 类别Id,name varchar(100) NOT NULL DEFAULT COMMENT 商品名称,subtitle varchar(200) NOT NULL DEFAULT COMMENT 商…...

编程新知 2025/12/9 4:34:53

解决本地部署 SmolVLM2 大语言模型运行 flash-attn 报错

出现的问题安装 flash-attn 会一直卡在 build 那一步或者运行报错解决办法是因为你安装的 flash-attn 版本没有对应上，所以报错，到 https://github.com/Dao-AILab/flash-attention/releases 下载对应版本，cu、torch、cp 的版本一定要对…...

编程新知 2025/11/3 2:11:24

JUC笔记(上)-复习涉及死锁 volatile synchronized CAS 原子操作

一、上下文切换即使单核CPU也可以进行多线程执行代码，CPU会给每个线程分配CPU时间片来实现这个机制。时间片非常短，所以CPU会不断地切换线程执行，从而让我们感觉多个线程是同时执行的。时间片一般是十几毫秒(ms)。通过时间片分配算法执行。…...

编程新知 2025/12/16 16:33:37

GC1808高性能24位立体声音频ADC芯片解析

1. 芯片概述 GC1808是一款24位立体声音频模数转换器（ADC），支持8kHz~96kHz采样率，集成Δ-Σ调制器、数字抗混叠滤波器和高通滤波器，适用于高保真音频采集场景。 2. 核心特性高精度：24位分辨率&#xff0c…...

编程新知 2025/12/20 16:13:31

IP如何挑？2025年海外专线IP如何购买？

你花了时间和预算买了IP，结果IP质量不佳，项目效率低下不说，还可能带来莫名的网络问题，是不是太闹心了？尤其是在面对海外专线IP时，到底怎么才能买到适合自己的呢？所以，挑IP绝对是个技…...

编程新知 2025/10/19 9:49:56

▲ 本章节目的

一、Timestamp Interceptor

1. 概述

2. 配置属性

3. 案例

4. 数据按天存放

二、Host Interceptor

1. 概述

2. 配置属性

3. 案例

三、Static Interceptor

1. 概述

2. 配置属性

3. 案例

四、UUID Interceptor

1. 概述

2. 配置属性

3. 案例

五、Search And Replace Interceptor

1. 概述

2. 配置属性

3. 案例

六、Regex Filtering Interceptor

1. 概述

2. 配置属性

3. 案例

七、Custom Interceptor

1. 概述

相关文章：