当前位置：首页 > news >正文

（二十三）大数据实战——Flume数据采集之采集数据聚合案例实战

news 2026/5/15 18:41:02

前言

本节内容我们主要介绍一下Flume数据采集过程中，如何把多个数据采集点的数据聚合到一个地方供分析使用。我们使用hadoop101服务器采集nc数据，hadoop102采集文件数据，将hadoop101和hadoop102服务器采集的数据聚合到hadoop103服务器输出到控制台。其整体架构如下：

正文

①在hadoop101服务器的/opt/module/apache-flume-1.9.0/job/group1目录下创建job-nc-flume-avro.conf配置文件，用于监控nc发送的数据，通过avro sink传输到avro source

- job-nc-flume-avro.conf配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/apache-flume-1.9.0/a.log
a1.sources.r1.shell = /bin/bash -c
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop103
a1.sinks.k1.port = 4141
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

②在hadoop102服务器的/opt/module/apache-flume-1.9.0/job/group1目录下创建job-file-flume-avro.conf配置文件，用于监控目录/opt/module/apache-flume-1.9.0/a.log的数据，通过avro sink传输到avro source

- job-file-flume-avro.conf配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/apache-flume-1.9.0/a.log
a1.sources.r1.shell = /bin/bash -c
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop103
a1.sinks.k1.port = 4141
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

③在hadoop103服务器的/opt/module/apache-flume-1.9.0/job/group1目录下创建job-avro-flume-console.conf配置文件，用户将avro source聚合的数据输出到控制台

- job-avro-flume-console.conf配置文件

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop103
a1.sources.r1.port = 4141
# Describe the sink
# Describe the sink
a1.sinks.k1.type = logger
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

④ 在hadoop103启动job-avro-flume-console.conf任务

- 命令：

bin/flume-ng agent -c conf/ -n a1 -f job/group1/job-avro-flume-console.conf -Dflume.root.logger=INFO,console

⑤在hadoop101启动job-nc-flume-avro.conf任务

- 命令：

bin/flume-ng agent -c conf/ -n a1 -f job/group1/job-nc-flume-avro.conf -Dflume.root.logger=INFO,console

⑥在hadoop102启动job-file-flume-avro.conf任务

- 命令：

bin/flume-ng agent -c conf/ -n a1 -f job/group1/job-file-flume-avro.conf -Dflume.root.logger=INFO,console

⑦使用nc工具向hadoop101发送数据

- nc发送数据

- hadoop103接收到数据

⑧在hadoop102的a.log中写入数据

- 写入数据

- hadoop103接收到数据

结语

flume数据聚合就是为了将具有相同属性的数据聚合到一起，便于管理、分析、统计等。至此，关于Flume数据采集之采集数据聚合案例实战到这里就结束了，我们下期见。。。。。。

（二十三）大数据实战——Flume数据采集之采集数据聚合案例实战

前言

正文

结语

相关文章：

（二十三）大数据实战——Flume数据采集之采集数据聚合案例实战

Linux: network: dhcp: mtu 这个里面也有关于网卡的MTU设置；

Android中使用图片水印，并且能够在线下载字体并应用于水印

HTTP文件服务

nginx配置获取客户端的真实ip

1990-2022上市公司董监高学历工资特征信息数据/上市公司高管信息数据

Java程序连接 Mysql 超时问题 - 数据包过大，导致超时，# 配置网络超时时间 socketTimeout: 1800000

c++分层最短路（洛谷飞行路线）acwing版

Python bs4 BeautifulSoup库使用记录

Jmeter系列-插件安装（5）

spring aop源码解析

使用Unity的Input.GetAxis(““)控制物体移动、旋转

【CSS】画个三角形或圆形或环

AI项目六：基于YOLOV5的CPU版本部署openvino

记录YDLidar驱动包交叉编译时出现的一点问题

嵌入式学习笔记（32）S5PV210的向量中断控制器

linux下安装qt、qt触摸屏校准tslib

C++之unordered_map，unordered_set模拟实现

React Router，常用API有哪些？

JVM类加载和双亲委派机制

网络安全新态势与应对策略

3PEAK思瑞浦 TPA1811-S5TR SOT23-5 精密运放

环境配置与基础教程：保姆级教程：在 Mac M 芯片上利用 MPS 加速 YOLO 训练与推理的完整环境搭建

别再只会用StegSolve了！深入理解LSB隐写原理，手写Python脚本提取隐藏信息

从Figma到Midjourney的极简工作流革命：1套可复用的“视觉降噪SOP”（含内部团队验证版Checklist）

从 SU01 到 SAP HANA，DBMS 用户管理里的 SSO 选项到底在管什么

Android端ChatGPT应用开发：MVVM架构、流式响应与性能优化实践

告别繁琐操作：用League Akari重新定义英雄联盟游戏体验

day15 C语言指针3

基于MCP协议的TikTok趋势数据获取与AI助手集成实战