当前位置：首页 > news >正文

大数据中间件——Kafka

news 2025/10/19 14:11:31

Kafka安装配置

首先我们把kafka的安装包上传到虚拟机中：

解压到对应的目录并修改对应的文件名：

首先我们来到kafka的config目录，我们第一个要修改的文件就是server.properties文件，修改内容如下：

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.# see kafka.server.KafkaConfig for additional details and defaults############################# Server Basics ############################## The id of the broker. This must be set to a unique integer for each broker.
# kafka在整个集群中的身份标识，集群中的id是唯一的
broker.id=0############################# Socket Server Settings ############################## The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600############################# Log Basics ############################## A comma separated list of directories under which to store log files
# 存储kafka数据的位置，默认存储在临时文件夹，要修改成自己的文件夹
log.dirs=/opt/model/kafka/datas# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1############################# Log Flush Policy ############################## Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000############################# Log Retention Policy ############################## The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000############################# Zookeeper ############################## Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
# 连接的zookeeper集群，需要将集群中部署zookeeper的所有节点写入
# 首先，在zookeeper中，数据的存储是以目录树的方式去存储的，如果后期我们的kafka的数据要修改，在不做任何的修改的情况下，默认是存储在zookeeper根目录下的，这样我们想要单独提取出zookeeper的数据就非常的麻烦
# 所以我们将kafka的数据的单独存储在一个文件分支中，这就是我们为什么要在最后写一个[/kafka]的原因。
# 前面写多个节点是为了防止单个zookeeper节点无法连接可以使用其他的zookeeper节点
zookeeper.connect=node1:2181,node2:2181,node3:2181/kafka# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000############################# Group Coordinator Settings ############################## The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

主要修改三个部分，一个是唯一标识id，kafka的文件存储路径，一个是zookeeper的节点地址。

然后我们将kafka的安装包分发到其他的节点中。

注意在分发完成之后，不要忘记修改不同节点中的唯一标识id的值。

然后我们就可以启动kafka的服务了，注意在启动kafka的服务之前，我们必须要启动zookeeper的服务。

kafka和zookeeper一样，也是要在每个节点中都分别执行启动脚本，并且kafka的启动脚本需要手动指定配置文件：

./kafka-server-start.sh -daemon ../config/server.properties

注意，我的kafka的地址和你们的可能不一样，但是只需要知道启动命令在bin目录下，配置文件在conf目录下即可，我们在三台虚拟机上分别执行脚本：

当我们看到在集群中出现kafka的进程之后，就表示我们的kafka集群启动成功了。

大数据中间件——Kafka

Kafka安装配置首先我们把kafka的安装包上传到虚拟机中： 解压到对应的目录并修改对应的文件名： 首先我们来到kafka的config目录，我们第一个要修改的文件就是server.properties文件，修改内容如下： # Licensed to the …...

编程日记 2023/10/19 6:09:23

HarmonyOS/OpenHarmony原生应用-ArkTS万能卡片组件Slider

滑动条组件，通常用于快速调节设置值，如音量调节、亮度调节等应用场景。该组件从API Version 7开始支持。无子组件一、接口 Slider(options?: {value?: number, min?: number, max?: number, step?: number, style?: SliderStyle, direction?: Ax…...

编程日记 2023/10/19 6:08:22

一、配置文件要增加 spring.cloud.sentinel.webContextUnify: false二、在要限流的业务方法上使用SentinelResource注解 package cn.edu.tju.service;import com.alibaba.csp.sentinel.annotation.SentinelResource; import com.alibaba.csp.sentinel.slots.block.BlockExcept…...

编程日记 2023/10/19 6:07:22

UML 中的关系

种类继承、实现、组合、聚合、关联、依赖理解继承和实现的关系强度最大。组合代表着实体之间共同构成一个主体内部的组成部分无法单独支撑，聚合则代表层级更高的一种关联涉及的实体都是独立的个体共同组合起来构成一个主体个体之间是可以单独工作的。组合和…...

编程日记 2023/10/19 6:06:21

ChatGPT技术或加剧钓鱼邮件攻击

我们对ChatGPT这一新技术并不陌生，也早就听闻ChatGPT可以通过某种方式绕过安全机制，对目标进行入侵。 ChatGPT的“越狱”技术已经迭代数次，甚至有了先进的“邪恶GPT”WormGPT和FraudGPT，两者都能快速实现钓鱼邮件骗局。安全分析…...

编程日记 2023/10/19 6:05:20

哨兵1号后向散射系数土壤水分反演

哨兵1号后向散射系数土壤水分反演数据导入打开之前预处理之后的VH和VV极化的后向散射系数转存的tiff文件导入实测点选择KML转图层 kml文件是由奥维地图导出的.ovkml格式改后缀名得到的提取采样点的后向散射系数选择多值提取至点右键打开点图层的属性表，发现…...

编程日记 2023/10/19 6:04:19

day3：Node.js 基础知识

day3：Node.js 基础知识文章目录 day3：Node.js 基础知识创建第一个应用事件循环机制异步编程模块系统函数与回调函数路由和全局对象创建第一个应用实例如下，在你项目的根目录下创建一个叫 helloworld.js 的文件，并写入以下代码： var http = require(http);http.cre…...

编程日记 2023/10/19 6:03:18

【RDMA】librdmacm库和连接建立过程

翻译：rdma_cm - RDMA通信管理器。概述：rdma_cm是用于建立RDMA传输上的通信的管理器。说明：RDMA CM是一个用于建立可靠连接和不可靠数据报数据传输的通信管理器。它为建立连接提供了一个RDMA传输中立的接口。该API基于套接字，但…...

编程日记 2023/10/19 6:02:17

如何使用Python抓取PDF文件并自动下载到本地

目录一、导入必要的库二、发送HTTP请求并获取PDF文件内容三、将PDF文件内容写入到本地文件中四、完整代码示例五、注意事项六、错误处理和异常处理七、进一步优化总结在Python中，抓取PDF文件并自动下载到本地需要使用几个不同的库。首先&#xff0…...

编程日记 2023/10/19 6:00:14

人脸写真FaceChain的简单部署记录（一）

由【让你拥有专属且万能的AI摄影师AI修图师——FaceChain迎来最大版本更新】这篇文章开始出发进行人脸写真的尝试，笔者之前modelscope申请过免费额度，这里有适配的GPU环境可以提供测试。但是很难抢到GPU资源，需要等待很久，可能才…...

编程日记 2023/10/19 5:59:13

linux虚机新增加磁盘后在系统中查不到

问题描述在虚机管理平台上对某一linux主机添加了一块硬盘，但在系统中并未显示通过执行 lsblk，并未看到新增的硬盘信息解决方法 1. 可通过重启服务器解决 2. 如果不能重启服务器，可重新扫描下 scsi总线查看总线： ls /s…...

编程日记 2023/10/19 5:58:12

js中隐式类型转换与toPrimitive

前言我们知道Js的隐式类型转换主要出现在有运算符的情况下【逻辑运算符、关系运算符、算术运算符】。那么在接触toPrimitive之前，我们需要先知道其他值到某个类型值的转换规则。其他值到数值的转换规则 Boolean： true — 1 false — 0 Null&#xf…...

编程日记 2023/10/19 5:57:11

家政系统预约小程序具备哪些功能？

预约家政小程序有这么大的市场需求加上这么多的好处，相信未来发展前景不错。也必将吸引很多商家投资者着手开发属于自己的上门家政APP小程序软件，在实际的开发过程中需要具备哪些功能呢？ 一、用户端功能： 1. 用户注册登录&#x…...

编程日记 2023/10/19 5:56:10

【LeetCode】46. 全排列

1 问题给定一个不含重复数字的数组 nums ，返回其所有可能的全排列。你可以按任意顺序返回答案。示例 1： 输入：nums [1,2,3] 输出：[[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]] 示例 2： 输入&#x…...

编程日记 2023/10/19 5:55:09

宏电股份RedCap产品亮相迪拜华为MBBF，并参与RedCap全球商用阶段性成果发布

10月10-11日，由华为主办的第十四届全球移动宽带论坛（MBBF）在阿联酋迪拜成功举办。MBBF期间，华为联合宏电股份等产业伙伴集中发布RedCap商用阶段性成果。本次发布是RedCap产业的关键里程碑，标志着RedCap在全球已具备规模…...

编程日记 2023/10/19 5:54:07

Harris图像角点检测

角点检测算法大致有三类：基于灰度图像的角点检测，基于二值图像的角点检测，基于轮廓曲线的角点检测。基于灰度图像的角点检测又可分为基于梯度、基于模板和基于模板梯度组合3类方法，其中基于模板的方法主要考虑像素领域点的灰度变化，即图像亮度的变化，将与邻点亮度对比足够…...

编程日记 2023/10/19 5:53:07

互联网Java工程师面试题·Java 总结篇·第七弹

目录 68、Java 中如何实现序列化，有什么意义？ 69、Java 中有几种类型的流？ 70、写一个方法，输入一个文件名和一个字符串，统计这个字符串在这个文件中出现的次数。 71、如何用 Java 代码列出一个目录下所有的文件&a…...

编程日记 2023/10/19 5:52:06

UVa658 It’s not a Bug, it’s a Feature!(Dijkstra)

题意给出一个包含n个bug的应用程序，以及m个补丁，每个补丁使用两个字符串表示，第一个串表示补丁针对bug的情况，即哪些bug存在，以及哪些bug不存在，第二个串表示补丁对bug的修复情况，即修复了哪些…...

编程日记 2023/10/19 5:51:05

Object 类常用方法

在Java中，java.lang.Object类是所有类的根类，因此所有对象都继承了Object类的方法。以下是Object类中一些常用的方法： equals(Object obj)： 用于比较两个对象是否相等。默认实现是比较对象的引用是否相同，但通常需要…...

编程日记 2023/10/19 5:50:04

chromium 52 chrome 各个版本发布功能列表(58-84)

chromium Features 58-84 From https://chromestatus.com/features chromium58 Features:41 ‘allow-top-navigation-by-user-activation’ <iframe sandbox> keyword Adds a new keyword named “allow-top-navigation-by-user-activation” for iframe sandbox, wh…...

编程日记 2023/10/19 5:49:03

接口测试中缓存处理策略

在接口测试中，缓存处理策略是一个关键环节，直接影响测试结果的准确性和可靠性。合理的缓存处理策略能够确保测试环境的一致性，避免因缓存数据导致的测试偏差。以下是接口测试中常见的缓存处理策略及其详细说明： 一、缓存处理的核…...

编程新知 2025/10/19 2:15:37

装饰模式（Decorator Pattern）重构java邮件发奖系统实战

前言现在我们有个如下的需求，设计一个邮件发奖的小系统， 需求 1.数据验证 → 2. 敏感信息加密 → 3. 日志记录 → 4. 实际发送邮件装饰器模式（Decorator Pattern）允许向一个现有的对象添加新的功能，同时又不改变其…...

编程新知 2025/10/17 5:24:47

Prompt Tuning、P-Tuning、Prefix Tuning的区别

一、Prompt Tuning、P-Tuning、Prefix Tuning的区别 1. Prompt Tuning（提示调优）核心思想：固定预训练模型参数，仅学习额外的连续提示向量（通常是嵌入层的一部分）。实现方式：在输入文本前添加可训练的连续向量（软提示），模型只更新这些提示参数。优势：参数量少（仅提…...

编程新知 2025/10/19 11:01:36