当前位置: 首页 > news >正文

大数据技术-Hadoop(四)Yarn的介绍与使用

目录

一、Yarn 基本结构

1、Yarn基本结构

2、Yarn的工作机制

二、Yarn常用的命令

三、调度器

1、Capacity Scheduler(容量调度器)

1.1、特点

1.2、配置

1.2.1、yarn-site.xml

1.2.2、capacity-scheduler.xml

 1.3、重启yarn、刷新队列 测试 向hive 队列提交数据

1.4、优先级配置

2、Fair Scheduler(公平调度器)

2.1、特点

2.2、配置

2.2.1、yarn-site.xml

 2.2.2、fair-scheduler.xml

 2.3、测试

四、Yarn的Tool接口案例

1、代码

2、测试

3、完整代码


一、Yarn 基本结构

1、Yarn基本结构

YARN主要由ResourceManager、NodeManager、ApplicationMaster和Container等组件构成。

ResourceManager(RM)主要作用

  • 处理客户端请求
  • 监控NodeManager
  • 启动监控ApplicationMaster
  • 资源的分配和调度(cpu, memory, disk, network)

NodeManager(NM)主要作用

  • 管理单个基点上的资源(cpu, memory, disk, network)
  • 处理来自ResourceManager的命令
  • 处理来自ApplicationMaster的命令

ApplicationMaster (Am)主要作用

  • 为应用程序申请资源并分配任务
  • 任务监控与容错

Container 主要作用

它基于资源容器的抽象概念,该容器包含内存、CPU、磁盘、网络等元素。

2、Yarn的工作机制

  1. MR程序提交到客户端所在的节点。
  2. YarnRunner向ResourceManager申请一个Application。
  3. RM将该应用程序的资源路径返回给YarnRunner。
  4. 该程序将运行所需资源提交到HDFS上。
  5. 程序资源提交完毕后,申请运行mrAppMaster。
  6. RM将用户的请求初始化成一个Task。
  7. 其中一个NodeManager领取到Task任务。
  8. 该NodeManager创建容器Container,并产生MRAppmaster。
  9. Container从HDFS上拷贝资源到本地。
  10. MRAppmaster向RM 申请运行MapTask资源。
  11. RM将运行MapTask任务分配给另外两个NodeManager,另两个NodeManager分别领取任务并创建容器。
  12. MR向两个接收到任务的NodeManager发送程序启动脚本,这两个NodeManager分别启动MapTask,MapTask对数据分区排序。
  13. MrAppMaster等待所有MapTask运行完毕后,向RM申请容器,运行ReduceTask。
  14. ReduceTask向MapTask获取相应分区的数据。
  15. 程序运行完毕后,MR会向RM申请注销自己。

二、Yarn常用的命令

# 列出运行的应用
yarn app -list 或者 yarn application -list#根据状态查询  ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED
yarn application -list -appStates ALL#查看日志
yarn logs -applicationId application_1735538046453_0007#停止任务
yarn app -kill application_1735538046453_0007 #查看尝试运行的任务
yarn applicationattempt -list application_1735538046453_0007#查看ApplicationAttemp状态
yarn applicationattempt -status appattempt_1735538046453_0007_000001 #这个id为attempt的id# 列出容器
yarn container -list appattempt_1735538046453_0007_000001#查看容器状态yarn container -status <container-id>#查看节点状态yarn node -list -all#刷新队列yarn rmadmin -refreshQueues#查看队列yarn queue -list all#查看队列状态 default为队列名称yarn queue -status default  更多命令 https://hadoop.apache.org/docs/r3.4.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html

三、调度器

1、Capacity Scheduler(容量调度器)

1.1、特点

  • 分层队列- 支持队列层次结构,以确保在允许其他队列使用可用资源之前,在组织的子队列之间共享资源,从而提供更多的控制和可预测性。
  • 容量保证- 队列分配到网格容量的一小部分,即一定容量的资源可供队列使用。提交到队列的所有应用程序都可以访问分配给该队列的容量。管理员可以对分配给每个队列的容量配置软限制和可选的硬限制。
  • 安全性- 每个队列都有严格的 ACL,控制哪些用户可以向各个队列提交申请。
  • 弹性- 可以将空闲资源分配给超出其容量的任何队列。
  • 多租户- 提供全面的限制,以防止单个应用程序、用户和队列垄断队列或整个集群的资源,以确保集群不会不堪重负。
  • 可操作性
    • 运行时配置 - 管理员可以在运行时以安全的方式更改队列定义和属性(例如容量、ACL),以尽量减少对用户的干扰
    • 清空应用程序 - 管理员可以在运行时停止队列,以确保在现有应用程序运行完成时,不会提交任何新应用程序。
  • 基于资源的调度- 支持资源密集型应用程序。
  • 基于默认或用户定义的放置规则的队列映射接口: 此功能允许用户根据某些默认放置规则将作业映射到特定队列。
  • 优先级调度- 此功能允许以不同的优先级提交和调度应用程序。整数值越高,应用程序的优先级越高。目前,应用程序优先级仅支持 FIFO 排序策略。
  • 百分比资源配置-管理员可以指定队列的资源百分比。
  • 绝对资源配置- 管理员可以指定队列的绝对资源,而不是提供基于百分比的值。这为管理员提供了更好的控制,可以配置给定队列所需的资源量。
  • 权重资源配置- 管理员可以指定队列的权重,而不是提供基于百分比的值。这为管理员提供了更好的控制,可以在动态变化的队列层次结构中为队列配置资源。
  • 通用容量向量资源配置- 管理员可以针对每个定义的资源类型使用绝对、权重或百分比模式以混合方式为队列指定资源。这为配置给定队列所需的资源量提供了最灵活的方式。
  • 动态自动创建和管理叶队列- 此功能支持自动创建叶队列以及队列映射,目前支持基于用户组的队列映射,以便将应用程序放置到队列。调度程序还支持根据父队列上配置的策略对这些队列进行容量管理。

1.2、配置

1.2.1、yarn-site.xml
<!-- 选择调度器,默认容量调度器 -->
<property><description>The class to use as the resource scheduler.</description><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property><!-- ResourceManager处理调度器请求的线程数量,默认50;如果提交的任务数大于50,可以增加该值,但是不能超过3台 * 4线程 = 12线程(去除其他应用程序实际不能超过8) -->
<property><description>Number of threads to handle scheduler interface.</description><name>yarn.resourcemanager.scheduler.client.thread-count</name><value>8</value>
</property><!-- 是否让yarn自动检测硬件进行配置,默认是false,如果该节点有很多其他应用程序,建议手动配置。如果该节点没有其他应用程序,可以采用自动 -->
<property><description>Enable auto-detection of node capabilities such asmemory and CPU.</description><name>yarn.nodemanager.resource.detect-hardware-capabilities</name><value>false</value>
</property><!-- 是否将虚拟核数当作CPU核数,默认是false,采用物理CPU核数 -->
<property><description>Flag to determine if logical processors(such ashyperthreads) should be counted as cores. Only applicable on Linuxwhen yarn.nodemanager.resource.cpu-vcores is set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true.</description><name>yarn.nodemanager.resource.count-logical-processors-as-cores</name><value>false</value>
</property><!-- 虚拟核数和物理核数乘数,默认是1.0 -->
<property><description>Multiplier to determine how to convert phyiscal cores tovcores. This value is used if yarn.nodemanager.resource.cpu-vcoresis set to -1(which implies auto-calculate vcores) andyarn.nodemanager.resource.detect-hardware-capabilities is set to true. The	number of vcores will be calculated as	number of CPUs * multiplier.</description><name>yarn.nodemanager.resource.pcores-vcores-multiplier</name><value>1.0</value>
</property><!-- NodeManager使用内存数,默认8G,修改为4G内存 -->
<property><description>Amount of physical memory, in MB, that can be allocated for containers. If set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true, it isautomatically calculated(in case of Windows and Linux).In other cases, the default is 8192MB.</description><name>yarn.nodemanager.resource.memory-mb</name><value>4096</value>
</property><!-- nodemanager的CPU核数,不按照硬件环境自动设定时默认是8个,修改为4个 -->
<property><description>Number of vcores that can be allocatedfor containers. This is used by the RM scheduler when allocatingresources for containers. This is not used to limit the number ofCPUs used by YARN containers. If it is set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true, it isautomatically determined from the hardware in case of Windows and Linux.In other cases, number of vcores is 8 by default.</description><name>yarn.nodemanager.resource.cpu-vcores</name><value>4</value>
</property><!-- 容器最小内存,默认1G -->
<property><description>The minimum allocation for every container request at the RM	in MBs. Memory requests lower than this will be set to the value of this	property. Additionally, a node manager that is configured to have less memory	than this value will be shut down by the resource manager.</description><name>yarn.scheduler.minimum-allocation-mb</name><value>1024</value>
</property><!-- 容器最大内存,默认8G,修改为2G -->
<property><description>The maximum allocation for every container request at the RM	in MBs. Memory requests higher than this will throw an	InvalidResourceRequestException.</description><name>yarn.scheduler.maximum-allocation-mb</name><value>2048</value>
</property><!-- 容器最小CPU核数,默认1个 -->
<property><description>The minimum allocation for every container request at the RM	in terms of virtual CPU cores. Requests lower than this will be set to the	value of this property. Additionally, a node manager that is configured to	have fewer virtual cores than this value will be shut down by the resource	manager.</description><name>yarn.scheduler.minimum-allocation-vcores</name><value>1</value>
</property><!-- 容器最大CPU核数,默认4个,修改为2个 -->
<property><description>The maximum allocation for every container request at the RM	in terms of virtual CPU cores. Requests higher than this will throw anInvalidResourceRequestException.</description><name>yarn.scheduler.maximum-allocation-vcores</name><value>2</value>
</property><!-- 虚拟内存检查,默认打开,修改为关闭 -->
<property><description>Whether virtual memory limits will be enforced forcontainers.</description><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value>
</property><!-- 虚拟内存和物理内存设置比例,默认2.1 -->
<property><description>Ratio between virtual memory to physical memory when	setting memory limits for containers. Container allocations are	expressed in terms of physical memory, and virtual memory usage	is allowed to exceed this allocation by this ratio.</description><name>yarn.nodemanager.vmem-pmem-ratio</name><value>2.1</value>
</property>
1.2.2、capacity-scheduler.xml
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<configuration><property><name>yarn.scheduler.capacity.maximum-applications</name><value>10000</value><description>Maximum number of applications that can be pending and running.</description></property><property><name>yarn.scheduler.capacity.maximum-am-resource-percent</name><value>0.1</value><description>Maximum percent of resources in the cluster which can be used to run application masters i.e. controls number of concurrent runningapplications.</description></property><property><name>yarn.scheduler.capacity.resource-calculator</name><value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value><description>The ResourceCalculator implementation to be used to compare Resources in the scheduler.The default i.e. DefaultResourceCalculator only uses Memory whileDominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc.</description></property><!--添加队列--><property><name>yarn.scheduler.capacity.root.queues</name><value>default,hive</value><description>The queues at the this level (root is the root queue).</description></property><!--层级队列--><!--  <property><name>yarn.scheduler.capacity.root.hive.queues</name><value>hive1,hive2</value><description>The queues at the this level (root is the root queue).</description></property>--><!--默认队列百分比--><property><name>yarn.scheduler.capacity.root.default.capacity</name><value>40</value><description>Default queue target capacity.</description></property><!-- 指定hive队列的资源额定容量 --><property><name>yarn.scheduler.capacity.root.hive.capacity</name><value>60</value></property><property><name>yarn.scheduler.capacity.root.default.user-limit-factor</name><value>1</value><description>Default queue user limit a percentage from 0.0 to 1.0.</description></property><!-- 用户最多可以使用队列多少资源,1表示 --><property><name>yarn.scheduler.capacity.root.hive.user-limit-factor</name><value>1</value><description>Default queue user limit a percentage from 0.0 to 1.0.</description></property><property><name>yarn.scheduler.capacity.root.default.maximum-capacity</name><value>60</value><description>The maximum capacity of the default queue. </description></property><!--最大容量比--><property><name>yarn.scheduler.capacity.root.hive.maximum-capacity</name><value>80</value><description>The maximum capacity of the default queue. </description></property><property><name>yarn.scheduler.capacity.root.default.state</name><value>RUNNING</value><description>The state of the default queue. State can be one of RUNNING or STOPPED.</description></property><!--hive 队列状态--><property><name>yarn.scheduler.capacity.root.hive.state</name><value>RUNNING</value><description>The state of the default queue. State can be one of RUNNING or STOPPED.</description></property><property><name>yarn.scheduler.capacity.root.default.acl_submit_applications</name><value>*</value><description>The ACL of who can submit jobs to the default queue.</description></property><!--*表示用户--><property><name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name><value>*</value><description>The ACL of who can submit jobs to the default queue.</description></property><property><name>yarn.scheduler.capacity.root.default.acl_administer_queue</name><value>*</value><description>The ACL of who can administer jobs on the default queue.</description></property><property><name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name><value>*</value><description>The ACL of who can administer jobs on the default queue.</description></property><property><name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name><value>*</value><description>The ACL of who can submit applications with configured priority.For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]</description></property><!--提交带有优先级的参数--><property><name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</name><value>*</value><description>The ACL of who can submit applications with configured priority.For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]</description></property><property><name>yarn.scheduler.capacity.root.default.maximum-application-lifetime</name><value>-1</value><description>Maximum lifetime of an application which is submitted to a queuein seconds. Any value less than or equal to zero will be considered asdisabled.This will be a hard time limit for all applications in thisqueue. If positive value is configured then any application submittedto this queue will be killed after exceeds the configured lifetime.User can also specify lifetime per application basis inapplication submission context. But user lifetime will beoverridden if it exceeds queue maximum lifetime. It is point-in-timeconfiguration.Note : Configuring too low value will result in killing applicationsooner. This feature is applicable only for leaf queue.</description></property><!--提交到队列的应用程序的最大生命周期(以秒为单位)--><property><name>yarn.scheduler.capacity.root.hive.maximum-application-lifetime</name><value>-1</value><description>Maximum lifetime of an application which is submitted to a queuein seconds. Any value less than or equal to zero will be considered asdisabled.This will be a hard time limit for all applications in thisqueue. If positive value is configured then any application submittedto this queue will be killed after exceeds the configured lifetime.User can also specify lifetime per application basis inapplication submission context. But user lifetime will beoverridden if it exceeds queue maximum lifetime. It is point-in-timeconfiguration.Note : Configuring too low value will result in killing applicationsooner. This feature is applicable only for leaf queue.</description></property><property><name>yarn.scheduler.capacity.root.default.default-application-lifetime</name><value>-1</value><description>Default lifetime of an application which is submitted to a queuein seconds. Any value less than or equal to zero will be considered asdisabled.If the user has not submitted application with lifetime value then thisvalue will be taken. It is point-in-time configuration.Note : Default lifetime can't exceed maximum lifetime. This feature isapplicable only for leaf queue.</description></property><!--提交到队列的应用程序的默认生命周期--><property><name>yarn.scheduler.capacity.root.hive.default-application-lifetime</name><value>-1</value><description>Default lifetime of an application which is submitted to a queuein seconds. Any value less than or equal to zero will be considered asdisabled.If the user has not submitted application with lifetime value then thisvalue will be taken. It is point-in-time configuration.Note : Default lifetime can't exceed maximum lifetime. This feature isapplicable only for leaf queue.</description></property><property><name>yarn.scheduler.capacity.node-locality-delay</name><value>40</value><description>Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers.When setting this parameter, the size of the cluster should be taken into account.We use 40 as the default value, which is approximately the number of nodes in one rack.Note, if this value is -1, the locality constraint in the container requestwill be ignored, which disables the delay scheduling.</description></property><property><name>yarn.scheduler.capacity.rack-locality-additional-delay</name><value>-1</value><description>Number of additional missed scheduling opportunities over the node-locality-delayones, after which the CapacityScheduler attempts to schedule off-switch containers,instead of rack-local ones.Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler willattempt rack-local assignments after 40 missed opportunities, and off-switch assignmentsafter 40+20=60 missed opportunities.When setting this parameter, the size of the cluster should be taken into account.We use -1 as the default value, which disables this feature. In this case, the numberof missed opportunities for assigning off-switch containers is calculated based onthe number of containers and unique locations specified in the resource request,as well as the size of the cluster.</description></property><property><name>yarn.scheduler.capacity.queue-mappings</name><value></value><description>A list of mappings that will be used to assign jobs to queuesThe syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*Typically this list will be used to map users to queues,for example, u:%user:%user maps all users to queues with the same nameas the user.</description></property><property><name>yarn.scheduler.capacity.queue-mappings-override.enable</name><value>false</value><description>If a queue mapping is present, will it override the value specifiedby the user? This can be used by administrators to place jobs in queuesthat are different than the one specified by the user.The default is false.</description></property><property><name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name><value>1</value><description>Controls the number of OFF_SWITCH assignments allowedduring a node's heartbeat. Increasing this value can improvescheduling rate for OFF_SWITCH containers. Lower values reduce"clumping" of applications on particular nodes. The default is 1.Legal values are 1-MAX_INT. This config is refreshable.</description></property><property><name>yarn.scheduler.capacity.application.fail-fast</name><value>false</value><description>Whether RM should fail during recovery if previous applications'queue is no longer valid.</description></property><property><name>yarn.scheduler.capacity.workflow-priority-mappings</name><value></value><description>A list of mappings that will be used to override application priority.The syntax for this list is[workflowId]:[full_queue_name]:[priority][,next mapping]*where an application submitted (or mapped to) queue "full_queue_name"and workflowId "workflowId" (as specified in application submissioncontext) will be given priority "priority".</description></property><property><name>yarn.scheduler.capacity.workflow-priority-mappings-override.enable</name><value>false</value><description>If a priority mapping is present, will it override the value specifiedby the user? This can be used by administrators to give applications apriority that is different than the one specified by the user.The default is false.</description></property>
</configuration>

 1.3、重启yarn、刷新队列 测试 向hive 队列提交数据

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar wordcount -D mapreduce.job.queuename=hive /input  /output11

1.4、优先级配置

yarn-site.xml

<property><name>yarn.cluster.max-application-priority</name><value>5</value>
</property>

重启 yarn 测试

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar pi 5 2000000hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar pi  -D mapreduce.job.priority=5 5 2000000

2、Fair Scheduler(公平调度器)

2.1、特点

  • 分层队列- 支持队列层次结构,以确保在允许其他队列使用可用资源之前,在组织的子队列之间共享资源,从而提供更多的控制和可预测性。
  • 容量保证- 队列分配到网格容量的一小部分,即一定容量的资源可供队列使用。提交到队列的所有应用程序都可以访问分配给该队列的容量。管理员可以对分配给每个队列的容量配置软限制和可选的硬限制。
  • 安全性- 每个队列都有严格的 ACL,控制哪些用户可以向各个队列提交申请。
  • 弹性- 可以将空闲资源分配给超出其容量的任何队列。
  • 多租户- 提供全面的限制,以防止单个应用程序、用户和队列垄断队列或整个集群的资源,以确保集群不会不堪重负。

与容量调度器不同的是调度策略不同,容量调度器优先选择资源利用率低的队列,公平调度器会优先选择对资源缺额比例大的。队列设置资源分配的方式不同,容量调度器为 FIFO(先进先出)和DRF(Dominant Resource Fairness Policy)。公平调度器为FifoPolicy、FairSharePolicy(默认)和 DominantResourceFairnessPolicy 。

2.2、配置

2.2.1、yarn-site.xml
<?xml version="1.0"?>
<!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file.
-->
<configuration><!-- Site specific YARN configuration properties --><!-- 指定MR走shuffle --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><!-- 指定ResourceManager的地址--><property><name>yarn.resourcemanager.hostname</name><value>hadoop2</value></property><!-- 环境变量的继承 --><property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value></property><!-- 开启日志聚集功能 --><property><name>yarn.log-aggregation-enable</name><value>true</value></property><!-- 设置日志聚集服务器地址 --><property><name>yarn.log.server.url</name><value>http://hadoop102:19888/jobhistory/logs</value></property><!-- 设置日志保留时间为7天 --><property><name>yarn.log-aggregation.retain-seconds</name><value>604800</value></property><!-- 选择调度器,默认容量调度器 --><property><description>The class to use as the resource scheduler.</description><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value></property><!-- ResourceManager处理调度器请求的线程数量,默认50;如果提交的任务数大于50,可以增加该值,但是不能超过3台 * 4线程 = 12线程(去除其他应用程序实际不能超过8) --><property><description>Number of threads to handle scheduler interface.</description><name>yarn.resourcemanager.scheduler.client.thread-count</name><value>8</value></property><!-- 是否让yarn自动检测硬件进行配置,默认是false,如果该节点有很多其他应用程序,建议手动配置。如果该节点没有其他应用程序,可以采用自动 --><property><description>Enable auto-detection of node capabilities such asmemory and CPU.</description><name>yarn.nodemanager.resource.detect-hardware-capabilities</name><value>false</value></property><!-- 是否将虚拟核数当作CPU核数,默认是false,采用物理CPU核数 --><property><description>Flag to determine if logical processors(such ashyperthreads) should be counted as cores. Only applicable on Linuxwhen yarn.nodemanager.resource.cpu-vcores is set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true.</description><name>yarn.nodemanager.resource.count-logical-processors-as-cores</name><value>false</value></property><!-- 虚拟核数和物理核数乘数,默认是1.0 --><property><description>Multiplier to determine how to convert phyiscal cores tovcores. This value is used if yarn.nodemanager.resource.cpu-vcoresis set to -1(which implies auto-calculate vcores) andyarn.nodemanager.resource.detect-hardware-capabilities is set to true. The	number of vcores will be calculated as	number of CPUs * multiplier.</description><name>yarn.nodemanager.resource.pcores-vcores-multiplier</name><value>1.0</value></property><!-- NodeManager使用内存数,默认8G,修改为4G内存 --><property><description>Amount of physical memory, in MB, that can be allocated for containers. If set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true, it isautomatically calculated(in case of Windows and Linux).In other cases, the default is 8192MB.</description><name>yarn.nodemanager.resource.memory-mb</name><value>4096</value></property><!-- nodemanager的CPU核数,不按照硬件环境自动设定时默认是8个,修改为4个 --><property><description>Number of vcores that can be allocatedfor containers. This is used by the RM scheduler when allocatingresources for containers. This is not used to limit the number ofCPUs used by YARN containers. If it is set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true, it isautomatically determined from the hardware in case of Windows and Linux.In other cases, number of vcores is 8 by default.</description><name>yarn.nodemanager.resource.cpu-vcores</name><value>4</value></property><!-- 容器最小内存,默认1G --><property><description>The minimum allocation for every container request at the RM	in MBs. Memory requests lower than this will be set to the value of this	property. Additionally, a node manager that is configured to have less memory	than this value will be shut down by the resource manager.</description><name>yarn.scheduler.minimum-allocation-mb</name><value>1024</value></property><!-- 容器最大内存,默认8G,修改为2G --><property><description>The maximum allocation for every container request at the RM	in MBs. Memory requests higher than this will throw an	InvalidResourceRequestException.</description><name>yarn.scheduler.maximum-allocation-mb</name><value>2048</value></property><!-- 容器最小CPU核数,默认1个 --><property><description>The minimum allocation for every container request at the RM	in terms of virtual CPU cores. Requests lower than this will be set to the	value of this property. Additionally, a node manager that is configured to	have fewer virtual cores than this value will be shut down by the resource	manager.</description><name>yarn.scheduler.minimum-allocation-vcores</name><value>1</value></property><!-- 容器最大CPU核数,默认4个,修改为2个 --><property><description>The maximum allocation for every container request at the RM	in terms of virtual CPU cores. Requests higher than this will throw anInvalidResourceRequestException.</description><name>yarn.scheduler.maximum-allocation-vcores</name><value>2</value></property><!-- 虚拟内存检查,默认打开,修改为关闭 --><property><description>Whether virtual memory limits will be enforced forcontainers.</description><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property><!-- 虚拟内存和物理内存设置比例,默认2.1 --><property><description>Ratio between virtual memory to physical memory when	setting memory limits for containers. Container allocations are	expressed in terms of physical memory, and virtual memory usage	is allowed to exceed this allocation by this ratio.</description><name>yarn.nodemanager.vmem-pmem-ratio</name><value>2.1</value></property><property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value><description>指定公平调度器</description></property><property><name>yarn.scheduler.fair.allocation.file</name><value>/usr/local/hadoop-3.4.0/etc/hadoop/fair-scheduler.xml</value><description>指明公平调度器队列分配配置文件</description></property><property><name>yarn.scheduler.fair.preemption</name><value>false</value><description>禁止队列间资源抢占</description></property>
</configuration>
 2.2.2、fair-scheduler.xml
<?xml version="1.0"?>
<allocations><!-- 单个队列中Application Master占用资源的最大比例,取值0-1 ,企业一般配置0.1 --><queueMaxAMShareDefault>0.5</queueMaxAMShareDefault><!-- 单个队列最大资源的默认值 test default --><queueMaxResourcesDefault>4096mb,4vcores</queueMaxResourcesDefault><!-- 增加一个队列test --><queue name="test"><!-- 队列最小资源 --><minResources>2048mb,2vcores</minResources><!-- 队列最大资源 --><maxResources>4096mb,4vcores</maxResources><!-- 队列中最多同时运行的应用数,默认50,根据线程数配置 --><maxRunningApps>4</maxRunningApps><!-- 队列中Application Master占用资源的最大比例 --><maxAMShare>0.5</maxAMShare><!-- 该队列资源权重,默认值为1.0 --><weight>1.0</weight><!-- 队列内部的资源分配策略 --><schedulingPolicy>fair</schedulingPolicy></queue><!-- 增加一个队列demo --><queue name="demo" type="parent"><!-- 队列最小资源 --><minResources>2048mb,2vcores</minResources><!-- 队列最大资源 --><maxResources>4096mb,4vcores</maxResources><!-- 队列中最多同时运行的应用数,默认50,根据线程数配置 --><maxRunningApps>4</maxRunningApps><!-- 该队列资源权重,默认值为1.0 --><weight>1.0</weight><!-- 队列内部的资源分配策略 --><schedulingPolicy>fair</schedulingPolicy></queue><!-- 任务队列分配策略,可配置多层规则,从第一个规则开始匹配,直到匹配成功 --><queuePlacementPolicy><!-- 提交任务时指定队列,如未指定提交队列,则继续匹配下一个规则; false表示:如果指定队列不存在,不允许自动创建--><rule name="specified" create="false"/><!-- 提交到root.group.username队列,若root.group不存在,不允许自动创建;若root.group.user不存在,允许自动创建 --><rule name="nestedUserQueue" create="true"><rule name="primaryGroup" create="false"/></rule><!-- 最后一个规则必须为reject或者default。Reject表示拒绝创建提交失败,default表示把任务提交到default队列 --><rule name="reject" /></queuePlacementPolicy>
</allocations>

 2.3、测试

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar pi -Dmapreduce.job.queuename=root.test 1 1#不指定队列,默认会创建以当前用户名为名称的队列执行任务
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar pi 1 1hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar wordcount -D mapreduce.job.queuename=root.test /input  /output11

四、Yarn的Tool接口案例

1、代码

package com.xiaojie.hadoop.mapreduce.yarntool.count;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;import java.io.IOException;public class WordCount implements Tool {private Configuration conf;@Overridepublic int run(String[] args) throws Exception {Job job = Job.getInstance(conf);job.setJarByClass(WordCountDriver.class);job.setMapperClass(WordCountMapper.class);job.setReducerClass(WordCountReducer.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));return job.waitForCompletion(true) ? 0 : 1;}@Overridepublic void setConf(Configuration conf) {this.conf = conf;}@Overridepublic Configuration getConf() {return conf;}public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {private Text outK = new Text();private IntWritable outV = new IntWritable(1);@Overrideprotected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {String line = value.toString();String[] words = line.split(" ");for (String word : words) {outK.set(word);context.write(outK, outV);}}}public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable outV = new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable value : values) {sum += value.get();}outV.set(sum);context.write(key, outV);}}
}
package com.xiaojie.hadoop.mapreduce.yarntool.count;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;import java.util.Arrays;public class WordCountDriver {private static Tool tool;public static void main(String[] args) throws Exception {// 1. 创建配置文件Configuration conf = new Configuration();// 2. 判断是否有tool接口switch (args[0]) {case "wordcount":tool = new WordCount();break;default:throw new RuntimeException(" No such tool: " + args[0]);}// 3. 用Tool执行程序// Arrays.copyOfRange 将老数组的元素放到新数组里面int run = ToolRunner.run(conf, tool, Arrays.copyOfRange(args, 1, args.length));System.exit(run);}
}

2、测试

yarn jar yarn-demo.jar com.xiaojie.hadoop.mapreduce.yarntool.count.WordCountDriver wordcount -Dmapreduce.job.queuename=root.test /input /output222

3、完整代码

spring-boot: Springboot整合redis、消息中间件等相关代码 - Gitee.com

相关文章:

大数据技术-Hadoop(四)Yarn的介绍与使用

目录 一、Yarn 基本结构 1、Yarn基本结构 2、Yarn的工作机制 二、Yarn常用的命令 三、调度器 1、Capacity Scheduler&#xff08;容量调度器&#xff09; 1.1、特点 1.2、配置 1.2.1、yarn-site.xml 1.2.2、capacity-scheduler.xml 1.3、重启yarn、刷新队列 测试 向hi…...

算法 class 004(选择,冒泡,插入)

选择排序&#xff1a; 刚进入 j 循环的样子 j 跳出循环后&#xff0c;b 指向最小值的坐标 然后交换 i 和 b 位置的 值 随后 i , b i , i j1; 开始新一轮的排序&#xff0c; void SelectAQort(int* arr,int size)//选择排序 {for (int i 0; i < size-1; i){ //i 的位置就是…...

linux---awk命令详细教程

awk是一种强大的编程语言&#xff0c;用于在Linux/Unix系统下对文本和数据进行处理。以下是对awk的详细教程&#xff1a; 一、awk简介 awk由Alfred Aho、Brian Kernighan和Peter Weinberger三人开发&#xff0c;其名称分别代表这三位作者姓氏的第一个字母。awk支持用户自定义…...

一个通用的居于 OAuth2的API集成方案

在现代 web 应用程序中&#xff0c;OAuth 协议是授权和认证的主流选择。为了与多个授权提供商进行无缝对接&#xff0c;我们需要一个易于扩展和维护的 OAuth 解决方案。本文将介绍如何构建一个灵活的、支持多提供商的 OAuth 系统&#xff0c;包括动态 API 调用、路径参数替换、…...

STM32配合可编程加密芯片SMEC88ST的防抄板加密方案设计

SMEC88ST SDK开发包下载 目前市场上很多嵌入式产品方案都是可以破解复制的&#xff0c;主要是因为方案主芯片不具备防破解的功能&#xff0c;这就导致开发者投入大量精力、财力开发的新产品一上市就被别人复制&#xff0c;到市场上的只能以价格竞争&#xff0c;最后工厂复制的产…...

QML学习(五) 做出第一个简单的应用程序

通过前面四篇对QML已经有了基本的了解&#xff0c;今天先尝试做出第一个单页面的桌面应用程序。 1.首先打开Qt,创建项目&#xff0c;选择“QtQuick Application - Empty” 空工程。 2.设置项目名称和项目代码存储路径 3.这里要注意选择你的编译器类型&#xff0c;以及输出的程…...

深入解析Android Framework中的android.location包:架构设计、设计模式与系统定制

深入解析Android Framework中的android.location包:架构设计、设计模式与系统定制 目录 引言android.location包概述核心类解析 LocationManagerLocationProviderLocationCriteriaGpsStatusGpsStatus.ListenerLocationListener位置服务的工作原理位置信息的获取与处理GPS状态…...

【C++11】类型分类、引用折叠、完美转发

目录 一、类型分类 二、引用折叠 三、完美转发 一、类型分类 C11以后&#xff0c;进一步对类型进行了划分&#xff0c;右值被划分纯右值(pure value&#xff0c;简称prvalue)和将亡值 (expiring value&#xff0c;简称xvalue)。 纯右值是指那些字面值常量或求值结果相当于…...

mongodb(6.0.15)安装注意事项,重装系统后数据恢复

window10系统 上周重装了系统&#xff0c;环境变量之类的都没有了。现在要恢复。 我电脑里之前的安装包没有删除&#xff08;虽然之前也没在C盘安装&#xff0c;但是找不到了&#xff0c;所以需要重新下载安装&#xff09;&#xff0c;长下图这样。这个不是最新版本&#xff0…...

union的实际使用

记录一下&#xff0c;免得忘记&#xff1a; 1、定义一个共用体变量 这里定义一个64位变量 i2creg_rev&#xff0c;然后通过共用体定义两个位变量bits和bits_reverse&#xff0c;通过bit可以访问指定位的值大小&#xff0c;不需要自己再左移右移转换。 bits_reverse是bits的对…...

EKF 自动匹配维度 MATLAB代码

该 M A T L A B MATLAB MATLAB代码实现了扩展卡尔曼滤波( E...

Oracle复合索引规则指南

在Oracle中可以创建组合索引&#xff0c;即同时包含两个或两个以上列的索引。在组合索引的使用方面&#xff0c;Oracle有以下特点&#xff1a; 1、 当使用基于规则的优化器&#xff08;RBO&#xff09;时&#xff0c;只有当组合索引的前导列出现在SQL语句的where子句中时&#…...

JS - Array Api

判断一个对象是否为数组 /* 语法&#xff1a; Array.isArray(object); 参数&#xff1a;object 必需&#xff0c;要测试的对象。返回值 如果 object 是数组&#xff0c;则为 true&#xff1b;否则为 false。 如果 object 参数不是对象&#xff0c;则返回 false。 */ 一、改…...

【JS】for-in 和 for-of遍历对象的区别

【介绍】 for-in 和 for-of 都是 JavaScript 中用于遍历数据结构的循环语句&#xff0c;但它们的工作原理和适用场景有所不同。特别是它们在遍历对象时的行为是不同的。 【区别】 for-in 遍历对象 for-in 是用于遍历对象的 可枚举属性的键名&#xff08;属性名&#xff09;…...

【每日学点鸿蒙知识】ets匿名类、获取控件坐标、Web显示iframe标签、软键盘导致上移、改变Text的背景色

1、HarmonyOS ets不支持匿名类吗&#xff1f; 不支持&#xff0c;需要显式标注对象字面量的类型&#xff0c;可以参考以下文档&#xff1a;https://developer.huawei.com/consumer/cn/doc/harmonyos-guides-V5/typescript-to-arkts-migration-guide-V5#%E9%9C%80%E8%A6%81%E6%…...

深度学习blog- 数学基础(全是数学)

矩阵‌&#xff1a;矩阵是一个二维数组&#xff0c;通常由行和列组成&#xff0c;每个元素可以通过行索引和列索引进行访问。 张量‌&#xff1a;张量是一个多维数组的抽象概念&#xff0c;可以具有任意数量的维度。除了标量&#xff08;0D张量&#xff09;、向量&#xff08;…...

最后100米配送

1. 项目概述 1.1 项目目标 集成无人机与电动车&#xff1a;设计并实现将无人机固定在电动车上&#xff0c;利用电动车的电源进行飞行&#xff0c;实现高楼内部从电动车位置到用户办公/居住地点的最后100米精准配送。低成本实现&#xff1a;通过利用电动车现有的电源和结构&am…...

Linux的进程替换以及基础IO

进程替换 上一篇草率的讲完了进程地址空间的组成结构和之间的关系&#xff0c;那么我们接下来了解一下程序的替换。 首先&#xff0c;在进程部分我们提过了&#xff0c;其实文件可以在运行时变成进程&#xff0c;而我们使用的Linux软件其实也是一个进程&#xff0c;所以进一步…...

《计算机网络A》单选题-复习题库

1. 计算机网络最突出的优点是&#xff08;D&#xff09; A、存储容量大B、将计算机技术与通信技术相结合C、集中计算D、资源共享 2. RIP 路由协议的最大跳数是&#xff08;C&#xff09; A、13B、14C、15D、16 3. 下面哪一个网络层次不属于 TCP/IP 体系模型&#xff08;D&a…...

闲谭Scala(2)--安装与环境配置

1. 概述 Java开发环境安装&#xff0c;需要两步&#xff0c;第一安装JDK&#xff0c;第二配置环境变量。 Scala的话&#xff0c;也是两步&#xff0c;第一安装Scale环境&#xff0c;第二配置环境变量。 需要注意的是&#xff0c;配置环境变量&#xff0c;主要是想让windows操…...

谷歌浏览器插件

项目中有时候会用到插件 sync-cookie-extension1.0.0&#xff1a;开发环境同步测试 cookie 至 localhost&#xff0c;便于本地请求服务携带 cookie 参考地址&#xff1a;https://juejin.cn/post/7139354571712757767 里面有源码下载下来&#xff0c;加在到扩展即可使用FeHelp…...

地震勘探——干扰波识别、井中地震时距曲线特点

目录 干扰波识别反射波地震勘探的干扰波 井中地震时距曲线特点 干扰波识别 有效波&#xff1a;可以用来解决所提出的地质任务的波&#xff1b;干扰波&#xff1a;所有妨碍辨认、追踪有效波的其他波。 地震勘探中&#xff0c;有效波和干扰波是相对的。例如&#xff0c;在反射波…...

论文解读:交大港大上海AI Lab开源论文 | 宇树机器人多姿态起立控制强化学习框架(二)

HoST框架核心实现方法详解 - 论文深度解读(第二部分) 《Learning Humanoid Standing-up Control across Diverse Postures》 系列文章: 论文深度解读 + 算法与代码分析(二) 作者机构: 上海AI Lab, 上海交通大学, 香港大学, 浙江大学, 香港中文大学 论文主题: 人形机器人…...

CVPR 2025 MIMO: 支持视觉指代和像素grounding 的医学视觉语言模型

CVPR 2025 | MIMO&#xff1a;支持视觉指代和像素对齐的医学视觉语言模型 论文信息 标题&#xff1a;MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output作者&#xff1a;Yanyuan Chen, Dexuan Xu, Yu Hu…...

【位运算】消失的两个数字(hard)

消失的两个数字&#xff08;hard&#xff09; 题⽬描述&#xff1a;解法&#xff08;位运算&#xff09;&#xff1a;Java 算法代码&#xff1a;更简便代码 题⽬链接&#xff1a;⾯试题 17.19. 消失的两个数字 题⽬描述&#xff1a; 给定⼀个数组&#xff0c;包含从 1 到 N 所有…...

大数据零基础学习day1之环境准备和大数据初步理解

学习大数据会使用到多台Linux服务器。 一、环境准备 1、VMware 基于VMware构建Linux虚拟机 是大数据从业者或者IT从业者的必备技能之一也是成本低廉的方案 所以VMware虚拟机方案是必须要学习的。 &#xff08;1&#xff09;设置网关 打开VMware虚拟机&#xff0c;点击编辑…...

Java - Mysql数据类型对应

Mysql数据类型java数据类型备注整型INT/INTEGERint / java.lang.Integer–BIGINTlong/java.lang.Long–––浮点型FLOATfloat/java.lang.FloatDOUBLEdouble/java.lang.Double–DECIMAL/NUMERICjava.math.BigDecimal字符串型CHARjava.lang.String固定长度字符串VARCHARjava.lang…...

uniapp微信小程序视频实时流+pc端预览方案

方案类型技术实现是否免费优点缺点适用场景延迟范围开发复杂度​WebSocket图片帧​定时拍照Base64传输✅ 完全免费无需服务器 纯前端实现高延迟高流量 帧率极低个人demo测试 超低频监控500ms-2s⭐⭐​RTMP推流​TRTC/即构SDK推流❌ 付费方案 &#xff08;部分有免费额度&#x…...

华为云Flexus+DeepSeek征文|DeepSeek-V3/R1 商用服务开通全流程与本地部署搭建

华为云FlexusDeepSeek征文&#xff5c;DeepSeek-V3/R1 商用服务开通全流程与本地部署搭建 前言 如今大模型其性能出色&#xff0c;华为云 ModelArts Studio_MaaS大模型即服务平台华为云内置了大模型&#xff0c;能助力我们轻松驾驭 DeepSeek-V3/R1&#xff0c;本文中将分享如何…...

管理学院权限管理系统开发总结

文章目录 &#x1f393; 管理学院权限管理系统开发总结 - 现代化Web应用实践之路&#x1f4dd; 项目概述&#x1f3d7;️ 技术架构设计后端技术栈前端技术栈 &#x1f4a1; 核心功能特性1. 用户管理模块2. 权限管理系统3. 统计报表功能4. 用户体验优化 &#x1f5c4;️ 数据库设…...