A Survey on Mixture of Experts 混合专家模型综述(第二部分:混合专家系统设计)
A Survey on Mixture of Experts 混合专家模型综述
(第一部分:混合专家算法设计)
A Survey on Mixture of Experts
arxiv
github:A-Survey-on-Mixture-of-Experts-in-LLMs

5 System Design of Mixture of Experts
While Mixture of Experts (MoE) has been increasingly leveraged to enhance the capabilities of large language models, its adoption introduces new challenges to existing training and inference systems, due to the inherently sparse and dynamic nature of its computational workload. GShard [28] introduces expert parallelism that implements parallel gating and expert computation by dispatching partitioned local tokens with load balancing limit of expert capacity. Since then, expert parallelism has emerged as a fundamental strategy to facilitate efficient scaling of MoE models. This approach can be viewed as an augmentation of data parallelism [197], [198], [199], where each expert in an MoE layer is assigned to a distinct device, while all non-expert layers are duplicated across devices. As depicted in Figure 8(a), the process flow of expert parallelism consists of the following sequential operations: gate routing, input encode, All-to-All dispatch, expert computation, All-to-All combine, and output decode. In general, the input size for general matrix multiply (GEMM) needs to be large enough to achieve optimal utilization and throughput that computing device necessitates. Therefore, input encode is employed to aggregate the input tokens of a same expert into a contiguous memory space, as determined by the token-expert mapping from gate routing. Subsequently, the All-to-All dispatch is employed to send the input tokens to their corresponding experts across the distributed devices. Following the localized computation by the experts, the inverse process—All-to-All combine and output decode—reinstates the original data layout according to the gating indices.

Furthermore, the synergy of expert parallelism [36], [132], [135], [200], [201] with other existing parallel strategies (tensor [202], [203], [204], pipeline [205], [206], [207], sequence parallelism [208], [209], [210]) has been investigated to enhance the scalability and efficiency of MoE models in large-scale distributed environments. As shown in Figure 8, we illustrate several examples of hybrid parallelism, encompassing (b) data + expert + tensor parallelism [36], [66], [132], [135], [138], (c) data + expert + pipeline parallelism [132], [134], [138], and (d) expert + tensor parallelism [67]. It is imperative to recognize that the choice of distributed parallelism strategies influences a complex interplay between computation efficiency, communication overhead, memory occupation, potentially affected by various hardware configurations. Consequently, the deployment strategies for practical applications necessitate nuanced trade-offs and bespoke designs tailored to specific use-case scenarios.
In the subsequent discussion, we delineate the challenges introduced by MoE models from computation, communication, and storage aspects, concurrently reviewing existing research addressing these issues. Table 4 shows an overview of the open-source MoE frameworks.

尽管混合专家模型(Mixture of Experts, MoE)越来越多地被用于增强大语言模型的能力,但其固有的稀疏性和动态计算特性给现有的训练和推理系统带来了新的挑战。GShard [28]引入了专家并行性(expert parallelism),通过分发分区的本地 token 并限制专家容量来实现并行门控(parallel gating)和专家计算。自此,专家并行性成为促进 MoE 模型高效扩展的基本策略。这种方法可以被视为数据并行性(data parallelism) [197], [198], [199]的增强,其中 MoE 层中的每个专家被分配到不同的设备,而非专家层则在所有设备上复制。如图 8(a) 所示,专家并行性的流程包括以下顺序操作:门控路由(gate routing)、输入编码(input encode)、全局分发(All-to-All dispatch)、专家计算(expert computation)、全局聚合(All-to-All combine)和输出解码(output decode)。通常,通用矩阵乘法(General Matrix Multiply, GEMM)的输入需要足够大才能实现计算设备所需的最佳利用率和吞吐量。因此,输入编码用于将同一专家的输入 token 聚合到连续的内存空间中,这是由门控路由的 token-专家映射决定的。随后,全局分发用于将输入 token 发送到分布式设备上对应的专家。在专家完成本地计算后,逆过程——全局聚合和输出解码——根据门控索引恢复原始数据布局。
此外,专家并行性 [36], [132], [135], [200], [201]与其他现有并行策略(张量并行性(tensor parallelism) [202], [203], [204]、流水线并行性(pipeline parallelism) [205], [206], [207]、序列并行性(sequence parallelism) [208], [209], [210])的协同作用已被研究,以增强 MoE 模型在大规模分布式环境中的可扩展性和效率。如图 8 所示,我们展示了混合并行性(hybrid parallelism)的几个示例,包括 (b) 数据 + 专家 + 张量并行性 [36], [66], [132], [135], [138],(c) 数据 + 专家 + 流水线并行性 [132], [134], [138],以及 (d) 专家 + 张量并行性 [67]。必须认识到,分布式并行策略的选择会影响计算效率、通信开销和内存占用之间的复杂相互作用,这些因素可能受到各种硬件配置的影响。因此,实际应用的部署策略需要根据特定用例场景进行细致的权衡和定制设计。
在接下来的讨论中,我们从计算、通信和存储方面梳理了 MoE 模型引入的挑战,并回顾了解决这些问题的现有研究。表 4展示了开源 MoE 框架的概述。
5.1 Computation
Despite MoE is designed to scale model parameters efficiently without increasing computational demand, it encounters challenges pertaining to computational efficiency. One concern is the imbalance of computational load across distributed devices employing expert parallelism, which incurs significant synchronization overhead as the system awaits the processing completion of the most heavily loaded expert. Such issues are typically addressed through algorithmic strategies, such as optimized gating mechanisms and expert capacity adjustments, as discussed in Section 4.1. Besides, solutions like SE-MoE [133], Tutel [132], FlexMoE [137] and SmartMoE [138] have introduced dynamic expert placement strategies to distribute the workload as equally as possible among devices. Additionally, FasterMoE [134] has implemented a novel dynamic shadowed expert strategy, replicating experts on multiple devices to mitigate severe load imbalance. These model placement related strategies impact both computation and communication efficiency.
Another concern is that MoE introduces additional computational overhead through operations including gate routing, input encode, and output decode. Unlike expert computations, which mirror operations in dense models and benefit from extensive optimization on prevalent hardware such as GPUs, these MoE operations are characterized by redundant computation and memory movement, resulting in low efficiency on computing devices. Therefore, recent studies like DeepSpeed-MoE [66], FastMoE [131], HetuMoE [136] and Tutel [132] have focused on the development of tailored GPU kernels to enhance the efficiency of MoE operations.
In contexts where multiple experts are deployed on a single GPU device, MegaBlocks [139] reformulates MoE computation in terms of block-sparse operations, developing specialized block-sparse GPU kernels that efficiently handle the dynamic workloads without dropping tokens. Zheng et al. [141] propose PIT, a deep-learning compiler tailored for dynamic sparsity of MoE, which can find feasible PIT rules for all the operators within a model and generate optimized GPU kernels for them. PIT employs a novel tiling mechanism, utilizing the Permutation Invariant Transformation (PIT)—–a mathematically proven property—to transform multiple sparsely located micro-tiles into a GPU-efficient dense tile without changing the computation results, thereby achieving both high GPU utilization and low coverage waste. Despite these advancements, Tan et al. [140] highlight remaining optimization potential within current MoE frameworks such as MegaBlocks and PIT, which commence with an initial scatter-to-group data copy that increases memory footprint and requires a translation of the MoE problem into the sparse matrix format. Although this translation contributes minimally to computation overhead, it imposes limitations on the transparency and adaptability of extending MegaBlocks to modules beyond the FFN. To address these issues, Tan et al. [140] propose ScatterMoE, a MoE implementation designed to effectively minimize the memory footprint. ScatterMoE leverages ParallelLinear, a linear module capable of executing grouped matrix operations on scattered groups. This approach yields intermediate representations (e.g., the hidden states of an SMoE MLP) that are directly accessible as standard PyTorch tensors, allowing for easy extensions of MoE methods to other types of expert modules.
5.1 计算
尽管 MoE 旨在高效扩展模型参数而不增加计算需求,但它仍面临与计算效率相关的挑战。一个问题是使用专家并行性的分布式设备之间计算负载的不均衡,这会导致显著的同步开销,因为系统需要等待负载最重的专家完成处理。这些问题通常通过算法策略来解决,例如优化门控机制和专家容量调整,如第 4.1 节所述。此外,SE-MoE [133]、Tutel [132]、FlexMoE [137]和SmartMoE [138]等解决方案引入了动态专家放置策略,以尽可能均衡地分配设备间的工作负载。此外,FasterMoE [134]实现了一种新颖的动态影子专家策略(dynamic shadowed expert strategy),通过在多个设备上复制专家来缓解严重的负载不均衡。这些与模型放置相关的策略同时影响计算和通信效率。
另一个问题是,MoE 通过门控路由、输入编码和输出解码等操作引入了额外的计算开销。与专家计算(其操作与密集模型类似,并受益于 GPU 等硬件的广泛优化)不同,这些 MoE 操作的特点是冗余计算和内存移动,导致计算设备的效率低下。因此,DeepSpeed-MoE [66]、FastMoE [131]、HetuMoE [136]和Tutel [132]等最新研究专注于开发定制的 GPU 内核,以提高 MoE 操作的效率。
在多个专家部署在单个 GPU 设备上的场景中,MegaBlocks [139]将 MoE 计算重构为块稀疏操作(block-sparse operations),开发了专门的块稀疏 GPU 内核,能够高效处理动态工作负载而不会丢弃 token。Zheng 等人 [141]提出了PIT,这是一个针对 MoE 动态稀疏性定制的深度学习编译器,它可以为模型中的所有算子找到可行的PIT 规则并为其生成优化的 GPU 内核。PIT 采用了一种新颖的分块机制,利用置换不变变换(Permutation Invariant Transformation, PIT)——一种数学上证明的特性——将多个稀疏分布的微块转换为 GPU 高效的密集块,而不改变计算结果,从而实现高 GPU 利用率和低覆盖浪费。尽管取得了这些进展,Tan 等人 [140]指出,当前 MoE 框架(如MegaBlocks 和 PIT)仍存在优化潜力,这些框架从初始的分散到分组数据复制开始,增加了内存占用,并需要将 MoE 问题转换为稀疏矩阵格式。尽管这种转换对计算开销的贡献很小,但它限制了将MegaBlocks扩展到 FFN 之外模块的透明性和适应性。为了解决这些问题,Tan 等人 [140]提出了ScatterMoE,这是一种旨在有效最小化内存占用的 MoE 实现。ScatterMoE 利用了ParallelLinear,这是一个能够对分散的组执行分组矩阵运算的线性模块。这种方法生成的中间表示(例如 SMoE MLP 的隐藏状态)可以直接作为标准PyTorch 张量访问,从而可以轻松将 MoE 方法扩展到其他类型的专家模块。
5.2 Communication
In expert parallelism, the quadruple invocation of All-to-All communication during both the forward and backward propagation phases within each MoE layer causes a significant overhead, even emerging as the primary constraint on efficiency. The All-to-All communication paradigm encompasses both intra-node (via PCIe, pre-4th-generation NVLink) and inter-node (Ethernet, Infiniband, 4th-generation NVLink) communication channels. The efficiency of such communication is contingent upon a multitude of factors, including the heterogeneity of channel bandwidths, network topology, and the collective communication algorithms. Moreover, load imbalances intrinsic to MoE may exacerbate these inefficiencies by inducing synchronization delays.
To optimize the use of high intra-node bandwidth and low inter-node bandwidth, DeepSpeed-MoE [66], HetuMoE [136] and ScheMoE [147] have introduced hierarchical All-to-All communication strategies that enhance intra-node process and reduce inter-node data exchanges. Besides, FasterMoE [134], TA-MoE [143] and SE-MoE [133] have introduced topology-aware routing strategies aimed at mitigating cross-node expert selection, thereby reducing inter-node communication burdens. Additionally, ExFlow [142] exploits expert affinity, anticipating expert allocation across layers to maximize the retention of token processing within local GPU confines. The strategic allocation of experts to minimize network traffic and leverage high-bandwidth connections is a prevalent approach in distributed MoE system [66], [67], [135]. Moreover, this is often integrated with the placement design of non-expert modules to optimize overall system performance.
Given the concurrent feature of communication and computation, pipelining [205], [206], [207] is commonly employed to overlap their execution, thereby reducing the total time cost. This technique, which is integrated in systems such as Tutel [132], FasterMoE [134], PipeMoE [146] and MPipeMoE [144], orchestrates overlapping between All-to-All communication and expert computation. Notably, Lancet [145] underscores the inherent constraints of these pipelining methods, particularly the bounded duration for which expert computation and communication can overlap. To address this limitation, Lancet partitions non-MoE computations and integrates them into the pipeline during forward pass, and strategically schedules gradient weight computations to augment overlap in the backward pass. Punniyamurthy et al. [148] also emphasize the challenge posed by collective communications, which are often on the critical path, noting the difficulty of hiding their latency by overlapping kernel-granular communication and computation due to the absence of independent computation. Their solution involves fusing computation with dependent collective communication by leveraging GPU’s massive parallelism and GPU-initiated communication.
Aiming to break the inherent dependencies and thereby extend the overlap duration, ScMoE [110] restructures the MoE architecture to simultaneously process representations from preceding layers while engaging with current-layer representations. This decoupling of communication dependencies facilitates substantial, and in certain cases, complete overlapping between communication and computation. Snowflake Arctic [32] employs a similar design, utilizing a Dense-MoE hybrid transformer architecture to overlap communication with computation.
5.2 通信
在专家并行性中,每个 MoE 层的前向和反向传播阶段中All-to-All 通信的四重调用导致了显著的开销,甚至成为效率的主要限制因素。All-to-All 通信范式涵盖了节点内(通过 PCIe、第四代之前的 NVLink)和节点间(以太网、Infiniband、第四代 NVLink)的通信通道。这种通信的效率取决于多种因素,包括通道带宽的异构性、网络拓扑以及集体通信算法。此外,MoE 固有的负载不均衡可能通过同步延迟加剧这些低效问题。
为了优化高节点内带宽和低节点间带宽的使用,DeepSpeed-MoE [66]、HetuMoE [136]和ScheMoE [147]引入了分层 All-to-All 通信策略,以增强节点内处理并减少节点间数据交换。此外,FasterMoE [134]、TA-MoE [143]和SE-MoE [133]引入了拓扑感知路由策略,旨在减少跨节点专家选择,从而减轻节点间通信负担。另外,ExFlow [142]利用专家亲和性,预测跨层的专家分配,以最大化在本地 GPU 范围内保留 token 处理。将专家分配到最小化网络流量并利用高带宽连接是分布式 MoE 系统中的常见方法 [66], [67], [135]。此外,这通常与非专家模块的放置设计集成,以优化整体系统性能。
鉴于通信和计算的并发特性,流水线化 [205], [206], [207]通常被用于重叠它们的执行,从而减少总时间成本。这种技术在Tutel [132]、FasterMoE [134]、PipeMoE [146]和MPipeMoE [144]等系统中集成,协调了All-to-All 通信与专家计算之间的重叠。值得注意的是,Lancet [145]强调了这些流水线方法的固有约束,特别是专家计算和通信可以重叠的有限时间。为了解决这一限制,Lancet将非 MoE 计算分区并将其集成到前向传播的流水线中,并策略性地调度梯度权重计算以增加反向传播中的重叠。Punniyamurthy 等人 [148]也强调了集体通信带来的挑战,这些通信通常位于关键路径上,并指出由于缺乏独立计算,很难通过重叠内核粒度的通信和计算来隐藏其延迟。他们的解决方案是通过利用GPU 的大规模并行性和GPU 发起的通信,将计算与依赖的集体通信融合。
为了打破固有依赖关系并延长重叠时间,ScMoE [110]重构了 MoE 架构,以在处理当前层表示的同时处理来自前一层的表示。这种通信依赖关系的解耦促进了通信和计算之间的大幅重叠,在某些情况下甚至完全重叠。Snowflake Arctic [32]采用了类似的设计,利用Dense-MoE 混合 Transformer 架构来重叠通信与计算。
5.3 Storage
The ever-increasing parameters in MoE models exacerbate the constraints posed by memory capacity in compute devices, a challenge already pronounced in dense models. While expert parallelism offers a mitigation strategy through the distribution of experts across multiple devices, individual devices may still struggle to accommodate numerous experts, particularly in inference contexts where device capacity—–such as that of edge devices (PCs, smartphones, IoTs)–—is inherently more restricted.
Considering the hierarchical storage pyramid, solutions like SE-MoE [133], Pre-gated MoE [149], and EdgeMoE [150] selectively retain only essential non-expert parameters and the active expert parameters within the GPU’s HighBandwidth Memory (HBM), offloading inactive expert parameters to CPU memory or SSDs. These patterns incur additional overhead from data transfer across the storage hierarchy, thus they integrate expert selection forecasting and expert parameter prefetching techniques to overlap parameter access with computation.
In addition, MPipeMoE [144] introduces a strategy to reduce the memory overhead associated with activations and temporary buffers. This is achieved by sharing buffer for various partitions of tensors, while leveraging recomputation/communication and CPU offloading to recover the requisite activations in the backward pass.
5.3 存储
MoE 模型中不断增加的参数加剧了计算设备中内存容量的限制,这一挑战在密集模型中已经十分突出。虽然专家并行性通过将专家分布到多个设备提供了一种缓解策略,但单个设备可能仍然难以容纳多个专家,特别是在推理场景中,设备的容量(如边缘设备(PC、智能手机、IoT))本身更加有限。
考虑到分层存储金字塔,SE-MoE [133]、Pre-gated MoE [149]和EdgeMoE [150]等解决方案选择性地仅在 GPU 的高带宽内存(HBM)中保留必要的非专家参数和活跃的专家参数,将非活跃的专家参数卸载到 CPU 内存或 SSD 中。这些模式会因跨存储层次的数据传输而产生额外开销,因此它们集成了专家选择预测和专家参数预取技术,以重叠参数访问与计算。
此外,MPipeMoE [144]引入了一种策略,以减少与激活和临时缓冲区相关的内存开销。这是通过共享各种张量分区的缓冲区实现的,同时利用重计算/通信和CPU 卸载来恢复反向传播中所需的激活。
相关文章:
A Survey on Mixture of Experts 混合专家模型综述(第二部分:混合专家系统设计)
A Survey on Mixture of Experts 混合专家模型综述 (第一部分:混合专家算法设计) A Survey on Mixture of Experts arxiv github:A-Survey-on-Mixture-of-Experts-in-LLMs 5 System Design of Mixture of Experts While Mixture of Exper…...
docker python:latest镜像 允许ssh远程
跳转到家目录 cd创建pythonsshdockerfile mkdir pythonsshdockerfile跳转pythonsshdockerfile cd pythonsshdockerfile创建Dockerfile文件 vim Dockerfile将Dockerfile的指令复制到文件中 # 使用 python:latest 作为基础镜像 # 如果我的镜像列表中没有python:latest镜像&…...
通过 CSS 的 命名页面(Named Pages) 技术实现作用域隔离,实现 @page 样式仅影响当前组件
以下是实现 page 样式仅影响当前组件的完整解决方案,通过 CSS 的 命名页面(Named Pages) 技术实现作用域隔离: vue <template><div><button v-print"printOptions">打印当前报表</button><…...
Aim Robotics电动胶枪:机器人涂胶点胶的高效解决方案
在自动化和智能制造领域,机器人技术的应用越来越广泛,而涂胶和点胶作为生产过程中的重要环节,也逐渐实现了自动化和智能化。Aim Robotics作为一家专注于机器人技术的公司,其推出的电动胶枪为这一领域带来了高效、灵活且易于操作的…...
动态规划----完全平方数(3种写法,逐步简化)
题目链接:完全平方数 完全平方数可以认为是完全背包问题。每一个平方小于n的平方数都是物品,而完全平方数之和n就是背包容量。每一个平方和都可以无限次使用。 写法1:把所有小于n的平方数存入数组nums,使用二维dp数组。 递推公式的推导可以…...
C#中通过Response.Headers设置自定义参数
一、基础设置方法 1. 直接添加自定义头 // ASP.NET Core方案 Response.Headers.Append("X-API-Version", "2.3.1"); Response.Headers.Append("Custom-Auth-Token", Guid.NewGuid().ToString());• 底层原理:通过IHeaderDictionary…...
【HDLbits--分支预测器简单实现】
HDLbits--分支预测器简单实现 1 timer2.branche predicitors3.Branch history shift4.Branch direction predictor 以下是分支预测器的简单其实现; 1 timer 实现一个计时器,当load1’b1时,加载data进去,当load1’b0时进行倒计时&…...
LLM自动化评测
使用的数据集:ceval-exam import requests from datasets import load_dataset, concatenate_datasets import re from tqdm import tqdm import re, time, tiktoken, ollama from ollama import ChatResponse from ollama import Optionsdef llm(model, query, te…...
Linux--操作系统/进程
ok,我们今天学习linux中的操作系统和进程 1. 冯诺依曼体系 我们常⻅的计算机,如笔记本。我们不常⻅的计算机,如服务器,⼤部分都遵守冯诺依曼体系。 内存是CPU和外设之间的一个巨大的缓存! 截⾄⽬前,我们…...
MFC控件按钮的使用
MFC窗口的创建/消息映射机制 mfc.h #include<afxwin.h>//mfc头文件//应用程序类 class MyApp:public CWinApp //继承于应用程序类 { public://程序入口virtual BOOL InitInstance(); };//框架类 class MyFrame:public CFrameWnd { public:MyFrame();//声明宏 提供消息映…...
Java面试八股—Redis篇
一、Redis的使用场景 (一)缓存 1.Redis使用场景缓存 场景:缓存热点数据(如用户信息、商品详情),减少数据库访问压力,提升响应速度。 2.缓存穿透 正常的访问是:根据ID查询文章&…...
计算矩阵边缘元素之和(信息学奥赛一本通-1121)
【题目描述】 输入一个整数矩阵,计算位于矩阵边缘的元素之和。所谓矩阵边缘的元素,就是第一行和最后一行的元素以及第一列和最后一列的元素。 【输入】 第一行分别为矩阵的行数m和列数n(m<100,n<100),…...
Web后端开发之Maven
Maven Mven是apache旗下的一个开源项目,用来管理和构建java项目的工具。 通过一小段描述信息来管理项目。 Maven的作用 1.依赖管理:方便快捷的管理项目依赖的资源(jar包),避免版本冲突问题 以前用某个jar包需要下载…...
哈希算法,蓝桥杯java备战中
哈希表的实现 核心思路 目标:实现一个基于开放寻址法(线性探测)的哈希表,支持插入元素 I x 和查询元素 Q x 两种操作。 核心逻辑: 哈希函数:将元素映射到固定范围的索引(哈希值)。…...
there are no enabled repos
我做了两个操作 第一个操作: 1.先在本地电脑,也就是在我们电脑的桌面上下载 https://repo.huaweicloud.com/repository/conf/CentOS-7-reg.repo 2.在CentOS 创建etc文件夹 3在etc文件夹内创建yum.repos.d文件夹 4.将下载好的repo 黏贴到yum.repos.d…...
OpenEuler-22.03-LTS上利用Ansible轻松部署MySQL 5.7
一、需求 使用ansible自动化部署mysql二进制部署mysql部署mysql并创建JDBC用户 二、环境信息 本文涉及的代码,配置文件地址: 链接:百度网盘 请输入提取码 提取码:1g6y 软件名称版本备注Ansible2.9.27All modules — Ansible Doc…...
前端无限滚动内容自动回收技术详解:原理、实现与优化
文章目录 一、核心需求与技术挑战1.1 无限滚动的问题症结1.2 自动回收的三大目标 二、技术实现原理2.1 虚拟滚动核心机制2.2 关键技术指标 三、完整实现方案3.1 基础HTML结构3.2 CSS关键样式3.3 JavaScript核心逻辑3.3.1 滚动控制器3.3.2 动态尺寸处理 四、性能优化策略4.1 内存…...
LeetCode hot 100 每日一题(9)——560. 和为 K 的子数组
这是一道难度为中等的题目,让我们来看看题目描述: 给你一个整数数组 nums 和一个整数 k ,请你统计并返回 该数组中和为 k 的子数组的个数 。 子数组是数组中元素的连续非空序列。 示例 1: 输入: nums [1,1,1], k 2 输…...
C++Primer学习(6.7 函数指针——难!)
6.7 函数指针 (这一章节比较难) 函数指针指向的是函数而非对象。和其他指针一样,函数指针指向某种特定类型。函数的类型由它的返回类型和形参类型共同决定,与函数名无关。例如: //比较两个 string 对象的长度 bool lengthCompare(const string &,co…...
单一责任原则在Java设计模式中的深度解析
在软件开发中,设计模式提供了一种解决特定问题的思路。在众多的设计原则中,单一责任原则(Single Responsibility Principle,SRP)是一个非常重要的概念。它主要强调一个类应该只有一个责任,也就是说…...
如何在Ubuntu上构建编译LLVM和ISPC,以及Ubuntu上ISPC的使用方法
之前一直在 Mac 上使用 ISPC,奈何核心/线程太少了。最近想在 Ubuntu 上搞搞,但是 snap 安装的 ISPC不知道为什么只能单核,很奇怪,就想着编译一下,需要 Clang 和 LLVM。但是 Ubuntu 很搞,他的很多软件版本是…...
学习计划:第四阶段(第十周)
目录 第四阶段:特殊方法与高级特性 第 10 周:综合复习与实践 周一 周二 周三 周四 周五 总结 一、项目设计与实现 二、问题与解决 三、学习成果 四、后续展望 第四阶段:特殊方法与高级特性 第 10 周:综合复习与实践 …...
如何查看redis的缓存时间
要查看 Redis 缓存的时间,有下列两种方式: 一、使用 TTL 命令来获取缓存剩余的时间 Redis提供了多个命令来查看缓存数据的时间戳,其中最常用的命令是ttl和pttl。 ttl它返回的是以秒为单位的时间,表示 key 距离过期的时间还有多久…...
每日学习Java之一万个为什么
JVM的加载过程 启动阶段:启动JVM实例,设置初始配置参数,加载核心类库如java.lang类加载器:自举加载器,扩展加载器,系统加载器,自定义加载器。分别负责- 1.核心类库rt.jar等 2.扩展目录下的类库…...
【MySQL】表的约束(上)
文章目录 表的约束什么是表的约束空属性默认值列描述(comment)零填充(zerofill)主键 总结 表的约束 什么是表的约束 表的约束(Constraints)是数据库表中的规则,用于限制存储的数据,…...
静态分析技术:Jadx-GUI高级用法与模式识别
1. 深度反编译策略 1.1 多层级反混淆方案 代码恢复流程: graph TD A[混淆代码] --> B{符号恢复} B -->|字典匹配| C[变量重命名] B -->|类型推导| D[参数重构] C --> E[控制流优化] D --> E E --> F[语义化输出] 反混淆脚本示例&…...
30天学习Java第六天——Object类
Object类 java.lang.Object时所有类的超类。Java中所有类都实现了这个类中的方法。 toString方法 将Java对象转换成字符串的表示形式。 public String toString() {return getClass().getName() "" Integer.toHexString(hashCode()); }默认实现是:完…...
【C语言】编译和链接详解
hi,各位,让我们开启今日份博客~ 小编个人主页点这里~ 目录 一、翻译环境和运行环境1、翻译环境1.1预处理(预编译)1.2编译1.2.1词法分析1.2.2语法分析1.2.3语义分析 1.3汇编1.4链接 2.运行环境 一、翻译环境和运行环境 在ANSI C…...
Secs/Gem第一讲(基于secs4net项目的ChatGpt介绍)
后续内容为基于github上secs4net项目源码的ChatGpt介绍 以该项目为主,从零开始介绍讲解secs/gem,更多的以面试口吻讲述形式。 主要为个人学习,提升使用 🎓 第一讲:SECS/GEM 协议是个什么东西? Ὄ…...
DataWhale 速通AI编程开发:(基础篇)第1章 环境下载、安装与配置
课程地址:Datawhale-学用 AI,从此开始 vscode 更新为最新版 目前绝大多数deepseek非官方渠道均兼容openai的api格式,这里以硅基流动为例进行演示,其他非官方渠道同理。 点击链接注册账号之后,点击“实名认证“完成实名࿰…...
