当前位置：首页 > news >正文

C#，动态规划问题中基于单词搜索树（Trie Tree）的单词断句分词（ Word Breaker）算法与源代码

news 2026/5/31 8:59:56

1 分词

分词是自然语言处理的基础，分词准确度直接决定了后面的词性标注、句法分析、词向量以及文本分析的质量。英文语句使用空格将单词进行分隔，除了某些特定词，如how many，New York等外，大部分情况下不需要考虑分词问题。但有些情况下，没有空格，则需要好的分词算法。

简单的分词算法主要有：

2 正向最大匹配

从左到右尽可能划分出一段连续字符，使得其等于词典中的某个词，然后将这段连续字符提取出来，对余下的部分进行同样的操作。如果第一个字符不是词典中任何一个词的前缀，那么这个字符单独作为一个词。

3 逆向最大匹配

跟正向最大匹配的唯一不同是从右到左尽可能划分出一段连续字符。

4 双向最大匹配

歧义指对于一个句子有多个分词结果。汉语文本中 90.0%左右的句子，FMM 和 BMM 的切分完全重合且正确，9.0%左右的句子 FMM 和 BMM 切分不同，但其中必有一个是正确的(歧义检测成功)，只有不到1.0 %的句子，或者 FMM 和 BMM 的切分虽重合却是错的，或者FMM 和 BMM 切分不同但两个都不对(歧义检测失败)。

本文介绍了基于单词搜索树（Trie Tree）的单词断句分词（ Word Breaker）算法及其源代码。

5 节点信息

public class TrieNode
{public TrieNode[] children { get; set; } = new TrieNode[26];// isEndOfWord is true if the node represents// end of a wordpublic bool isEndOfWord { get; set; } = false;public TrieNode(){isEndOfWord = false;for (int i = 0; i < 26; i++){children[i] = null;}}
}

public class TrieNode
{
public TrieNode[] children { get; set; } = new TrieNode[26];

// isEndOfWord is true if the node represents
// end of a word
public bool isEndOfWord { get; set; } = false;

public TrieNode()
{
isEndOfWord = false;
for (int i = 0; i < 26; i++)
{
children[i] = null;
}
}
}

6 字典分词算法

using System;
using System.Text;namespace Legalsoft.Truffer.Algorithm
{public static class Trie_Tree_Word_Breaker{public static void Insert(TrieNode root, string key){TrieNode pCrawl = root;for (int i = 0; i < key.Length; i++){int index = key[i] - 'a';if (pCrawl.children[index] == null){pCrawl.children[index] = new TrieNode();}pCrawl = pCrawl.children[index];}pCrawl.isEndOfWord = true;}public static bool Search(TrieNode root, string key){TrieNode pCrawl = root;for (int i = 0; i < key.Length; i++){int index = key[i] - 'a';if (pCrawl.children[index] == null){return false;}pCrawl = pCrawl.children[index];}return (pCrawl != null && pCrawl.isEndOfWord);}public static bool Word_Break(string str, TrieNode root){int size = str.Length;if (size == 0){return true;}for (int i = 1; i <= size; i++){if (Search(root, str.Substring(0, i)) && Word_Break(str.Substring(i, size - i), root)){return true;}}return false;}public static string Drive(){string[] dictionary = {"mobile", "huawei","sam", "sung", "ma","mango", "icecream","and", "go", "i", "like","ice", "cream" };int n = dictionary.Length;TrieNode root = new TrieNode();// Construct triefor (int i = 0; i < n; i++){Insert(root, dictionary[i]);}StringBuilder sb = new StringBuilder();sb.AppendLine(Word_Break("ilikehuawei", root) + "<br>");sb.AppendLine(Word_Break("iiiiiiii", root) + "<br>");sb.AppendLine(Word_Break("", root) + "<br>");sb.AppendLine(Word_Break("ilikelikeimangoiii", root) + "<br>");sb.AppendLine(Word_Break("huaweiandmango", root) + "<br>");sb.AppendLine(Word_Break("huaweiandmangok", root) + "<br>");return sb.ToString();}}
}

using System;
using System.Text;

namespace Legalsoft.Truffer.Algorithm
{
public static class Trie_Tree_Word_Breaker
{
public static void Insert(TrieNode root, string key)
{
TrieNode pCrawl = root;

for (int i = 0; i < key.Length; i++)
{
int index = key[i] - 'a';
if (pCrawl.children[index] == null)
{
pCrawl.children[index] = new TrieNode();
}
pCrawl = pCrawl.children[index];
}

pCrawl.isEndOfWord = true;
}

public static bool Search(TrieNode root, string key)
{
TrieNode pCrawl = root;
for (int i = 0; i < key.Length; i++)
{
int index = key[i] - 'a';
if (pCrawl.children[index] == null)
{
return false;
}
pCrawl = pCrawl.children[index];
}
return (pCrawl != null && pCrawl.isEndOfWord);
}

public static bool Word_Break(string str, TrieNode root)
{
int size = str.Length;

if (size == 0)
{
return true;
}
for (int i = 1; i <= size; i++)
{
if (Search(root, str.Substring(0, i)) && Word_Break(str.Substring(i, size - i), root))
{
return true;
}
}

return false;
}

public static string Drive()
{
string[] dictionary = {
"mobile", "huawei",
"sam", "sung", "ma",
"mango", "icecream",
"and", "go", "i", "like",
"ice", "cream"
};

int n = dictionary.Length;
TrieNode root = new TrieNode();

// Construct trie
for (int i = 0; i < n; i++)
{
Insert(root, dictionary[i]);
}

StringBuilder sb = new StringBuilder();
sb.AppendLine(Word_Break("ilikehuawei", root) + " ");
sb.AppendLine(Word_Break("iiiiiiii", root) + " ");
sb.AppendLine(Word_Break("", root) + " ");
sb.AppendLine(Word_Break("ilikelikeimangoiii", root) + " ");
sb.AppendLine(Word_Break("huaweiandmango", root) + " ");
sb.AppendLine(Word_Break("huaweiandmangok", root) + " ");
return sb.ToString();
}
}
}

C#，动态规划问题中基于单词搜索树（Trie Tree）的单词断句分词（ Word Breaker）算法与源代码

1 分词

2 正向最大匹配

3 逆向最大匹配

4 双向最大匹配

5 节点信息

6 字典分词算法

相关文章：

C#，动态规划问题中基于单词搜索树（Trie Tree）的单词断句分词（ Word Breaker）算法与源代码

计算机网络（六）应用层

上海亚商投顾：沪指探底回升微涨机器人概念股午后爆发

conda相关操作

使用TCP协议实现智能聊天机器人

PHP二维数组去除重复值

2025年01月11日Github流行趋势

备战蓝桥杯队列和queue详解

IT面试求职系列主题-Jenkins

Vue篇-06

mysql binlog 日志分析查找

ubuntu 配置OpenOCD与RT-RT-thread环境的记录

双系统解决开机提示security Policy Violation的方法

附加共享数据库（ ATTACH DATABASE）的使用场景

matlab的绘图的标题中(title)添加标量以及格式化输出

2、第一个GO 程序

【Linux-多线程】-线程安全单例模式+可重入vs线程安全+死锁等

00000007_C语言设计模式

探索数据存储的奥秘：深入理解B树与B+树

Web渗透测试之XSS跨站脚本之JS输出以及什么是闭合标签一篇文章给你说明白

OpenClaw技能安装失败全解析：从依赖冲突到网络问题的系统性解决方案

用C语言解决‘换硬币’问题？我来教你如何调试和验证你的循环逻辑

SSH工具对比：新手用户和熟练运维，选型逻辑有什么不同

PDF 可视化签名盖章页技术解析

告别浪费！SolidWorks企业级共享方案，实现降本增效全攻略

LangGraph状态机工程：构建复杂AI工作流的完整指南

Facebook登录协议逆向解析：appsecret_proof与e2e加密机制

D2DX如何让暗黑破坏神2在4K显示器上流畅运行：5个关键技术解析

OpenCore Legacy Patcher完整指南：让老旧Mac焕发新生，运行最新macOS

开源合规生死线，DeepSeek协议识别错误率高达63%？2024企业级扫描避坑清单全公开