当前位置：首页 > news >正文

python 不相交集简介（并查集算法）【Introduction to Disjoint Set (Union-Find Algorithm)】

news 2025/7/7 16:57:38

什么是不相交集数据结构？

如果两个集合没有任何共同元素，则它们被称为不相交集，集合的交集为空集。

存储不重叠或不相交元素子集的数据结构称为不相交集合数据结构。不相交集合数据结构支持以下操作：

1、将新集合添加到不相交集合中。

2、使用联合操作将不相交集合并为单个不相交集。

3、使用“查找”操作查找不相交集的代表。

4、检查两个集合是否不相交。

考虑这样一个情况，有许多人需要执行以下任务：

1、添加新的友谊关系，即一个人 x 成为另一个人 y 的朋友，即向集合中添加新元素。

2、判断个体x 是否是个体 y 的朋友（直接或间接朋友）

例子：

我们有 10 个人，比如 a、b、c、d、e、f、g、h、i、j

以下是需要添加的关系：
a <-> b
b <-> d
c <-> f
c <-> i
j <-> e
g <-> j

给定查询，例如 a 是否是 d 的朋友。我们基本上需要创建以下 4 个组，并在组项之间保持快速访问的连接：
G1 = {a, b, d}
G2 = {c, f, i}
G3 = {e, g, j}
G4 = {h}

判断 x 和 y 是否属于同一组，即判断 x 和 y 是否是直接/间接朋友。

根据个体所属的组别，将个体划分为不同的集合。此方法称为不相交集合并集，它维护不相交集合的集合，每个集合由其成员之一表示。

要回答上述问题，需要考虑两个关键点：

1、如何解析集合？最初，所有元素都属于不同的集合。在处理给定的关系后，我们选择一个成员作为代表。选择代表的方法有很多种，一种简单的方法是选择最大的索引。

2、检查两个人是否在同一组中？如果两个人的代表相同，那么他们就会成为朋友。

使用的数据结构包括：

数组：整数数组称为Parent[]。如果我们处理N 个项目，则数组的第 i 个元素代表第 i 个项目。更准确地说，Parent[] 数组的第 i 个元素是第 i 个项目的父级。这些关系创建一个或多个虚拟树。

树：它是一个不相交集。如果两个元素在同一棵树中，那么它们就在同一个不相交集。每棵树的根节点（或最顶端节点）称为集合的代表。每个集合始终有一个唯一的代表。识别代表的一个简单规则是，如果“i”是集合的代表，则Parent[i] = i。如果 i 不是其集合的代表，则可以通过沿树向上移动直到找到代表来找到它。

不相交集合数据结构上的操作：
查找
联合

1. 查找：
可以通过递归遍历父数组直到找到其自身的父节点来实现。

# Finds the representative of the set
# that i is an element of

def find(i):

# If i is the parent of itself
if (parent[i] == i):

# Then i is the representative of
# this set
return i
else:

# Else if i is not the parent of
# itself, then i is not the
# representative of his set. So we
# recursively call Find on its parent
return find(parent[i])

# The code is contributed by Nidhi goel

时间复杂度：这种方法效率低下，在最坏的情况下可能需要 O(n) 时间。

2. 联合：

它以两个元素作为输入，并使用查找操作找到它们的集合的代表，最后将其中一棵树（代表集合）放在另一棵树的根节点下。

# Unites the set that includes i
# and the set that includes j

def union(parent, rank, i, j):
# Find the representatives
# (or the root nodes) for the set
# that includes i
irep = find(parent, i)

# And do the same for the set
# that includes j
jrep = find(parent, j)

# Make the parent of i’s representative
# be j’s representative effectively
# moving all of i’s set into j’s set)

parent[irep] = jrep

时间复杂度：这种方法效率低下，在最坏的情况下可能导致长度为 O（n）的树。

优化（按等级/大小合并和路径压缩）：

效率在很大程度上取决于哪棵树连接到另一棵树。有两种方法可以实现。第一种是按等级联合，它将树的高度视为一个因素；第二种是按大小联合，它将树的大小视为一个因素，同时将一棵树连接到另一棵树。此方法与路径压缩一起提供了几乎恒定时间的复杂性。

路径压缩（对 Find() 的修改）：

它通过压缩树的高度来加速数据结构。这可以通过在Find操作中插入一个小的缓存机制来实现。查看代码了解更多详细信息：

# Finds the representative of the set that i
# is an element of.


def find(i):

# If i is the parent of itself
if Parent[i] == i:

# Then i is the representative
return i
else:

# Recursively find the representative.
result = find(Parent[i])

# We cache the result by moving i’s node
# directly under the representative of this
# set
Parent[i] = result

# And then we return the result
return result

# The code is contributed by Arushi Jindal.

时间复杂度：平均每次调用为 O(log n)。

按等级合并：

首先，我们需要一个新的整数数组，名为rank[] 。此数组的大小与父数组Parent[]相同。如果 i 代表一个集合，则rank[i]就是代表该集合的树的高度。现在回想一下，在 Union 操作中，将两棵树中的哪一棵移动到另一棵之下并不重要。现在我们要做的是最小化结果树的高度。如果我们要合并两棵树（或集合），我们将它们称为左和右，那么这一切都取决于左的等级和右的等级。

1、如果左边的等级小于右边的等级，那么最好将左边移到右边的下方，因为这不会改变右边的等级（而将右边移到左边的下方会增加高度）。同样，如果右边的等级小于左边的等级，那么我们应该将右边移到左边的下方。

2、如果等级相等，那么哪棵树位于另一棵树之下并不重要，但结果的等级始终比树的等级大一。

class DisjointSet:
def __init__(self, size):
self.parent = [i for i in range(size)]
self.rank = [0] * size

# Function to find the representative (or the root node) of a set
def find(self, i):
# If i is not the representative of its set, recursively find the representative
if self.parent[i] != i:
self.parent[i] = self.find(self.parent[i]) # Path compression
return self.parent[i]

# Unites the set that includes i and the set that includes j by rank
def union_by_rank(self, i, j):
# Find the representatives (or the root nodes) for the set that includes i and j
irep = self.find(i)
jrep = self.find(j)

# Elements are in the same set, no need to unite anything
if irep == jrep:
return

# Get the rank of i's tree
irank = self.rank[irep]

# Get the rank of j's tree
jrank = self.rank[jrep]

# If i's rank is less than j's rank
if irank < jrank:
# Move i under j
self.parent[irep] = jrep
# Else if j's rank is less than i's rank
elif jrank < irank:
# Move j under i
self.parent[jrep] = irep
# Else if their ranks are the same
else:
# Move i under j (doesn't matter which one goes where)
self.parent[irep] = jrep
# Increment the result tree's rank by 1
self.rank[jrep] += 1

def main(self):
# Example usage
size = 5
ds = DisjointSet(size)

# Perform some union operations
ds.union_by_rank(0, 1)
ds.union_by_rank(2, 3)
ds.union_by_rank(1, 3)

# Find the representative of each element
for i in range(size):
print(f"Element {i} belongs to the set with representative {ds.find(i)}")


# Creating an instance and calling the main method
ds = DisjointSet(size=5)
ds.main()

按大小合并：

同样，我们需要一个新的整数数组，名为size[] 。此数组的大小与父数组Parent[]相同。如果 i 代表一个集合，则size[i]是代表该集合的树中元素的数量。现在我们将两棵树（或集合）合并起来，我们将它们称为左树和右树，在这种情况下，一切都取决于左树（或集合）的大小和右树（或集合）的大小。

1、如果左边的尺寸小于右边的尺寸，那么最好将左边移到右边下方，并将右边的尺寸增加左边的尺寸。同样，如果右边的尺寸小于左边的尺寸，那么我们应该将右边移到左边下方，并将左边的尺寸增加右边的尺寸。

2、如果尺寸相等，那么哪棵树位于另一棵树下都没有关系。

# Python program for the above approach
class UnionFind:
def __init__(self, n):
# Initialize Parent array
self.Parent = list(range(n))

# Initialize Size array with 1s
self.Size = [1] * n

# Function to find the representative (or the root node) for the set that includes i
def find(self, i):
if self.Parent[i] != i:
# Path compression: Make the parent of i the root of the set
self.Parent[i] = self.find(self.Parent[i])
return self.Parent[i]

# Unites the set that includes i and the set that includes j by size
def unionBySize(self, i, j):
# Find the representatives (or the root nodes) for the set that includes i
irep = self.find(i)

# And do the same for the set that includes j
jrep = self.find(j)

# Elements are in the same set, no need to unite anything.
if irep == jrep:
return

# Get the size of i’s tree
isize = self.Size[irep]

# Get the size of j’s tree
jsize = self.Size[jrep]

# If i’s size is less than j’s size
if isize < jsize:
# Then move i under j
self.Parent[irep] = jrep

# Increment j's size by i's size
self.Size[jrep] += self.Size[irep]
# Else if j’s size is less than i’s size
else:
# Then move j under i
self.Parent[jrep] = irep

# Increment i's size by j's size
self.Size[irep] += self.Size[jrep]

# Example usage
n = 5
unionFind = UnionFind(n)

# Perform union operations
unionFind.unionBySize(0, 1)
unionFind.unionBySize(2, 3)
unionFind.unionBySize(0, 4)

# Print the representative of each element after unions
for i in range(n):
print("Element {}: Representative = {}".format(i, unionFind.find(i)))

# This code is contributed by Susobhan Akhuli

输出
元素 0：代表 = 0
元素 1：代表 = 0
元素 2：代表 = 2
元素 3：代表 = 2
元素 4：代表 = 0

时间复杂度：O（log n），无路径压缩。

下面是具有路径压缩和按等级合并的不相交集的完整实现。

# Python3 program to implement Disjoint Set Data
# Structure.

class DisjSet:
def __init__(self, n):
# Constructor to create and
# initialize sets of n items
self.rank = [1] * n
self.parent = [i for i in range(n)]


# Finds set of given item x
def find(self, x):

# Finds the representative of the set
# that x is an element of
if (self.parent[x] != x):

# if x is not the parent of itself
# Then x is not the representative of
# its set,
self.parent[x] = self.find(self.parent[x])

# so we recursively call Find on its parent
# and move i's node directly under the
# representative of this set

return self.parent[x]


# Do union of two sets represented
# by x and y.
def Union(self, x, y):

# Find current sets of x and y
xset = self.find(x)
yset = self.find(y)

# If they are already in same set
if xset == yset:
return

# Put smaller ranked item under
# bigger ranked item if ranks are
# different
if self.rank[xset] < self.rank[yset]:
self.parent[xset] = yset

elif self.rank[xset] > self.rank[yset]:
self.parent[yset] = xset

# If ranks are same, then move y under
# x (doesn't matter which one goes where)
# and increment rank of x's tree
else:
self.parent[yset] = xset
self.rank[xset] = self.rank[xset] + 1

# Driver code
obj = DisjSet(5)
obj.Union(0, 2)
obj.Union(4, 2)
obj.Union(3, 1)
if obj.find(4) == obj.find(0):
print('Yes')
else:
print('No')
if obj.find(1) == obj.find(0):
print('Yes')
else:
print('No')

# This code is contributed by ng24_7.

输出

Yes
No

时间复杂度：创建 n 个单项集的时间为 O(n)。两种技术（路径压缩和按等级/大小合并）的时间复杂度将达到接近常数时间。事实证明，最终的摊销时间复杂度为 O(α(n))，其中 α(n) 是逆阿克曼函数，其增长非常稳定（当 n<10 600 时，它甚至不会超过）。

空间复杂度： O（n），因为我们需要在不相交集数据结构中存储 n 个元素。

python 不相交集简介（并查集算法）【Introduction to Disjoint Set (Union-Find Algorithm)】

相关文章：

python 不相交集简介（并查集算法）【Introduction to Disjoint Set (Union-Find Algorithm)】

23种设计模式之工厂方法模式

Redis——事务

Redis非关系型数据库操作命令大全

基于SpringBoot+Vue+uniapp微信小程序的澡堂预订的微信小程序的详细设计和实现

Linux mips架构链接库函数调用plt表汇编代码分析

python 作业1

Apache 出现 “403 forbidden“ 排查方法

vue video播放m3u8监控视频

uniapp 获取签名证书 SHA1 自有证书签名打包

Open3d开发点云标注工具问题总结（二）

【FreeRTOS】

洛谷 P4995：跳跳！ ← 贪心算法

代理 IP 在 AI 爬虫中的关键应用

【Vercel】Vercel静态部署踩坑

【Spring】关于Spring中aware相关接口的作用

动态内存管理及RAII的简单应用

7、Vue2(一)

Chapter11

LLAMA2入门（一）-----预训练

调用支付宝接口响应40004 SYSTEM_ERROR问题排查

为什么需要建设工程项目管理？工程项目管理有哪些亮点功能？

React Native在HarmonyOS 5.0阅读类应用开发中的实践

Neo4j 集群管理：原理、技术与最佳实践深度解析

【RockeMQ】第2节｜RocketMQ快速实战以及核⼼概念详解（二）

大学生职业发展与就业创业指导教学评价

Unity | AmplifyShaderEditor插件基础（第七集：平面波动shader）

安宝特案例丨Vuzix AR智能眼镜集成专业软件，助力卢森堡医院药房转型，赢得辉瑞创新奖

宇树科技，改名了！

Kubernetes 网络模型深度解析：Pod IP 与 Service 的负载均衡机制，Service到底是什么？