当前位置：首页 > news >正文

亚博microros小车-原生ubuntu支持系列：4-手部检测

news 2025/11/10 6:09:06

一准备工作

在学习手部检测之前，有2个准备工作。

1 确保小车的摄像头能显示画面

参见：亚博microros小车-原生ubuntu支持系列：2-摄像头控制-CSDN博客

启动图传代理：

docker run -it --rm -v /dev:/dev -v /dev/shm:/dev/shm --privileged --net=host microros/micro-ros-agent:humble udp4 --port 9999 -v4

2 消息接口已引用

bohu@bohu-TM1701:~/yahboomcar/yahboomcar_ws$ ros2 interface show yahboomcar_msgs/msg/PointArray 
geometry_msgs/Point[] pointsfloat64 xfloat64 yfloat64 z

二背景知识

以下摘自亚博学习资料：

1、简介

MediaPipe是⼀款由Google开发并开源的数据流处理机器学习应⽤开发框架。它是⼀个基于图的数据处理管线，⽤于构建使⽤了多种形式的数据源，如视频、⾳频、传感器数据以及任何时间序列数据。MediaPipe是跨平台的，可以运⾏在嵌⼊式平台(树莓派等)，移动设备(iOS和Android)，⼯作站和服务器上，并⽀持移动端GPU加速。 MediaPipe为实时和流媒体提供跨平台、可定制的ML解决⽅案。MediaPipe 的核⼼框架由 C++ 实现，并提供 Java 以及 Objective C 等语⾔的⽀持。MediaPipe 的主要概念包括数据包（Packet）、数据流（Stream）、计算单元（Calculator）、图（Graph）以及⼦图（Subgraph）。

MediaPipe的特点：

端到端加速：内置的快速ML推理和处理即使在普通硬件上也能加速。
⼀次构建，随时随地部署：统⼀解决⽅案适⽤于Android、iOS、桌⾯/云、web和物联⽹。
即⽤解决⽅案：展⽰框架全部功能的尖端ML解决⽅案。
免费开源：Apache2.0下的框架和解决⽅案，完全可扩展和定制。

2、MediaPipe Hands

MediaPipe Hands是⼀款⾼保真的⼿和⼿指跟踪解决⽅案。它利⽤机器学习（ML）从⼀帧中推断出21个⼿的3D坐标。

在对整个图像进⾏⼿掌检测后，根据⼿部标记模型通过回归对检测到的⼿区域内的21个3D⼿关节坐标进⾏精确的关键点定位，即直接坐标预测。该模型学习⼀致的内部⼿姿势表⽰，甚⾄对部分可⻅的⼿和⾃我遮挡也具有鲁棒性。

为了获得地⾯真实数据，⽤了21个3D坐标⼿动注释了约30K幅真实世界的图像，如下所⽰（从图像深度图中获取Z值，如果每个对应坐标都有Z值）。为了更好地覆盖可能的⼿部姿势，并对⼿部⼏何体的性质提供额外的监督，还绘制了各种背景下的⾼质量合成⼿部模型，并将其映射到相应的3D坐标。

看完这个大概有个了解，并不直观，然后就是代码了，这种对于新人并不友好。我们在连接小车的摄像头之前，先跟网上的大佬体验下笔记本自带摄像头的手部识别。

MediaPipe一些函数：来自：学习 MediaPipe 手部检测和手势识别（1）_mediapipe.solutions.hands-CSDN博客

hand初始化参数

static_image_mode：静态图片输入模式，默认值为 False。是否将输入图片视为一批不相关的静态图片。
max_num_hands：识别手掌的最大数目，默认值为 2。
model_complexity：模型复杂度，默认值为 1，取值 0/1。值越大，模型越复杂，识别越精确，耗时越久。
min_detection_confidence：最低检测置信度，默认值为 0.5，取值 0.0 ~ 1.0。值越大，对手掌筛选越精确，越难识别出手掌，反之越容易误识别。
min_tracking_confidence：最低追踪置信度，默认值为 0.5，取值 0.0 ~ 1.0。值越大，对手掌追踪筛选越精确，越容易跟丢手掌，反之越容易误识别。

cvtColor 方法将我们的框架从BGR 重新着色到RGB 。默认情况下，OpenCV将图像颜色的格式设置为BGR 。我们需要将其设置为RGB ，因为那是mediapipe接受的格式。

process 检测

输入：RGB格式的数组

输出：multi_hand_landmarks：每只手的关节点坐标。

multi_handedness：每只手的手性（左/右手）。

解析 multi_hand_landmarks，返回的坐标值为相对图片的归一化后的坐标。

landmark {
x: 0.280276567
y: 0.531350315
z: 0.00314787566
}

解析 multi_handedness，返回：序号、置信度、手性。

classification {
index: 0
score: 0.994448185
label: "Left"
}

函数 draw_landmarks

在图片中绘制关节点和骨骼，接收6个输入参数：

    image：BGR 三通道 numpy 数组；
    landmark_list：需要标注在图片上的、标准化后的关节点列表（landmark_pb2.NormalizedLandmarkList）；
    connections：关节点索引列表，指定关节点连接方式，默认值为 None，不绘制；
    landmark_drawing_spec：指定关节点的绘图设定，输入可以是 DrawingSpec 或者 Mapping[int, DrawingSpec]，传入 None 时不绘制关节点，默认值为 DrawingSpec(color=RED_COLOR)；
    connection_drawing_spec：指定骨骼的绘制设定，输入可以是 DrawingSpec 或者 Mapping[int, DrawingSpec]，传入 None 时不绘制关节点，默认值为 DrawingSpec()；
    is_drawing_landmarks：是否绘制关节点，默认值为 True。

测试代码：来自手部21个关键点检测+手势识别-[MediaPipe]_手部关键点检测-CSDN博客

import cv2
import mediapipe as mpmp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
#手部模型
hands = mp_hands.Hands(static_image_mode=False,max_num_hands=2,min_detection_confidence=0.75,min_tracking_confidence=0.75)cap = cv2.VideoCapture(0)#打开默认摄像头
while True:ret,frame = cap.read()#读取一帧图像#图像格式转换frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# 因为摄像头是镜像的，所以将摄像头水平翻转# 不是镜像的可以不翻转frame= cv2.flip(frame,1)#输出结果results = hands.process(frame)frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)if results.multi_handedness:for hand_label in results.multi_handedness:print(hand_label)if results.multi_hand_landmarks:for hand_landmarks in results.multi_hand_landmarks:print(f'hand_landmarks:{hand_landmarks}' )# 关键点可视化mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)cv2.imshow('MediaPipe Hands', frame)if cv2.waitKey(1) & 0xFF == 27:break
cap.release()

效果：

看到这里，就对手部检测有个基本的了解。

接下来，看看亚博的小车如何在捕获摄像头画面做手部检测的。

三手部检测代码

src/yahboom_esp32_mediapipe/yahboom_esp32_mediapipe/目录下01_HandDetector.py

import rclpy
import time
import mediapipe as mp
import cv2 as cv
from rclpy.node import Node
from geometry_msgs.msg import Point
from cv_bridge import CvBridge
from sensor_msgs.msg import Image, CompressedImagefrom rclpy.time import Time
import datetime#import define msg
from yahboomcar_msgs.msg import PointArray
import numpy as np
print("import done")class HandDetector(Node):def __init__(self, name, mode=False, maxHands=2, detectorCon=0.5, trackCon=0.5):super().__init__(name)self.mpHand = mp.solutions.handsself.mpDraw = mp.solutions.drawing_utils#手部检测模型self.hands = self.mpHand.Hands(static_image_mode=mode,max_num_hands=maxHands,min_detection_confidence=detectorCon,min_tracking_confidence=trackCon)#在图片或视频中绘制出姿态关键点样式：线宽、颜色self.lmDrawSpec = mp.solutions.drawing_utils.DrawingSpec(color=(0, 0, 255), thickness=-1, circle_radius=6)self.drawSpec = mp.solutions.drawing_utils.DrawingSpec(color=(0, 255, 0), thickness=2, circle_radius=2)# create a publisherself.pub_point = self.create_publisher(PointArray, '/mediapipe/points', 1000)#瘦不检测def pubHandsPoint(self, frame, draw=True):pointArray = PointArray()img = np.copy(frame)#图片格式转换img_RGB = cv.cvtColor(frame, cv.COLOR_BGR2RGB)#进行检测self.results = self.hands.process(img_RGB)if self.results.multi_hand_landmarks:#关键点处理for i in range(len(self.results.multi_hand_landmarks)):if draw: #关键点输出self.mpDraw.draw_landmarks(frame, self.results.multi_hand_landmarks[i], self.mpHand.HAND_CONNECTIONS, self.lmDrawSpec, self.drawSpec)self.mpDraw.draw_landmarks(img, self.results.multi_hand_landmarks[i], self.mpHand.HAND_CONNECTIONS, self.lmDrawSpec, self.drawSpec)for id, lm in enumerate(self.results.multi_hand_landmarks[i].landmark):point = Point()point.x, point.y, point.z = lm.x, lm.y, lm.zpointArray.points.append(point)self.pub_point.publish(pointArray) #发布关键点话题return frame, imgdef frame_combine(slef, frame, src):if len(frame.shape) == 3:frameH, frameW = frame.shape[:2]srcH, srcW = src.shape[:2]dst = np.zeros((max(frameH, srcH), frameW + srcW, 3), np.uint8)dst[:, :frameW] = frame[:, :]dst[:, frameW:] = src[:, :]else:src = cv.cvtColor(src, cv.COLOR_BGR2GRAY)frameH, frameW = frame.shape[:2]imgH, imgW = src.shape[:2]dst = np.zeros((frameH, frameW + imgW), np.uint8)dst[:, :frameW] = frame[:, :]dst[:, frameW:] = src[:, :]return dstclass MY_Picture(Node):def __init__(self, name):super().__init__(name)self.bridge = CvBridge()self.sub_img = self.create_subscription(CompressedImage, '/espRos/esp32camera', self.handleTopic, 1) #获取esp32传来的图像self.last_stamp = Noneself.new_seconds = 0self.fps_seconds = 1self.hand_detector = HandDetector('hand_detector')#图像回调函数def handleTopic(self, msg):self.last_stamp = msg.header.stamp  if self.last_stamp:total_secs = Time(nanoseconds=self.last_stamp.nanosec, seconds=self.last_stamp.sec).nanosecondsdelta = datetime.timedelta(seconds=total_secs * 1e-9)seconds = delta.total_seconds()*100if self.new_seconds != 0:self.fps_seconds = seconds - self.new_secondsself.new_seconds = seconds#保留这次的值start = time.time()frame = self.bridge.compressed_imgmsg_to_cv2(msg)frame = cv.resize(frame, (640, 480))cv.waitKey(10)#调用手部检测frame, img = self.hand_detector.pubHandsPoint(frame, draw=False)end = time.time()fps = 1/((end - start)+self.fps_seconds) text = "FPS : " + str(int(fps))cv.putText(frame, text, (20, 30), cv.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 1)#显示dist = self.hand_detector.frame_combine(frame, img)cv.imshow('dist', dist)# print(frame)cv.waitKey(10)def main():print("start it")rclpy.init()esp_img = MY_Picture("My_Picture")try:rclpy.spin(esp_img)except KeyboardInterrupt:passfinally:esp_img.destroy_node()rclpy.shutdown()

代码稍微长点，可以认为是分成了2部分，主节点是MY_Picture。

MY_Picture 里面首先获取摄像头画面，逻辑跟上一篇类似。

获取完之后，调用手部检测pubHandsPoint方法，反悔了原始图像跟加节点的图像。

最后展示。

bohu@bohu-TM1701:~/yahboomcar/yahboomcar_ws$ ros2 run yahboom_esp32_mediapipe HandDetector 
import done
start it
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1737441363.353163   46594 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1737441363.357023   46653 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1~22.04.3), renderer: Mesa Intel(R) UHD Graphics 620 (KBL GT2)
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1737441363.415742   46637 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1737441363.449011   46633 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1737441363.466042   46634 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway.

效果如下

我觉得官方的图黑乎乎的，还不如在原图上加节点直观，改了下