当前位置：首页 > news >正文

基于yolo的小球位置实时检测

news 2026/2/9 6:42:08

基于yolo的小球位置实时检测

Yolo安装

操作系统：ubuntu

安装cuda和opencv

git clone https://github.com/pjreddie/darknet.git

cd darknet

修改Makefile文件，使GPU=1，OPENCV=1

make

2. 数据集处理

2.1 制作数据集

将小球放在摄像头前面，截取500张不同位置的图片，如图2-1所示。尽可能的保证图像的清晰度，不要有幻影。

图2-1 制作数据集

2.2 图片名顺序编号

截取500张图片之后，需要将图片从000-499顺序编号，如图2-2所示。

图2-2 图片顺序标号

可以通过如下python程序实现：

#coding:utf-8

import os

path = "./b/"

dirs = os.listdir(path)

print type(dirs)

for i in range(0,500):

oldname = path + dirs[i]

newname = path + "%03d"%i +".jpg"

os.rename(oldname,newname)

图2-3 train.txt文件

2.3 生成train.txt文件

新建文件夹VOCdevkit，在VOCdevkit文件夹下新建三个文件夹Annotation、ImageSets和JPEGImages，并把准备好的自己的原始图像放在JPEGImages文件夹下。在ImageSets文件夹中，新建空文件夹Main，然后把写了训练或测试的图像的名字的文本拷到Main文件夹下，即train.txt文件，如图2-3所示。

可以通过如下python程序实现：

file = open('train.txt','w')

for i in range(0,500):

file.write(str("%03d"%i)+'\n')

file.close()

2.4 生成xml文件

通过标注工具labelimg来图片进行标注，生成xml文件。Github地址为：https://github.com/tzutalin/labelImg。

生成的xml文件保存到Annotation文件夹下，格式如下：

<?xml version="1.0"?>

<folder>JPEGImages</folder>

<path>/home/byl/dl/yolo/darknet/scripts/VOCdevkit/BALL2007/JPEGImages/000.jpg</path>

<database>Unknown</database>

</source>

<size>

</size>

<pose>Unspecified</pose>

</bndbox>

</object>

</annotation>

2.5 生成txt文件

将VOCdevkit放到scripts目录下，运行voc_label.py文件，会在VOCdevkit文件下生成labels文件夹，000.txt-499.txt会生成在labels文件夹下。格式为：0 0.428125 0.0841666666667 0.03375 0.045。同时，会在scripts文件夹下生成2007_train.txt文件，如图2-4所示。

图2-4 2007_train.txt文件

修改voc_label.py代码如下：

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

#sets=[('2012', 'train'), ('2012', 'val'), ('2007', 'train'), ('2007', 'val'), ('2007', 'test')]

#classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

sets = [('2007','train')]
classes = ["ball"]

def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(year, image_id):
    in_file = open('VOCdevkit/BALL%s/Annotations/%s.xml'%(year, image_id))
    out_file = open('VOCdevkit/BALL%s/labels/%s.txt'%(year, image_id), 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

for year, image_set in sets:
    if not os.path.exists('VOCdevkit/BALL%s/labels/'%(year)):
        os.makedirs('VOCdevkit/BALL%s/labels/'%(year))
    image_ids = open('VOCdevkit/BALL%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year, image_set), 'w')
    for image_id in image_ids:
        list_file.write('%s/VOCdevkit/BALL%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
        convert_annotation(year, image_id)
    list_file.close()

3.训练神经网络

3.1 修改tiny-yolo-voc.cfg

修改cfg/tiny-yolo-voc.cfg如下：

[convolutional]

size=1

stride=1

pad=1

filters=30 //修改最后一层卷积层核参数个数，计算公式是依旧自己数据的类别数filter=num×（classes + coords + 1）=5×（1+4+1）=30

activation=linear

[region]

anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52

bias_match=1

classes=1 //类别数，本例为1类

coords=4

num=5

softmax=1

jitter=.2

rescore=1

object_scale=5

noobject_scale=1

class_scale=1

coord_scale=1

absolute=1

thresh = .6

random=1

3.2 修改voc.names文件

现在检测的类只有一个ball，删除data/voc.names中的内容，写入ball。

3.3 修改voc.data文件

修改voc.data文件内容如下：

classes= 1

train = /home/byl/dl/yolo/darknet/scripts/2007_train.txt

//valid = /home/byl/dl/yolo/darknet/2007_test.txt

names = data/voc.names

backup = backup

图3-1 训练网络

3.4 训练网络

在darknet目录下，输入./darknet detector train cfg/voc.data cfg/tiny-yolo-voc.cfg，即可开始训练。如图3-1所示。

4.最终结果

4.1 终极模型

对目标进行检测，需要模型文件。模型的好坏直接影响最终的检测见过。对训练log进行分析，来判断模型是否训练到了最佳，如表4-1所示。AVG IOU的值越大，测试结果会越好。表中的0.0，0.1…分别代表AVG IOU 0.0，AVG IOU 0.1…。

训练次数	0.0	0.1	0.2	0.3	0.7	0.8	0.9
47万-53万	1	6	150	1311	185100	127817	1874
53万-55万	1	2	35	362	61803	46824	880
55万-61万	0	7	85	825	178128	147453	3405
61万-63万	0	2	21	185	51183	45038	1149
63万-69万	1	7	82	657	193275	183137	5572
69万-71万	0	0	18	187	63576	64266	2234
118万-121万	0	0	6	84	70550	100660	7199
121万-126万	0	1	18	188	145083	209911	15962
126万-128万	0	0	7	54	42951	59009	4284
128万-133万	0	1	17	181	137936	207057	16287
133万-135万	0	0	6	74	47243	69910	5404

表4-1 训练log分析

通过表可知，大约训练到120万次时，训练结果趋于稳定，不会再有太大的变化。由于数据量太大，训练log和最终训练的模型将会放在TITAN X电脑Ubuntu系统家目录下的项目/01-小球位置实时检测目录下。

4.2 检测结果

用历时三周训练了135万次的模型检测的结果如图4-1所示。还有录制的检测结果的视频、训练数据文件，数据标定文件等均在上节所述文件夹。

图4-1 检测结果

基于yolo的小球位置实时检测

2. 数据集处理

2.1 制作数据集

2.2 图片名顺序编号

2.3 生成train.txt文件

2.4 生成xml文件

2.5 生成txt文件

3.训练神经网络

3.1 修改tiny-yolo-voc.cfg

3.2 修改voc.names文件

3.3 修改voc.data文件

3.4 训练网络

4.最终结果

4.1 终极模型

4.2 检测结果

相关文章：