当前位置：首页 > news >正文

OpenCv + Qt5.12.2 文字识别

news 2025/10/18 20:04:41

OpenCv + Qt5.12.2 文字检测与文本识别

前言

好久没有进行一些相关的更新的了，去年一共更新了四篇，最近一直在做音视频相关的直播服务，又是重新学习积攒经验的一个过程。去年疫情也比较严重，等到解封，又一直很忙，最近又算有了一些时间，所以想着可以做一些更新了，又拿起了 OpenCV,做一些相关更新了。其实代码相关的工作，在上一篇 OpenCV-摄像头相关的完成之后已经做完了，只是一直没有写相关博客，这次先给做完。

简介

文本检测与文本识别都是基于原生OpenCV的扩张模块来实现的，基本流程是按照 OpenCV 文字检测与识别模块来实现的，只不过是我做了一些关于Ot与OpenCV的集成工作做成了项目。大致工作流程为：图片选择，功能选择，图片保存。

相关的文档我在内外网搜索后发现大致几篇一样的文档，来源不可考，大致都贴出来：

OpenCV 文字檢測與識別模塊 - 台部落 / OpenCV 文字检测与识别模块 - CSDN

OPENCV 文字检测与识别模块 - 灰信网

文档基本相同，CSDN与灰信网完全相同，台部落是资源路径不同，台部落是原始模型资源路径，CSDN与灰信网的路径相同是一个网盘。但是台部落与CSDN博主是同一个名字。那就是灰信网。

资源路径

编译相关的已经在前两篇文档已经描述过了，路径如下： OpenCv4.4.0+Qt5.12.2+OpenCv-Contrib-4.4.0。

那就描述一下本期需要用到的一些资源：

文字检测

资源文件描述如下： textDetector.hpp 文档中 37-39行。详细内容如下：

/** @brief TextDetectorCNN class provides the functionallity of text bounding box detection.This class is representing to find bounding boxes of text words given an input image.This class uses OpenCV dnn module to load pre-trained model described in @cite LiaoSBWL17.The original repository with the modified SSD Caffe version: https://github.com/MhLiao/TextBoxes.Model can be downloaded from [DropBox](https://www.dropbox.com/s/g8pjzv2de9gty8g/TextBoxes_icdar13.caffemodel?dl=0).Modified .prototxt file with the model description can be found in `opencv_contrib/modules/text/samples/textbox.prototxt`.*/

textbox.prototxt - 本地文档模块目录中，按照路径查找即可。

TextBoxes_icdar13.caffemodel - TextBoxes_icdar13.caffemodel

文字识别

所需要的资源如下：见相关网页描述： OpenCV.org, text_recognition_cnn.cpp,不过也只是贴出了相关路径而已，原始博客中提到的关于

    cout << "   Demo of text recognition CNN for text detection." << endl<< "   Max Jaderberg et al.: Reading Text in the Wild with Convolutional Neural Networks, IJCV 2015"<<endl<<endl<< "   Usage: " << progFname << " <output_file> <input_image>" << endl<< "   Caffe Model files (textbox.prototxt, TextBoxes_icdar13.caffemodel)"<<endl<< "     must be in the current directory. See the documentation of text::TextDetectorCNN class to get download links." << endl<< "   Obtaining text recognition Caffe Model files in linux shell:" << endl<< "   wget http://nicolaou.homouniversalis.org/assets/vgg_text/dictnet_vgg.caffemodel" << endl<< "   wget http://nicolaou.homouniversalis.org/assets/vgg_text/dictnet_vgg_deploy.prototxt" << endl<< "   wget http://nicolaou.homouniversalis.org/assets/vgg_text/dictnet_vgg_labels.txt" <<endl << endl;

代码

头文件：

#ifndef MAINWINDOW_H
#define MAINWINDOW_H#include <QMainWindow>
#include <iostream>
#include <fstream>
#include <vector>
#include <opencv2/opencv.hpp>
#include <opencv2/text.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/features2d.hpp>class ParallelExtracCSER: public cv::ParallelLoopBody
{
private:std::vector<cv::Mat> &channels;std::vector<std::vector<cv::text::ERStat>> &regions;std::vector<cv::Ptr<cv::text::ERFilter>> erFiter_1;std::vector<cv::Ptr<cv::text::ERFilter>> erFiter_2;
public:ParallelExtracCSER(std::vector<cv::Mat> &_channels, std::vector<std::vector<cv::text::ERStat>> &_regions,std::vector<cv::Ptr<cv::text::ERFilter>> _erFiter_1, std::vector<cv::Ptr<cv::text::ERFilter>> _erFiter_2): channels(_channels), regions(_regions), erFiter_1(_erFiter_1), erFiter_2(_erFiter_2){}virtual void operator()( const cv::Range &r) const CV_OVERRIDE{for(int c = r.start; c < r.end; c++){erFiter_1[c]->run(channels[c], regions[c]);erFiter_2[c]->run(channels[c], regions[c]);}}ParallelExtracCSER & operator=(const ParallelExtracCSER &a);
};template  <class T>
class ParallelOCR: public cv::ParallelLoopBody
{
private:std::vector<cv::Mat> &detections;std::vector<std::string> &outputs;std::vector<std::vector<cv::Rect> > &boxes;std::vector<std::vector<std::string> > &words;std::vector<std::vector<float> > &confidences;std::vector<cv::Ptr<T> > &ocrs;
public:ParallelOCR(std::vector<cv::Mat> &_detections, std::vector<std::string> &_outputs, std::vector< std::vector<cv::Rect> > &_boxes,std::vector< std::vector<std::string> > &_words, std::vector< std::vector<float> > &_confidences,std::vector< cv::Ptr<T> > &_ocrs):detections(_detections),outputs(_outputs),boxes(_boxes),words(_words),confidences(_confidences),ocrs(_ocrs){}virtual void operator()(const cv::Range &r) const CV_OVERRIDE{for(int c=r.start; c < r.end; c++){ocrs[c%ocrs.size()]->run(detections[c], outputs[c], &boxes[c], &words[c], &confidences[c], cv::text::OCR_LEVEL_WORD);}}ParallelOCR & operator=(const ParallelOCR &a);
};namespace Ui {
class MainWindow;
}class MainWindow : public QMainWindow
{Q_OBJECTpublic:explicit MainWindow(QWidget *parent = nullptr);~MainWindow();private:Ui::MainWindow *ui;void WindowInit();std::string sourcePath;void showImage(cv::Mat &image);bool fileExists(const std::string &filename);void textboxDraw(cv::Mat src, std::vector<cv::Rect> &groups, std::vector<float> &probs, std::vector<int> &indexes);bool isRepetitive(const std::string &s);void erDraw(std::vector<cv::Mat> &channels, std::vector<std::vector<cv::text::ERStat>> &regions, std::vector<cv::Vec2i> group, cv::Mat segmentation);public slots:void slot_importImage();void slot_saveImage();void slot_textDetector();void slot_textRecognizer();
};#endif // MAINWINDOW_H

MainWindow类是主要的Ctrl模块，其他两个类 ParallelExtracCSER，ParallelOCR属于业务类了，主要功能模块实现相关的。

函数实现

槽函数

主要对应四个主要功能，图片导入，图片保存，文本检测，文本识别

1. slot_importImage()

void MainWindow::slot_importImage()
{QString imagePath = QFileDialog::getOpenFileName(this,"选择图片","./","*png *jpg *jpeg");QImage image;if(image.load(imagePath))qDebug() << "导入图片成功" << imagePath;sourcePath = QDir::toNativeSeparators(imagePath).toStdString();qDebug() << "图片路径:" << QDir::toNativeSeparators(imagePath);int imageWidth = image.width();int imageHeight = image.height();if(imageWidth > 640){imageHeight = (640*10 / imageWidth) * imageHeight /10;imageWidth = 640;}if(imageHeight > 480){imageWidth = (480*10 / imageHeight) * imageWidth /10;imageHeight = 480;}image = image.scaled(imageWidth, imageHeight, Qt::IgnoreAspectRatio, Qt::SmoothTransformation);this->resize(imageWidth*2+2,imageHeight);ui->label_source->setPixmap(QPixmap::fromImage(image));
}

2.slot_saveImage()

void MainWindow::slot_saveImage()
{if(currentActive.isEmpty() || sourcePath.empty()){qDebug() << "currentActive is " << currentActive.isEmpty() << " sourcePath: " << sourcePath.empty();return;}QString source_path_name = QString::fromStdString(sourcePath);size_t pos = sourcePath.find('.');if(pos == std::string::npos){qDebug() << QString::fromStdString(sourcePath) << " iamget format is error";return;}QStringList sourcePaths = source_path_name.split('.');QString saveName = sourcePaths.at(0) + "_" + currentActive + "." + sourcePaths.at(1);if(ui->label_result->pixmap()->save(saveName, sourcePaths.at(1).toStdString().c_str())){qDebug() << saveName << " save success.";}else{qDebug() << saveName << " save fail.";}
}

3.slot_textDetector()

void MainWindow::slot_textDetector()
{const std::string modelArch = "textbox.prototxt" ;const std::string moddelWeights = "TextBoxes_icdar13.caffemodel";if(!fileExists(modelArch) || !fileExists(moddelWeights)){qDebug() << "Model files not found in the current directory. Aborting!";return;}if(sourcePath.empty()){qDebug() << "图片路径无效，请检查图片是否存在！";return;}cv::Mat image = cv::imread(sourcePath, cv::IMREAD_COLOR);if(image.empty()){qDebug() << "image is empty" << sourcePath.c_str();return;}qDebug() << "Starting Text Box Demo";cv::Ptr<cv::text::TextDetectorCNN> textSpotter = cv::text::TextDetectorCNN::create(modelArch, moddelWeights);std::vector<cv::Rect> bbox;std::vector<float> outProbabillities;textSpotter->detect(image, bbox, outProbabillities);std::vector<int> indexes;cv::dnn::NMSBoxes(bbox, outProbabillities, 0.4f, 0.5f, indexes);cv::Mat imageCopy = image.clone();
//    float threshold = 0.5;
//    for(int i = 0; i < bbox.size(); i++)
//    {
//        if(outProbabillities[i] > threshold)
//        {
//            cv::Rect rect = bbox[i];
//            cv::rectangle(imageCopy,rect,cv::Scalar(255,0,0),2);
//        }
//    }textboxDraw(imageCopy, bbox, outProbabillities, indexes);showImage(imageCopy);imageCopy = image.clone();cv::Ptr<cv::text::OCRHolisticWordRecognizer> wordSpotter =cv::text::OCRHolisticWordRecognizer::create("dictnet_vgg_deploy.prototxt", "dictnet_vgg.caffemodel", "dictnet_vgg_labels.txt");for(size_t i = 0; i < indexes.size(); i++){cv::Mat wordImg;cv::cvtColor(image(bbox[indexes[i]]),wordImg, cv::COLOR_BGR2GRAY);std::string word;std::vector<float> confs;wordSpotter->run(wordImg, word, nullptr, nullptr, &confs);cv::Rect currrentBox = bbox[indexes[i]];rectangle(imageCopy, currrentBox, cv::Scalar( 0, 255, 255 ), 2, cv::LINE_AA);int baseLine = 0;cv::Size labelSize = cv::getTextSize(word, cv::FONT_HERSHEY_PLAIN, 1, 1, &baseLine);int yLeftBottom = std::max(currrentBox.y, labelSize.height);rectangle(imageCopy, cv::Point(currrentBox.x, yLeftBottom - labelSize.height),cv::Point(currrentBox.x +labelSize.width, yLeftBottom + baseLine), cv::Scalar( 255, 255, 255 ), cv::FILLED);putText(imageCopy, word, cv::Point(currrentBox.x , yLeftBottom), cv::FONT_HERSHEY_PLAIN, 1, cv::Scalar( 0,0,0 ), 1, cv::LINE_AA);}showImage(imageCopy);
}

4.slot_textRecognizer()


void MainWindow::slot_textRecognizer()
{if(sourcePath.empty()){qDebug() << "图片路径无效，请检查图片是否存在！";return;}cv::Mat image = cv::imread(sourcePath, cv::IMREAD_COLOR);if(image.empty()){qDebug() << "image is empty" << sourcePath.c_str();return;}bool downsize = false;int RegionType = 1;int GroupingAlgorithm = 0;int Recongnition = 0;cv::String regionTypeString[2] = {"ERStats","MSER"};cv::String GroupingAlgorithmsStr[2] = {"exhaustive_search", "multioriented"};cv::String recognitionsStr[2] = {"Tesseract", "NM_chain_features + KNN"};std::vector<cv::Mat> channels;std::vector<std::vector<cv::text::ERStat>> regions(2);cv::Mat gray,outImage;// Create ERFilter objects with the 1st and 2nd stage default classifiers// since er algorithm is not reentrant we need one filter for channelstd::vector< cv::Ptr<cv::text::ERFilter> > erFilters1;std::vector< cv::Ptr<cv::text::ERFilter> > erFilters2;if(!fileExists("trained_classifierNM1.xml") || !fileExists("trained_classifierNM2.xml")|| !fileExists("OCRHMM_transitions_table.xml") || !fileExists("OCRHMM_knn_model_data.xml.gz") || !fileExists("trained_classifier_erGrouping.xml")){qDebug() << " trained_classifierNM1.xml file not found!";return;}for(int i = 0; i<2; i++ ){cv::Ptr<cv::text::ERFilter> erFilter1 = createERFilterNM1(cv::text::loadClassifierNM1("trained_classifierNM1.xml"), 8, 0.00015f, 0.13f, 0.2f, true, 0.1f);cv::Ptr<cv::text::ERFilter> erFilter2 = createERFilterNM2(cv::text::loadClassifierNM2("trained_classifierNM2.xml"), 0.5);erFilters1.push_back(erFilter1);erFilters2.push_back(erFilter2);}int numOcrs = 10;std::vector<cv::Ptr<cv::text::OCRTesseract>> ocrs;for(int o = 0; o < numOcrs; o++){ocrs.push_back(cv::text::OCRTesseract::create());}cv::Mat transitionP;std::string filename = "OCRHMM_transitions_table.xml";cv::FileStorage fs(filename, cv::FileStorage::READ);fs["transition_probabilities"] >> transitionP;fs.release();cv::Mat emissionP = cv::Mat::eye(62, 62, CV_64FC1);std::string voc = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";std::vector< cv::Ptr<cv::text::OCRHMMDecoder>> decoders;for(int o = 0; o <numOcrs; o++){decoders.push_back(cv::text::OCRHMMDecoder::create(cv::text::loadOCRHMMClassifierNM("OCRHMM_knn_model_data.xml.gz"),voc, transitionP, emissionP));}double tAll = (double)cv::getTickCount();if(downsize)cv::resize(image,image,cv::Size(image.size().width,image.size().height),0,0,cv::INTER_LINEAR_EXACT);cv::cvtColor(image, gray, cv::COLOR_BGR2GRAY);channels.clear();channels.push_back(gray);channels.push_back(255 - gray);regions[0].clear();regions[1].clear();switch (RegionType) {case 0:cv::parallel_for_(cv::Range(0, (int)channels.size()), ParallelExtracCSER(channels, regions, erFilters1, erFilters2));break;case 1:{std::vector<std::vector<cv::Point>> contours;std::vector<cv::Rect> bboxes;cv::Ptr<cv::MSER> mesr = cv::MSER::create(21, (int)(0.00002*gray.cols*gray.rows), (int)(0.05*gray.cols * gray.rows), 1, 0.7);mesr->detectRegions(gray, contours, bboxes);if(contours.size() > 0)MSERsToERStats(gray, contours, regions);}break;}std::vector< std::vector<cv::Vec2i>> nmRegionGroups;std::vector<cv::Rect> nmBoxes;switch (GroupingAlgorithm) {case 0:cv::text::erGrouping(image, channels, regions, nmRegionGroups, nmBoxes, cv::text::ERGROUPING_ORIENTATION_HORIZ);break;case 1:cv::text::erGrouping(image, channels, regions, nmRegionGroups, nmBoxes, cv::text::ERGROUPING_ORIENTATION_ANY, "trained_classifier_erGrouping.xml", 0.5);break;}/*Text Recognition (OCR)*/int bottom_bar_height = outImage.rows/7 ;cv::copyMakeBorder(image, outImage, 0, bottom_bar_height, 0, 0, cv::BORDER_CONSTANT, cv::Scalar(150, 150, 150));float scale_font = (float)(bottom_bar_height /85.0);std::vector<std::string> words_detection;float min_confidence1 = 0.f, min_confidence2 = 0.f;if (Recongnition == 0){min_confidence1 = 51.f;min_confidence2 = 60.f;}std::vector<cv::Mat> detections;for (int i=0; i<(int)nmBoxes.size(); i++){rectangle(outImage, nmBoxes[i].tl(), nmBoxes[i].br(), cv::Scalar(255,255,0),3);cv::Mat group_img = cv::Mat::zeros(image.rows+2, image.cols+2, CV_8UC1);erDraw(channels, regions, nmRegionGroups[i], group_img);group_img(nmBoxes[i]).copyTo(group_img);copyMakeBorder(group_img,group_img,15,15,15,15,cv::BORDER_CONSTANT,cv::Scalar(0));detections.push_back(group_img);}std::vector<std::string> outputs((int)detections.size());std::vector< std::vector<cv::Rect> > boxes((int)detections.size());std::vector< std::vector<std::string> > words((int)detections.size());std::vector< std::vector<float> > confidences((int)detections.size());// parallel process detections in batches of ocrs.size() (== num_ocrs)for (int i=0; i<(int)detections.size(); i=i+(int)numOcrs){cv::Range r;if (i+(int)numOcrs <= (int)detections.size())r = cv::Range(i,i+(int)numOcrs);elser = cv::Range(i,(int)detections.size());switch(Recongnition){case 0: // TesseractqDebug() << "+++++";cv::parallel_for_(r, ParallelOCR<cv::text::OCRTesseract>(detections, outputs, boxes, words, confidences, ocrs));qDebug() << "---";break;case 1: // NM_chain_features + KNNcv::parallel_for_(r, ParallelOCR<cv::text::OCRHMMDecoder>(detections, outputs, boxes, words, confidences, decoders));break;}}for(auto &it : outputs){qDebug() << QString::fromStdString(it);}for (int i=0; i<(int)detections.size(); i++){outputs[i].erase(remove(outputs[i].begin(), outputs[i].end(), '\n'), outputs[i].end());//cout << "OCR output = \"" << outputs[i] << "\" length = " << outputs[i].size() << endl;if (outputs[i].size() < 3)continue;for (int j=0; j<(int)boxes[i].size(); j++){boxes[i][j].x += nmBoxes[i].x-15;boxes[i][j].y += nmBoxes[i].y-15;//cout << "  word = " << words[j] << "\t confidence = " << confidences[j] << endl;if ((words[i][j].size() < 2) || (confidences[i][j] < min_confidence1) ||((words[i][j].size()==2) && (words[i][j][0] == words[i][j][1])) ||((words[i][j].size()< 4) && (confidences[i][j] < min_confidence2)) ||isRepetitive(words[i][j]))continue;words_detection.push_back(words[i][j]);rectangle(outImage, boxes[i][j].tl(), boxes[i][j].br(), cv::Scalar(255,0,255),3);cv::Size word_size = getTextSize(words[i][j], cv::FONT_HERSHEY_SIMPLEX, (double)scale_font, (int)(3*scale_font), nullptr);cv::rectangle(outImage, boxes[i][j].tl()-cv::Point(3,word_size.height+3), boxes[i][j].tl()+cv::Point(word_size.width,0), cv::Scalar(255,0,255),-1);cv::putText(outImage, words[i][j], boxes[i][j].tl()-cv::Point(1,1), cv::FONT_HERSHEY_SIMPLEX, scale_font, cv::Scalar(255,255,255),(int)(3*scale_font));}}tAll = ((double)cv::getTickCount() - tAll)*1000/cv::getTickFrequency();int text_thickness = 1+(outImage.rows/500);std::string fps_info = cv::format("%2.1f Fps. %dx%d", (float)(1000 / tAll), image.cols, image.rows);cv::putText(outImage, fps_info, cv::Point( 10,outImage.rows-5 ), cv::FONT_HERSHEY_DUPLEX, scale_font, cv::Scalar(255,0,0), text_thickness);cv::putText(outImage, regionTypeString[RegionType], cv::Point((int)(outImage.cols*0.5), outImage.rows - (int)(bottom_bar_height/ 1.5)), cv::FONT_HERSHEY_DUPLEX, scale_font, cv::Scalar(255,0,0), text_thickness);cv::putText(outImage, GroupingAlgorithmsStr[GroupingAlgorithm], cv::Point((int)(outImage.cols*0.5),outImage.rows-((int)(bottom_bar_height /3)+4) ), cv::FONT_HERSHEY_DUPLEX, scale_font, cv::Scalar(255,0,0), text_thickness);cv::putText(outImage, regionTypeString[Recongnition], cv::Point((int)(outImage.cols*0.5),outImage.rows-5 ), cv::FONT_HERSHEY_DUPLEX, scale_font, cv::Scalar(255,0,0), text_thickness);showImage(outImage);
}

Ctrl函数

void MainWindow::WindowInit()
{//设置菜单QMenu* file = ui->menuBar->addMenu(QString("文件"));QAction* importImage = file->addAction(QString("选择图片"));QAction* saveImage = file->addAction(QString("保存"));QMenu* funtion = ui->menuBar->addMenu(QString("功能"));QAction* textDetector = funtion->addAction(QString("文字检测"));QAction* textRecognizer = funtion->addAction(QString("文字识别"));//绑定信号与槽函数connect(importImage,&QAction::triggered,this,&MainWindow::slot_importImage);connect(saveImage,&QAction::triggered,this,&MainWindow::slot_saveImage);connect(textDetector,&QAction::triggered,this,&MainWindow::slot_textDetector);connect(textRecognizer,&QAction::triggered,this,&MainWindow::slot_textRecognizer);
}

Qt图片显示函数

做了一个图片显示，附带缩放显示

void MainWindow::showImage(cv::Mat &image)
{cv::Mat outImage;cv::cvtColor(image, outImage, cv::COLOR_BGR2RGB);QImage qImage = QImage((const unsigned char*)(outImage.data),outImage.cols,outImage.rows,outImage.step,QImage::Format_RGB888);int imageWidth = qImage.width();int imageHeight = qImage.height();if(imageWidth > 640){imageHeight = (640*10 / imageWidth) * imageHeight /10;imageWidth = 640;}if(imageHeight > 480){imageWidth = (480*10 / imageHeight) * imageWidth /10;imageHeight = 480;}qImage = qImage.scaled(imageWidth, imageHeight, Qt::IgnoreAspectRatio, Qt::SmoothTransformation);ui->label_result->setPixmap(QPixmap::fromImage(qImage));
}

文字绘制

void MainWindow::textboxDraw(cv::Mat src, std::vector<cv::Rect>& groups, std::vector<float>& probs, std::vector<int>& indexes)
{for (size_t i = 0; i < indexes.size(); i++){if (src.type() == CV_8UC3){cv::Rect currrentBox = groups[indexes[i]];cv::rectangle(src, currrentBox, cv::Scalar( 0, 255, 255 ), 2, cv::LINE_AA);cv::String cvlabel = cv::format("%.2f", probs[indexes[i]]);qDebug() << "text box: " << currrentBox.size().width << " " <<currrentBox.size().height << " confidence: " << probs[indexes[i]] << "\n";int baseLine = 0;cv::Size labelSize = getTextSize(cvlabel, cv::FONT_HERSHEY_PLAIN, 1, 1, &baseLine);int yLeftBottom = std::max(currrentBox.y, labelSize.height);cv::rectangle(src, cv::Point(currrentBox.x, yLeftBottom - labelSize.height),cv::Point(currrentBox.x + labelSize.width, yLeftBottom + baseLine), cv::Scalar( 255, 255, 255 ), cv::FILLED);cv::putText(src, cvlabel, cv::Point(currrentBox.x, yLeftBottom), cv::FONT_HERSHEY_PLAIN, 1, cv::Scalar( 0,0,0 ), 1, cv::LINE_AA);}elsecv::rectangle(src, groups[i], cv::Scalar( 255 ), 3, 8 );}
}

## 源码

基本流程如上，相关的函数解释与释义都已经附上，更详细的说明解释，见上述博客内容，就不再做一边赘述了。

源码