青娱乐免费视频,亚洲AV无码久久精品成人,婷婷五月天激情网

技術文章：Yolov3 CPU推理性能比較-Onnx、OpenCV、Darknet

2021-02-09 09:55

為實時目標檢測應用程序選擇正確的推理框架變得非常具有挑戰(zhàn)性,尤其是當模型應該在低功耗設備上運行時。在本文中,你將了解如何根據(jù)你的需要選擇最佳的推理檢測器,并發(fā)現(xiàn)它可以給你帶來巨大的性能提升。通常,當我們將模型部署到CPU或移動設備上時,往往只關注于輕量級的模型體系結構,而忽略了對快速推理機的研究。在研究CPU設備上的快速推理時,我測試了提供穩(wěn)定python API的各種框架。今天將重點介紹Onnxruntime、opencvdnn和Darknet框架,并從性能(運行時間)和準確性方面對它們進行度量。

Onnxruntime

opencvdnn

Darknet

我們將使用兩種常見的目標檢測模型進行性能測量:Yolov3架構:image_size = 480*480
classes = 98
BFLOPS =87．892
Tiny-Yolov3_3layers 體系結構:image_size= 1024*1024
classes =98
BFLOPS= 46．448

這兩個模型都是使用AlexeyAB的Darknet框架對自定義數(shù)據(jù)進行訓練的�，F(xiàn)在,讓我們用我們要測試的探測器來運行推理。Darknet探測器Darknet是訓練 YOLO 目標檢測模型的官方框架。此外,它還提供了對*．weights文件格式的模型進行推理的能力,該文件格式與訓練輸出的格式相同。推理有兩種方法:不同數(shù)量的圖像:darknet detector test cfg/coco．data cfg/yolov3．cfg yolov3．weights -thresh 0．25
一個圖像darknet detector demo cfg/coco．data cfg/yolov3．cfg yolov3．weights dog．png
OpenCV DNN探測器Opencv-DNN是計算機視覺領域常用的Opencv庫的擴展。Darknet聲稱OpenCV-DNN是“CPU設備上YOLV4/V3最快的推理實現(xiàn)”,因為它高效的C&C++實現(xiàn)。由于其方便的Python API,直接將darknet權重加載到opencv-dnn即可。這是E2E推理的代碼片段:import cv2
import numpy as np
# 指定模型的網(wǎng)絡大小
network_size = (480,480)
# Darknet cfg文件路徑
cfg_path = 'yolov3．cfg'
# Darknet 權重路徑
weights_path = 'yolov3．weights'
# 定義推理引擎
net = cv2．dnn．readNetFromDarknet(cfg_path, weights_path)
net．setPreferableBackend(cv2．dnn．DNN_BACKEND_OPENCV)
net．setPreferableTarget(cv2．dnn．DNN_TARGET_CPU)
_layer_names = net．getLayerNames()
_output_layers = [_layer_names[i[0] - 1] for i in net．getUnconnectedOutLayers()]
# 讀取圖像作為輸入
img_path = 'dog．png'
img = cv2．imread(img_path)
image_blob = cv2．dnn．blobFromImage(img, 1 / 255．0, network_size, swapRB=True, crop=False)
net．setInput(image_blob, "data")
# 運行推理
layers_result = net．forward(_output_layers)
# 將layers_result轉換為bbox,conf和類
def get_final_predictions(outputs, img, threshold, nms_threshold):
height, width = img．shape[0], img．shape[1]
boxes, confs, class_ids = [], [], []
for output in outputs:
for detect in output:
scores = detect[5:]
class_id = np．a(chǎn)rgmax(scores)
conf = scores[class_id]
if conf > threshold:
center_x = int(detect[0] * width)
center_y = int(detect[1] * height)
w = int(detect[2] * width)
h = int(detect[3] * height)
x = int(center_x - w/2)
y = int(center_y - h / 2)
boxes．a(chǎn)ppend([x, y, w, h])
confs．a(chǎn)ppend(float(conf))
class_ids．a(chǎn)ppend(class_id)

merge_boxes_ids = cv2．dnn．NMSBoxes(boxes, confs, threshold, nms_threshold)

# 僅過濾nms之后剩余的框
boxes = [boxes[int(i)] for i in merge_boxes_ids]
confs = [confs[int(i)] for i in merge_boxes_ids]
class_ids = [class_ids[int(i)] for i in merge_boxes_ids]
return boxes, confs, class_ids
boxes, confs, class_ids = get_final_predictions(layers_result, img, 0．3, 0．3)
Onnxruntime檢測器Onnxruntime是由微軟維護的,由于其內置的優(yōu)化和獨特的ONNX權重格式文件,它聲稱可以顯著加快推理速度。正如你在下一張圖片中看到的,它支持各種風格和技術。在我們的比較中,我們將使用PythondCPU風格。

ONNX格式定義了一組通用的操作符(機器學習和深度學習模型的構建塊)和一種通用的文件格式,使AI開發(fā)人員能夠將模型與各種框架、工具、運行時和編譯器一起使用。轉換Darknet權重> Onnx權重為了使用Onnxruntime運行推理,我們必須將*．weights格式轉換為*．onnx fomrat。我們將使用專門為將darknet*．weights格式轉換為*．pt(PyTorch)和*．onnx(onnx格式)而創(chuàng)建的存儲庫。

克隆repo并安裝需求。用cfg和weights和img_size參數(shù)運行converter．py。python converter．py yolov3．cfg yolov3．weights 1024 1024
將在yolov3．weights目錄中創(chuàng)建一個yolov3．onnx文件。請記住,在使用ONNX格式進行推理時,由于轉換過程的原因,精度會降低約0．1 mAP%。轉換器模仿PyTorch中的Darknet功能,但并非完美無缺為了支持除yolov3之外的其他darknet架構的轉換,可以隨意創(chuàng)建issues/PR在我們成功地將模型轉換為ONNX格式之后,我們可以使用Onnxruntime運行推理。下面是E2E推理的代碼片段:import onnxruntime
import cv2
import numpy as np
# 轉換后的onnx權重
onnx_weights_path = 'yolov3．onnx'
# 指定模型的網(wǎng)絡大小
network_size = (480, 480)
# 聲明onnxruntime會話
session = onnxruntime．InferenceSession(onnx_weights_path)
session．get_modelmeta()
input_name = session．get_inputs()[0]．name
output_name_1 = session．get_outputs()[0]．name
output_name_2 = session．get_outputs()[1]．name
# 閱讀圖片
img_path = 'dog．png'
img = cv2．imread(img_path)
image_blob = cv2．dnn．blobFromImage(img, 1 / 255．0, network_size, swapRB=True, crop=False)
# 運行推理
layers_result = session．run([output_name_1, output_name_2],
{input_name: image_blob})
layers_result = np．concatenate([layers_result[1], layers_result[0]], axis=1)
# 將layers_result轉換為bbox,conf和類
def get_final_predictions(outputs, img, threshold, nms_threshold):
height, width = img．shape[0], img．shape[1]
boxes, confs, class_ids = [], [], []
matches = outputs[np．where(np．max(outputs[:, 4:], axis=1) > threshold)]
for detect in matches:
scores = detect[4:]
class_id = np．a(chǎn)rgmax(scores)
conf = scores[class_id]
center_x = int(detect[0] * width)
center_y = int(detect[1] * height)
w = int(detect[2] * width)
h = int(detect[3] * height)
x = int(center_x - w/2)
y = int(center_y - h / 2)
boxes．a(chǎn)ppend([x, y, w, h])
confs．a(chǎn)ppend(float(conf))
class_ids．a(chǎn)ppend(class_id)

merge_boxes_ids = cv2．dnn．NMSBoxes(boxes, confs, threshold, nms_threshold)

#將layers_result轉換為bbox,conf和類
boxes = [boxes[int(i)] for i in merge_boxes_ids]
confs = [confs[int(i)] for i in merge_boxes_ids]
class_ids = [class_ids[int(i)] for i in merge_boxes_ids]
return boxes, confs, class_ids
boxes, confs, class_ids = get_final_predictions(layers_result, img, 0．3, 0．3)
性能比較祝賀你,我們已經(jīng)完成了所有的技術細節(jié),你現(xiàn)在應該有足夠的知識來推理每一個探測器。現(xiàn)在讓我們來討論我們的主要目標——性能比較。在PC cpu(英特爾i7第9代)上,分別針對上述每個型號(Yolov3,Tiny-Yolov3)分別測量了性能**。**對于opencv和onnxruntime,我們只測量前向傳播的執(zhí)行時間,以便將其與前/后進程隔離開來。概要分析:opencvlayers_result = self．net．forward(_output_layers)
Onnxruntimelayers_result = session．run([output_name_1, output_name_2], {input_name: image_blob})
layers_result = np．concatenate([layers_result[1], layers_result[0]], axis=1)
Darknetdarknet detector test cfg/coco．data cfg/yolov3．cfg yolov3．weights -thresh 0．25
判斷Yolov3Yolov3在400張獨特的圖片上進行了測試。ONNX Detector是推斷我們的Yolov3模型的最快方法。確切地說,它比opencv-dnn快43%,后者被認為是可用的最快的檢測器之一。