基於libtorch的Alexnet深度學習網路實現—

“

上篇文章我們講了Alexnet神經網路的結構與原理，我們知道該網路主要由5個卷積層、3個池化層、3個Affine層和1個Softmax層構成。本文我們將基於libtorch來實現該網路，並對Cifar-10資料進行訓練、分類。

”

基於libtorch的Alexnet深度學習網路實現——Alexnet網路結構原理

由於原Alexnet網路的輸入是3通道227*227影象，而Cifar-10資料集是3通道的32*32影象，如果直接輸入3*32*32的影象到Alexnet網路，邊緣需要填充大量0值才湊成227*227影象，這既麻煩又增加計算量。同時Cifar-10資料集只有10個種類，輸出層的尺寸也需要修改。因此我們對Alexnet網路的輸入層尺寸、中間層尺寸和輸出層尺寸都稍作修改，以便可以直接輸入3*32*32影象而不需要邊緣填充大量0值，並且可以輸出對應10個種類的10個機率值。修改尺寸之後的網路結構如下圖所示：

—

訓練策略

前文我們使用Lenet-5網路來訓練Cifar-10資料集時，每次往神經網路輸入一個樣本，Cifar-10資料集有50000個樣本，那麼一個epoch總共有50000次迴圈，對應的需要更新50000次引數，這個過程非常耗時，且收斂速度也慢。

PS：這裡說的樣本，是指輸入的一張影象包含的資料量，比如輸入單通道32*32影象，那麼一張單通道的32*32影象就是一個樣本，又比如輸入三通道32*32影象，那麼一張三通道的32*32影象（3*32*32）就是一個樣本。

為解決以上訓練問題，人們想出了批次（batch）訓練的方法，也即每次從訓練資料集中取n（n > 1，n通常稱為batch size）個樣本，然後n個樣本分別輸入神經網路執行前向傳播，得到對應的n個損失函式值Yi（0 ≤ i < n），再計算這n個損失函式值的均值Y作為本輪迭代的損失函式值，再使用Y進行誤反向傳播法，計算梯度進行網路引數更新。批次訓練的示意圖如下圖所示：

批次訓練不僅可以加快收斂速度，還能使訓練過程更加穩定地朝減小損失函式值的方向進行。此外，這樣訓練更有利於GPU並行執行，比如開啟GPU多執行緒運算，每個執行緒計算一個樣本的前向傳播，多個執行緒並行執行，這樣可明顯加快訓練速度。

PS：batch size要取合適的值，不能太小也不能太大，通常取16~128之間就好。

由上述可知，訓練過程分為多個epoch，每個epoch又分為多個batch。如果每個epoch都按照同樣的順序取batch樣本，這是訓練的大忌，會導致災難性的訓練結果。因此我們需要在每個epoch開始之前打亂全部樣本的順序，如下圖所示：

—

網路結構體定義

按照本文開頭的網路結構圖，我們使用libtorch來定義該網路，並實現前向傳播函式：

struct AlexNet ： torch：：nn：：Module{ AlexNet（int arg_padding = 0） //conv1卷積層，3*64個3*3卷積核，步長1，填充1 ： conv1（register_module（“conv1”， torch：：nn：：Conv2d（torch：：nn：：Conv2dOptions（3， 64， { 3，3 }）。padding（arg_padding）。stride（{ 1，1 }）））） //conv2卷積層，64*192個3*3卷積核，步長1，填充1 ， conv2（register_module（“conv2”， torch：：nn：：Conv2d（torch：：nn：：Conv2dOptions（64， 192， { 3，3 }）。padding（1）。stride（{ 1，1 }）））） //conv3卷積層，192*384個3*3卷積核，步長1，填充1 ， conv3（register_module（“conv3”， torch：：nn：：Conv2d（torch：：nn：：Conv2dOptions（192， 384， { 3，3 }）。padding（1）。stride（{ 1，1 }）））） //conv4卷積層，384*256個3*3卷積核，步長1，填充1 ， conv4（register_module（“conv4”， torch：：nn：：Conv2d（torch：：nn：：Conv2dOptions（384， 256， { 3，3 }）。padding（1）。stride（{ 1，1 }）））） //conv5卷積層，256*256個3*3卷積核，步長1，填充1 ， conv5（register_module（“conv5”， torch：：nn：：Conv2d（torch：：nn：：Conv2dOptions（256， 256， { 3，3 }）。padding（1）。stride（{ 1，1 }）））） //256*4*4 //fc1 Affine層，256*6*6——>4096，dropout 0。5 ， fc1（register_module（“fc1”， torch：：nn：：Linear（256*6*6， 4096））） //fc2 Affine層，4096——>4096，dropout 0。5 ， fc2（register_module（“fc2”， torch：：nn：：Linear（4096， 4096））） //fc3 Affine層，4096——>10，dropout 0。5 ， fc3（register_module（“fc3”， torch：：nn：：Linear（4096， 10））） { } ~AlexNet（） { } //前向傳播函式 torch：：Tensor forward（torch：：Tensor input） { namespace F = torch：：nn：：functional； //conv1——>relu——>pool1 auto x = F：：max_pool2d（F：：relu（conv1（input））， F：：MaxPool2dFuncOptions（3）。stride（{ 2， 2 }））； //（32+1*2-3）/1+1=32 ——> （32-3）/2+1=15 //conv2——>relu——>pool2 x = F：：max_pool2d（F：：relu（conv2（x））， F：：MaxPool2dFuncOptions（3）。stride（{ 2， 2 }））； //（15+1*2-3）/1+1=15 ——> （15-3）/2+1=7 //conv3——>relu x = F：：relu（conv3（x））； //（7+1*2-3）/1+1=7 //conv4——>relu x = F：：relu（conv4（x））； //（7+1*2-3）/1+1=7 //conv5——>relu——>pool3 x = F：：max_pool2d（F：：relu（conv5（x））， F：：MaxPool2dFuncOptions（2）。stride（{ 1， 1 }））； //（7+1*2-3）/1+1=7 ——> （7-2）/1+1=6 //一維展開 x = x。view（{ x。size（0）， -1 }）； //dropout 0。5 x = F：：dropout（x， F：：DropoutFuncOptions（）。p（0。5））； //fc1——>relu x = F：：relu（fc1（x））； //dropout 0。5 x = F：：dropout（x， F：：DropoutFuncOptions（）。p（0。5））； //fc2——>relu x = F：：relu（fc2（x））； //dropout 0。5 x = F：：dropout（x， F：：DropoutFuncOptions（）。p（0。5））； //fc3——>relu x = fc3（x）； //注意這裡不需要計算softmax，因為後續的交叉熵誤差函數里面有計算了 return x； } torch：：nn：：Conv2d conv1； torch：：nn：：Conv2d conv2； torch：：nn：：Conv2d conv3； torch：：nn：：Conv2d conv4； torch：：nn：：Conv2d conv5； torch：：nn：：Linear fc1； torch：：nn：：Linear fc2； torch：：nn：：Linear fc3；}；

—

訓練、測試資料集準備

Cifar-10資料集總共有6個檔案，其中5個檔案用於訓練，一個檔案用於測試驗證。我們在前文已對Cifar-10資料集有過詳細介紹：

基於libtorch的LeNet-5卷積神經網路實現（2）——Cifar-10資料分類

由於5個用於訓練的檔案總共包含了5*10000張三通道影象，如果要全部讀出來訓練，所佔用的記憶體非常大，可能會超出限制導致程式崩潰，所以我們想辦法每次只讀取batch size個影象樣本用於訓練，相當於每次只加載batch size張三通道影象到記憶體進行訓練，這比起全部載入來說，佔用的記憶體資源大大減少。

1. 首先，我們將5個用於訓練的檔案中包含的所有三通道影象都解析出來，並按照檔案1-->檔案-->檔案-->檔案-->檔案5的順序把解析出來的影象儲存為tif檔案，依次命名1.tif~49999.tif

。

對應1。tif~49999。tif這50000張三通道影象，有50000個0~9的標籤，我們也將這50000個標籤按照順序儲存到100行500列、名為label。tif的標籤影象中，方便我們隨機取batch的時候也可以取到對應的標籤。秉著影象對應標籤的原則，訓練影象的檔名（序號）與label。tif的座標點有如下對應關係，其中index為訓練影象的序號，col為label。tif的列數，也即500。

y = index / col

x = index % col

比如對於5000。tif這張影象，它對應label。tif的座標點為：

y = 5001 / 500 = 10

x = 5001 % 500 = 1

那麼label。tif中點（1， 10）的畫素值就是5000。tif的標籤。

從Cifar-10檔案中解析影象與標籤的程式碼實現如下：

#define CIFAT_10_OENFILE_DATANUM 10000#define CIFAT_10_FILENUM 5#define CIFAT_10_TOTAL_DATANUM （CIFAT_10_OENFILE_DATANUM*CIFAT_10_FILENUM）//bin_path為cifar-10檔案的路徑，注意檔名中的序號要替換成%d//比如：“D：/cifar-10/data_batch_%d。bin”void read_cifar_to_file（char *bin_path）{ const int img_num = CIFAT_10_OENFILE_DATANUM； const int img_size = 3073； //第一位元組是標籤 const int img_size_1 = 1024； const int data_size = img_num * img_size； const int row = 32； const int col = 32； uchar *labels = （uchar *）malloc（CIFAT_10_TOTAL_DATANUM）； uchar *cifar_data = （uchar *）malloc（data_size）； for （int i = 0； i < CIFAT_10_FILENUM； i++） //總共5個檔案 { char str［200］ = {0}； sprintf（str， bin_path， i+1）； FILE *fp = fopen（str， “rb”）； fread（cifar_data， 1， data_size， fp）； //讀取一個檔案的全部資料 //每個檔案總共有10000張三通道影象 for （int j = 0； j < CIFAT_10_OENFILE_DATANUM； j++） { long int offset = j * img_size； long int offset0 = offset + 1； //紅 long int offset1 = offset0 + img_size_1； //綠 long int offset2 = offset1 + img_size_1； //藍 long int idx = i * CIFAT_10_OENFILE_DATANUM + j； labels［idx］ = cifar_data［offset］； //將標籤按照0~49999的索引寫到申請的陣列中 Mat img（row， col， CV_8UC3）； for （int y = 0； y < row； y++） { for （int x = 0； x < col； x++） { int index = y * col + x； //解析為BGR影象 img。at（y， x） = Vec3b（cifar_data［offset2 + index］， cifar_data［offset1 + index］， cifar_data［offset0 + index］）； //BGR } } //按照0~49999的序號寫影象到tif檔案 char str1［200］ = {0}； sprintf（str1， “D：/Program Files （x86）/Microsoft Visual Studio/2017/Community/prj/libtorch_test1_gpu2/libtorch_test/cifar-10/img/%d。tif”， idx）； imwrite（str1， img）； } fclose（fp）； } Mat label_mat（100， 500， CV_8UC1， labels）； //將標籤儲存到100行500列的矩陣中，並儲存矩陣到tif檔案 imwrite（“D：/Program Files （x86）/Microsoft Visual Studio/2017/Community/prj/libtorch_test1_gpu2/libtorch_test/cifar-10/label/label。tif”， label_mat）； free（labels）； free（cifar_data）；}

執行上述程式之後，我們將得到0。tif~49999。tif這50000張三通道的影象，以及一張儲存標籤的label。tif影象。

—

batch樣本的獲取

經過上一步，我們有0。tif~49999。tif這50000張訓練影象，以及一張儲存標籤的label。tif影象。接下來我們按照將0~49999打亂之後的順序依次獲取batch樣本。

每個epoch開始之前，首先要打亂讀取順序：

vector train_image_shuffle_set； //儲存讀取樣本順序的陣列train_image_shuffle_set。clear（）； //清除陣列for （size_t i = 0； i < CIFAT_10_TOTAL_DATANUM； i++）{ train_image_shuffle_set。push_back（i）； //將陣列初始化為0~49999的值}std：：random_device rd；std：：mt19937_64 g（rd（））；//使用隨機數打亂0~49999這些值在陣列中的儲存順序std：：shuffle（train_image_shuffle_set。begin（）， train_image_shuffle_set。end（）， g）； //打亂順序

接著，我們按照打亂之後的順序依次獲取batch 0、batch 1、batch 2……假設batch size為32，獲取batch的過程如下圖所示：

然後根據batch中樣本的序號index（該樣本的檔名為index。tif）來從label。tif中獲取對應的標籤：

y = index / 500

x = index % 500

標籤=label（x， y）

程式碼如下：

//bin_path為tif檔案的路徑，注意檔名中的序號要替換成%d//比如：“cifar-10/img/%d。tif”//這裡的shuffle_idx為陣列train_image_shuffle_set中某一元素的地址：//假如batch_size=32，傳入train_image_shuffle_set，則取0~31地址中儲存的順序//如果傳入&train_image_shuffle_set［32］，則取32~63地址中儲存的順序，以此類推void read_cifar_batch（char *bin_path， Mat labels， size_t *shuffle_idx， int batch_size， vector &img_list， vector &label_list）{ img_list。clear（）； label_list。clear（）； for （int i = 0； i < batch_size； i++） { char str［200］ = {0}； sprintf（str， bin_path， shuffle_idx［i］）； Mat img = imread（str， CV_LOAD_IMAGE_COLOR）； //以BGR方式讀取tif檔案 //將BGR轉換RGB cvtColor（img， img， COLOR_BGR2RGB）； img。convertTo（img， CV_32F， 1。0/255。0）； img = （img - 0。5） / 0。5； //將影象資料值轉換為-1。0~1。0之間 img_list。push_back（img。clone（））； //將影象儲存到陣列中 //計算對應標籤label。tif檔案中的座標 int y = shuffle_idx［i］ / labels。cols； int x = shuffle_idx［i］ % labels。cols； label_list。push_back（（long long）labels。ptr（y）［x］）； //將標籤儲存到陣列中，注意標籤需要強制轉換為long long型資料 }}

—

訓練過程

經過上面的步驟，我們已經獲取到batch size張batch樣本，以及對應的batch size個0~9的標籤，那麼首先需要將batch樣本和標籤轉換為libtorch的Tensor張量，接著才能開始訓練：

//讀取batch樣本和標籤，分別儲存在img_list、label_list中read_cifar_batch（“cifar-10/img/%d。tif”， label_mat， &train_image_shuffle_set［k*batch_size］， batch_size， img_list， label_list）；//定義batch_size個樣本張量，也可以理解成batch_size*3*32*32的陣列auto inputs = torch：：ones（{ batch_size， 3， 32， 32 }）；//將讀取的每個batch樣本依次幅值給inputs張量的第0維for （int b = 0； b < batch_size； b++）{ inputs［b］ = torch：：from_blob（img_list［b］。data， { img_list［b］。channels（）， img_list［b］。rows， img_list［b］。cols }， torch：：kFloat）。clone（）；}//將數值型別的vector陣列直接轉換Tensor張量torch：：Tensor labels = torch：：tensor（label_list）；

完整的訓練程式碼如下：

void tran_alexnet_cifar_10_batch（void）{ vector train_img_total； vector train_label_total； AlexNet net1（1）； //定義Alexnet網路結構體 net1。to（device_type）； //將網路型別切換到GPU，以加速執行 //定義交叉熵函式 auto criterion = torch：：nn：：CrossEntropyLoss（）； //訓練300個spoch int kNumberOfEpochs = 300； //學習率 double alpha = 0。001； int batch_size = 32； //batch size vector img_list； vector label_list； //讀取儲存標籤的tif檔案 Mat label_mat = imread（“cifar-10/label/label。tif”， CV_LOAD_IMAGE_GRAYSCALE）； //定義梯度下降最佳化器，momentum模式 auto optimizer = torch：：optim：：SGD（net1。parameters（）， torch：：optim：：SGDOptions（alpha）。momentum（0。9））； for （int epoch = 0； epoch < kNumberOfEpochs； epoch++） { printf（“epoch：%d\n”， epoch + 1）； //batch訓練之前打亂讀取資料的順序 train_image_shuffle_set。clear（）； for （size_t i = 0； i < CIFAT_10_TOTAL_DATANUM； i++） { train_image_shuffle_set。push_back（i）； } std：：random_device rd； std：：mt19937_64 g（rd（））； std：：shuffle（train_image_shuffle_set。begin（）， train_image_shuffle_set。end（）， g）； //打亂順序 auto running_loss = 0。； //總共有CIFAT_10_TOTAL_DATANUM/batch_size個batch for （int k = 0； k < CIFAT_10_TOTAL_DATANUM/batch_size； k++） { //按照打亂之後的順序讀取batch樣本、標籤 read_cifar_batch（“cifar-10/img/%d。tif”， label_mat， &train_image_shuffle_set［k*batch_size］， batch_size， img_list， label_list）； auto inputs = torch：：ones（{ batch_size， 3， 32， 32 }）； for （int b = 0； b < batch_size； b++） { inputs［b］ = torch：：from_blob（img_list［b］。data， { img_list［b］。channels（）， img_list［b］。rows， img_list［b］。cols }， torch：：kFloat）。clone（）； } torch：：Tensor labels = torch：：tensor（label_list）； //將樣本、標籤張量由CPU型別切換到GPU型別，對應於GPU型別的網路 inputs = inputs。to（device_type）； labels = labels。to（device_type）； auto outputs = net1。forward（inputs）； //前向傳播 auto loss = criterion（outputs， labels）； //計算交叉熵誤差 optimizer。zero_grad（）； //清零梯度 loss。backward（）； //誤反向傳播 optimizer。step（）； //更新引數 running_loss += loss。item（）。toFloat（）； if （（k + 1） % 50 == 0） { printf（“loss： %f\n”， running_loss/50）； running_loss = 0。； } } } remove（“mnist_cifar_10_alexnet。pt”）； printf（“Finish training！\n”）； torch：：serialize：：OutputArchive archive； net1。save（archive）； archive。save_to（“mnist_cifar_10_alexnet。pt”）； //儲存訓練好的模型 printf（“Save the training result to mnist_cifar_10_alexnet。pt。\n”）；}

訓練過程中，損失函式值具有一定的波動現象，但整體來說還是下降趨勢：

—

模型驗證

經過上一步的訓練，我們得到了儲存訓練模型的檔案mnist_cifar_10_alexnet。pt，在驗證模型或實際使用模型時，只需要載入該檔案即可，不需要再重複訓練。我們使用Cifar-10資料集的

test_batch.bin

檔案來驗證訓練模型，該檔案同樣包含10000張三通道的影象，我們只需要根據前文講的檔案格式把這些影象與其對應的標籤解析出來即可，然後將影象輸入網路並執行前向傳播、獲取預測值，並於實際標籤比較是否一致，就可以知道預測是否準確了。

解析

test_batch.bin

檔案的程式碼如下：

//bin_path就是test_batch。bin檔案的完整路徑void read_cifar_bin_rgb（char *bin_path， vector &img_liat， vector &label_list）{ const int img_num = 10000； const int img_size = 3073； //第一位元組是標籤 const int img_size_1 = 1024； const int data_size = img_num * img_size； const int row = 32； const int col = 32； uchar *cifar_data = （uchar *）malloc（data_size）； if （cifar_data == NULL） { cout << “malloc failed” << endl； return； } FILE *fp = fopen（bin_path， “rb”）； if （fp == NULL） { cout << “fopen file failed” << endl； free（cifar_data）； return； } fread（cifar_data， 1， data_size， fp）； img_liat。clear（）； label_list。clear（）； for （int i = 0； i < img_num； i++） { long int offset = i * img_size； long int offset0 = offset + 1； //紅 long int offset1 = offset0 + img_size_1； //綠 long int offset2 = offset1 + img_size_1； //藍 uchar label = cifar_data［offset］； //標籤 Mat img（row， col， CV_8UC3）； for （int y = 0； y < row； y++） { for （int x = 0； x < col； x++） { int idx = y * col + x； img。at（y， x） = Vec3b（cifar_data［offset2 + idx］， cifar_data［offset1 + idx］， cifar_data［offset0 + idx］）； //BGR } } cvtColor（img， img， COLOR_BGR2RGB）； img。convertTo（img， CV_32F， 1。0/255。0）； //0。0~1。0 img = （img - 0。5） / 0。5； //-1~1 img_liat。push_back（img。clone（））； //float label_list。push_back（label）； //uchar } fclose（fp）； free（cifar_data）；}

預測驗證的程式碼如下：

void test_alexnet_cifar_10（void）{ AlexNet net1（1）； torch：：serialize：：InputArchive archive； archive。load_from（“mnist_cifar_10_alexnet。pt”）； //從檔案載入訓練模型 net1。load（archive）； //將訓練模型載入到網路 net1。to（device_type）； //將網路型別切換到GPU，以加速執行 vector test_img； vector test_label； read_cifar_bin_rgb（“D：/Program Files （x86）/Microsoft Visual Studio 14。0/prj/KNN_test/KNN_test/cifar-10-batches-bin/test_batch。bin”， test_img， test_label）； int total_test_items = 0， passed_test_items = 0； double total_time = 0。0； for （int i = 0； i < test_img。size（）； i++） { //將樣本、標籤轉換為Tensor張量 torch：：Tensor inputs = torch：：from_blob（test_img［i］。data， { 1， test_img［i］。channels（）， test_img［i］。rows， test_img［i］。cols }， torch：：kFloat）； //1*1*32*32 torch：：Tensor labels = torch：：tensor（{ （long long）test_label［i］ }）； //將樣本、標籤張量由CPU型別切換到GPU型別，對應於GPU型別的網路 inputs = inputs。to（device_type）； labels = labels。to（device_type）； // 用訓練好的網路處理測試資料，也即前向傳播 auto outputs = net1。forward（inputs）； // 得到預測值，0 ~ 9 auto predicted = （torch：：max）（outputs， 1）； // 比較預測結果和實際結果，並更新統計結果 if （labels［0］。item（） == std：：get<1>（predicted）。item（）） passed_test_items++； total_test_items++； printf（“label： %d。\n”， labels［0］。item（））； printf（“predicted label： %d。\n”， std：：get<1>（predicted）。item（））； } printf（“total_test_items=%d， passed_test_items=%d， pass rate=%f\n”， total_test_items， passed_test_items， passed_test_items*1。0 / total_test_items）；}

執行上述程式碼，使用訓練好的模型對10000張影象進行分類，準確率僅達到了56。59%，這個準確率更簡單的Lenet-5網路都可以達到，所以這個搭建的Alexnet有點失敗啊~不過沒關係，接下來我們嘗試調整網路和引數，試試增加一些額外的層，比如batch norm層、LRN層，或者試試資料去均值歸一化等措施，再看看準確率有沒有提升。加油！fighting！