python識別手掌並計算手指個數演算法解析

我們今天來做一個有趣的實驗，如何用python實時識別手指的個數，並講解一下識別過程中的演算法，效果如下：

如何從混亂的背景中分割前景影象是一個難題。最明顯的原因是由於人看影象和計算機看同一影象時存在的差距。人們可以很容易地弄清楚影象中的內容，但是對於計算機而言，影象只是3維矩陣。因此，計算機視覺問題仍然是一個挑戰。看下面的圖片。

上面這個影象，如果是人看了會在影象中找到不同的區域並標記其相應的標籤，如：“天空”，“人”，“樹”和“草”。那計算機如何才能識別呢，我們首先需要單獨取出手部區域，以去除影片序列中所有不需要的部分。在分割手部區域之後，我們然後對影片序列中顯示的手指進行計數，所以我們分兩步走。

第一步：從影片序列中找到並分割手部區域。

第二步：從影片序列中分割的手區域計算手指的數量。

▊ 第一步、分割提取手部區域

手勢識別的第一步顯然是透過消除影片序列中所有其他不需要的部分來找到手部區域。起初這似乎令人恐懼。但是不用擔心。使用Python和OpenCV會容易得多！

注意：影片序列只是相對於時間執行的幀集合或影象集合。

在深入探討細節之前，讓我們瞭解如何確定手部區域。

▶

背景扣除

首先，我們需要一種有效的方法來將前景與背景分開。為此，我們使用移動平均值的概念。我們使我們的系統可以檢視特定場景的30幀。在此期間，我們計算當前幀和先前幀的執行平均值。透過這樣做，我們實質上告訴我們的系統-

好吧，機器人！您凝視的影片序列（這30幀的執行平均值）是背景。

在弄清背景之後，我們舉起手來，使系統瞭解我們的手是進入背景的新條目，這意味著它成為前景物件。但是，我們將如何單獨看待這一前景呢？答案是背景減法。

看下面的圖片，它描述了背景減法的工作原理。

在使用移動平均值確定背景模型之後，我們使用當前框架以及背景，該框架還包含前景物件（在本例中為hand）。我們計算背景模型（隨時間更新）和當前幀（有我們的手）之間的絕對差，以獲得包含新新增的前景物件（這就是我們的手）的差異影象。這就是背景減法的全部含義。

▶

運動檢測和閾值

為了從該差異影象中檢測出手部區域，我們需要對差異影象進行閾值處理，以使只有我們的手部區域可見，而所有其他不需要的區域都被塗成黑色。這就是運動檢測的全部意義。

注意：閾值是基於特定閾值級別將畫素強度分配為0和1，以便僅從影象中捕獲我們感興趣的物件。

▶輪廓提取

對差異影象進行閾值處理後，我們在結果影象中找到輪廓。假定面積最大的輪廓是我們的手。

注意：輪廓線是位於影象中的物件的輪廓或邊界。

具體程式碼

# organize importsimport cv2import imutilsimport numpy as np# global variablesbg = None

首先，我們匯入所有必須使用的軟體包並初始化背景模型。如果您的計算機上沒有安裝這些軟體包。

#——————————————————————————# To find the running average over the background#——————————————————————————def run_avg（image， aWeight）： global bg # initialize the background if bg is None： bg = image。copy（）。astype（“float”） return # compute weighted average， accumulate it and update the background cv2。accumulateWeighted（image， bg， aWeight）

接下來，我們使用函式來計算背景模型和當前幀之間的移動平均值。此函式接受兩個引數- 當前幀和aWeight，就像在影象上執行均線的閾值一樣。如果背景模型為“ 無”（即，如果它是第一幀），則使用當前幀對其進行初始化。然後，使用cv2。accumulateWeighted（）函式計算背景模型和當前幀的移動平均值。使用下面給出的公式計算移動平均值。

dst（x，y）=（1−a）。dst（x，y）+a。src（x，y）

s r c （x ，y） -源影象或輸入影象（1或3通道，8位或32位浮點）

dst（x ，ÿ） -目標影象或輸出影象（與源影象相同的通道，32位或64位浮點）

a-源影象（輸入影象）的權重

#——————————————————————-# To segment the region of hand in the image#——————————————————————-def segment（image， threshold=25）： global bg # find the absolute difference between background and current frame diff = cv2。absdiff（bg。astype（“uint8”）， image） # threshold the diff image so that we get the foreground thresholded = cv2。threshold（diff， threshold， 255， cv2。THRESH_BINARY）［1］ # get the contours in the thresholded image （_， cnts， _） = cv2。findContours（thresholded。copy（）， cv2。RETR_EXTERNAL， cv2。CHAIN_APPROX_SIMPLE） # return None， if no contours detected if len（cnts） == 0： return else： # based on contour area， get the maximum contour which is the hand segmented = max（cnts， key=cv2。contourArea） return （thresholded， segmented）

我們的下一個功能用於從影片序列中分割手部區域。這個函式有兩個引數- 當前幀和閾值用於閾值化的差分影象。

首先，我們使用cv2。absdiff（）函式找到背景模型與當前幀之間的絕對差異。

接下來，我們對差異影象進行閾值處理以僅顯示手部區域。最後，我們對閾值影象進行輪廓提取，並獲取面積最大的輪廓（這就是我們的手）。

我們將閾值影象以及分割後的影象作為元組返回。閾值背後的數學非常簡單。如果x （n ）表示輸入影象在特定畫素座標下的畫素強度，然後閥值決定我們將影象分割/閾值化為二進位制影象的程度。公式如下：

#————————-# MAIN FUNCTION#————————-if __name__ == “__main__”： # initialize weight for running average aWeight = 0。5 # get the reference to the webcam camera = cv2。VideoCapture（0） # region of interest （ROI） coordinates top， right， bottom， left = 10， 350， 225， 590 # initialize num of frames num_frames = 0 # keep looping， until interrupted while（True）： # get the current frame （grabbed， frame） = camera。read（） # resize the frame frame = imutils。resize（frame， width=700） # flip the frame so that it is not the mirror view frame = cv2。flip（frame， 1） # clone the frame clone = frame。copy（） # get the height and width of the frame （height， width） = frame。shape［：2］ # get the ROI roi = frame［top：bottom， right：left］ # convert the roi to grayscale and blur it gray = cv2。cvtColor（roi， cv2。COLOR_BGR2GRAY） gray = cv2。GaussianBlur（gray，（7， 7）， 0） # to get the background， keep looking till a threshold is reached # so that our running average model gets calibrated if num_frames < 30： run_avg（gray， aWeight） else： # segment the hand region hand = segment（gray） # check whether hand region is segmented if hand is not None： # if yes， unpack the thresholded image and # segmented region （thresholded， segmented） = hand # draw the segmented region and display the frame cv2。drawContours（clone，［segmented + （right， top）］， -1，（0， 0， 255）） cv2。imshow（“Thesholded”， thresholded） # draw the segmented hand cv2。rectangle（clone，（left， top），（right， bottom），（0，255，0）， 2） # increment the number of frames num_frames += 1 # display the frame with segmented hand cv2。imshow（“Video Feed”， clone） # observe the keypress by the user keypress = cv2。waitKey（1） & 0xFF # if the user pressed “q”， then stop looping if keypress == ord（“q”）： break# free up memorycamera。release（）cv2。destroyAllWindows（）

上面的程式碼示例是我們程式的主要功能。我們將aWeight初始化為0。5。如移動平均值方程式中更早顯示的那樣，此閾值意味著如果為該變數設定一個較低的值，則將在大量的先前幀上執行移動平均值，反之亦然。我們使用cv2。VideoCapture（0）引用了我們的網路攝像頭，這意味著我們在計算機中獲取了預設的網路攝像頭例項。

代替從整個影片序列中識別手勢，我們將嘗試最小化系統必須在其中尋找手部區域的識別區域（或區域）。為了突出顯示該區域，我們使用cv2。rectangle（）函式，該函式需要頂部，右側，底部和左側畫素座標。

為了跟蹤幀數，我們初始化一個變數num_frames。然後，我們開始無限迴圈，並使用camera。read（）函式從網路攝像頭讀取幀。然後，我們使用imutils庫將輸入幀的大小調整為700畫素的固定寬度，以保持寬高比，並翻轉該幀以避免產生映象。

接下來，我們使用簡單的NumPy切片僅取出感興趣的區域（即識別區域）。然後，我們將該ROI轉換為灰度影象，並使用高斯模糊來最小化影象中的高頻分量。直到超過30幀，我們繼續將輸入幀新增到run_avg函式並更新背景模型。請注意，在此步驟中，必須使相機保持不動。否則，整個演算法將失敗。

更新背景模型後，將當前輸入幀傳遞到分割函式中，並返回閾值影象和分割影象。所分割的輪廓繪製在使用幀cv2。drawContours（），並使用被示出閾值化的輸出cv2。imshow（）。

最後，我們在當前幀中顯示分段的手部區域，並等待按鍵退出程式。注意，我們在這裡將bg變數維護為全域性變數。這很重要，必須加以照顧。

▶執行程式碼

複製上面給出的所有程式碼，並將其放在一個名為segment。py的檔案中。

然後，開啟終端或命令提示符，然後鍵入python segment。py。

注意：切記透過保持相機靜止不動來更新背景模型。5-6秒後，將您的手放在識別區域中以僅露出您的手區域。在下面，您可以看到我們的系統如何從實時影片序列中有效分割手部區域。

▊第二步、識別手指個數

從實時影片序列中分割出手部區域後，我們將使我們的系統對透過攝像頭/網路攝像頭顯示的手指進行計數。我們不能使用任何模板（由OpenCV提供）來執行此操作，因為這確實是一個具有挑戰性的問題。

我們透過將分割的手區域假定為框架中的最大輪廓（即具有最大面積的輪廓）來獲得該區域。如果您在此框架中引入了一個比您的手大的大物體，則此演算法將失敗。因此，您必須確保您的手佔據框架中大部分割槽域。

我們將使用在可變手形中獲得的分段手形區域。請記住，此手形變數是具有閾值（閾值影象）和分段（分段的手區域）的元組。我們將利用這兩個變數來計算所顯示的手指。我們該怎麼做？

可以使用多種方法來數手指，但是在本教程中我們將看到一種這樣的方法。下圖顯示了計數手指的方法。

從上圖可以看到，給定分段的手區域，共有四個中間步驟來計數手指。在執行了特定步驟之後，所有這些步驟都顯示了一個對應的輸出影象（如左圖所示）。

▶步驟一：找到分割的手部區域的凸包（輪廓），並計算凸包中的最極端點（極端頂部，極端底部，極端左側，極端右側）。

▶步驟二：使用凸包中的這些極值點找到手掌的中心。

▶步驟三：使用手掌的中心，以最大歐幾里德距離（手掌的中心和極點之間）為半徑構造一個圓。

▶步驟四：在帶閾值的手形影象（幀）和圓形ROI（蒙版）之間執行按位與運算。這顯示了手指切片，該切片可以進一步用於計算所示手指的數量。

在下面，我們用程式碼實現以上幾個步驟。

輸入- 閾值（閾值影象）和分段（分段的手部區域或輪廓）

輸出- 計數（手指數）。

#————————————————————————————————# To count the number of fingers in the segmented hand region#————————————————————————————————def count（thresholded， segmented）： # find the convex hull of the segmented hand region chull = cv2。convexHull（segmented） # find the most extreme points in the convex hull extreme_top = tuple（chull［chull［：，：， 1］。argmin（）］［0］） extreme_bottom = tuple（chull［chull［：，：， 1］。argmax（）］［0］） extreme_left = tuple（chull［chull［：，：， 0］。argmin（）］［0］） extreme_right = tuple（chull［chull［：，：， 0］。argmax（）］［0］） # find the center of the palm cX = int（（extreme_left［0］ + extreme_right［0］） / 2） cY = int（（extreme_top［1］ + extreme_bottom［1］） / 2） # find the maximum euclidean distance between the center of the palm # and the most extreme points of the convex hull distance = pairwise。euclidean_distances（［（cX， cY）］， Y=［extreme_left， extreme_right， extreme_top， extreme_bottom］）［0］ maximum_distance = distance［distance。argmax（）］ # calculate the radius of the circle with 80% of the max euclidean distance obtained radius = int（0。8 * maximum_distance） # find the circumference of the circle circumference = （2 * np。pi * radius） # take out the circular region of interest which has # the palm and the fingers circular_roi = np。zeros（thresholded。shape［：2］， dtype=“uint8”） # draw the circular ROI cv2。circle（circular_roi，（cX， cY）， radius， 255， 1） # take bit-wise AND between thresholded hand using the circular ROI as the mask # which gives the cuts obtained using mask on the thresholded hand image circular_roi = cv2。bitwise_and（thresholded， thresholded， mask=circular_roi） # compute the contours in the circular ROI （_， cnts， _） = cv2。findContours（circular_roi。copy（）， cv2。RETR_EXTERNAL， cv2。CHAIN_APPROX_NONE） # initalize the finger count count = 0 # loop through the contours found for c in cnts： # compute the bounding box of the contour （x， y， w， h） = cv2。boundingRect（c） # increment the count of fingers only if - # 1。 The contour region is not the wrist （bottom area） # 2。 The number of points along the contour does not exceed # 25% of the circumference of the circular ROI if （（cY + （cY * 0。25）） > （y + h）） and （（circumference * 0。25） > c。shape［0］）： count += 1 return count

每個中間步驟都需要對影象處理基礎知識有一些瞭解，例如輪廓，按位與，歐氏距離和凸包。

▶Contours等高線

感興趣物件的輪廓或邊界。使用OpenCV的cv2。findContours（）函式可以輕鬆找到該輪廓。在解壓縮此函式的返回值時要小心，因為在OpenCV 3。1。0- Contours中，我們需要三個變數來解壓縮此元組。

▶Bitwise-AND 按位與

在兩個物件之間執行按位邏輯與。您可以從視覺上將其想象為使用遮罩並提取影象中僅位於此遮罩下的區域。OpenCV提供cv2。bitwise_and（）函式來執行此操作- 按位與。

▶Euclidean Distance歐氏距離

這是此處所示方程式給出的兩點之間的距離。Scikit-learn提供了一個名為pairwise。euclidean_distances （）的函式，用於計算單行程式碼“ 成對歐幾里得距離”中從一個點到多個點的歐幾里得距離。在那之後，我們採取了最大的使用與NumPy的所有這些距離argmax（）函式。

▶Convex Hull凸包

您可以將凸包視為動態的，可拉伸的信封，將目標物件包裹起來。

▶最後的結果

您可以在此處將整個程式碼下載到perfom手勢識別中。使用克隆該儲存庫

命令貝殼

git clone https：//github。com/Gogul09/gesture-recognition。git

在終端/命令提示符下。然後，進入資料夾並輸入

python recognize。py

注意：在30幀的校準期間，請勿搖動網路攝像頭。如果在前30幀中晃動，則整個演算法將無法達到我們的預期。

之後，您可以將手伸入邊界框，顯示手勢，並相應地顯示手指數。我在下面提供了整個管道的演示。

好了，有什麼問題歡迎評論，關注我，每天分享一篇技術好文。