如何從pptx中拿到文字？手把手教你批次輸出文字

生成示例檔案

編寫核心程式碼

擴充套件功能，強化工具

學習python-pptx也有幾天了，實在忍不住想要做個例項了，於是今天做了一個用利python-pptx來從pptx中提取所有文字的功能（不包含表格中的文字），具體有以下功能（軟體的設計過程已錄製好影片併發布在平臺，大家可以關注我後在主頁看到，也可以直接點選上面的連結看到）：

1、從指定的資料夾中，對所有pptx（注意不是ppt，因為兩者文件格式不同）進行文字提取。

2、提取出來的文字，以pptx的檔名加txt作為字尾進行儲存，儲存位置為程式中設定的targetPath，如果該目錄不存在的話，則會先建立一個。

# coding=gbkimport osimport refrom pptx import Presentationclass ExtractPPTXText（）： def __init__（self，params）： self。errFlag = False self。msg = “” self。sourcePath = params［“sourcePath”］ if not os。path。exists（self。sourcePath）： self。errFlag = True self。msg = “原始檔夾不存在！” self。targetPath = params［“targetPath”］ if not os。path。exists（self。targetPath）： os。makedirs（self。targetPath） self。run（） def run（self）： if self。errFlag： print（self。msg） return for file in os。listdir（self。sourcePath）： if not file［-4：］ == “pptx”： continue if re。findall（“^~”，file）： continue # 將檔案傳入getText，並返回文字列表 para_list = self。getText（os。path。join（self。sourcePath，file）） # 提取檔名與副檔名 fileName，expandName = os。path。splitext（file） # 組成新的檔名 newFileName = os。path。join（self。targetPath，fileName + “。txt”） # 將文字列表儲存到txt檔案 self。saveText（para_list，newFileName） # 從pptx中返回文字列表 def getText（self，file）： para_list = ［］ prs = Presentation（“test。pptx”） for slide in prs。slides： for shape in slide。shapes： for para in shape。text_frame。paragraphs： if len（para。text。strip（）） == 0： continue else： para_list。append（para。text。strip（）） return para_list # 儲存文字 def saveText（self，para_list，file）： with open（file，“w”，encoding=“gbk”） as txt_file： txt_file。write（“\n”。join（para_list）） print（“{} 儲存完成！”。format（file）） def __str__（self）： return self。msgif __name__ == ‘__main__’： params = { “sourcePath”：r“K：\伍德春原創影片\自動化\2020-11-10”， # 要進行提取的pptx的所在目錄 “targetPath”：r“K：\伍德春原創影片\自動化\2020-11-10\txt”， # 提取後的txt檔案要儲存到的目錄 } newobj = ExtractPPTXText（params）

頭條號中可以插入程式碼塊這功能很好，可以較好的展現程式碼，點贊。

別眨眼網

如何從pptx中拿到文字？手把手教你批次輸出文字

相關推薦