大家好,我是 Ai 學習的老章
最近的 OCR 大模型我都做了本地部署和測試,還寫了一個 API 統一對接這三個模型
?
?
??
很多同學問選哪個?
成年人怎么還在做選擇呢,必須全都要啊
我用 FastAPI 框架擼了一個簡單的 OCR 模型對比工具,可以實現同樣的提示詞 + 圖片/PDF,利用 Python 多線程并行調用 DeepSeek、Paddle 和 混元這三個模型的 API 進行解析,并將結果并排展示。
前端其實純 HTML+CSS+js 實現,為了內網部署,不依賴任何 CND。
使用也很簡單,圖片/PDF 上傳之后,輸入提示詞,沒有特殊要求,使用默認就行。
點擊 Run OCR Comparison 即可
三者都很快,內置了輕量級 Markdown 解析其,自動渲染結果。
也可以切換到識別后的原始 Markdown,支持一鍵 copy
核心代碼如下(完整代碼接近 600 行,大多是 HTML 相關):
我這里主要是模型本地部署,內網運行的,沒再折騰線上部署。感興趣的同學可以試試,OCR 模型 API 部分替換成官方/第三方的 API,代碼稍作修改就可以在線部署運行了。
#!/usr/bin/env python3
"""
OCR Comparison Web App - 美化版,不依賴外部 CDN
"""
import os
import re
import shutil
import tempfile
import requests
from concurrent.futures import ThreadPoolExecutor
import uvicorn
from fastapi import FastAPI, File, Form, UploadFile
from fastapi.responses import HTMLResponse
app = FastAPI(title="OCR Comparison")
# --- Configuration ---
MODELS = {
"DeepSeek-OCR": "http://localhost:8002/models/v1//deepseek-ocr/inference",
"PaddleOCR": "http://localhost:8003/models/v1/PaddleOCR/inference",
"HunyuanOCR": "http://localhost:8004/models/v1/HunyuanOCR/inference",
}
def call_api(model_name, api_url, file_path, prompt):
"""調用單個 OCR API"""
print(f"[INFO] Calling {model_name}: {api_url}")
try:
with open(file_path, 'rb') as f:
resp = requests.post(
api_url,
files={'file': (os.path.basename(file_path), f)},
data={'prompt': prompt},
timeout=300
)
print(f"[INFO] {model_name} status: {resp.status_code}")
if resp.status_code == 200:
data = resp.json()
result = data.get("result", str(data))
print(f"[INFO] {model_name} result length: {len(result)}")
return result
returnf"HTTP Error: {resp.status_code}"
except Exception as e:
print(f"[ERROR] {model_name}: {e}")
returnf"Error: {e}"
HTML_PAGE = """
省略
"""
@app.get("/", response_class=HTMLResponse)
asyncdef index():
return HTML_PAGE
@app.post("/api/compare")
asyncdef compare(
file: UploadFile = File(...),
prompt: str = Form("Convert the document to markdown.")
):
print(f"\n{'='*60}")
print(f"[INFO] Received request: {file.filename}")
print(f"[INFO] Prompt: {prompt[:50]}...")
print(f"{'='*60}")
temp_dir = tempfile.mkdtemp()
temp_path = os.path.join(temp_dir, file.filename)
try:
with open(temp_path, "wb") as f:
content = await file.read()
f.write(content)
print(f"[INFO] Saved to: {temp_path}, size: {len(content)} bytes")
# 并行調用三個 API
results = {}
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {
"deepseek": executor.submit(call_api, "DeepSeek-OCR", MODELS["DeepSeek-OCR"], temp_path, prompt),
"paddle": executor.submit(call_api, "PaddleOCR", MODELS["PaddleOCR"], temp_path, prompt),
"hunyuan": executor.submit(call_api, "HunyuanOCR", MODELS["HunyuanOCR"], temp_path, prompt),
}
for name, future in futures.items():
try:
result = future.result(timeout=310)
results[name] = result
print(f"[INFO] {name} done, length: {len(result)}")
except Exception as e:
results[name] = f"Error: {e}"
print(f"[ERROR] {name}: {e}")
print(f"[INFO] All done. Returning results.")
print(f"[DEBUG] Results keys: {list(results.keys())}")
return results
finally:
shutil.rmtree(temp_dir, ignore_errors=True)
if __name__ == "__main__":
print("\n" + "="*60)
print("OCR Comparison Server")
print("URL: http://0.0.0.0:8080")
print("="*60 + "\n")
uvicorn.run(app, host="0.0.0.0", port=8080)
特別聲明:以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布,本平臺僅提供信息存儲服務。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.