Files
openscreen/docs/engineering/paddleocr-local-service.md
2026-05-28 12:25:23 +07:00

2.3 KiB

PaddleOCR Local Service

OpenScreen calls OCR through a local HTTP service. The default endpoint is:

http://127.0.0.1:8866/ocr

The app sends either imageBase64 or path, plus optional language and profile, and expects OCR blocks:

{
  "blocks": [
    {
      "text": "Settings",
      "confidence": 0.97,
      "box": { "x": 120, "y": 80, "width": 90, "height": 24 }
    }
  ]
}

Install

Use a separate virtual environment because PaddleOCR and PaddlePaddle are large dependencies.

python -m venv .venv-ocr
.\.venv-ocr\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r tools\ocr\requirements.txt

If paddle is still missing after installing paddleocr, install the CPU PaddlePaddle wheel that matches your Python and OS from the official PaddlePaddle install guide.

Run

.\.venv-ocr\Scripts\Activate.ps1
$env:PADDLEOCR_DEVICE="cpu"
$env:OPENSCREEN_OCR_PROFILE="vietnamese"
npm run ocr:paddle

Keep this terminal open while using the Guide OCR step in OpenScreen.

Verify

Invoke-WebRequest http://127.0.0.1:8866/health -UseBasicParsing

Expected healthy environment:

{
  "ok": true,
  "paddleocrInstalled": true,
  "paddleInstalled": true,
  "engineReady": false,
  "defaultLanguage": "vi,en",
  "defaultProfile": "vietnamese"
}

engineReady becomes true after the first OCR request. The first request can be slow because PaddleOCR downloads and loads models.

Configuration

  • PADDLEOCR_DEVICE: cpu, gpu:0, or another PaddleOCR device string.
  • OPENSCREEN_OCR_PROFILE: fast, vietnamese, or hybrid. The default vietnamese profile upscales and sharpens focused UI screenshots before OCR.
  • OPENSCREEN_GUIDE_OCR_LANGUAGE: defaults to vi,en.
  • PADDLEOCR_LANG: optional hard override. Leave unset for the app profile/language settings to work.
  • PADDLEOCR_VERSION: defaults to PP-OCRv5.
  • PADDLEOCR_USE_MOBILE: defaults to 1; set to 0 to use the default/server models.
  • PADDLEOCR_REC_MODEL: optional recognizer model override. The bundled profile uses latin_PP-OCRv5_mobile_rec, which supports Vietnamese Latin-script text.
  • OPENSCREEN_GUIDE_OCR_URL: OpenScreen OCR endpoint override; defaults to http://127.0.0.1:8866.