Add auto guide generation with bundled OCR

This commit is contained in:
huanld
2026-05-28 07:07:30 +07:00
parent 8117d4826f
commit 07e006dc7f
61 changed files with 8734 additions and 193 deletions
+11
View File
@@ -20,6 +20,10 @@ dist-ssr
/electron/native/screencapturekit/.build/
/electron/native/screencapturekit/.swiftpm/
/electron/native/bin/
/tools/ocr/build/
/tools/ocr/dist/
/tools/ocr/models/**/.gitattributes
/tools/ocr/models/**/README.md
# Native macOS generated files
DerivedData/
@@ -40,6 +44,8 @@ xcuserdata/
release/**
*.kiro/
.claude/
__pycache__/
*.py[cod]
# npx electron-builder --mac --win
# Playwright
@@ -63,3 +69,8 @@ result-*
#others
**/*.import
# Local agent/tooling state
/.agent/
/.serena/
/.venv-ocr-build/
+935
View File
@@ -0,0 +1,935 @@
# Quy trình triển khai Auto User Guide Generation
Mục tiêu của tính năng này là biến OpenScreen từ công cụ quay màn hình thành công cụ tự tạo tài liệu hướng dẫn sử dụng phần mềm. Người dùng bật Guide Mode, quay thao tác như bình thường, hệ thống ghi lại thời điểm click hoặc hotkey, trích ảnh từ video sau khi quay xong, chạy OCR local để đọc chữ trên giao diện, sau đó dùng AI tạo bản nháp hướng dẫn từng bước.
Tài liệu này được viết để có thể bắt đầu coding ngay: có kiến trúc, schema, file cần thêm/sửa, thứ tự task, tiêu chí test và định nghĩa MVP.
## Trạng Thái MVP Hiện Tại
- Đã có Guide Mode trong HUD, ghi click/marker vào `.guide.json`.
- Đã có GuidePanel trong editor để chạy: prepare events, capture snapshots, OCR, generate draft, export Markdown/HTML.
- Đã có local deterministic draft để test không cần DeepSeek key.
- DeepSeek được gọi khi chọn provider `DeepSeek` và có `DEEPSEEK_API_KEY`.
- OCR local mặc định gọi `OPENSCREEN_GUIDE_OCR_URL` hoặc `http://127.0.0.1:8866/ocr`.
- Verification hiện tại: targeted guide tests pass, `npm test` pass, `npm run build-vite` pass, `npm run i18n:check` pass.
## Mục Tiêu Sản Phẩm
Flow người dùng:
1. Bật Guide Mode.
2. Quay màn hình phần mềm cần hướng dẫn.
3. Trong lúc quay, hệ thống tự ghi timestamp các click chuột.
4. Người dùng có thể bấm một hotkey/nút marker nếu muốn đánh dấu bước thủ công.
5. Sau khi dừng quay, hệ thống trích ảnh màn hình từ video tại các timestamp đó.
6. OCR local đọc text trên ảnh giao diện.
7. Hệ thống map vị trí click tới text/control gần nhất.
8. AI Agent tạo tài liệu dạng từng bước.
9. Người dùng review, sửa nội dung, export Markdown/HTML.
Ví dụ output:
```md
# Hướng dẫn xuất báo cáo
## Bước 1: Mở phần cài đặt
Nhấn nút **Settings** ở thanh điều hướng bên trái.
## Bước 2: Chọn Export
Trong màn hình Settings, chọn **Export report**.
```
## Phạm Vi MVP
MVP cần làm:
- Bật/tắt Guide Mode trước khi quay.
- Tận dụng recorder hiện tại, không viết recorder mới.
- Tận dụng `.cursor.json` hiện tại để lấy click timestamp.
- Thêm marker bằng hotkey hoặc nút trên HUD.
- Tạo sidecar `.guide.json` riêng cho guide.
- Trích screenshot sau khi quay xong, từ video đã lưu.
- OCR local bằng PaddleOCR service.
- Tạo step candidate từ click position + OCR blocks.
- Gọi DeepSeek bằng text metadata, không gửi ảnh mặc định.
- Có panel review trong editor.
- Export Markdown và HTML.
Không làm trong MVP:
- Không chụp screenshot realtime trong lúc quay nếu chưa có benchmark cần thiết.
- Không gửi raw screenshot lên cloud AI mặc định.
- Không sửa schema `.cursor.json` nếu không bắt buộc.
- Không build full UI automation engine.
- Không làm PDF/DOCX ngay.
- Không bundle OCR runtime vào app packaged ngay.
## Code Hiện Có Cần Tận Dụng
Các điểm đã có trong codebase:
- Recording orchestration: `src/hooks/useScreenRecorder.ts`
- Launch/HUD UI: `src/components/launch/LaunchWindow.tsx`
- Source selection: `src/components/launch/SourceSelector.tsx`
- Editor chính: `src/components/video-editor/VideoEditor.tsx`
- Project/session persistence: `src/components/video-editor/projectPersistence.ts`
- Cursor contracts: `src/native/contracts.ts`
- Hook đọc cursor data: `src/native/hooks/useCursorRecordingData.ts`
- IPC main handlers: `electron/ipc/handlers.ts`
- Native bridge: `electron/ipc/nativeBridge.ts`
- Cursor service: `electron/native-bridge/services/cursorService.ts`
- Windows cursor recording: `electron/native-bridge/cursor/recording/windowsNativeRecordingSession.ts`
- macOS cursor recording: `electron/native-bridge/cursor/recording/macNativeCursorRecordingSession.ts`
- Frame/export primitives: `src/lib/exporter/frameRenderer.ts`
Nhận định kỹ thuật:
- Windows/macOS native cursor recording đã có dữ liệu click.
- Cursor sample hiện có thể có `interactionType: "click" | "mouseup" | "move"`.
- Editor hiện đã dùng click timestamp để render hiệu ứng click.
- Vì schema cursor đang được nhiều nơi dùng, MVP nên tạo `.guide.json` riêng thay vì mở rộng `.cursor.json`.
## Kiến Trúc Tổng Thể
```mermaid
flowchart TD
A["User bật Guide Mode"] --> B["Quay video bằng recorder hiện tại"]
B --> C["Cursor recorder ghi click timestamp"]
B --> D["Hotkey/HUD marker ghi manual event"]
C --> E["Dừng quay"]
D --> E
E --> F["Guide assembler tạo .guide.json"]
F --> G["Snapshot extractor seek video và xuất PNG"]
G --> H["PaddleOCR local đọc text + bounding boxes"]
H --> I["Target mapper map click tới OCR text/control"]
I --> J["DeepSeek/local LLM viết draft guide"]
J --> K["GuidePanel cho user review/sửa"]
K --> L["Export Markdown/HTML"]
```
Quyết định chính:
- Realtime recording chỉ ghi event/timestamp, không xử lý OCR/AI.
- Screenshot được trích từ video sau khi quay, tránh ảnh hưởng performance recorder.
- OCR chạy local-first.
- DeepSeek chỉ nhận text metadata trừ khi user opt-in gửi ảnh.
- Guide data nằm cạnh recording artifact.
## File Cần Thêm
```text
src/guide/
contracts.ts
eventBuilder.ts
targetMapper.ts
promptBuilder.ts
generatedGuideSchema.ts
snapshot/
extractGuideSnapshots.ts
export/
markdownExporter.ts
htmlExporter.ts
__tests__/
eventBuilder.test.ts
targetMapper.test.ts
promptBuilder.test.ts
markdownExporter.test.ts
src/components/video-editor/guide/
GuidePanel.tsx
GuideStepList.tsx
GuideStepEditor.tsx
GuideSnapshotPreview.tsx
electron/guide/
guideStore.ts
guidePaths.ts
guideIpc.ts
ocr/
paddleOcrClient.ts
ai/
deepseekGuideClient.ts
```
File hiện có khả năng phải sửa:
- `src/hooks/useScreenRecorder.ts`
- `src/components/launch/LaunchWindow.tsx`
- `src/components/video-editor/VideoEditor.tsx`
- `electron/ipc/handlers.ts`
- `electron/preload.ts`
- file khai báo type cho `window.electronAPI`
- `package.json` nếu thêm script test hoặc dependency nhỏ
## Artifact Đầu Ra
Với video `recording-123.mp4`, hệ thống tạo:
```text
recording-123.mp4
recording-123.cursor.json
recording-123.guide.json
recording-123-guide/
step-001.png
step-002.png
ocr.json
guide.md
guide.html
```
Quy tắc:
- `.cursor.json` vẫn là dữ liệu cursor gốc.
- `.guide.json` là source of truth cho guide workflow.
- Folder `recording-123-guide/` chứa file phát sinh từ guide.
- `guide.md``guide.html` có thể được tạo lại từ `.guide.json`.
## Contract Chính
Tạo `src/guide/contracts.ts`.
```ts
export type GuideEventKind = "click" | "hotkey" | "manual";
export type GuideEventSource =
| "cursor-recording"
| "guide-hotkey"
| "review-ui";
export interface GuideEvent {
id: string;
recordingId: string;
kind: GuideEventKind;
source: GuideEventSource;
timeMs: number;
x?: number;
y?: number;
normalizedX?: number;
normalizedY?: number;
button?: "left" | "right" | "middle" | "unknown";
label?: string;
screenshotOffsetMs?: number;
createdAt: string;
}
export interface GuideSnapshot {
id: string;
eventId: string;
timeMs: number;
offsetMs: number;
path: string;
width: number;
height: number;
}
export interface OcrBlock {
id: string;
snapshotId: string;
text: string;
confidence: number;
box: {
x: number;
y: number;
width: number;
height: number;
};
}
export interface GuideStepCandidate {
id: string;
eventId: string;
snapshotId?: string;
timeMs: number;
action: "click" | "choose" | "type" | "wait" | "manual";
targetText?: string;
targetRole?: "button" | "menu" | "tab" | "field" | "link" | "unknown";
nearbyText: string[];
confidence: number;
}
export interface GeneratedGuideStep {
id: string;
order: number;
title: string;
instruction: string;
screenshotPath?: string;
sourceCandidateId?: string;
}
export interface GeneratedGuide {
title: string;
summary?: string;
steps: GeneratedGuideStep[];
}
export interface GuideSession {
schemaVersion: 1;
recordingId: string;
videoPath: string;
cursorPath?: string;
guidePath: string;
outputDir: string;
status:
| "recording"
| "events-ready"
| "snapshots-ready"
| "ocr-ready"
| "draft-ready"
| "reviewed";
events: GuideEvent[];
snapshots: GuideSnapshot[];
ocrBlocks: OcrBlock[];
candidates: GuideStepCandidate[];
generatedGuide?: GeneratedGuide;
createdAt: string;
updatedAt: string;
}
```
Quy tắc dữ liệu:
- `timeMs` luôn tính theo timeline video cuối cùng.
- `x/y` là tọa độ pixel nếu có.
- `normalizedX/Y` dùng để chống lệch khi video scale.
- `screenshotOffsetMs` mặc định `500`, nghĩa là lấy ảnh sau click 0.5 giây để bắt trạng thái UI sau thao tác.
- AI output chỉ là draft, user edit mới là nội dung cuối.
## IPC Cần Thêm
MVP dùng app-level Electron IPC, không cần đưa vào native bridge vì đây là workflow cấp ứng dụng.
Preload API đề xuất:
```ts
window.electronAPI.guide = {
startSession(recordingId: string): Promise<GuideSession>;
addMarker(input: AddGuideMarkerInput): Promise<GuideEvent>;
finalizeEvents(input: FinalizeGuideEventsInput): Promise<GuideSession>;
writeSnapshot(input: WriteGuideSnapshotInput): Promise<GuideSnapshot>;
runOcr(input: RunGuideOcrInput): Promise<GuideSession>;
generateDraft(input: GenerateGuideDraftInput): Promise<GuideSession>;
saveGuide(input: SaveGuideInput): Promise<GuideSession>;
exportMarkdown(input: ExportGuideInput): Promise<{ path: string }>;
exportHtml(input: ExportGuideInput): Promise<{ path: string }>;
};
```
Input types:
```ts
export interface AddGuideMarkerInput {
recordingId: string;
timeMs: number;
kind: "hotkey" | "manual";
label?: string;
}
export interface FinalizeGuideEventsInput {
recordingId: string;
videoPath: string;
cursorPath?: string;
}
export interface WriteGuideSnapshotInput {
recordingId: string;
eventId: string;
timeMs: number;
offsetMs: number;
pngBytes: ArrayBuffer;
width: number;
height: number;
}
export interface RunGuideOcrInput {
recordingId: string;
snapshotIds?: string[];
}
export interface GenerateGuideDraftInput {
recordingId: string;
language: "vi" | "en";
provider: "deepseek" | "local";
}
export interface SaveGuideInput {
recordingId: string;
generatedGuide: GeneratedGuide;
}
export interface ExportGuideInput {
recordingId: string;
}
```
## Phase 1: Contracts, Store, IPC
Mục tiêu: tạo khung lưu trữ `.guide.json` mà chưa đụng recorder.
Task coding:
1. Tạo `src/guide/contracts.ts`.
2. Tạo `electron/guide/guidePaths.ts`.
3. Tạo `electron/guide/guideStore.ts`.
4. Tạo `electron/guide/guideIpc.ts`.
5. Register guide IPC trong `electron/ipc/handlers.ts`.
6. Expose API trong `electron/preload.ts`.
7. Bổ sung type cho `window.electronAPI.guide`.
Yêu cầu kỹ thuật:
- Ghi file atomically: write temp file rồi rename.
- Validate `schemaVersion`.
- Không throw raw error ra renderer, trả error code ổn định.
- Không yêu cầu AI/OCR trong phase này.
Acceptance:
- Tạo được guide session fake bằng IPC.
- Đọc/ghi `.guide.json` round-trip không mất dữ liệu.
- Input thiếu `recordingId` hoặc `videoPath` bị reject rõ ràng.
Test:
- `guideStore` tạo path đúng.
- `guideStore` đọc file lỗi schema và trả error.
- IPC handler reject input thiếu field.
## Phase 2: Build Event Từ Cursor Click
Mục tiêu: lấy click event từ `.cursor.json` hiện tại.
Task coding:
1. Tạo `src/guide/eventBuilder.ts`.
2. Thêm hàm `buildGuideEventsFromCursor`.
3. Lọc sample có `interactionType === "click"`.
4. Convert sang `GuideEvent`.
5. De-duplicate click trong cửa sổ `250ms`.
6. Sort theo `timeMs`.
7. Merge với marker thủ công nếu có.
Pseudo-code:
```ts
export function buildGuideEventsFromCursor(input: {
recordingId: string;
samples: CursorRecordingSample[];
videoWidth?: number;
videoHeight?: number;
}): GuideEvent[] {
const events = input.samples
.filter((sample) => sample.interactionType === "click")
.map((sample) => ({
id: createGuideEventId(input.recordingId, sample.timeMs),
recordingId: input.recordingId,
kind: "click" as const,
source: "cursor-recording" as const,
timeMs: sample.timeMs,
x: sample.cx,
y: sample.cy,
normalizedX: normalize(sample.cx, input.videoWidth),
normalizedY: normalize(sample.cy, input.videoHeight),
button: "left" as const,
screenshotOffsetMs: 500,
createdAt: new Date().toISOString(),
}));
return sortGuideEvents(dedupeGuideEvents(events));
}
```
Acceptance:
- 5 click samples tạo 5 guide events.
- `move``mouseup` không tạo step.
- Double click hoặc click bounce không tạo quá nhiều step nếu nằm trong dedupe window.
- Không có cursor click thì vẫn dùng được manual marker.
Test:
- convert click sample.
- bỏ qua move/mouseup.
- dedupe theo thời gian.
- sort đúng thứ tự.
- xử lý sample thiếu tọa độ.
## Phase 3: Guide Mode UI Và Manual Marker
Mục tiêu: user bật được Guide Mode và đánh dấu bước thủ công.
Task coding:
1. Thêm Guide Mode toggle trong `LaunchWindow.tsx`.
2. Truyền trạng thái guide vào flow recording trong `useScreenRecorder.ts`.
3. Khi start recording và Guide Mode on, gọi `guide.startSession(recordingId)`.
4. Thêm nút marker trong HUD.
5. Thêm global hotkey ở Electron main, ví dụ `CommandOrControl+Shift+G`.
6. Khi bấm marker/hotkey, gọi `guide.addMarker`.
7. Khi stop recording, gọi `guide.finalizeEvents`.
Lưu ý:
- Global hotkey phải nằm ở Electron main vì app đang được quay có thể đang focus.
- Nếu register hotkey fail, UI vẫn dùng nút marker.
- Không làm thay đổi behavior khi Guide Mode off.
Acceptance:
- Guide Mode off: quay/sửa/export vẫn như cũ.
- Guide Mode on: stop recording tạo `.guide.json`.
- Hotkey tạo event đúng timestamp.
- Cancel recording không để lại guide artifact rác.
## Phase 4: Snapshot Extraction
Mục tiêu: trích ảnh PNG cho từng event sau khi quay xong.
Quyết định MVP:
- Không chụp realtime trong lúc quay.
- Dùng video đã lưu, seek tới timestamp cần lấy.
- Thực hiện trong renderer/editor bằng hidden `<video>` + `<canvas>`.
- Persist PNG qua IPC.
Task coding:
1. Tạo `src/guide/snapshot/extractGuideSnapshots.ts`.
2. Nhận `GuideSession` + `videoPath`.
3. Với mỗi event, lấy timestamp `event.timeMs + screenshotOffsetMs`.
4. Clamp timestamp vào duration video.
5. Seek hidden video tới timestamp.
6. Draw frame vào canvas.
7. Convert canvas thành PNG bytes.
8. Gọi `guide.writeSnapshot`.
9. Update `.guide.json` với danh sách snapshots.
Acceptance:
- Mỗi event có một ảnh `step-xxx.png`.
- Nếu một event fail snapshot, các event khác vẫn chạy.
- Ảnh được lưu trong `recording-123-guide/`.
- UI báo lỗi recoverable, không crash editor.
Test:
- clamp timestamp.
- tên file đúng thứ tự.
- handle seek timeout.
- không abort toàn bộ batch khi một frame lỗi.
## Phase 5: OCR Local
Mục tiêu: đọc text trên giao diện phần mềm từ screenshot.
Khuyến nghị:
- Dùng PaddleOCR làm OCR chính.
- Tesseract chỉ nên là fallback đơn giản.
- VLM local như Gemma 3 4B/MiniCPM/Qwen-VL chỉ dùng cho trường hợp icon/no-text khó, không dùng làm OCR chính.
Kiến trúc MVP:
- Chạy PaddleOCR như local HTTP service tại `127.0.0.1:8866`.
- Electron main gọi OCR service.
- Renderer không gọi OCR trực tiếp.
API local OCR đề xuất:
```http
GET /health
POST /ocr
Content-Type: application/json
{
"imagePath": "D:\\Code\\OpenScreen\\recording-123-guide\\step-001.png",
"language": "vi,en"
}
```
Response:
```json
{
"blocks": [
{
"text": "Settings",
"confidence": 0.97,
"box": { "x": 120, "y": 80, "width": 90, "height": 24 }
}
]
}
```
Task coding:
1. Tạo `electron/guide/ocr/paddleOcrClient.ts`.
2. Thêm health check.
3. Gọi OCR cho từng snapshot.
4. Convert output sang `OcrBlock`.
5. Ghi `ocrBlocks` vào `.guide.json`.
6. Ghi bản tổng hợp vào `recording-123-guide/ocr.json`.
Config đề xuất:
```ts
export interface GuideOcrConfig {
provider: "paddleocr";
baseUrl: string; // default http://127.0.0.1:8866
language: string; // default vi,en
timeoutMs: number; // default 30000
}
```
Acceptance:
- OCR service offline thì UI báo lỗi rõ ràng.
- OCR fail không xóa snapshots.
- OCR result có text, confidence, bounding box.
- Guide vẫn export thủ công được nếu OCR không chạy.
## Phase 6: Target Mapper
Mục tiêu: xác định user đã click vào nút/menu/field nào dựa trên tọa độ click và OCR.
Task coding:
1. Tạo `src/guide/targetMapper.ts`.
2. Với mỗi `GuideEvent`, lấy snapshot tương ứng.
3. Lấy OCR blocks của snapshot đó.
4. Score từng OCR block.
5. Chọn target tốt nhất.
6. Sinh `GuideStepCandidate`.
Scoring đề xuất:
- `+100` nếu click nằm trong OCR box.
- Điểm cao hơn nếu box center gần click hơn.
- Cộng điểm nếu text ngắn, giống label button/menu.
- Trừ điểm nếu confidence thấp.
- Nếu không có block đủ tốt, để `targetRole: "unknown"`.
Role heuristic:
- `button`: click vào/near text dạng action label.
- `menu`: text nằm trong danh sách dọc.
- `tab`: text nằm trong hàng ngang gần đầu giao diện.
- `field`: click vào vùng giống input.
- `unknown`: không đủ tự tin.
Acceptance:
- Click trực tiếp vào nút text map đúng target text.
- Click gần label map được OCR block gần nhất.
- Click vùng icon/no-text tạo candidate confidence thấp để user review.
Test:
- click inside box.
- nearest box.
- low-confidence penalty.
- no OCR fallback.
## Phase 7: AI Draft Generation
Mục tiêu: tạo bản nháp hướng dẫn từ candidate metadata.
Provider MVP:
- DeepSeek API cho cloud text generation.
- Local LLM có thể thêm sau qua cùng prompt contract.
- Không gửi ảnh lên DeepSeek mặc định.
Task coding:
1. Tạo `src/guide/promptBuilder.ts`.
2. Tạo `electron/guide/ai/deepseekGuideClient.ts`.
3. Đọc API key ở Electron main qua env/config.
4. Build prompt từ candidates + OCR nearby text.
5. Yêu cầu output JSON.
6. Validate output.
7. Ghi `generatedGuide` vào `.guide.json`.
Env:
```powershell
$env:DEEPSEEK_API_KEY="..."
$env:DEEPSEEK_BASE_URL="https://api.deepseek.com"
$env:DEEPSEEK_MODEL="deepseek-v4-flash"
```
Prompt input:
```json
{
"language": "vi",
"softwareContext": {
"recordingName": "recording-123",
"userGoal": "Tạo báo cáo"
},
"steps": [
{
"order": 1,
"eventKind": "click",
"targetText": "Settings",
"targetRole": "button",
"nearbyText": ["Home", "Settings", "Account"],
"confidence": 0.91
}
]
}
```
Expected AI output:
```json
{
"title": "Hướng dẫn thao tác",
"summary": "Tài liệu này mô tả các bước thực hiện thao tác đã ghi hình.",
"steps": [
{
"order": 1,
"title": "Mở phần cài đặt",
"instruction": "Nhấn nút Settings ở thanh điều hướng bên trái."
}
]
}
```
Acceptance:
- Thiếu API key thì UI báo lỗi rõ ràng.
- AI trả invalid JSON thì reject và cho retry.
- Output được validate trước khi lưu.
- Có thể generate tiếng Việt.
Test:
- promptBuilder không đưa raw image vào prompt.
- parser reject JSON sai schema.
- DeepSeek client handle timeout/401/rate limit.
## Phase 8: GuidePanel Review UI
Mục tiêu: người dùng sửa được guide trước khi export.
Task coding:
1. Tạo `src/components/video-editor/guide/GuidePanel.tsx`.
2. Tạo `GuideStepList.tsx`.
3. Tạo `GuideStepEditor.tsx`.
4. Tạo `GuideSnapshotPreview.tsx`.
5. Mount panel trong `VideoEditor.tsx`.
6. Load `.guide.json` khi mở video có guide sidecar.
7. Thêm action:
- Generate snapshots.
- Run OCR.
- Generate AI draft.
- Save edits.
- Export Markdown.
- Export HTML.
UX:
- AI output là draft, không khóa nội dung.
- User sửa title/instruction từng step.
- User xóa step nhiễu.
- User merge step sau nếu cần.
- Confidence hiển thị nhỏ, không làm UI rối.
- Guide fail không ảnh hưởng video editing.
Acceptance:
- User sửa step và save được.
- User xóa step được.
- Regenerate cần confirm nếu đang có manual edits.
- Export dùng nội dung đã sửa, không dùng lại AI raw output.
## Phase 9: Export Markdown/HTML
Mục tiêu: tạo tài liệu dùng được ngay.
Task coding:
1. Tạo `src/guide/export/markdownExporter.ts`.
2. Tạo `src/guide/export/htmlExporter.ts`.
3. Gọi exporter từ Electron IPC.
4. Ghi file vào `recording-123-guide/guide.md`.
5. Ghi file vào `recording-123-guide/guide.html`.
6. Dùng relative screenshot link.
Markdown format:
```md
# Hướng dẫn thao tác
Tài liệu này mô tả các bước thực hiện thao tác đã ghi hình.
## Bước 1: Mở phần cài đặt
Nhấn nút **Settings** ở thanh điều hướng bên trái.
![Bước 1](step-001.png)
```
Acceptance:
- Markdown mở được và thấy ảnh local.
- HTML mở được bằng browser.
- Export vẫn chạy nếu guide được viết thủ công, không cần AI.
## Thứ Tự Coding Ngay
Nên làm theo thứ tự này để giảm rủi ro:
1. Tạo `src/guide/contracts.ts`.
2. Tạo `electron/guide/guidePaths.ts`.
3. Tạo `electron/guide/guideStore.ts`.
4. Tạo `electron/guide/guideIpc.ts`.
5. Expose `window.electronAPI.guide`.
6. Viết unit test cho guide store.
7. Tạo `src/guide/eventBuilder.ts`.
8. Viết unit test convert cursor samples sang guide events.
9. Thêm Guide Mode toggle vào launch UI.
10. Gọi `startSession` khi bắt đầu quay.
11. Gọi `finalizeEvents` khi dừng quay.
12. Tạo snapshot extractor trong renderer.
13. Tạo `paddleOcrClient`.
14. Tạo `targetMapper`.
15. Tạo `promptBuilder`.
16. Tạo `deepseekGuideClient`.
17. Tạo `GuidePanel`.
18. Tạo Markdown/HTML exporters.
19. Chạy lint/test/build.
20. Test thủ công flow đầy đủ.
Chia PR đề xuất:
- PR 1: contracts, store, IPC, unit tests.
- PR 2: cursor-click event builder, Guide Mode toggle, manual marker.
- PR 3: snapshot extraction và GuidePanel shell.
- PR 4: PaddleOCR integration và target mapping.
- PR 5: DeepSeek generation, review UI, Markdown/HTML export.
## Error Codes
Dùng error code ổn định để UI xử lý:
```ts
export type GuideErrorCode =
| "guide-session-not-found"
| "guide-invalid-schema"
| "guide-video-load-failed"
| "guide-snapshot-failed"
| "guide-ocr-unavailable"
| "guide-ocr-failed"
| "guide-ai-key-missing"
| "guide-ai-request-failed"
| "guide-ai-invalid-output"
| "guide-export-failed";
```
Quy tắc:
- IPC không throw raw provider error ra renderer.
- OCR fail là recoverable.
- AI fail là recoverable.
- Export fail phải giữ nguyên `.guide.json`.
## Local Development
Baseline:
```powershell
npm install
npm run lint
npm test
npm run build-vite
```
OCR service dev:
```powershell
python -m venv .venv-ocr
.venv-ocr\Scripts\Activate.ps1
pip install paddleocr fastapi uvicorn
uvicorn local_ocr_service:app --host 127.0.0.1 --port 8866
```
DeepSeek env:
```powershell
$env:DEEPSEEK_API_KEY="..."
$env:DEEPSEEK_BASE_URL="https://api.deepseek.com"
$env:DEEPSEEK_MODEL="deepseek-v4-flash"
```
Không commit API key.
## Testing Matrix
Unit tests:
- `eventBuilder`: cursor sample -> guide events.
- `targetMapper`: OCR blocks -> step candidates.
- `promptBuilder`: candidates -> AI prompt.
- `markdownExporter`: generated guide -> Markdown.
- `htmlExporter`: generated guide -> HTML.
Renderer/browser tests:
- snapshot extractor seek video fixture.
- GuidePanel edit/delete/save step.
Manual integration:
1. Quay với Guide Mode off, xác nhận behavior cũ không đổi.
2. Quay với Guide Mode on và 3 click.
3. Kiểm tra `.guide.json` có 3 click events.
4. Generate snapshots.
5. Run OCR local.
6. Generate Vietnamese draft.
7. Sửa một step.
8. Export Markdown.
9. Mở Markdown/HTML xem ảnh local.
10. Tắt OCR service và test lỗi recoverable.
11. Xóa DeepSeek key và test lỗi recoverable.
Lệnh trước khi merge:
```powershell
npm run lint
npm test
npm run build-vite
```
Nếu phase không sửa native recorder thì chưa cần chạy native helper tests.
## Definition Of Done Cho MVP
MVP được xem là xong khi:
- Guide Mode bật/tắt được.
- Guide Mode off không ảnh hưởng recording hiện tại.
- Click events lấy được từ cursor telemetry hiện có.
- Hotkey/HUD marker tạo event thủ công.
- `.guide.json` được tạo cạnh recording.
- Snapshot PNG được trích từ final video.
- PaddleOCR local đọc được text và bounding boxes.
- Target mapper tạo step candidates.
- DeepSeek tạo được draft tiếng Việt từ text metadata.
- User review/sửa/xóa step được.
- Export Markdown/HTML dùng nội dung đã review.
- Lint/test/build pass.
## Nâng Cấp Sau MVP
- Export PDF/DOCX.
- Bundle local OCR runtime vào packaged app.
- Thêm local VLM fallback cho icon-only control.
- Cho phép user opt-in gửi crop ảnh lên remote vision model.
- Merge/dedupe step thông minh hơn cho double click/menu navigation.
- Dùng transcript giọng nói làm ngữ cảnh thêm.
- Template theo từng loại phần mềm.
- Computer vision detect UI element ngoài OCR.
@@ -0,0 +1,73 @@
# PaddleOCR Local Service
OpenScreen calls OCR through a local HTTP service. The default endpoint is:
```text
http://127.0.0.1:8866/ocr
```
The app sends either `imageBase64` or `path` and expects OCR blocks:
```json
{
"blocks": [
{
"text": "Settings",
"confidence": 0.97,
"box": { "x": 120, "y": 80, "width": 90, "height": 24 }
}
]
}
```
## Install
Use a separate virtual environment because PaddleOCR and PaddlePaddle are large dependencies.
```powershell
python -m venv .venv-ocr
.\.venv-ocr\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r tools\ocr\requirements.txt
```
If `paddle` is still missing after installing `paddleocr`, install the CPU PaddlePaddle wheel that matches your Python and OS from the official PaddlePaddle install guide.
## Run
```powershell
.\.venv-ocr\Scripts\Activate.ps1
$env:PADDLEOCR_DEVICE="cpu"
$env:PADDLEOCR_LANG="latin"
npm run ocr:paddle
```
Keep this terminal open while using the Guide OCR step in OpenScreen.
## Verify
```powershell
Invoke-WebRequest http://127.0.0.1:8866/health -UseBasicParsing
```
Expected healthy environment:
```json
{
"ok": true,
"paddleocrInstalled": true,
"paddleInstalled": true,
"engineReady": false,
"defaultLanguage": "latin"
}
```
`engineReady` becomes `true` after the first OCR request. The first request can be slow because PaddleOCR downloads and loads models.
## Configuration
- `PADDLEOCR_DEVICE`: `cpu`, `gpu:0`, or another PaddleOCR device string.
- `PADDLEOCR_LANG`: defaults to `latin`; this is preferred for Vietnamese UI text because it uses a Latin-script recognition model.
- `PADDLEOCR_VERSION`: defaults to `PP-OCRv5`.
- `PADDLEOCR_USE_MOBILE`: defaults to `1`; set to `0` to use the default/server models.
- `OPENSCREEN_GUIDE_OCR_URL`: OpenScreen OCR endpoint override; defaults to `http://127.0.0.1:8866`.
+31 -16
View File
@@ -3,12 +3,15 @@
"$schema": "https://raw.githubusercontent.com/electron-userland/electron-builder/master/packages/app-builder-lib/scheme.json",
"appId": "com.siddharthvaddem.openscreen",
"asar": true,
// .node binaries can't be dlopen'd from inside an asar — must live unpacked.
// .node binaries cannot be loaded from inside an asar; keep them unpacked.
"asarUnpack": [
"**/*.node"
],
"productName": "Openscreen",
"npmRebuild": true,
"productName": "Openscreen",
"toolsets": {
"winCodeSign": "1.1.0"
},
"npmRebuild": true,
"buildDependenciesFromSource": true,
"compression": "normal",
"directories": {
@@ -71,19 +74,31 @@
"artifactName": "${productName}-Linux-${version}.${ext}",
"category": "AudioVideo"
},
"win": {
"target": [
"nsis"
],
"icon": "icons/icons/win/icon.ico",
"extraResources": [
{
"from": "electron/native/bin",
"to": "electron/native/bin",
"filter": ["win32-*/*"]
}
]
},
"win": {
"target": [
"nsis"
],
"icon": "icons/icons/win/icon.ico",
"signAndEditExecutable": false,
"signExts": ["!.exe"],
"extraResources": [
{
"from": "electron/native/bin",
"to": "electron/native/bin",
"filter": ["win32-*/*"]
},
{
"from": "tools/ocr/dist/openscreen-ocr-service",
"to": "ocr-service",
"filter": ["**/*"]
},
{
"from": "tools/ocr/models/paddlex",
"to": "ocr-models/paddlex",
"filter": ["**/*"]
}
]
},
"nsis": {
"oneClick": false,
"allowToChangeInstallationDirectory": true
+88
View File
@@ -27,6 +27,94 @@ interface Window {
invokeNativeBridge: <TData = unknown>(
request: import("../src/native/contracts").NativeBridgeRequest,
) => Promise<import("../src/native/contracts").NativeBridgeResponse<TData>>;
guide: {
startSession: (
recordingId: import("../src/guide/contracts").GuideRecordingIdInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideSession
>
>;
readSession: (
recordingId: import("../src/guide/contracts").GuideRecordingIdInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideSession
>
>;
addMarker: (input: import("../src/guide/contracts").AddGuideMarkerInput) => Promise<
import("../src/guide/contracts").GuideIpcResult<{
session: import("../src/guide/contracts").GuideSession;
event: import("../src/guide/contracts").GuideEvent;
}>
>;
finalizeEvents: (
input: import("../src/guide/contracts").FinalizeGuideEventsInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideSession
>
>;
writeSnapshot: (
input: import("../src/guide/contracts").WriteGuideSnapshotInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideSession
>
>;
runOcr: (
input: import("../src/guide/contracts").RunGuideOcrInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideSession
>
>;
generateDraft: (
input: import("../src/guide/contracts").GenerateGuideDraftInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideSession
>
>;
getAiSettings: () => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideAiSettings
>
>;
saveAiSettings: (
input: import("../src/guide/contracts").SaveGuideAiSettingsInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideAiSettings
>
>;
saveGuide: (
input: import("../src/guide/contracts").SaveGuideInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").GuideSession
>
>;
exportMarkdown: (
input: import("../src/guide/contracts").ExportGuideInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").ExportGuideResult
>
>;
exportHtml: (
input: import("../src/guide/contracts").ExportGuideInput,
) => Promise<
import("../src/guide/contracts").GuideIpcResult<
import("../src/guide/contracts").ExportGuideResult
>
>;
discardSession: (input: import("../src/guide/contracts").DiscardGuideSessionInput) => Promise<
import("../src/guide/contracts").GuideIpcResult<{
discarded: true;
}>
>;
};
getSources: (opts: Electron.SourcesOptions) => Promise<ProcessedDesktopSource[]>;
switchToEditor: () => Promise<void>;
switchToHud: () => Promise<void>;
+181
View File
@@ -0,0 +1,181 @@
import type {
GeneratedGuide,
GuideLanguage,
GuideSession,
GuideStepCandidate,
} from "../../../src/guide/contracts";
import { buildGuideDraftPrompt } from "../../../src/guide/promptBuilder";
import type { DeepSeekGuideConfigProvider } from "./deepseekSettingsStore";
export interface GuideDraftClient {
generate(input: {
session: GuideSession;
candidates: GuideStepCandidate[];
language: GuideLanguage;
}): Promise<GeneratedGuide>;
}
export class DeepSeekGuideClientError extends Error {
constructor(
readonly code: "guide-ai-key-missing" | "guide-ai-request-failed" | "guide-ai-invalid-output",
message: string,
readonly retryable = false,
) {
super(message);
this.name = "DeepSeekGuideClientError";
}
}
interface DeepSeekChatResponse {
choices?: Array<{
message?: {
content?: string;
};
}>;
}
export class DeepSeekGuideClient implements GuideDraftClient {
constructor(
private readonly configProvider?: DeepSeekGuideConfigProvider,
private readonly fallbackApiKey = process.env.DEEPSEEK_API_KEY,
private readonly fallbackBaseUrl = process.env.DEEPSEEK_BASE_URL ?? "https://api.deepseek.com",
private readonly fallbackModel = process.env.DEEPSEEK_MODEL ?? "deepseek-chat",
) {}
async generate(input: {
session: GuideSession;
candidates: GuideStepCandidate[];
language: GuideLanguage;
}): Promise<GeneratedGuide> {
const config = await this.resolveConfig();
if (!config.apiKey) {
throw new DeepSeekGuideClientError(
"guide-ai-key-missing",
"DeepSeek API key is not configured.",
);
}
let response: Response;
try {
response = await fetch(`${config.baseUrl.replace(/\/$/, "")}/chat/completions`, {
method: "POST",
headers: {
"content-type": "application/json",
authorization: `Bearer ${config.apiKey}`,
},
body: JSON.stringify({
model: config.model,
temperature: 0.2,
response_format: { type: "json_object" },
messages: [
{
role: "system",
content:
"You convert UI interaction telemetry into concise software user-guide steps.",
},
{
role: "user",
content: buildGuideDraftPrompt(input),
},
],
}),
});
} catch (error) {
throw new DeepSeekGuideClientError(
"guide-ai-request-failed",
`DeepSeek request failed: ${error instanceof Error ? error.message : String(error)}`,
true,
);
}
if (!response.ok) {
throw new DeepSeekGuideClientError(
"guide-ai-request-failed",
`DeepSeek returned HTTP ${response.status}.`,
true,
);
}
const payload = (await response.json()) as DeepSeekChatResponse;
const content = payload.choices?.[0]?.message?.content;
if (!content) {
throw new DeepSeekGuideClientError(
"guide-ai-invalid-output",
"DeepSeek returned an empty response.",
);
}
return parseGeneratedGuide(content);
}
private async resolveConfig(): Promise<{ apiKey?: string; baseUrl: string; model: string }> {
if (this.configProvider) {
return await this.configProvider.getDeepSeekConfig();
}
return {
apiKey: this.fallbackApiKey,
baseUrl: this.fallbackBaseUrl,
model: this.fallbackModel,
};
}
}
function parseGeneratedGuide(content: string): GeneratedGuide {
try {
const parsed = JSON.parse(stripCodeFence(content)) as unknown;
const normalized = normalizeGeneratedGuide(parsed);
if (!normalized) {
throw new Error("Unexpected guide JSON shape.");
}
return normalized;
} catch (error) {
throw new DeepSeekGuideClientError(
"guide-ai-invalid-output",
`DeepSeek response is not valid guide JSON: ${error instanceof Error ? error.message : String(error)}`,
);
}
}
function stripCodeFence(content: string): string {
return content
.replace(/^```(?:json)?\s*/i, "")
.replace(/\s*```$/i, "")
.trim();
}
function normalizeGeneratedGuide(value: unknown): GeneratedGuide | null {
if (!value || typeof value !== "object") {
return null;
}
const guide = value as Partial<GeneratedGuide>;
if (typeof guide.title !== "string" || !Array.isArray(guide.steps)) {
return null;
}
const steps = guide.steps
.map((step, index) => {
if (!step || typeof step !== "object") {
return null;
}
const raw = step as Partial<GeneratedGuide["steps"][number]>;
if (typeof raw.title !== "string" || typeof raw.instruction !== "string") {
return null;
}
const order =
typeof raw.order === "number" && Number.isFinite(raw.order) ? raw.order : index + 1;
return {
id: typeof raw.id === "string" && raw.id.trim() ? raw.id : `guide-step-${order}`,
order,
title: raw.title,
instruction: raw.instruction,
...(typeof raw.screenshotPath === "string" ? { screenshotPath: raw.screenshotPath } : {}),
...(typeof raw.sourceCandidateId === "string"
? { sourceCandidateId: raw.sourceCandidateId }
: {}),
};
})
.filter((step): step is GeneratedGuide["steps"][number] => step !== null);
return {
title: guide.title,
summary: typeof guide.summary === "string" ? guide.summary : undefined,
steps,
};
}
+157
View File
@@ -0,0 +1,157 @@
import fs from "node:fs/promises";
import path from "node:path";
import type { GuideAiSettings, SaveGuideAiSettingsInput } from "../../../src/guide/contracts";
export interface DeepSeekGuideConfig {
apiKey?: string;
baseUrl: string;
model: string;
}
export interface DeepSeekGuideConfigProvider {
getDeepSeekConfig(): Promise<DeepSeekGuideConfig>;
}
interface PersistedGuideAiSettings {
schemaVersion: 1;
deepseek?: {
apiKeyEnvName?: string;
baseUrl?: string;
model?: string;
updatedAt?: string;
};
}
const DEFAULT_DEEPSEEK_API_KEY_ENV_NAME = "DEEPSEEK_API_KEY";
const DEFAULT_DEEPSEEK_BASE_URL = "https://api.deepseek.com";
const DEFAULT_DEEPSEEK_MODEL = "deepseek-chat";
export class DeepSeekSettingsStore implements DeepSeekGuideConfigProvider {
constructor(private readonly filePath: string) {}
async getStatus(): Promise<GuideAiSettings> {
const raw = await this.readSettings();
const apiKeyEnvName = normalizeEnvName(raw?.deepseek?.apiKeyEnvName);
const activeApiKey = process.env[apiKeyEnvName];
return {
deepseek: {
hasApiKey: Boolean(activeApiKey),
apiKeyEnvName,
baseUrl: normalizeBaseUrl(raw?.deepseek?.baseUrl ?? process.env.DEEPSEEK_BASE_URL),
model: normalizeModel(raw?.deepseek?.model ?? process.env.DEEPSEEK_MODEL),
storage: activeApiKey ? "environment" : "none",
encryptionAvailable: false,
updatedAt: raw?.deepseek?.updatedAt,
},
};
}
async save(input: SaveGuideAiSettingsInput): Promise<GuideAiSettings> {
const current = (await this.readSettings()) ?? { schemaVersion: 1 };
const currentDeepSeek = current.deepseek ?? {};
const nextDeepSeek = {
...currentDeepSeek,
baseUrl: normalizeBaseUrl(input.baseUrl ?? currentDeepSeek.baseUrl),
model: normalizeModel(input.model ?? currentDeepSeek.model),
updatedAt: new Date().toISOString(),
};
if (input.clearDeepseekApiKeyEnvName) {
delete nextDeepSeek.apiKeyEnvName;
} else if (input.deepseekApiKeyEnvName !== undefined) {
nextDeepSeek.apiKeyEnvName = normalizeEnvName(input.deepseekApiKeyEnvName);
}
await this.writeSettings({
schemaVersion: 1,
deepseek: nextDeepSeek,
});
return await this.getStatus();
}
async getDeepSeekConfig(): Promise<DeepSeekGuideConfig> {
const raw = await this.readSettings();
const apiKeyEnvName = normalizeEnvName(raw?.deepseek?.apiKeyEnvName);
return {
apiKey: process.env[apiKeyEnvName],
baseUrl: normalizeBaseUrl(raw?.deepseek?.baseUrl ?? process.env.DEEPSEEK_BASE_URL),
model: normalizeModel(raw?.deepseek?.model ?? process.env.DEEPSEEK_MODEL),
};
}
private async readSettings(): Promise<PersistedGuideAiSettings | null> {
try {
const content = await fs.readFile(this.filePath, "utf-8");
const parsed = JSON.parse(content) as unknown;
const normalized = normalizePersistedSettings(parsed);
if (normalized && hasLegacyStoredSecret(parsed)) {
await this.writeSettings(normalized);
}
return normalized;
} catch {
return null;
}
}
private async writeSettings(settings: PersistedGuideAiSettings): Promise<void> {
await fs.mkdir(path.dirname(this.filePath), { recursive: true });
const tempPath = `${this.filePath}.${process.pid}.${Date.now()}.tmp`;
await fs.writeFile(tempPath, JSON.stringify(settings, null, 2), "utf-8");
await fs.rename(tempPath, this.filePath);
}
}
function hasLegacyStoredSecret(input: unknown): boolean {
return (
typeof input === "object" &&
input !== null &&
typeof (input as { deepseek?: { apiKey?: unknown } }).deepseek?.apiKey === "object"
);
}
function normalizePersistedSettings(input: unknown): PersistedGuideAiSettings | null {
if (!input || typeof input !== "object") {
return null;
}
const raw = input as Partial<PersistedGuideAiSettings>;
if (raw.schemaVersion !== 1) {
return null;
}
return {
schemaVersion: 1,
deepseek: {
apiKeyEnvName: normalizeEnvName(raw.deepseek?.apiKeyEnvName),
baseUrl: raw.deepseek?.baseUrl,
model: raw.deepseek?.model,
updatedAt: raw.deepseek?.updatedAt,
},
};
}
function normalizeEnvName(value: string | undefined): string {
const normalized = value?.trim();
if (!normalized) {
return DEFAULT_DEEPSEEK_API_KEY_ENV_NAME;
}
return /^[A-Za-z_][A-Za-z0-9_]*$/.test(normalized)
? normalized
: DEFAULT_DEEPSEEK_API_KEY_ENV_NAME;
}
function normalizeBaseUrl(value: string | undefined): string {
const candidate = value?.trim() || DEFAULT_DEEPSEEK_BASE_URL;
try {
const url = new URL(candidate);
if (url.protocol !== "https:" && url.protocol !== "http:") {
return DEFAULT_DEEPSEEK_BASE_URL;
}
return url.toString().replace(/\/$/, "");
} catch {
return DEFAULT_DEEPSEEK_BASE_URL;
}
}
function normalizeModel(value: string | undefined): string {
return value?.trim() || DEFAULT_DEEPSEEK_MODEL;
}
+152
View File
@@ -0,0 +1,152 @@
import type { IpcMain } from "electron";
import type {
AddGuideMarkerInput,
DiscardGuideSessionInput,
ExportGuideInput,
ExportGuideResult,
FinalizeGuideEventsInput,
GenerateGuideDraftInput,
GuideAiSettings,
GuideEvent,
GuideIpcResult,
GuideSession,
RunGuideOcrInput,
SaveGuideAiSettingsInput,
SaveGuideInput,
WriteGuideSnapshotInput,
} from "../../src/guide/contracts";
import type { DeepSeekSettingsStore } from "./ai/deepseekSettingsStore";
import { GuideStore, GuideStoreError } from "./guideStore";
export function registerGuideIpcHandlers(
ipcMain: IpcMain,
store: GuideStore,
aiSettingsStore?: DeepSeekSettingsStore,
): void {
ipcMain.handle(
"guide:start-session",
async (_, recordingId): Promise<GuideIpcResult<GuideSession>> => {
return await toGuideResult(() => store.startSession(recordingId));
},
);
ipcMain.handle(
"guide:read-session",
async (_, recordingId): Promise<GuideIpcResult<GuideSession>> => {
return await toGuideResult(() => store.readSession(recordingId));
},
);
ipcMain.handle(
"guide:add-marker",
async (
_,
input: AddGuideMarkerInput,
): Promise<GuideIpcResult<{ session: GuideSession; event: GuideEvent }>> => {
return await toGuideResult(() => store.addMarker(input));
},
);
ipcMain.handle(
"guide:finalize-events",
async (_, input: FinalizeGuideEventsInput): Promise<GuideIpcResult<GuideSession>> => {
return await toGuideResult(() => store.finalizeEvents(input));
},
);
ipcMain.handle(
"guide:write-snapshot",
async (_, input: WriteGuideSnapshotInput): Promise<GuideIpcResult<GuideSession>> => {
return await toGuideResult(() => store.writeSnapshot(input));
},
);
ipcMain.handle(
"guide:run-ocr",
async (_, input: RunGuideOcrInput): Promise<GuideIpcResult<GuideSession>> => {
return await toGuideResult(() => store.runOcr(input));
},
);
ipcMain.handle(
"guide:generate-draft",
async (_, input: GenerateGuideDraftInput): Promise<GuideIpcResult<GuideSession>> => {
return await toGuideResult(() => store.generateDraft(input));
},
);
ipcMain.handle("guide:get-ai-settings", async (): Promise<GuideIpcResult<GuideAiSettings>> => {
return await toGuideResult(() => requireAiSettingsStore(aiSettingsStore).getStatus());
});
ipcMain.handle(
"guide:save-ai-settings",
async (_, input: SaveGuideAiSettingsInput): Promise<GuideIpcResult<GuideAiSettings>> => {
return await toGuideResult(() => requireAiSettingsStore(aiSettingsStore).save(input));
},
);
ipcMain.handle(
"guide:save-guide",
async (_, input: SaveGuideInput): Promise<GuideIpcResult<GuideSession>> => {
return await toGuideResult(() => store.saveGuide(input));
},
);
ipcMain.handle(
"guide:export-markdown",
async (_, input: ExportGuideInput): Promise<GuideIpcResult<ExportGuideResult>> => {
return await toGuideResult(() => store.exportMarkdown(input));
},
);
ipcMain.handle(
"guide:export-html",
async (_, input: ExportGuideInput): Promise<GuideIpcResult<ExportGuideResult>> => {
return await toGuideResult(() => store.exportHtml(input));
},
);
ipcMain.handle(
"guide:discard-session",
async (_, input: DiscardGuideSessionInput): Promise<GuideIpcResult<{ discarded: true }>> => {
return await toGuideResult(async () => {
await store.discardSession(input);
return { discarded: true };
});
},
);
}
function requireAiSettingsStore(store: DeepSeekSettingsStore | undefined): DeepSeekSettingsStore {
if (!store) {
throw new GuideStoreError("guide-internal-error", "Guide AI settings store is unavailable.");
}
return store;
}
async function toGuideResult<TData>(action: () => Promise<TData>): Promise<GuideIpcResult<TData>> {
try {
return {
success: true,
data: await action(),
};
} catch (error) {
if (error instanceof GuideStoreError) {
return {
success: false,
code: error.code,
error: error.message,
retryable: error.retryable,
};
}
console.error("Guide IPC failed:", error);
return {
success: false,
code: "guide-internal-error",
error: error instanceof Error ? error.message : String(error),
retryable: false,
};
}
}
+57
View File
@@ -0,0 +1,57 @@
import path from "node:path";
import type { GuideRecordingIdInput } from "../../src/guide/contracts";
export const GUIDE_SESSION_SUFFIX = ".guide.json";
export const GUIDE_OUTPUT_DIR_SUFFIX = "-guide";
export interface GuidePaths {
recordingId: string;
baseName: string;
baseDir: string;
guidePath: string;
outputDir: string;
}
export function normalizeGuideRecordingId(recordingId: GuideRecordingIdInput): string | null {
if (typeof recordingId === "number") {
return Number.isFinite(recordingId) ? String(Math.trunc(recordingId)) : null;
}
if (typeof recordingId !== "string") {
return null;
}
const trimmed = recordingId.trim();
return trimmed.length > 0 ? trimmed : null;
}
export function resolveGuidePaths(input: {
recordingsDir: string;
recordingId: GuideRecordingIdInput;
videoPath?: string | null;
}): GuidePaths | null {
const recordingId = normalizeGuideRecordingId(input.recordingId);
if (!recordingId) {
return null;
}
const normalizedVideoPath =
typeof input.videoPath === "string" && input.videoPath.trim()
? path.resolve(input.videoPath.trim())
: null;
const parsedVideoPath = normalizedVideoPath ? path.parse(normalizedVideoPath) : null;
const baseName = parsedVideoPath?.name ?? defaultGuideBaseName(recordingId);
const baseDir = parsedVideoPath?.dir ?? path.resolve(input.recordingsDir);
return {
recordingId,
baseName,
baseDir,
guidePath: path.join(baseDir, `${baseName}${GUIDE_SESSION_SUFFIX}`),
outputDir: path.join(baseDir, `${baseName}${GUIDE_OUTPUT_DIR_SUFFIX}`),
};
}
function defaultGuideBaseName(recordingId: string): string {
return recordingId.startsWith("recording-") ? recordingId : `recording-${recordingId}`;
}
+233
View File
@@ -0,0 +1,233 @@
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import { GuideStore, GuideStoreError } from "./guideStore";
let recordingsDir = "";
beforeEach(async () => {
recordingsDir = await fs.mkdtemp(path.join(os.tmpdir(), "openscreen-guide-"));
});
afterEach(async () => {
if (recordingsDir) {
await fs.rm(recordingsDir, { recursive: true, force: true });
}
});
describe("GuideStore", () => {
it("creates and reads an empty guide session", async () => {
const store = new GuideStore(recordingsDir);
const session = await store.startSession(123);
const readSession = await store.readSession(123);
expect(session.recordingId).toBe("123");
expect(session.status).toBe("recording");
expect(session.guidePath).toBe(path.join(recordingsDir, "recording-123.guide.json"));
expect(readSession).toEqual(session);
await expect(fs.stat(session.outputDir)).resolves.toMatchObject({
isDirectory: expect.any(Function),
});
});
it("adds marker events in timeline order", async () => {
const store = new GuideStore(recordingsDir);
await store.startSession(456);
await store.addMarker({ recordingId: 456, kind: "manual", timeMs: 2000, label: "Later" });
const result = await store.addMarker({
recordingId: 456,
kind: "hotkey",
timeMs: 500,
label: "First",
});
expect(result.event.kind).toBe("hotkey");
expect(result.session.events.map((event) => event.timeMs)).toEqual([500, 2000]);
expect(result.session.events[0]?.source).toBe("guide-hotkey");
expect(result.session.events[1]?.source).toBe("review-ui");
});
it("finalizes a session against the saved video path", async () => {
const store = new GuideStore(recordingsDir);
await store.startSession(789);
const videoPath = path.join(recordingsDir, "recording-789.mp4");
await fs.writeFile(videoPath, "");
const session = await store.finalizeEvents({ recordingId: 789, videoPath });
expect(session.status).toBe("events-ready");
expect(session.videoPath).toBe(videoPath);
expect(session.guidePath).toBe(path.join(recordingsDir, "recording-789.guide.json"));
});
it("adds cursor click events when finalizing a session", async () => {
const store = new GuideStore(recordingsDir);
await store.startSession(790);
await store.addMarker({ recordingId: 790, kind: "manual", timeMs: 250, label: "Manual" });
const videoPath = path.join(recordingsDir, "recording-790.mp4");
await fs.writeFile(videoPath, "");
await fs.writeFile(
`${videoPath}.cursor.json`,
JSON.stringify({
version: 2,
provider: "native",
assets: [],
samples: [
{ timeMs: 100, cx: 0.2, cy: 0.3, interactionType: "move" },
{ timeMs: 200, cx: 0.4, cy: 0.5, interactionType: "click" },
{ timeMs: 225, cx: 0.401, cy: 0.501, interactionType: "click" },
],
}),
"utf-8",
);
const session = await store.finalizeEvents({ recordingId: 790, videoPath });
expect(session.cursorPath).toBe(`${videoPath}.cursor.json`);
expect(session.events.map((event) => event.kind)).toEqual(["click", "manual"]);
expect(session.events[0]).toMatchObject({
timeMs: 200,
normalizedX: 0.4,
normalizedY: 0.5,
});
});
it("rejects guide artifacts outside the recordings directory", async () => {
const store = new GuideStore(recordingsDir);
await store.startSession(321);
const outsideVideoPath = path.join(path.dirname(recordingsDir), "outside.mp4");
await expect(
store.finalizeEvents({ recordingId: 321, videoPath: outsideVideoPath }),
).rejects.toMatchObject({
code: "guide-invalid-input",
});
});
it("rejects invalid guide session schema", async () => {
const store = new GuideStore(recordingsDir);
await fs.writeFile(
path.join(recordingsDir, "recording-bad.guide.json"),
JSON.stringify({ schemaVersion: 999 }),
"utf-8",
);
await expect(store.readSession("bad")).rejects.toBeInstanceOf(GuideStoreError);
await expect(store.readSession("bad")).rejects.toMatchObject({
code: "guide-invalid-schema",
});
});
it("saves a reviewed generated guide", async () => {
const store = new GuideStore(recordingsDir);
await store.startSession(654);
const session = await store.saveGuide({
recordingId: 654,
generatedGuide: {
title: "Huong dan thao tac",
steps: [
{
id: "step-1",
order: 1,
title: "Mo cai dat",
instruction: "Nhan nut Settings.",
},
],
},
});
expect(session.status).toBe("reviewed");
expect(session.generatedGuide?.steps).toHaveLength(1);
});
it("writes snapshots and builds candidates without OCR", async () => {
const store = new GuideStore(recordingsDir);
await store.startSession(112);
await store.addMarker({ recordingId: 112, kind: "manual", timeMs: 500, label: "Save" });
const videoPath = path.join(recordingsDir, "recording-112.mp4");
await fs.writeFile(videoPath, "");
const eventsSession = await store.finalizeEvents({ recordingId: 112, videoPath });
const session = await store.writeSnapshot({
recordingId: 112,
eventId: eventsSession.events[0]?.id ?? "",
timeMs: 1000,
offsetMs: 500,
width: 800,
height: 600,
pngBytes: new Uint8Array([137, 80, 78, 71]).buffer,
});
expect(session.status).toBe("snapshots-ready");
expect(session.snapshots).toHaveLength(1);
expect(session.candidates[0]).toMatchObject({ targetText: "Save" });
await expect(fs.readFile(session.snapshots[0]?.path ?? "")).resolves.toEqual(
Buffer.from([137, 80, 78, 71]),
);
});
it("runs OCR, generates a local draft, and exports files", async () => {
const store = new GuideStore(recordingsDir, {
ocrClient: {
recognize: async (snapshot) => [
{
id: `ocr-${snapshot.id}-1`,
snapshotId: snapshot.id,
text: "Save",
confidence: 0.95,
box: { x: 0.45, y: 0.45, width: 0.15, height: 0.08 },
},
],
},
});
await store.startSession(113);
const videoPath = path.join(recordingsDir, "recording-113.mp4");
await fs.writeFile(videoPath, "");
await fs.writeFile(
`${videoPath}.cursor.json`,
JSON.stringify({
samples: [{ timeMs: 200, cx: 0.5, cy: 0.5, interactionType: "click" }],
}),
"utf-8",
);
const eventsSession = await store.finalizeEvents({ recordingId: 113, videoPath });
await store.writeSnapshot({
recordingId: 113,
eventId: eventsSession.events[0]?.id ?? "",
timeMs: 700,
offsetMs: 500,
width: 800,
height: 600,
pngBytes: new Uint8Array([1, 2, 3]).buffer,
});
const ocrSession = await store.runOcr({ recordingId: 113 });
const draftSession = await store.generateDraft({
recordingId: 113,
language: "en",
provider: "local",
});
const markdown = await store.exportMarkdown({ recordingId: 113 });
const html = await store.exportHtml({ recordingId: 113 });
expect(ocrSession.candidates[0]).toMatchObject({ targetText: "Save" });
expect(draftSession.generatedGuide?.steps[0]?.instruction).toBe('Click "Save".');
await expect(fs.readFile(markdown.path, "utf-8")).resolves.toContain("# User guide");
await expect(fs.readFile(html.path, "utf-8")).resolves.toContain("<!doctype html>");
});
it("discards a guide session and output directory", async () => {
const store = new GuideStore(recordingsDir);
const session = await store.startSession(111);
await fs.writeFile(path.join(session.outputDir, "step-001.png"), "");
await store.discardSession({ recordingId: 111 });
await expect(fs.stat(session.guidePath)).rejects.toMatchObject({ code: "ENOENT" });
await expect(fs.stat(session.outputDir)).rejects.toMatchObject({ code: "ENOENT" });
});
});
+824
View File
@@ -0,0 +1,824 @@
import { randomUUID } from "node:crypto";
import fs from "node:fs/promises";
import path from "node:path";
import {
type AddGuideMarkerInput,
type DiscardGuideSessionInput,
type ExportGuideInput,
type ExportGuideResult,
type FinalizeGuideEventsInput,
type GeneratedGuide,
type GeneratedGuideStep,
type GenerateGuideDraftInput,
GUIDE_SCHEMA_VERSION,
type GuideErrorCode,
type GuideEvent,
type GuideEventKind,
type GuideEventSource,
type GuideSession,
type GuideSessionStatus,
type GuideSnapshot,
type GuideStepCandidate,
type OcrBlock,
type RunGuideOcrInput,
type SaveGuideInput,
type WriteGuideSnapshotInput,
} from "../../src/guide/contracts";
import { buildGuideEventsFromCursor, mergeGuideEvents } from "../../src/guide/eventBuilder";
import { exportGuideToHtml, exportGuideToMarkdown } from "../../src/guide/exporters";
import { buildLocalGuideDraft } from "../../src/guide/promptBuilder";
import { buildGuideStepCandidates } from "../../src/guide/targetMapper";
import type { CursorRecordingSample } from "../../src/native/contracts";
import {
DeepSeekGuideClient,
DeepSeekGuideClientError,
type GuideDraftClient,
} from "./ai/deepseekGuideClient";
import type { DeepSeekGuideConfigProvider } from "./ai/deepseekSettingsStore";
import { type GuidePaths, normalizeGuideRecordingId, resolveGuidePaths } from "./guidePaths";
import { createFocusedOcrSnapshot, remapFocusedOcrBlocks } from "./ocr/focusedOcrSnapshot";
import { DefaultGuideOcrClient, type GuideOcrClient } from "./ocr/paddleOcrClient";
const VALID_SESSION_STATUSES = new Set<GuideSessionStatus>([
"recording",
"events-ready",
"snapshots-ready",
"ocr-ready",
"draft-ready",
"reviewed",
]);
const VALID_EVENT_KINDS = new Set<GuideEventKind>(["click", "hotkey", "manual"]);
const VALID_EVENT_SOURCES = new Set<GuideEventSource>([
"cursor-recording",
"guide-hotkey",
"review-ui",
]);
export class GuideStoreError extends Error {
constructor(
readonly code: GuideErrorCode,
message: string,
readonly retryable = false,
) {
super(message);
this.name = "GuideStoreError";
}
}
export interface GuideStoreDependencies {
ocrClient?: GuideOcrClient;
draftClient?: GuideDraftClient;
deepSeekConfigProvider?: DeepSeekGuideConfigProvider;
focusOcrSnapshots?: boolean;
}
export class GuideStore {
constructor(
private readonly recordingsDir: string,
private readonly dependencies: GuideStoreDependencies = {},
) {}
async startSession(recordingIdInput: AddGuideMarkerInput["recordingId"]): Promise<GuideSession> {
const paths = this.requireGuidePaths(recordingIdInput);
const now = new Date().toISOString();
const session: GuideSession = {
schemaVersion: GUIDE_SCHEMA_VERSION,
recordingId: paths.recordingId,
videoPath: "",
guidePath: paths.guidePath,
outputDir: paths.outputDir,
status: "recording",
events: [],
snapshots: [],
ocrBlocks: [],
candidates: [],
createdAt: now,
updatedAt: now,
};
await this.writeSession(session);
return session;
}
async readSession(recordingIdInput: AddGuideMarkerInput["recordingId"]): Promise<GuideSession> {
const paths = this.requireGuidePaths(recordingIdInput);
return await this.readSessionAtPath(paths.guidePath);
}
async addMarker(
input: AddGuideMarkerInput,
): Promise<{ session: GuideSession; event: GuideEvent }> {
const recordingId = normalizeGuideRecordingId(input.recordingId);
if (!recordingId) {
throw new GuideStoreError("guide-invalid-input", "Guide marker is missing recordingId.");
}
if (input.kind !== "hotkey" && input.kind !== "manual") {
throw new GuideStoreError("guide-invalid-input", "Guide marker kind is invalid.");
}
if (!Number.isFinite(input.timeMs) || input.timeMs < 0) {
throw new GuideStoreError("guide-invalid-input", "Guide marker timeMs must be non-negative.");
}
const session = await this.readSession(recordingId);
const event: GuideEvent = {
id: `guide-event-${randomUUID()}`,
recordingId,
kind: input.kind,
source: input.kind === "hotkey" ? "guide-hotkey" : "review-ui",
timeMs: Math.max(0, input.timeMs),
label: normalizeOptionalString(input.label),
screenshotOffsetMs: 500,
createdAt: new Date().toISOString(),
};
const updatedSession = touchSession({
...session,
events: sortGuideEvents([...session.events, event]),
});
await this.writeSession(updatedSession);
return { session: updatedSession, event };
}
async finalizeEvents(input: FinalizeGuideEventsInput): Promise<GuideSession> {
const recordingId = normalizeGuideRecordingId(input.recordingId);
if (!recordingId) {
throw new GuideStoreError(
"guide-invalid-input",
"Guide finalization is missing recordingId.",
);
}
if (typeof input.videoPath !== "string" || input.videoPath.trim().length === 0) {
throw new GuideStoreError("guide-invalid-input", "Guide finalization is missing videoPath.");
}
const videoPath = path.resolve(input.videoPath);
const currentSession = await this.readSession(recordingId);
const nextPaths = this.requireGuidePaths(recordingId, videoPath);
const cursorPath = await this.resolveCursorPath(videoPath, input.cursorPath);
const cursorEvents = cursorPath
? await this.readCursorGuideEvents(recordingId, cursorPath)
: [];
const manualEvents = currentSession.events.filter(
(event) => event.source !== "cursor-recording",
);
const updatedSession = touchSession({
...currentSession,
videoPath,
cursorPath,
guidePath: nextPaths.guidePath,
outputDir: nextPaths.outputDir,
status: "events-ready",
events: mergeGuideEvents([...cursorEvents, ...manualEvents]),
});
await this.writeSession(updatedSession);
if (path.resolve(currentSession.guidePath) !== path.resolve(updatedSession.guidePath)) {
await fs.unlink(currentSession.guidePath).catch(() => undefined);
}
return updatedSession;
}
async writeSnapshot(input: WriteGuideSnapshotInput): Promise<GuideSession> {
const recordingId = normalizeGuideRecordingId(input.recordingId);
if (!recordingId) {
throw new GuideStoreError("guide-invalid-input", "Snapshot write is missing recordingId.");
}
if (!input.eventId || !Number.isFinite(input.timeMs) || input.timeMs < 0) {
throw new GuideStoreError("guide-invalid-input", "Snapshot metadata is invalid.");
}
if (!input.pngBytes || input.pngBytes.byteLength === 0) {
throw new GuideStoreError("guide-invalid-input", "Snapshot PNG data is empty.");
}
if (
!Number.isFinite(input.width) ||
input.width <= 0 ||
!Number.isFinite(input.height) ||
input.height <= 0
) {
throw new GuideStoreError("guide-invalid-input", "Snapshot dimensions are invalid.");
}
const session = await this.readSession(recordingId);
const eventIndex = session.events.findIndex((event) => event.id === input.eventId);
if (eventIndex === -1) {
throw new GuideStoreError("guide-invalid-input", "Snapshot event does not exist.");
}
this.assertGuidePathIsAllowed(session.outputDir);
await fs.mkdir(session.outputDir, { recursive: true });
const fileName = `step-${String(eventIndex + 1).padStart(3, "0")}.png`;
const snapshotPath = path.join(session.outputDir, fileName);
this.assertGuidePathIsAllowed(snapshotPath);
await fs.writeFile(snapshotPath, Buffer.from(new Uint8Array(input.pngBytes)));
const snapshot: GuideSnapshot = {
id: `snapshot-${input.eventId}`,
eventId: input.eventId,
timeMs: Math.max(0, input.timeMs),
offsetMs: input.offsetMs,
path: snapshotPath,
width: Math.round(input.width),
height: Math.round(input.height),
};
const updatedSnapshots = [
...session.snapshots.filter((existing) => existing.eventId !== input.eventId),
snapshot,
].sort((left, right) => left.timeMs - right.timeMs);
const updatedSession = touchSession({
...session,
status: "snapshots-ready",
snapshots: updatedSnapshots,
ocrBlocks: session.ocrBlocks.filter((block) => block.snapshotId !== snapshot.id),
candidates: buildGuideStepCandidates({
...session,
snapshots: updatedSnapshots,
ocrBlocks: session.ocrBlocks.filter((block) => block.snapshotId !== snapshot.id),
}),
generatedGuide: undefined,
});
await this.writeSession(updatedSession);
return updatedSession;
}
async runOcr(input: RunGuideOcrInput): Promise<GuideSession> {
const session = await this.readSession(input.recordingId);
const requestedIds = new Set(input.snapshotIds ?? []);
const snapshots =
requestedIds.size > 0
? session.snapshots.filter((snapshot) => requestedIds.has(snapshot.id))
: session.snapshots;
if (snapshots.length === 0) {
throw new GuideStoreError("guide-invalid-input", "No guide snapshots are available for OCR.");
}
const ocrClient = this.dependencies.ocrClient ?? new DefaultGuideOcrClient();
const shouldFocusOcrSnapshots =
this.dependencies.focusOcrSnapshots ?? this.dependencies.ocrClient === undefined;
const eventsById = new Map(session.events.map((event) => [event.id, event]));
const blocks: OcrBlock[] = [];
try {
for (const snapshot of snapshots) {
const focusedSnapshot = shouldFocusOcrSnapshots
? await createFocusedOcrSnapshot({
snapshot,
event: eventsById.get(snapshot.eventId),
outputDir: session.outputDir,
})
: { snapshot };
const recognizedBlocks = await ocrClient.recognize(focusedSnapshot.snapshot);
blocks.push(...remapFocusedOcrBlocks(recognizedBlocks, focusedSnapshot.transform));
}
} catch (error) {
throw new GuideStoreError(
"guide-ocr-unavailable",
error instanceof Error ? error.message : "OCR failed.",
true,
);
}
const snapshotIds = new Set(snapshots.map((snapshot) => snapshot.id));
const updatedOcrBlocks = [
...session.ocrBlocks.filter((block) => !snapshotIds.has(block.snapshotId)),
...blocks,
];
const draftSession = {
...session,
ocrBlocks: updatedOcrBlocks,
};
const updatedSession = touchSession({
...draftSession,
status: "ocr-ready",
candidates: buildGuideStepCandidates(draftSession),
generatedGuide: undefined,
});
await this.writeSession(updatedSession);
return updatedSession;
}
async generateDraft(input: GenerateGuideDraftInput): Promise<GuideSession> {
const session = await this.readSession(input.recordingId);
const candidates =
session.candidates.length > 0 ? session.candidates : buildGuideStepCandidates(session);
if (candidates.length === 0) {
throw new GuideStoreError(
"guide-invalid-input",
"No guide events are available for drafting.",
);
}
let generatedGuide: GeneratedGuide;
if (input.provider === "local") {
generatedGuide = buildLocalGuideDraft(session, candidates, input.language);
} else {
const draftClient =
this.dependencies.draftClient ??
new DeepSeekGuideClient(this.dependencies.deepSeekConfigProvider);
try {
generatedGuide = await draftClient.generate({
session,
candidates,
language: input.language,
});
} catch (error) {
if (error instanceof DeepSeekGuideClientError) {
throw new GuideStoreError(error.code, error.message, error.retryable);
}
throw new GuideStoreError(
"guide-ai-request-failed",
error instanceof Error ? error.message : "Guide draft generation failed.",
true,
);
}
}
const updatedSession = touchSession({
...session,
candidates,
generatedGuide: normalizeGeneratedGuide(generatedGuide) ?? generatedGuide,
status: "draft-ready",
});
await this.writeSession(updatedSession);
return updatedSession;
}
async saveGuide(input: SaveGuideInput): Promise<GuideSession> {
const session = await this.readSession(input.recordingId);
const generatedGuide = normalizeGeneratedGuide(input.generatedGuide);
if (!generatedGuide) {
throw new GuideStoreError("guide-invalid-input", "Generated guide shape is invalid.");
}
const updatedSession = touchSession({
...session,
generatedGuide,
status: "reviewed",
});
await this.writeSession(updatedSession);
return updatedSession;
}
async exportMarkdown(input: ExportGuideInput): Promise<ExportGuideResult> {
const session = await this.readSession(input.recordingId);
return await this.writeGuideExport(session, "guide.md", () => exportGuideToMarkdown(session));
}
async exportHtml(input: ExportGuideInput): Promise<ExportGuideResult> {
const session = await this.readSession(input.recordingId);
return await this.writeGuideExport(session, "guide.html", () => exportGuideToHtml(session));
}
async discardSession(input: DiscardGuideSessionInput): Promise<void> {
const paths = this.requireGuidePaths(input.recordingId);
const session = await this.readSession(input.recordingId).catch(() => null);
const guidePath = session?.guidePath ?? paths.guidePath;
const outputDir = session?.outputDir ?? paths.outputDir;
this.assertGuidePathIsAllowed(guidePath);
this.assertGuidePathIsAllowed(outputDir);
await fs.unlink(guidePath).catch(() => undefined);
await fs.rm(outputDir, { recursive: true, force: true });
}
private async writeGuideExport(
session: GuideSession,
fileName: string,
renderContent: () => string,
): Promise<ExportGuideResult> {
if (!session.generatedGuide) {
throw new GuideStoreError("guide-invalid-input", "Generate a guide draft before exporting.");
}
const exportPath = path.join(session.outputDir, fileName);
this.assertGuidePathIsAllowed(exportPath);
try {
await fs.mkdir(session.outputDir, { recursive: true });
await fs.writeFile(exportPath, renderContent(), "utf-8");
} catch (error) {
throw new GuideStoreError(
"guide-export-failed",
error instanceof Error ? error.message : "Guide export failed.",
true,
);
}
return { path: exportPath, session };
}
async writeSession(session: GuideSession): Promise<void> {
const normalized = normalizeGuideSession(session);
if (!normalized) {
throw new GuideStoreError("guide-invalid-schema", "Guide session schema is invalid.");
}
this.assertGuidePathIsAllowed(normalized.guidePath);
this.assertGuidePathIsAllowed(normalized.outputDir);
await fs.mkdir(path.dirname(normalized.guidePath), { recursive: true });
await fs.mkdir(normalized.outputDir, { recursive: true });
await atomicWriteJson(normalized.guidePath, normalized);
}
private async readSessionAtPath(guidePath: string): Promise<GuideSession> {
this.assertGuidePathIsAllowed(guidePath);
try {
const content = await fs.readFile(guidePath, "utf-8");
const session = normalizeGuideSession(JSON.parse(content));
if (!session) {
throw new GuideStoreError("guide-invalid-schema", "Guide session schema is invalid.");
}
return session;
} catch (error) {
if (error instanceof GuideStoreError) {
throw error;
}
const nodeError = error as NodeJS.ErrnoException;
if (nodeError.code === "ENOENT") {
throw new GuideStoreError("guide-session-not-found", "Guide session was not found.");
}
throw error;
}
}
private requireGuidePaths(
recordingIdInput: AddGuideMarkerInput["recordingId"],
videoPath?: string | null,
): GuidePaths {
const paths = resolveGuidePaths({
recordingsDir: this.recordingsDir,
recordingId: recordingIdInput,
videoPath,
});
if (!paths) {
throw new GuideStoreError("guide-invalid-input", "Guide recordingId is invalid.");
}
this.assertGuidePathIsAllowed(paths.guidePath);
this.assertGuidePathIsAllowed(paths.outputDir);
return paths;
}
private assertGuidePathIsAllowed(targetPath: string): void {
if (this.isPathAllowed(targetPath)) {
return;
}
throw new GuideStoreError(
"guide-invalid-input",
"Guide artifacts must be stored inside the recordings directory.",
);
}
private async resolveCursorPath(
videoPath: string,
explicitCursorPath?: string,
): Promise<string | undefined> {
const candidates = [
normalizeOptionalString(explicitCursorPath),
`${videoPath}.cursor.json`,
].filter((candidate): candidate is string => Boolean(candidate));
for (const candidate of candidates) {
const resolvedCandidate = path.resolve(candidate);
if (!this.isPathAllowed(resolvedCandidate)) {
continue;
}
try {
await fs.access(resolvedCandidate);
return resolvedCandidate;
} catch {
// Cursor telemetry is optional for guide sessions.
}
}
return undefined;
}
private async readCursorGuideEvents(
recordingId: string,
cursorPath: string,
): Promise<GuideEvent[]> {
try {
const content = await fs.readFile(cursorPath, "utf-8");
const parsed = JSON.parse(content) as unknown;
const rawSamples =
isRecord(parsed) && Array.isArray(parsed.samples) ? parsed.samples : parsed;
const samples = Array.isArray(rawSamples)
? rawSamples
.map(normalizeCursorSampleForGuide)
.filter((sample): sample is CursorRecordingSample => sample !== null)
: [];
return buildGuideEventsFromCursor({ recordingId, samples });
} catch (error) {
console.warn("Failed to read cursor telemetry for guide events:", error);
return [];
}
}
private isPathAllowed(targetPath: string): boolean {
const resolvedTarget = path.resolve(targetPath);
const resolvedRecordingsDir = path.resolve(this.recordingsDir);
const relative = path.relative(resolvedRecordingsDir, resolvedTarget);
return relative === "" || (!relative.startsWith("..") && !path.isAbsolute(relative));
}
}
function touchSession(session: GuideSession): GuideSession {
return {
...session,
updatedAt: new Date().toISOString(),
};
}
function sortGuideEvents(events: GuideEvent[]): GuideEvent[] {
return [...events].sort((left, right) => left.timeMs - right.timeMs);
}
function normalizeCursorSampleForGuide(input: unknown): CursorRecordingSample | null {
if (!isRecord(input)) {
return null;
}
const interactionType =
input.interactionType === "click" ||
input.interactionType === "mouseup" ||
input.interactionType === "move"
? input.interactionType
: "move";
const timeMs = normalizeNonNegativeNumber(input.timeMs);
const cx = normalizeOptionalNumber(input.cx);
const cy = normalizeOptionalNumber(input.cy);
if (timeMs === null || cx === undefined || cy === undefined) {
return null;
}
return {
timeMs,
cx,
cy,
interactionType,
};
}
async function atomicWriteJson(filePath: string, value: unknown): Promise<void> {
const tempPath = `${filePath}.${process.pid}.${Date.now()}.tmp`;
await fs.writeFile(tempPath, JSON.stringify(value, null, 2), "utf-8");
await fs.rename(tempPath, filePath);
}
function normalizeGuideSession(input: unknown): GuideSession | null {
if (!isRecord(input) || input.schemaVersion !== GUIDE_SCHEMA_VERSION) {
return null;
}
const recordingId = normalizeString(input.recordingId);
const videoPath = normalizeString(input.videoPath);
const guidePath = normalizeString(input.guidePath);
const outputDir = normalizeString(input.outputDir);
const status = normalizeSessionStatus(input.status);
const createdAt = normalizeString(input.createdAt);
const updatedAt = normalizeString(input.updatedAt);
if (
!recordingId ||
videoPath === null ||
!guidePath ||
!outputDir ||
!status ||
!createdAt ||
!updatedAt
) {
return null;
}
const generatedGuide =
input.generatedGuide === undefined ? undefined : normalizeGeneratedGuide(input.generatedGuide);
if (generatedGuide === null) {
return null;
}
return {
schemaVersion: GUIDE_SCHEMA_VERSION,
recordingId,
videoPath,
cursorPath: normalizeOptionalString(input.cursorPath),
guidePath,
outputDir,
status,
events: normalizeArray(input.events, normalizeGuideEvent),
snapshots: normalizeArray(input.snapshots, normalizeGuideSnapshot),
ocrBlocks: normalizeArray(input.ocrBlocks, normalizeOcrBlock),
candidates: normalizeArray(input.candidates, normalizeGuideStepCandidate),
generatedGuide,
createdAt,
updatedAt,
};
}
function normalizeGuideEvent(input: unknown): GuideEvent | null {
if (!isRecord(input)) {
return null;
}
const id = normalizeString(input.id);
const recordingId = normalizeString(input.recordingId);
const kind = VALID_EVENT_KINDS.has(input.kind as GuideEventKind)
? (input.kind as GuideEventKind)
: null;
const source = VALID_EVENT_SOURCES.has(input.source as GuideEventSource)
? (input.source as GuideEventSource)
: null;
const timeMs = normalizeNonNegativeNumber(input.timeMs);
const createdAt = normalizeString(input.createdAt);
if (!id || !recordingId || !kind || !source || timeMs === null || !createdAt) {
return null;
}
return {
id,
recordingId,
kind,
source,
timeMs,
x: normalizeOptionalNumber(input.x),
y: normalizeOptionalNumber(input.y),
normalizedX: normalizeOptionalNumber(input.normalizedX),
normalizedY: normalizeOptionalNumber(input.normalizedY),
button:
input.button === "left" ||
input.button === "right" ||
input.button === "middle" ||
input.button === "unknown"
? input.button
: undefined,
label: normalizeOptionalString(input.label),
screenshotOffsetMs: normalizeOptionalNumber(input.screenshotOffsetMs),
createdAt,
};
}
function normalizeGuideSnapshot(input: unknown): GuideSnapshot | null {
if (!isRecord(input)) {
return null;
}
const id = normalizeString(input.id);
const eventId = normalizeString(input.eventId);
const pathValue = normalizeString(input.path);
const timeMs = normalizeNonNegativeNumber(input.timeMs);
const offsetMs = normalizeOptionalNumber(input.offsetMs);
const width = normalizePositiveInteger(input.width);
const height = normalizePositiveInteger(input.height);
if (
!id ||
!eventId ||
!pathValue ||
timeMs === null ||
offsetMs === undefined ||
width === null ||
height === null
) {
return null;
}
return { id, eventId, timeMs, offsetMs, path: pathValue, width, height };
}
function normalizeOcrBlock(input: unknown): OcrBlock | null {
if (!isRecord(input) || !isRecord(input.box)) {
return null;
}
const id = normalizeString(input.id);
const snapshotId = normalizeString(input.snapshotId);
const text = normalizeString(input.text);
const confidence = normalizeOptionalNumber(input.confidence);
const x = normalizeOptionalNumber(input.box.x);
const y = normalizeOptionalNumber(input.box.y);
const width = normalizeOptionalNumber(input.box.width);
const height = normalizeOptionalNumber(input.box.height);
if (
!id ||
!snapshotId ||
text === null ||
confidence === undefined ||
x === undefined ||
y === undefined ||
width === undefined ||
height === undefined
) {
return null;
}
return { id, snapshotId, text, confidence, box: { x, y, width, height } };
}
function normalizeGuideStepCandidate(input: unknown): GuideStepCandidate | null {
if (!isRecord(input)) {
return null;
}
const id = normalizeString(input.id);
const eventId = normalizeString(input.eventId);
const timeMs = normalizeNonNegativeNumber(input.timeMs);
const confidence = normalizeOptionalNumber(input.confidence);
const nearbyText = Array.isArray(input.nearbyText)
? input.nearbyText.map(normalizeString).filter((text): text is string => text !== null)
: [];
if (!id || !eventId || timeMs === null || confidence === undefined) {
return null;
}
return {
id,
eventId,
snapshotId: normalizeOptionalString(input.snapshotId),
timeMs,
action:
input.action === "click" ||
input.action === "choose" ||
input.action === "type" ||
input.action === "wait" ||
input.action === "manual"
? input.action
: "manual",
targetText: normalizeOptionalString(input.targetText),
targetRole:
input.targetRole === "button" ||
input.targetRole === "menu" ||
input.targetRole === "tab" ||
input.targetRole === "field" ||
input.targetRole === "link" ||
input.targetRole === "unknown"
? input.targetRole
: undefined,
nearbyText,
confidence,
};
}
function normalizeGeneratedGuide(input: unknown): GeneratedGuide | null {
if (!isRecord(input)) {
return null;
}
const title = normalizeString(input.title);
if (!title || !Array.isArray(input.steps)) {
return null;
}
const steps = input.steps
.map((step): GeneratedGuideStep | null => {
if (!isRecord(step)) {
return null;
}
const id = normalizeString(step.id);
const order = normalizePositiveInteger(step.order);
const stepTitle = normalizeString(step.title);
const instruction = normalizeString(step.instruction);
if (!id || order === null || !stepTitle || !instruction) {
return null;
}
return {
id,
order,
title: stepTitle,
instruction,
screenshotPath: normalizeOptionalString(step.screenshotPath),
sourceCandidateId: normalizeOptionalString(step.sourceCandidateId),
};
})
.filter((step): step is GeneratedGuide["steps"][number] => step !== null);
return {
title,
summary: normalizeOptionalString(input.summary),
steps,
};
}
function normalizeArray<T>(input: unknown, normalize: (value: unknown) => T | null): T[] {
return Array.isArray(input)
? input.map((value) => normalize(value)).filter((value): value is T => value !== null)
: [];
}
function normalizeSessionStatus(value: unknown): GuideSessionStatus | null {
return VALID_SESSION_STATUSES.has(value as GuideSessionStatus)
? (value as GuideSessionStatus)
: null;
}
function normalizeString(value: unknown): string | null {
return typeof value === "string" ? value : null;
}
function normalizeOptionalString(value: unknown): string | undefined {
const text = normalizeString(value);
return text === null || text.length === 0 ? undefined : text;
}
function normalizeNonNegativeNumber(value: unknown): number | null {
return typeof value === "number" && Number.isFinite(value) && value >= 0 ? value : null;
}
function normalizeOptionalNumber(value: unknown): number | undefined {
return typeof value === "number" && Number.isFinite(value) ? value : undefined;
}
function normalizePositiveInteger(value: unknown): number | null {
return typeof value === "number" && Number.isFinite(value) && value > 0
? Math.round(value)
: null;
}
function isRecord(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null;
}
+232
View File
@@ -0,0 +1,232 @@
import { type ChildProcessWithoutNullStreams, spawn } from "node:child_process";
import fs from "node:fs/promises";
import path from "node:path";
import { app } from "electron";
const DEFAULT_OCR_BASE_URL = "http://127.0.0.1:8866";
const DEFAULT_OCR_PORT = "8866";
const SERVICE_EXE_NAME = "openscreen-ocr-service.exe";
const HEALTH_TIMEOUT_MS = 1000;
const STARTUP_TIMEOUT_MS = 90000;
const PADDLEX_MODEL_NAMES = ["PP-OCRv5_mobile_det", "latin_PP-OCRv5_mobile_rec"];
let ocrProcess: ChildProcessWithoutNullStreams | null = null;
let startupPromise: Promise<void> | null = null;
let quitHookRegistered = false;
export async function ensureBundledOcrServiceRunning(
baseUrl = DEFAULT_OCR_BASE_URL,
): Promise<void> {
if (!shouldManageOcrService(baseUrl)) {
return;
}
if (await isOcrServiceHealthy(baseUrl, HEALTH_TIMEOUT_MS)) {
return;
}
const executablePath = await findBundledOcrServiceExecutable();
if (!executablePath) {
return;
}
if (!startupPromise) {
startupPromise = startAndWaitForOcrService(executablePath, baseUrl).finally(() => {
startupPromise = null;
});
}
await startupPromise;
}
function shouldManageOcrService(baseUrl: string): boolean {
try {
const url = new URL(baseUrl);
const hostname = url.hostname.toLowerCase();
return (
(url.protocol === "http:" || url.protocol === "https:") &&
(hostname === "127.0.0.1" || hostname === "localhost") &&
(url.port === "" || url.port === DEFAULT_OCR_PORT)
);
} catch {
return false;
}
}
async function findBundledOcrServiceExecutable(): Promise<string | null> {
const candidates = [
process.env.OPENSCREEN_GUIDE_OCR_EXE,
path.join(process.resourcesPath, "ocr-service", SERVICE_EXE_NAME),
path.join(process.resourcesPath, "ocr-service", "openscreen-ocr-service", SERVICE_EXE_NAME),
path.resolve(process.cwd(), "tools", "ocr", "dist", "openscreen-ocr-service", SERVICE_EXE_NAME),
].filter(
(candidate): candidate is string => typeof candidate === "string" && candidate.length > 0,
);
for (const candidate of candidates) {
try {
const stats = await fs.stat(candidate);
if (stats.isFile()) {
return candidate;
}
} catch {
// Try the next candidate.
}
}
return null;
}
async function startAndWaitForOcrService(executablePath: string, baseUrl: string): Promise<void> {
const runtimePaths = await prepareOcrRuntimePaths();
if (!ocrProcess || ocrProcess.exitCode !== null || ocrProcess.killed) {
startOcrServiceProcess(executablePath, runtimePaths);
}
await waitForOcrServiceHealth(baseUrl, STARTUP_TIMEOUT_MS);
}
async function prepareOcrRuntimePaths(): Promise<{
modelCachePath: string;
paddlexCachePath: string;
}> {
const modelCachePath = path.join(app.getPath("userData"), "ocr-models");
const paddlexCachePath = path.join(modelCachePath, "paddlex");
await seedBundledPaddlexModels(paddlexCachePath);
return { modelCachePath, paddlexCachePath };
}
async function seedBundledPaddlexModels(destinationCachePath: string): Promise<void> {
const sourceCachePath = await findBundledPaddlexModelCache();
if (!sourceCachePath) {
return;
}
const sourceOfficialModels = path.join(sourceCachePath, "official_models");
const destinationOfficialModels = path.join(destinationCachePath, "official_models");
await fs.mkdir(destinationOfficialModels, { recursive: true });
for (const modelName of PADDLEX_MODEL_NAMES) {
const sourceModelPath = path.join(sourceOfficialModels, modelName);
const destinationModelPath = path.join(destinationOfficialModels, modelName);
if (!(await pathExists(sourceModelPath)) || (await pathExists(destinationModelPath))) {
continue;
}
await fs.cp(sourceModelPath, destinationModelPath, {
recursive: true,
errorOnExist: false,
force: false,
});
}
}
async function findBundledPaddlexModelCache(): Promise<string | null> {
const candidates = [
path.join(process.resourcesPath, "ocr-models", "paddlex"),
path.resolve(process.cwd(), "tools", "ocr", "models", "paddlex"),
];
for (const candidate of candidates) {
try {
const stats = await fs.stat(candidate);
if (stats.isDirectory()) {
return candidate;
}
} catch {
// Try the next candidate.
}
}
return null;
}
async function pathExists(value: string): Promise<boolean> {
try {
await fs.access(value);
return true;
} catch {
return false;
}
}
function startOcrServiceProcess(
executablePath: string,
runtimePaths: { modelCachePath: string; paddlexCachePath: string },
): void {
registerQuitHook();
ocrProcess = spawn(executablePath, [], {
cwd: path.dirname(executablePath),
env: {
...process.env,
OPENSCREEN_OCR_HOST: "127.0.0.1",
OPENSCREEN_OCR_PORT: DEFAULT_OCR_PORT,
PADDLEOCR_DEVICE: process.env.PADDLEOCR_DEVICE ?? "cpu",
PADDLEOCR_ENABLE_MKLDNN: process.env.PADDLEOCR_ENABLE_MKLDNN ?? "0",
PADDLEOCR_LANG: process.env.PADDLEOCR_LANG ?? "latin",
PADDLEOCR_USE_MOBILE: process.env.PADDLEOCR_USE_MOBILE ?? "1",
PADDLE_PDX_ENABLE_MKLDNN_BYDEFAULT: process.env.PADDLE_PDX_ENABLE_MKLDNN_BYDEFAULT ?? "False",
PADDLE_PDX_CACHE_HOME: process.env.PADDLE_PDX_CACHE_HOME ?? runtimePaths.paddlexCachePath,
PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK:
process.env.PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK ?? "True",
PADDLE_HOME: process.env.PADDLE_HOME ?? path.join(runtimePaths.modelCachePath, "paddle"),
PADDLEOCR_HOME:
process.env.PADDLEOCR_HOME ?? path.join(runtimePaths.modelCachePath, "paddleocr"),
PYTHONUTF8: "1",
},
windowsHide: true,
});
ocrProcess.stdout.on("data", (chunk) => {
console.info(`[guide-ocr-service] ${chunk.toString().trim()}`);
});
ocrProcess.stderr.on("data", (chunk) => {
console.warn(`[guide-ocr-service] ${chunk.toString().trim()}`);
});
ocrProcess.on("exit", (code, signal) => {
console.info("[guide-ocr-service] exited", { code, signal });
ocrProcess = null;
});
}
function registerQuitHook(): void {
if (quitHookRegistered) {
return;
}
quitHookRegistered = true;
app.once("before-quit", () => {
const processToStop = ocrProcess;
ocrProcess = null;
processToStop?.kill();
});
}
async function waitForOcrServiceHealth(baseUrl: string, timeoutMs: number): Promise<void> {
const startedAt = Date.now();
let lastError: unknown;
while (Date.now() - startedAt < timeoutMs) {
if (await isOcrServiceHealthy(baseUrl, HEALTH_TIMEOUT_MS)) {
return;
}
if (ocrProcess?.exitCode !== null && ocrProcess?.exitCode !== undefined) {
throw new Error(`Bundled OCR service exited with code ${ocrProcess.exitCode}.`);
}
await sleep(750);
}
if (lastError instanceof Error) {
throw lastError;
}
throw new Error("Timed out waiting for bundled OCR service to start.");
}
async function isOcrServiceHealthy(baseUrl: string, timeoutMs: number): Promise<boolean> {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeoutMs);
try {
const response = await fetch(`${baseUrl.replace(/\/$/, "")}/health`, {
signal: controller.signal,
});
return response.ok;
} catch {
return false;
} finally {
clearTimeout(timeoutId);
}
}
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
@@ -0,0 +1,33 @@
import { describe, expect, it } from "vitest";
import type { OcrBlock } from "../../../src/guide/contracts";
import { remapFocusedOcrBlocks } from "./focusedOcrSnapshot";
describe("remapFocusedOcrBlocks", () => {
it("maps boxes from a focused crop back to the original snapshot coordinates", () => {
const blocks: OcrBlock[] = [
{
id: "ocr-1",
snapshotId: "snapshot-1",
text: "Settings",
confidence: 0.9,
box: { x: 0.25, y: 0.5, width: 0.2, height: 0.1 },
},
];
const remapped = remapFocusedOcrBlocks(blocks, {
cropX: 320,
cropY: 180,
cropWidth: 640,
cropHeight: 360,
originalWidth: 1280,
originalHeight: 720,
});
expect(remapped[0]?.box).toEqual({
x: 0.375,
y: 0.5,
width: 0.1,
height: 0.05,
});
});
});
+225
View File
@@ -0,0 +1,225 @@
import { execFile } from "node:child_process";
import fs from "node:fs/promises";
import path from "node:path";
import { promisify } from "node:util";
import type { GuideEvent, GuideSnapshot, OcrBlock } from "../../../src/guide/contracts";
const execFileAsync = promisify(execFile);
interface FocusTransform {
cropX: number;
cropY: number;
cropWidth: number;
cropHeight: number;
originalWidth: number;
originalHeight: number;
}
export interface FocusedOcrSnapshot {
snapshot: GuideSnapshot;
transform?: FocusTransform;
}
export async function createFocusedOcrSnapshot(input: {
snapshot: GuideSnapshot;
event?: GuideEvent;
outputDir: string;
}): Promise<FocusedOcrSnapshot> {
if (process.platform !== "win32") {
return { snapshot: input.snapshot };
}
const click = getEventPoint(input.event, input.snapshot);
if (!click) {
return { snapshot: input.snapshot };
}
const crop = calculateFocusCrop(input.snapshot, click);
if (
!crop ||
(crop.cropWidth === input.snapshot.width && crop.cropHeight === input.snapshot.height)
) {
return { snapshot: input.snapshot };
}
const focusDir = path.join(input.outputDir, "ocr-focus");
await fs.mkdir(focusDir, { recursive: true });
const focusPath = path.join(focusDir, `${path.parse(input.snapshot.path).name}-focus.png`);
const zoom = 2;
const focusedSnapshot: GuideSnapshot = {
...input.snapshot,
path: focusPath,
width: crop.cropWidth * zoom,
height: crop.cropHeight * zoom,
};
try {
await writeFocusedPng({
sourcePath: input.snapshot.path,
outputPath: focusPath,
cropX: crop.cropX,
cropY: crop.cropY,
cropWidth: crop.cropWidth,
cropHeight: crop.cropHeight,
outputWidth: focusedSnapshot.width,
outputHeight: focusedSnapshot.height,
});
return { snapshot: focusedSnapshot, transform: crop };
} catch {
return { snapshot: input.snapshot };
}
}
export function remapFocusedOcrBlocks(
blocks: OcrBlock[],
transform: FocusedOcrSnapshot["transform"],
): OcrBlock[] {
if (!transform) {
return blocks;
}
return blocks.map((block) => ({
...block,
box: {
x: clamp01((transform.cropX + block.box.x * transform.cropWidth) / transform.originalWidth),
y: clamp01((transform.cropY + block.box.y * transform.cropHeight) / transform.originalHeight),
width: clamp01((block.box.width * transform.cropWidth) / transform.originalWidth),
height: clamp01((block.box.height * transform.cropHeight) / transform.originalHeight),
},
}));
}
function getEventPoint(
event: GuideEvent | undefined,
snapshot: GuideSnapshot,
): { x: number; y: number } | null {
if (!event) {
return null;
}
if (isNormalizedNumber(event.normalizedX) && isNormalizedNumber(event.normalizedY)) {
return { x: event.normalizedX, y: event.normalizedY };
}
if (isNormalizedNumber(event.x) && isNormalizedNumber(event.y)) {
return { x: event.x, y: event.y };
}
if (
typeof event.x === "number" &&
typeof event.y === "number" &&
event.x >= 0 &&
event.y >= 0 &&
event.x <= snapshot.width &&
event.y <= snapshot.height
) {
return { x: clamp01(event.x / snapshot.width), y: clamp01(event.y / snapshot.height) };
}
return null;
}
function calculateFocusCrop(
snapshot: GuideSnapshot,
click: { x: number; y: number },
): FocusTransform | null {
if (snapshot.width <= 0 || snapshot.height <= 0) {
return null;
}
const cropWidth = clampInteger(
Math.round(snapshot.width * 0.42),
Math.min(360, snapshot.width),
Math.min(720, snapshot.width),
);
const cropHeight = clampInteger(
Math.round(snapshot.height * 0.42),
Math.min(240, snapshot.height),
Math.min(520, snapshot.height),
);
const clickX = Math.round(clamp01(click.x) * snapshot.width);
const clickY = Math.round(clamp01(click.y) * snapshot.height);
return {
cropX: clampInteger(Math.round(clickX - cropWidth / 2), 0, snapshot.width - cropWidth),
cropY: clampInteger(Math.round(clickY - cropHeight / 2), 0, snapshot.height - cropHeight),
cropWidth,
cropHeight,
originalWidth: snapshot.width,
originalHeight: snapshot.height,
};
}
async function writeFocusedPng(input: {
sourcePath: string;
outputPath: string;
cropX: number;
cropY: number;
cropWidth: number;
cropHeight: number;
outputWidth: number;
outputHeight: number;
}): Promise<void> {
const script = buildCropScript(input);
const encodedCommand = Buffer.from(script, "utf16le").toString("base64");
await execFileAsync(
"powershell.exe",
["-NoProfile", "-ExecutionPolicy", "Bypass", "-EncodedCommand", encodedCommand],
{
timeout: 30000,
maxBuffer: 1024 * 1024,
windowsHide: true,
},
);
}
function buildCropScript(input: {
sourcePath: string;
outputPath: string;
cropX: number;
cropY: number;
cropWidth: number;
cropHeight: number;
outputWidth: number;
outputHeight: number;
}): string {
const sourcePathBase64 = Buffer.from(input.sourcePath, "utf8").toString("base64");
const outputPathBase64 = Buffer.from(input.outputPath, "utf8").toString("base64");
return `
$ErrorActionPreference = "Stop"
$sourcePath = [System.Text.Encoding]::UTF8.GetString([Convert]::FromBase64String("${sourcePathBase64}"))
$outputPath = [System.Text.Encoding]::UTF8.GetString([Convert]::FromBase64String("${outputPathBase64}"))
Add-Type -AssemblyName System.Drawing
$source = [System.Drawing.Image]::FromFile($sourcePath)
$target = [System.Drawing.Bitmap]::new(${input.outputWidth}, ${input.outputHeight})
$graphics = [System.Drawing.Graphics]::FromImage($target)
try {
$graphics.Clear([System.Drawing.Color]::White)
$graphics.InterpolationMode = [System.Drawing.Drawing2D.InterpolationMode]::HighQualityBicubic
$graphics.SmoothingMode = [System.Drawing.Drawing2D.SmoothingMode]::HighQuality
$graphics.PixelOffsetMode = [System.Drawing.Drawing2D.PixelOffsetMode]::HighQuality
$sourceRect = [System.Drawing.Rectangle]::new(${input.cropX}, ${input.cropY}, ${input.cropWidth}, ${input.cropHeight})
$targetRect = [System.Drawing.Rectangle]::new(0, 0, ${input.outputWidth}, ${input.outputHeight})
$graphics.DrawImage($source, $targetRect, $sourceRect, [System.Drawing.GraphicsUnit]::Pixel)
$target.Save($outputPath, [System.Drawing.Imaging.ImageFormat]::Png)
} finally {
$graphics.Dispose()
$target.Dispose()
$source.Dispose()
}
`;
}
function isNormalizedNumber(value: unknown): value is number {
return typeof value === "number" && Number.isFinite(value) && value >= 0 && value <= 1;
}
function clampInteger(value: number, min: number, max: number): number {
if (max < min) {
return min;
}
return Math.round(Math.min(max, Math.max(min, value)));
}
function clamp01(value: number): number {
if (!Number.isFinite(value)) {
return 0;
}
return Math.min(1, Math.max(0, value));
}
+110
View File
@@ -0,0 +1,110 @@
import { describe, expect, it } from "vitest";
import type { GuideSnapshot, OcrBlock } from "../../../src/guide/contracts";
import {
DefaultGuideOcrClient,
normalizeOcrResponse,
parseWindowsOcrPayload,
} from "./paddleOcrClient";
const snapshot: GuideSnapshot = {
id: "snapshot-1",
eventId: "event-1",
timeMs: 1000,
offsetMs: 500,
path: "/tmp/step-001.png",
width: 1000,
height: 800,
};
describe("normalizeOcrResponse", () => {
it("normalizes pixel boxes into guide OCR blocks", () => {
const blocks = normalizeOcrResponse(
{
blocks: [
{
text: "Save",
confidence: 92,
box: { x: 400, y: 320, width: 120, height: 40 },
},
],
},
snapshot,
);
expect(blocks).toEqual([
{
id: "ocr-snapshot-1-1",
snapshotId: "snapshot-1",
text: "Save",
confidence: 0.92,
box: { x: 0.4, y: 0.4, width: 0.12, height: 0.05 },
},
]);
});
it("normalizes polygon responses", () => {
const blocks = normalizeOcrResponse(
[
{
text: "Next",
score: 0.8,
bbox: [
[100, 200],
[300, 200],
[300, 260],
[100, 260],
],
},
],
snapshot,
);
expect(blocks[0]).toMatchObject({
text: "Next",
confidence: 0.8,
box: { x: 0.1, y: 0.25, width: 0.2, height: 0.075 },
});
});
});
describe("DefaultGuideOcrClient", () => {
it("falls back when the HTTP OCR service is unavailable", async () => {
const fallbackBlock: OcrBlock = {
id: "ocr-snapshot-1-1",
snapshotId: "snapshot-1",
text: "Save",
confidence: 0.75,
box: { x: 0.1, y: 0.2, width: 0.3, height: 0.4 },
};
const client = new DefaultGuideOcrClient(
{
recognize: async () => {
throw new Error("HTTP down");
},
},
{
recognize: async () => [fallbackBlock],
},
);
await expect(client.recognize(snapshot)).resolves.toEqual([fallbackBlock]);
});
});
describe("parseWindowsOcrPayload", () => {
it("recovers from raw control characters in OCR text", () => {
const payload = parseWindowsOcrPayload(
'{"blocks":[{"text":"Save\u0001now","confidence":0.75,"box":{"x":1,"y":2,"width":3,"height":4}}]}',
);
expect(payload).toEqual({
blocks: [
{
text: "Save now",
confidence: 0.75,
box: { x: 1, y: 2, width: 3, height: 4 },
},
],
});
});
});
+372
View File
@@ -0,0 +1,372 @@
import { execFile } from "node:child_process";
import fs from "node:fs/promises";
import { promisify } from "node:util";
import type { GuideSnapshot, OcrBlock } from "../../../src/guide/contracts";
import { ensureBundledOcrServiceRunning } from "./bundledOcrService";
const execFileAsync = promisify(execFile);
export interface GuideOcrClient {
recognize(snapshot: GuideSnapshot): Promise<OcrBlock[]>;
}
interface PaddleOcrResponseBlock {
text?: unknown;
confidence?: unknown;
score?: unknown;
box?: unknown;
bbox?: unknown;
}
export class PaddleOcrHttpClient implements GuideOcrClient {
constructor(
private readonly baseUrl = process.env.OPENSCREEN_GUIDE_OCR_URL ?? "http://127.0.0.1:8866",
private readonly language = process.env.OPENSCREEN_GUIDE_OCR_LANGUAGE ?? "vi,en",
) {}
async recognize(snapshot: GuideSnapshot): Promise<OcrBlock[]> {
await ensureBundledOcrServiceRunning(this.baseUrl);
const imageBase64 = await fs.readFile(snapshot.path, "base64");
let response: Response;
try {
response = await fetch(`${this.baseUrl.replace(/\/$/, "")}/ocr`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({
imageBase64,
path: snapshot.path,
language: this.language,
}),
});
} catch (error) {
throw new Error(
`OCR service is unavailable: ${error instanceof Error ? error.message : String(error)}`,
);
}
if (!response.ok) {
throw new Error(`OCR service returned HTTP ${response.status}.`);
}
const payload = (await response.json()) as unknown;
return normalizeOcrResponse(payload, snapshot);
}
}
export class WindowsOcrClient implements GuideOcrClient {
constructor(private readonly language = process.env.OPENSCREEN_GUIDE_OCR_LANGUAGE ?? "vi,en") {}
async recognize(snapshot: GuideSnapshot): Promise<OcrBlock[]> {
if (process.platform !== "win32") {
throw new Error("Windows OCR fallback is only available on Windows.");
}
const script = buildWindowsOcrScript(snapshot.path, this.language);
const encodedCommand = Buffer.from(script, "utf16le").toString("base64");
let stdout: string;
try {
const result = await execFileAsync(
"powershell.exe",
["-NoProfile", "-ExecutionPolicy", "Bypass", "-EncodedCommand", encodedCommand],
{
maxBuffer: 8 * 1024 * 1024,
timeout: 30000,
windowsHide: true,
},
);
stdout = result.stdout;
} catch (error) {
throw new Error(
`Windows OCR failed: ${error instanceof Error ? error.message : String(error)}`,
);
}
let payload: unknown;
try {
payload = parseWindowsOcrPayload(stdout);
} catch (error) {
throw new Error(
`Windows OCR returned invalid JSON: ${
error instanceof Error ? error.message : String(error)
}`,
);
}
return normalizeOcrResponse(payload, snapshot);
}
}
export class DefaultGuideOcrClient implements GuideOcrClient {
constructor(
private readonly httpClient = new PaddleOcrHttpClient(),
private readonly windowsClient = new WindowsOcrClient(),
) {}
async recognize(snapshot: GuideSnapshot): Promise<OcrBlock[]> {
try {
return await this.httpClient.recognize(snapshot);
} catch (httpError) {
try {
return await this.windowsClient.recognize(snapshot);
} catch (fallbackError) {
throw new Error(
[
httpError instanceof Error ? httpError.message : String(httpError),
fallbackError instanceof Error ? fallbackError.message : String(fallbackError),
].join(" "),
);
}
}
}
}
export function parseWindowsOcrPayload(stdout: string): unknown {
const normalized = stdout.replace(/^\uFEFF/, "").trim();
try {
return JSON.parse(normalized);
} catch {
return JSON.parse(replaceRawJsonControlCharacters(normalized));
}
}
function replaceRawJsonControlCharacters(value: string): string {
let result = "";
for (const character of value) {
const code = character.charCodeAt(0);
result += code < 32 || code === 127 ? " " : character;
}
return result;
}
export function normalizeOcrResponse(payload: unknown, snapshot: GuideSnapshot): OcrBlock[] {
const rawBlocks = extractRawBlocks(payload);
return rawBlocks
.map((raw, index) => normalizeBlock(raw, snapshot, index))
.filter((block): block is OcrBlock => block !== null);
}
function extractRawBlocks(payload: unknown): PaddleOcrResponseBlock[] {
if (Array.isArray(payload)) {
return payload as PaddleOcrResponseBlock[];
}
if (isRecord(payload)) {
if (Array.isArray(payload.blocks)) {
return payload.blocks as PaddleOcrResponseBlock[];
}
if (Array.isArray(payload.results)) {
return payload.results as PaddleOcrResponseBlock[];
}
if (Array.isArray(payload.data)) {
return payload.data as PaddleOcrResponseBlock[];
}
}
return [];
}
function normalizeBlock(
raw: PaddleOcrResponseBlock,
snapshot: GuideSnapshot,
index: number,
): OcrBlock | null {
if (!isRecord(raw)) {
return null;
}
const text = typeof raw.text === "string" ? raw.text.trim() : "";
if (!text) {
return null;
}
const confidence = normalizeConfidence(raw.confidence ?? raw.score);
const box = normalizeBox(raw.box ?? raw.bbox, snapshot);
if (!box) {
return null;
}
return {
id: `ocr-${snapshot.id}-${index + 1}`,
snapshotId: snapshot.id,
text,
confidence,
box,
};
}
function normalizeConfidence(value: unknown): number {
if (typeof value !== "number" || !Number.isFinite(value)) {
return 0.5;
}
return value > 1 ? clamp01(value / 100) : clamp01(value);
}
function normalizeBox(
value: unknown,
snapshot: GuideSnapshot,
): { x: number; y: number; width: number; height: number } | null {
if (Array.isArray(value)) {
return normalizeArrayBox(value, snapshot);
}
if (!isRecord(value)) {
return null;
}
const x = normalizeNumber(value.x);
const y = normalizeNumber(value.y);
const width = normalizeNumber(value.width ?? value.w);
const height = normalizeNumber(value.height ?? value.h);
if (x === null || y === null || width === null || height === null) {
return null;
}
return normalizeBoxDimensions({ x, y, width, height }, snapshot);
}
function normalizeArrayBox(
value: unknown[],
snapshot: GuideSnapshot,
): { x: number; y: number; width: number; height: number } | null {
const numbers = value.flat(2).filter((item): item is number => typeof item === "number");
if (numbers.length >= 8) {
const xs = [numbers[0], numbers[2], numbers[4], numbers[6]];
const ys = [numbers[1], numbers[3], numbers[5], numbers[7]];
const minX = Math.min(...xs);
const maxX = Math.max(...xs);
const minY = Math.min(...ys);
const maxY = Math.max(...ys);
return normalizeBoxDimensions(
{ x: minX, y: minY, width: maxX - minX, height: maxY - minY },
snapshot,
);
}
if (numbers.length >= 4) {
return normalizeBoxDimensions(
{ x: numbers[0] ?? 0, y: numbers[1] ?? 0, width: numbers[2] ?? 0, height: numbers[3] ?? 0 },
snapshot,
);
}
return null;
}
function normalizeBoxDimensions(
box: { x: number; y: number; width: number; height: number },
snapshot: GuideSnapshot,
): { x: number; y: number; width: number; height: number } {
const usesPixels =
box.x > 1 ||
box.y > 1 ||
box.width > 1 ||
box.height > 1 ||
box.x + box.width > 1 ||
box.y + box.height > 1;
const scaleX = usesPixels ? snapshot.width : 1;
const scaleY = usesPixels ? snapshot.height : 1;
return {
x: clamp01(box.x / scaleX),
y: clamp01(box.y / scaleY),
width: clamp01(box.width / scaleX),
height: clamp01(box.height / scaleY),
};
}
function normalizeNumber(value: unknown): number | null {
return typeof value === "number" && Number.isFinite(value) ? value : null;
}
function clamp01(value: number): number {
if (!Number.isFinite(value)) {
return 0;
}
return Math.min(1, Math.max(0, value));
}
function isRecord(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null;
}
function buildWindowsOcrScript(imagePath: string, language: string): string {
const imagePathBase64 = Buffer.from(imagePath, "utf8").toString("base64");
const languageBase64 = Buffer.from(language, "utf8").toString("base64");
return `
$ErrorActionPreference = "Stop"
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new($false)
$OutputEncoding = [System.Text.UTF8Encoding]::new($false)
$imagePath = [System.Text.Encoding]::UTF8.GetString([Convert]::FromBase64String("${imagePathBase64}"))
$languageSetting = [System.Text.Encoding]::UTF8.GetString([Convert]::FromBase64String("${languageBase64}"))
Add-Type -AssemblyName System.Runtime.WindowsRuntime
[void][Windows.Storage.StorageFile, Windows.Storage, ContentType=WindowsRuntime]
[void][Windows.Storage.FileAccessMode, Windows.Storage, ContentType=WindowsRuntime]
[void][Windows.Graphics.Imaging.BitmapDecoder, Windows.Graphics.Imaging, ContentType=WindowsRuntime]
[void][Windows.Graphics.Imaging.SoftwareBitmap, Windows.Graphics.Imaging, ContentType=WindowsRuntime]
[void][Windows.Media.Ocr.OcrEngine, Windows.Foundation, ContentType=WindowsRuntime]
[void][Windows.Globalization.Language, Windows.Globalization, ContentType=WindowsRuntime]
$asTaskGeneric = ([System.WindowsRuntimeSystemExtensions].GetMethods() | Where-Object {
$_.Name -eq "AsTask" -and $_.IsGenericMethodDefinition -and $_.GetParameters().Count -eq 1
})[0]
function Await-WinRt($operation, [Type]$resultType) {
$asTask = $asTaskGeneric.MakeGenericMethod($resultType)
$task = $asTask.Invoke($null, @($operation))
$task.Wait()
return $task.Result
}
function New-OcrEngine($languageSetting) {
$languageTags = @()
foreach ($item in $languageSetting.Split(",")) {
$tag = $item.Trim()
if ($tag -eq "vi") { $tag = "vi-VN" }
if ($tag -eq "en") { $tag = "en-US" }
if ($tag.Length -gt 0) { $languageTags += $tag }
}
foreach ($tag in $languageTags) {
try {
$language = [Windows.Globalization.Language]::new($tag)
$engine = [Windows.Media.Ocr.OcrEngine]::TryCreateFromLanguage($language)
if ($null -ne $engine) { return $engine }
} catch {}
}
$profileEngine = [Windows.Media.Ocr.OcrEngine]::TryCreateFromUserProfileLanguages()
if ($null -ne $profileEngine) { return $profileEngine }
return [Windows.Media.Ocr.OcrEngine]::TryCreateFromLanguage([Windows.Globalization.Language]::new("en-US"))
}
function Normalize-OcrText($value) {
if ($null -eq $value) { return "" }
$text = [string]$value
$text = [System.Text.RegularExpressions.Regex]::Replace($text, "[\\x00-\\x1F\\x7F]", " ")
return $text.Trim()
}
$file = Await-WinRt ([Windows.Storage.StorageFile]::GetFileFromPathAsync($imagePath)) ([Windows.Storage.StorageFile])
$stream = Await-WinRt ($file.OpenAsync([Windows.Storage.FileAccessMode]::Read)) ([Windows.Storage.Streams.IRandomAccessStream])
$decoder = Await-WinRt ([Windows.Graphics.Imaging.BitmapDecoder]::CreateAsync($stream)) ([Windows.Graphics.Imaging.BitmapDecoder])
$bitmap = Await-WinRt ($decoder.GetSoftwareBitmapAsync()) ([Windows.Graphics.Imaging.SoftwareBitmap])
$engine = New-OcrEngine $languageSetting
if ($null -eq $engine) { throw "No Windows OCR engine is available." }
$result = Await-WinRt ($engine.RecognizeAsync($bitmap)) ([Windows.Media.Ocr.OcrResult])
$blocks = @()
$index = 0
foreach ($line in $result.Lines) {
foreach ($word in $line.Words) {
$rect = $word.BoundingRect
$text = Normalize-OcrText $word.Text
if ($text.Length -gt 0) {
$index += 1
$blocks += [PSCustomObject]@{
text = $text
confidence = 0.75
box = [PSCustomObject]@{
x = [double]$rect.X
y = [double]$rect.Y
width = [double]$rect.Width
height = [double]$rect.Height
}
}
}
}
}
[PSCustomObject]@{ blocks = $blocks } | ConvertTo-Json -Depth 6 -Compress
`;
}
+11
View File
@@ -35,6 +35,9 @@ import type {
ProjectFileResult,
ProjectPathResult,
} from "../../src/native/contracts";
import { DeepSeekSettingsStore } from "../guide/ai/deepseekSettingsStore";
import { registerGuideIpcHandlers } from "../guide/guideIpc";
import { GuideStore } from "../guide/guideStore";
import { mainT } from "../i18n";
import { RECORDINGS_DIR } from "../main";
import { createCursorRecordingSession } from "../native-bridge/cursor/recording/factory";
@@ -2172,6 +2175,14 @@ export function registerIpcHandlers(
// never buffers the full video in memory (the #616 fix).
const recordingStreams = new RecordingStreamRegistry();
registerRecordingStreamHandlers(ipcMain, recordingStreams, resolveRecordingOutputPath);
const guideAiSettingsStore = new DeepSeekSettingsStore(
path.join(app.getPath("userData"), "guide-ai-settings.json"),
);
registerGuideIpcHandlers(
ipcMain,
new GuideStore(RECORDINGS_DIR, { deepSeekConfigProvider: guideAiSettingsStore }),
guideAiSettingsStore,
);
ipcMain.handle("store-recorded-session", async (_, payload: StoreRecordedSessionInput) => {
try {
+2 -2
View File
@@ -632,8 +632,8 @@ int main(int argc, char* argv[]) {
(webcamOutputFrameIndex * 10'000'000ULL) / std::max(1, webcamCapture.fps()));
if (!webcamEncoder.writeBgraFrame(webcamFrame, webcamTimestampHns)) {
encodeFailed = true;
stopRequested = true;
cv.notify_all();
control.stopRequested = true;
control.cv.notify_all();
return;
}
lastWrittenWebcamSequence = latestWebcamSequence;
+52
View File
@@ -1,4 +1,15 @@
import { contextBridge, ipcRenderer } from "electron";
import type {
AddGuideMarkerInput,
DiscardGuideSessionInput,
ExportGuideInput,
FinalizeGuideEventsInput,
GenerateGuideDraftInput,
RunGuideOcrInput,
SaveGuideAiSettingsInput,
SaveGuideInput,
WriteGuideSnapshotInput,
} from "../src/guide/contracts";
import type { NativeMacRecordingRequest } from "../src/lib/nativeMacRecording";
import type { NativeWindowsRecordingRequest } from "../src/lib/nativeWindowsRecording";
import type { RecordingSession, StoreRecordedSessionInput } from "../src/lib/recordingSession";
@@ -16,6 +27,47 @@ contextBridge.exposeInMainWorld("electronAPI", {
invokeNativeBridge: <TData>(request: NativeBridgeRequest) => {
return ipcRenderer.invoke(NATIVE_BRIDGE_CHANNEL, request) as Promise<TData>;
},
guide: {
startSession: (recordingId: string | number) => {
return ipcRenderer.invoke("guide:start-session", recordingId);
},
readSession: (recordingId: string | number) => {
return ipcRenderer.invoke("guide:read-session", recordingId);
},
addMarker: (input: AddGuideMarkerInput) => {
return ipcRenderer.invoke("guide:add-marker", input);
},
finalizeEvents: (input: FinalizeGuideEventsInput) => {
return ipcRenderer.invoke("guide:finalize-events", input);
},
writeSnapshot: (input: WriteGuideSnapshotInput) => {
return ipcRenderer.invoke("guide:write-snapshot", input);
},
runOcr: (input: RunGuideOcrInput) => {
return ipcRenderer.invoke("guide:run-ocr", input);
},
generateDraft: (input: GenerateGuideDraftInput) => {
return ipcRenderer.invoke("guide:generate-draft", input);
},
getAiSettings: () => {
return ipcRenderer.invoke("guide:get-ai-settings");
},
saveAiSettings: (input: SaveGuideAiSettingsInput) => {
return ipcRenderer.invoke("guide:save-ai-settings", input);
},
saveGuide: (input: SaveGuideInput) => {
return ipcRenderer.invoke("guide:save-guide", input);
},
exportMarkdown: (input: ExportGuideInput) => {
return ipcRenderer.invoke("guide:export-markdown", input);
},
exportHtml: (input: ExportGuideInput) => {
return ipcRenderer.invoke("guide:export-html", input);
},
discardSession: (input: DiscardGuideSessionInput) => {
return ipcRenderer.invoke("guide:discard-session", input);
},
},
hudOverlayHide: () => {
ipcRenderer.send("hud-overlay-hide");
},
+6 -4
View File
@@ -14,17 +14,18 @@
},
"scripts": {
"dev": "vite",
"build": "tsc && vite build && electron-builder",
"build": "tsc && vite build && electron-builder --config electron-builder.json5",
"lint": "biome check .",
"lint:fix": "biome check --write .",
"format": "biome format --write .",
"i18n:check": "node scripts/i18n-check.mjs",
"preview": "vite preview",
"build:native:mac": "node scripts/build-macos-screencapturekit-helper.mjs",
"build:mac": "npm run build:native:mac && tsc && vite build && electron-builder --mac",
"build:mac": "npm run build:native:mac && tsc && vite build && electron-builder --mac --config electron-builder.json5",
"build:native:win": "node scripts/build-windows-wgc-helper.mjs",
"build:win": "npm run build:native:win && tsc && vite build && electron-builder --win --config.npmRebuild=false",
"build:linux": "tsc && vite build && electron-builder --linux AppImage deb pacman --config.npmRebuild=false",
"build:ocr:win": "node scripts/build-windows-ocr-service.mjs",
"build:win": "npm run build:native:win && npm run build:ocr:win && tsc && vite build && electron-builder --win --config electron-builder.json5 --config.npmRebuild=false",
"build:linux": "tsc && vite build && electron-builder --linux AppImage deb pacman --config electron-builder.json5 --config.npmRebuild=false",
"test": "vitest --run",
"test:watch": "vitest",
"test:cursor-native:win": "node scripts/test-windows-native-cursor.mjs",
@@ -35,6 +36,7 @@
"test:wgc-mixed-audio:win": "node scripts/test-windows-wgc-helper.mjs --system-audio --microphone",
"test:wgc-webcam:win": "node scripts/test-windows-wgc-helper.mjs --webcam",
"test:wgc-full:win": "node scripts/test-windows-wgc-helper.mjs --webcam --system-audio --microphone",
"ocr:paddle": "python -m uvicorn tools.ocr.paddle_ocr_service:app --host 127.0.0.1 --port 8866",
"capture:openscreen-preview": "node scripts/capture-openscreen-preview.mjs",
"inspect:cursor-click-bounce": "node scripts/inspect-native-cursor-click-bounce.mjs",
"build-vite": "tsc && vite build",
+163
View File
@@ -0,0 +1,163 @@
import { execFileSync } from "node:child_process";
import fs from "node:fs";
import path from "node:path";
import process from "node:process";
import { fileURLToPath } from "node:url";
const ROOT = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "..");
const OCR_DIR = path.join(ROOT, "tools", "ocr");
const VENV_DIR = path.join(ROOT, ".venv-ocr-build");
const VENV_PYTHON = path.join(VENV_DIR, "Scripts", "python.exe");
const DIST_DIR = path.join(OCR_DIR, "dist");
const WORK_DIR = path.join(OCR_DIR, "build");
const MODEL_CACHE_DIR = path.join(OCR_DIR, "models", "paddlex");
const ENTRYPOINT = path.join(OCR_DIR, "openscreen_ocr_service_entry.py");
const OUTPUT_DIR = path.join(DIST_DIR, "openscreen-ocr-service");
const OUTPUT_EXE = path.join(OUTPUT_DIR, "openscreen-ocr-service.exe");
const REQUIRED_MODEL_NAMES = ["PP-OCRv5_mobile_det", "latin_PP-OCRv5_mobile_rec"];
if (process.platform !== "win32") {
console.log("Skipping Windows OCR service build on non-Windows host.");
process.exit(0);
}
function run(command, args, options = {}) {
console.log(`> ${command} ${args.join(" ")}`);
execFileSync(command, args, {
cwd: ROOT,
stdio: "inherit",
...options,
});
}
function ensureVenv() {
if (fs.existsSync(VENV_PYTHON)) {
return;
}
run(process.env.PYTHON ?? "python", ["-m", "venv", VENV_DIR]);
}
function installDependencies() {
run(VENV_PYTHON, ["-m", "pip", "install", "--upgrade", "pip"]);
run(VENV_PYTHON, ["-m", "pip", "install", "-r", path.join(OCR_DIR, "requirements.txt")]);
run(VENV_PYTHON, ["-m", "pip", "install", "pyinstaller>=6.0"]);
}
function prepareModelCache() {
const officialModelsDir = path.join(MODEL_CACHE_DIR, "official_models");
const hasRequiredModels = REQUIRED_MODEL_NAMES.every((modelName) =>
fs.existsSync(path.join(officialModelsDir, modelName)),
);
if (hasRequiredModels) {
return;
}
fs.mkdirSync(officialModelsDir, { recursive: true });
run(
VENV_PYTHON,
[
"-c",
[
"import sys",
`sys.path.insert(0, ${JSON.stringify(OCR_DIR)})`,
"from paddle_ocr_service import _create_engine",
"_create_engine('latin')",
].join("; "),
],
{
env: {
...process.env,
PADDLE_PDX_CACHE_HOME: MODEL_CACHE_DIR,
PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK: "True",
PADDLE_PDX_ENABLE_MKLDNN_BYDEFAULT: "False",
PADDLEOCR_DEVICE: "cpu",
PADDLEOCR_ENABLE_MKLDNN: "0",
PADDLEOCR_LANG: "latin",
PADDLEOCR_USE_MOBILE: "1",
PYTHONUTF8: "1",
},
},
);
}
function buildService() {
fs.rmSync(OUTPUT_DIR, { recursive: true, force: true });
fs.mkdirSync(DIST_DIR, { recursive: true });
fs.mkdirSync(WORK_DIR, { recursive: true });
run(VENV_PYTHON, [
"-m",
"PyInstaller",
"--noconfirm",
"--clean",
"--onedir",
"--name",
"openscreen-ocr-service",
"--distpath",
DIST_DIR,
"--workpath",
WORK_DIR,
"--specpath",
WORK_DIR,
"--paths",
OCR_DIR,
"--collect-all",
"paddleocr",
"--collect-all",
"paddle",
"--collect-all",
"paddlex",
"--collect-all",
"cv2",
"--collect-all",
"shapely",
"--collect-all",
"pyclipper",
"--collect-all",
"pypdfium2",
"--collect-all",
"bidi",
"--copy-metadata",
"paddleocr",
"--copy-metadata",
"paddlex",
"--copy-metadata",
"paddlepaddle",
"--copy-metadata",
"opencv-contrib-python",
"--copy-metadata",
"shapely",
"--copy-metadata",
"pyclipper",
"--copy-metadata",
"pypdfium2",
"--copy-metadata",
"python-bidi",
"--hidden-import",
"uvicorn.logging",
"--hidden-import",
"uvicorn.loops",
"--hidden-import",
"uvicorn.loops.auto",
"--hidden-import",
"uvicorn.protocols",
"--hidden-import",
"uvicorn.protocols.http",
"--hidden-import",
"uvicorn.protocols.http.auto",
"--hidden-import",
"uvicorn.lifespan",
"--hidden-import",
"uvicorn.lifespan.on",
ENTRYPOINT,
]);
if (!fs.existsSync(OUTPUT_EXE)) {
throw new Error(`OCR service build did not produce ${OUTPUT_EXE}`);
}
console.log(`Built OCR service: ${OUTPUT_EXE}`);
}
ensureVenv();
installDependencies();
prepareModelCache();
buildService();
+20 -5
View File
@@ -18,6 +18,7 @@ export default function App() {
const [windowType, setWindowType] = useState(
() => new URLSearchParams(window.location.search).get("windowType") || "",
);
const hasElectronBridge = Boolean(window.electronAPI);
useEffect(() => {
const type = new URLSearchParams(window.location.search).get("windowType") || "";
@@ -71,11 +72,7 @@ export default function App() {
</ShortcutsProvider>
);
default:
return (
<div className="w-full h-full bg-background text-foreground">
<h1>Openscreen</h1>
</div>
);
return hasElectronBridge ? <LaunchWindow /> : <BrowserDevFallback />;
}
})();
@@ -86,3 +83,21 @@ export default function App() {
</TooltipProvider>
);
}
function BrowserDevFallback() {
return (
<div className="flex h-screen w-screen items-center justify-center bg-[#08090b] px-6 text-slate-100">
<div className="w-full max-w-[520px] rounded-lg border border-white/10 bg-white/[0.035] p-5 shadow-2xl">
<h1 className="mb-2 text-xl font-semibold tracking-normal">OpenScreen desktop app</h1>
<p className="mb-4 text-sm leading-6 text-slate-300">
This localhost page is only the Vite renderer. Recording, file access, guide generation,
and export require the Electron window because those actions use the preload bridge.
</p>
<div className="rounded-md border border-white/10 bg-black/30 px-3 py-2 text-xs text-slate-300">
Use the separate Electron window titled <span className="text-slate-100">openscreen</span>
.
</div>
</div>
</div>
);
}
+38 -1
View File
@@ -1,4 +1,4 @@
import { Check, ChevronDown, Languages } from "lucide-react";
import { BookOpen, Check, ChevronDown, Flag, Languages } from "lucide-react";
import { useCallback, useEffect, useRef, useState } from "react";
import { createPortal } from "react-dom";
import { BsPauseCircle, BsPlayCircle, BsRecordCircle } from "react-icons/bs";
@@ -99,6 +99,9 @@ export function LaunchWindow() {
toggleRecording,
togglePaused,
canPauseRecording,
guideModeEnabled,
setGuideModeEnabled,
addGuideMarker,
restartRecording,
cancelRecording,
microphoneEnabled,
@@ -692,6 +695,29 @@ export function LaunchWindow() {
)}
</button>
)}
<Tooltip
content={guideModeEnabled ? t("guide.disableGuideMode") : t("guide.enableGuideMode")}
>
<button
data-testid="launch-guide-mode-button"
className={`${hudIconBtnClasses} relative ${
guideModeEnabled
? "bg-blue-500/20 text-blue-100 ring-1 ring-blue-300/60 shadow-[0_0_14px_rgba(59,130,246,0.38),inset_0_1px_0_rgba(255,255,255,0.18)] hover:bg-blue-500/25"
: "bg-transparent opacity-60 hover:opacity-100"
}`}
onClick={() => setGuideModeEnabled(!guideModeEnabled)}
disabled={recording}
aria-pressed={guideModeEnabled}
>
<BookOpen
size={ICON_SIZE}
className={guideModeEnabled ? "text-blue-200" : "text-white/40"}
/>
{guideModeEnabled && (
<span className="absolute right-1 top-1 h-1.5 w-1.5 rounded-full bg-blue-200 shadow-[0_0_8px_rgba(147,197,253,0.9)]" />
)}
</button>
</Tooltip>
</div>
{/* Record/Stop group */}
@@ -724,6 +750,17 @@ export function LaunchWindow() {
{recording && (
<div className={`flex items-center gap-0.5 ${styles.electronNoDrag}`}>
{guideModeEnabled && (
<Tooltip content={t("guide.addMarker")}>
<button
data-testid="launch-guide-marker-button"
className={hudAuxIconBtnClasses}
onClick={addGuideMarker}
>
<Flag size={ICON_SIZE} className="text-blue-300" />
</button>
</Tooltip>
)}
{canPauseRecording && (
<Tooltip
content={paused ? t("tooltips.resumeRecording") : t("tooltips.pauseRecording")}
+178 -160
View File
@@ -55,6 +55,7 @@ import {
DEFAULT_GIF_SETTINGS,
DEFAULT_SOURCE_DIMENSIONS,
} from "./editorDefaults";
import { GuidePanel } from "./guide/GuidePanel";
import PlaybackControls from "./PlaybackControls";
import {
createProjectData,
@@ -255,6 +256,7 @@ export default function VideoEditor() {
const [nativePlatform, setNativePlatform] = useState<NativePlatform | null>(null);
const [recordingCursorCaptureMode, setRecordingCursorCaptureMode] =
useState<CursorCaptureMode | null>(null);
const [guideRecordingId, setGuideRecordingId] = useState<number | null>(null);
const videoPlaybackRef = useRef<VideoPlaybackRef>(null);
@@ -349,6 +351,7 @@ export default function VideoEditor() {
setWebcamVideoSourcePath(webcamSourcePath);
setWebcamVideoPath(webcamSourcePath ? toFileUrl(webcamSourcePath) : null);
setRecordingCursorCaptureMode(projectCursorCaptureMode);
setGuideRecordingId(null);
setCurrentProjectPath(path ?? null);
pushState({
@@ -496,6 +499,7 @@ export default function VideoEditor() {
setWebcamVideoSourcePath(webcamSourcePath);
setWebcamVideoPath(webcamSourcePath ? toFileUrl(webcamSourcePath) : null);
setRecordingCursorCaptureMode(session.cursorCaptureMode ?? null);
setGuideRecordingId(session.createdAt);
setCurrentProjectPath(null);
setLastSavedSnapshot(
createProjectSnapshot(
@@ -517,6 +521,7 @@ export default function VideoEditor() {
setVideoSourcePath(result.path);
setVideoPath(toFileUrl(result.path));
setRecordingCursorCaptureMode(null);
setGuideRecordingId(null);
setCurrentProjectPath(null);
setLastSavedSnapshot(
createProjectSnapshot({ screenVideoPath: result.path }, INITIAL_EDITOR_STATE),
@@ -2150,166 +2155,179 @@ export default function VideoEditor() {
</div>
</div>
<div className="editor-settings-rail min-w-0 h-full">
<SettingsPanel
selected={wallpaper}
onWallpaperChange={(w) => pushState({ wallpaper: w })}
selectedZoomDepth={
selectedZoomId ? zoomRegions.find((z) => z.id === selectedZoomId)?.depth : null
}
onZoomDepthChange={(depth) => selectedZoomId && handleZoomDepthChange(depth)}
selectedZoomCustomScale={
selectedZoomId
? (zoomRegions.find((z) => z.id === selectedZoomId)?.customScale ?? null)
: null
}
onZoomCustomScaleChange={handleZoomCustomScaleChange}
onZoomCustomScaleCommit={handleZoomCustomScaleCommit}
onZoomPreviewStart={() => setIsPreviewingZoom(true)}
onZoomPreviewEnd={() => setIsPreviewingZoom(false)}
selectedZoomFocusMode={
selectedZoomId
? (zoomRegions.find((z) => z.id === selectedZoomId)?.focusMode ?? "manual")
: null
}
onZoomFocusModeChange={(mode) =>
selectedZoomId && handleZoomFocusModeChange(mode)
}
selectedZoomFocus={
selectedZoomId
? (zoomRegions.find((z) => z.id === selectedZoomId)?.focus ?? null)
: null
}
onZoomFocusCoordinateChange={(focus) =>
selectedZoomId && handleZoomFocusChange(selectedZoomId, focus)
}
onZoomFocusCoordinateCommit={commitState}
hasCursorTelemetry={cursorTelemetry.length > 0}
selectedZoomId={selectedZoomId}
onZoomDelete={handleZoomDelete}
selectedZoomRotationPreset={
selectedZoomId
? (zoomRegions.find((z) => z.id === selectedZoomId)?.rotationPreset ?? null)
: null
}
onZoomRotationPresetChange={handleZoomRotationPresetChange}
selectedTrimId={selectedTrimId}
onTrimDelete={handleTrimDelete}
shadowIntensity={shadowIntensity}
onShadowChange={(v) => updateState({ shadowIntensity: v })}
onShadowCommit={commitState}
showBlur={showBlur}
onBlurChange={(v) => pushState({ showBlur: v })}
motionBlurAmount={motionBlurAmount}
onMotionBlurChange={(v) => updateState({ motionBlurAmount: v })}
onMotionBlurCommit={commitState}
borderRadius={borderRadius}
onBorderRadiusChange={(v) => updateState({ borderRadius: v })}
onBorderRadiusCommit={commitState}
padding={padding}
onPaddingChange={(v) => updateState({ padding: v })}
onPaddingCommit={commitState}
cropRegion={cropRegion}
onCropChange={(r) => pushState({ cropRegion: r })}
aspectRatio={aspectRatio}
hasWebcam={Boolean(webcamVideoPath)}
webcamLayoutPreset={webcamLayoutPreset}
onWebcamLayoutPresetChange={(preset) =>
pushState({
webcamLayoutPreset: preset,
webcamPosition: preset === "picture-in-picture" ? webcamPosition : null,
})
}
webcamMaskShape={webcamMaskShape}
onWebcamMaskShapeChange={(shape) => pushState({ webcamMaskShape: shape })}
webcamSizePreset={webcamSizePreset}
onWebcamSizePresetChange={(v) => updateState({ webcamSizePreset: v })}
onWebcamSizePresetCommit={commitState}
videoElement={videoPlaybackRef.current?.video || null}
exportQuality={exportQuality}
onExportQualityChange={setExportQuality}
exportFormat={exportFormat}
onExportFormatChange={setExportFormat}
gifFrameRate={gifFrameRate}
onGifFrameRateChange={setGifFrameRate}
gifLoop={gifLoop}
onGifLoopChange={setGifLoop}
gifSizePreset={gifSizePreset}
onGifSizePresetChange={setGifSizePreset}
gifOutputDimensions={calculateOutputDimensions(
calculateEffectiveSourceDimensions(
videoPlaybackRef.current?.video?.videoWidth ||
DEFAULT_SOURCE_DIMENSIONS.width,
videoPlaybackRef.current?.video?.videoHeight ||
DEFAULT_SOURCE_DIMENSIONS.height,
cropRegion,
).width,
calculateEffectiveSourceDimensions(
videoPlaybackRef.current?.video?.videoWidth ||
DEFAULT_SOURCE_DIMENSIONS.width,
videoPlaybackRef.current?.video?.videoHeight ||
DEFAULT_SOURCE_DIMENSIONS.height,
cropRegion,
).height,
gifSizePreset,
GIF_SIZE_PRESETS,
aspectRatio === "native"
? getNativeAspectRatioValue(
videoPlaybackRef.current?.video?.videoWidth ||
DEFAULT_SOURCE_DIMENSIONS.width,
videoPlaybackRef.current?.video?.videoHeight ||
DEFAULT_SOURCE_DIMENSIONS.height,
cropRegion,
)
: getAspectRatioValue(aspectRatio),
)}
onExport={handleOpenExportDialog}
onExportPanelOpen={() => {
setSelectedZoomId(null);
setSelectedTrimId(null);
setSelectedSpeedId(null);
}}
selectedAnnotationId={selectedAnnotationId}
annotationRegions={annotationOnlyRegions}
onAnnotationContentChange={handleAnnotationContentChange}
onAnnotationTypeChange={handleAnnotationTypeChange}
onAnnotationStyleChange={handleAnnotationStyleChange}
onAnnotationFigureDataChange={handleAnnotationFigureDataChange}
onAnnotationDuplicate={handleAnnotationDuplicate}
onAnnotationDelete={handleAnnotationDelete}
selectedBlurId={selectedBlurId}
blurRegions={blurRegions}
onBlurDataChange={handleBlurDataPanelChange}
onBlurDataCommit={commitState}
onBlurDelete={handleAnnotationDelete}
selectedSpeedId={selectedSpeedId}
selectedSpeedValue={
selectedSpeedId
? (speedRegions.find((r) => r.id === selectedSpeedId)?.speed ?? null)
: null
}
onSpeedChange={handleSpeedChange}
onSpeedDelete={handleSpeedDelete}
unsavedExport={unsavedExport}
onSaveUnsavedExport={handleSaveUnsavedExport}
onSaveDiagnostic={handleSaveDiagnostic}
showCursor={showCursor}
onShowCursorChange={setShowCursor}
cursorSize={cursorSize}
onCursorSizeChange={setCursorSize}
cursorSmoothing={cursorSmoothing}
onCursorSmoothingChange={setCursorSmoothing}
cursorMotionBlur={cursorMotionBlur}
onCursorMotionBlurChange={setCursorMotionBlur}
cursorClickBounce={cursorClickBounce}
onCursorClickBounceChange={setCursorClickBounce}
cursorClipToBounds={cursorClipToBounds}
onCursorClipToBoundsChange={setCursorClipToBounds}
hasCursorData={
cursorTelemetry.length > 0 || hasNativeCursorRecordingData(cursorRecordingData)
}
showCursorSettings={showCursorSettings}
/>
<div className="editor-settings-rail flex min-w-0 h-full flex-col gap-3">
{guideRecordingId && (
<GuidePanel
recordingId={guideRecordingId}
videoPath={videoPath}
videoSourcePath={videoSourcePath}
currentTimeMs={currentTime * 1000}
/>
)}
<div className="min-h-0 flex-1 overflow-hidden">
<SettingsPanel
selected={wallpaper}
onWallpaperChange={(w) => pushState({ wallpaper: w })}
selectedZoomDepth={
selectedZoomId
? zoomRegions.find((z) => z.id === selectedZoomId)?.depth
: null
}
onZoomDepthChange={(depth) => selectedZoomId && handleZoomDepthChange(depth)}
selectedZoomCustomScale={
selectedZoomId
? (zoomRegions.find((z) => z.id === selectedZoomId)?.customScale ?? null)
: null
}
onZoomCustomScaleChange={handleZoomCustomScaleChange}
onZoomCustomScaleCommit={handleZoomCustomScaleCommit}
onZoomPreviewStart={() => setIsPreviewingZoom(true)}
onZoomPreviewEnd={() => setIsPreviewingZoom(false)}
selectedZoomFocusMode={
selectedZoomId
? (zoomRegions.find((z) => z.id === selectedZoomId)?.focusMode ?? "manual")
: null
}
onZoomFocusModeChange={(mode) =>
selectedZoomId && handleZoomFocusModeChange(mode)
}
selectedZoomFocus={
selectedZoomId
? (zoomRegions.find((z) => z.id === selectedZoomId)?.focus ?? null)
: null
}
onZoomFocusCoordinateChange={(focus) =>
selectedZoomId && handleZoomFocusChange(selectedZoomId, focus)
}
onZoomFocusCoordinateCommit={commitState}
hasCursorTelemetry={cursorTelemetry.length > 0}
selectedZoomId={selectedZoomId}
onZoomDelete={handleZoomDelete}
selectedZoomRotationPreset={
selectedZoomId
? (zoomRegions.find((z) => z.id === selectedZoomId)?.rotationPreset ?? null)
: null
}
onZoomRotationPresetChange={handleZoomRotationPresetChange}
selectedTrimId={selectedTrimId}
onTrimDelete={handleTrimDelete}
shadowIntensity={shadowIntensity}
onShadowChange={(v) => updateState({ shadowIntensity: v })}
onShadowCommit={commitState}
showBlur={showBlur}
onBlurChange={(v) => pushState({ showBlur: v })}
motionBlurAmount={motionBlurAmount}
onMotionBlurChange={(v) => updateState({ motionBlurAmount: v })}
onMotionBlurCommit={commitState}
borderRadius={borderRadius}
onBorderRadiusChange={(v) => updateState({ borderRadius: v })}
onBorderRadiusCommit={commitState}
padding={padding}
onPaddingChange={(v) => updateState({ padding: v })}
onPaddingCommit={commitState}
cropRegion={cropRegion}
onCropChange={(r) => pushState({ cropRegion: r })}
aspectRatio={aspectRatio}
hasWebcam={Boolean(webcamVideoPath)}
webcamLayoutPreset={webcamLayoutPreset}
onWebcamLayoutPresetChange={(preset) =>
pushState({
webcamLayoutPreset: preset,
webcamPosition: preset === "picture-in-picture" ? webcamPosition : null,
})
}
webcamMaskShape={webcamMaskShape}
onWebcamMaskShapeChange={(shape) => pushState({ webcamMaskShape: shape })}
webcamSizePreset={webcamSizePreset}
onWebcamSizePresetChange={(v) => updateState({ webcamSizePreset: v })}
onWebcamSizePresetCommit={commitState}
videoElement={videoPlaybackRef.current?.video || null}
exportQuality={exportQuality}
onExportQualityChange={setExportQuality}
exportFormat={exportFormat}
onExportFormatChange={setExportFormat}
gifFrameRate={gifFrameRate}
onGifFrameRateChange={setGifFrameRate}
gifLoop={gifLoop}
onGifLoopChange={setGifLoop}
gifSizePreset={gifSizePreset}
onGifSizePresetChange={setGifSizePreset}
gifOutputDimensions={calculateOutputDimensions(
calculateEffectiveSourceDimensions(
videoPlaybackRef.current?.video?.videoWidth ||
DEFAULT_SOURCE_DIMENSIONS.width,
videoPlaybackRef.current?.video?.videoHeight ||
DEFAULT_SOURCE_DIMENSIONS.height,
cropRegion,
).width,
calculateEffectiveSourceDimensions(
videoPlaybackRef.current?.video?.videoWidth ||
DEFAULT_SOURCE_DIMENSIONS.width,
videoPlaybackRef.current?.video?.videoHeight ||
DEFAULT_SOURCE_DIMENSIONS.height,
cropRegion,
).height,
gifSizePreset,
GIF_SIZE_PRESETS,
aspectRatio === "native"
? getNativeAspectRatioValue(
videoPlaybackRef.current?.video?.videoWidth ||
DEFAULT_SOURCE_DIMENSIONS.width,
videoPlaybackRef.current?.video?.videoHeight ||
DEFAULT_SOURCE_DIMENSIONS.height,
cropRegion,
)
: getAspectRatioValue(aspectRatio),
)}
onExport={handleOpenExportDialog}
onExportPanelOpen={() => {
setSelectedZoomId(null);
setSelectedTrimId(null);
setSelectedSpeedId(null);
}}
selectedAnnotationId={selectedAnnotationId}
annotationRegions={annotationOnlyRegions}
onAnnotationContentChange={handleAnnotationContentChange}
onAnnotationTypeChange={handleAnnotationTypeChange}
onAnnotationStyleChange={handleAnnotationStyleChange}
onAnnotationFigureDataChange={handleAnnotationFigureDataChange}
onAnnotationDuplicate={handleAnnotationDuplicate}
onAnnotationDelete={handleAnnotationDelete}
selectedBlurId={selectedBlurId}
blurRegions={blurRegions}
onBlurDataChange={handleBlurDataPanelChange}
onBlurDataCommit={commitState}
onBlurDelete={handleAnnotationDelete}
selectedSpeedId={selectedSpeedId}
selectedSpeedValue={
selectedSpeedId
? (speedRegions.find((r) => r.id === selectedSpeedId)?.speed ?? null)
: null
}
onSpeedChange={handleSpeedChange}
onSpeedDelete={handleSpeedDelete}
unsavedExport={unsavedExport}
onSaveUnsavedExport={handleSaveUnsavedExport}
onSaveDiagnostic={handleSaveDiagnostic}
showCursor={showCursor}
onShowCursorChange={setShowCursor}
cursorSize={cursorSize}
onCursorSizeChange={setCursorSize}
cursorSmoothing={cursorSmoothing}
onCursorSmoothingChange={setCursorSmoothing}
cursorMotionBlur={cursorMotionBlur}
onCursorMotionBlurChange={setCursorMotionBlur}
cursorClickBounce={cursorClickBounce}
onCursorClickBounceChange={setCursorClickBounce}
cursorClipToBounds={cursorClipToBounds}
onCursorClipToBoundsChange={setCursorClipToBounds}
hasCursorData={
cursorTelemetry.length > 0 ||
hasNativeCursorRecordingData(cursorRecordingData)
}
showCursorSettings={showCursorSettings}
/>
</div>
</div>
</div>
</Panel>
@@ -0,0 +1,551 @@
import { KeyRound, ListChecks, RefreshCw, Save, Trash2, Wand2 } from "lucide-react";
import { useCallback, useEffect, useMemo, useState } from "react";
import { toast } from "sonner";
import { Button } from "@/components/ui/button";
import { useI18n } from "@/contexts/I18nContext";
import type {
GuideAiProvider,
GuideAiSettings,
GuideLanguage,
GuideSession,
} from "@/guide/contracts";
import { captureGuideSnapshots } from "@/guide/snapshot/extractGuideSnapshots";
interface GuidePanelProps {
recordingId: number | string | null;
videoPath: string | null;
videoSourcePath: string | null;
currentTimeMs: number;
}
type BusyAction = "load" | "generate";
const COPY = {
en: {
title: "Guide",
noRecording: "Record with Guide Mode to create a step guide.",
noSession: "No guide session yet.",
generateGuide: "Generate guide",
generating: "Generating...",
prepare: "Prepare",
snapshots: "Snapshots",
ocr: "OCR",
draft: "Draft",
deepseek: "DeepSeek",
local: "Local",
exportMd: "MD",
exportHtml: "HTML",
events: "events",
shots: "shots",
text: "text",
steps: "steps",
captureStep: "Capture step",
captureLabel: "Manual capture",
settings: "Settings",
apiKey: "API key env",
apiKeyPlaceholder: "DEEPSEEK_API_KEY",
baseUrl: "Base URL",
model: "Model",
saveSettings: "Save",
clearKey: "Reset env",
keySaved: "DeepSeek settings saved.",
keyMissing: "Set a DeepSeek API key environment variable before generating with DeepSeek.",
keyConfigured: "Env ready",
keyNotConfigured: "Env value missing",
ready: "Guide artifacts are ready.",
noEvents: "No click events were captured for this guide.",
ocrUnavailable: "Local OCR service is unavailable. You can still create a local draft.",
exported: "Guide exported",
},
vi: {
title: "Hướng dẫn",
noRecording: "Hãy quay bằng Guide Mode để tạo hướng dẫn từng bước.",
noSession: "Chưa có guide session.",
generateGuide: "Tạo hướng dẫn",
generating: "Đang tạo...",
prepare: "Chuẩn bị",
snapshots: "Ảnh",
ocr: "OCR",
draft: "Draft",
deepseek: "DeepSeek",
local: "Local",
exportMd: "MD",
exportHtml: "HTML",
events: "events",
shots: "ảnh",
text: "text",
steps: "steps",
captureStep: "Chụp bước",
captureLabel: "Chụp thủ công",
settings: "Cài đặt",
apiKey: "API key env",
apiKeyPlaceholder: "DEEPSEEK_API_KEY",
baseUrl: "Base URL",
model: "Model",
saveSettings: "Lưu",
clearKey: "Reset env",
keySaved: "Đã lưu cài đặt DeepSeek.",
keyMissing: "Hãy set biến môi trường DeepSeek API key trước khi tạo draft bằng DeepSeek.",
keyConfigured: "Env ready",
keyNotConfigured: "Chưa thấy giá trị env",
ready: "Đã sẵn sàng tài liệu hướng dẫn.",
noEvents: "Chưa ghi nhận click event nào cho guide này.",
ocrUnavailable: "OCR local chưa chạy. Vẫn có thể tạo draft local.",
exported: "Đã export hướng dẫn",
},
} as const;
export function GuidePanel({ recordingId, videoPath, videoSourcePath }: GuidePanelProps) {
const { locale } = useI18n();
const copy = useMemo(() => (locale.startsWith("vi") ? COPY.vi : COPY.en), [locale]);
const guideLanguage: GuideLanguage = locale.startsWith("vi") ? "vi" : "en";
const [session, setSession] = useState<GuideSession | null>(null);
const [provider, setProvider] = useState<GuideAiProvider>("local");
const [busyAction, setBusyAction] = useState<BusyAction | null>(null);
const [aiSettings, setAiSettings] = useState<GuideAiSettings | null>(null);
const [settingsOpen, setSettingsOpen] = useState(false);
const [settingsBusy, setSettingsBusy] = useState(false);
const [deepSeekApiKeyEnvName, setDeepSeekApiKeyEnvName] = useState("DEEPSEEK_API_KEY");
const [deepSeekBaseUrl, setDeepSeekBaseUrl] = useState("https://api.deepseek.com");
const [deepSeekModel, setDeepSeekModel] = useState("deepseek-chat");
const [message, setMessage] = useState<string | null>(null);
const isBusy = busyAction !== null;
const canUseGuide = Boolean(recordingId && videoSourcePath && window.electronAPI?.guide);
const generatedSteps = session?.generatedGuide?.steps ?? [];
const statusLabel = useMemo(() => {
if (!session) {
return copy.noSession;
}
return [
`${session.events.length} ${copy.events}`,
`${session.snapshots.length} ${copy.shots}`,
`${session.ocrBlocks.length} ${copy.text}`,
`${generatedSteps.length} ${copy.steps}`,
].join(" / ");
}, [copy, generatedSteps.length, session]);
const loadAiSettings = useCallback(async () => {
if (!window.electronAPI?.guide?.getAiSettings) {
return;
}
const result = await window.electronAPI.guide.getAiSettings();
if (!result.success) {
setMessage(result.error);
return;
}
setAiSettings(result.data);
setDeepSeekBaseUrl(result.data.deepseek.baseUrl);
setDeepSeekModel(result.data.deepseek.model);
setDeepSeekApiKeyEnvName(result.data.deepseek.apiKeyEnvName);
}, []);
useEffect(() => {
void loadAiSettings();
}, [loadAiSettings]);
const loadSession = useCallback(async () => {
if (!recordingId || !window.electronAPI?.guide) {
setSession(null);
setBusyAction(null);
return;
}
setBusyAction("load");
const result = await window.electronAPI.guide.readSession(recordingId);
setBusyAction(null);
if (result.success) {
setSession(result.data);
setMessage(null);
return;
}
if (result.code === "guide-session-not-found") {
setSession(null);
setMessage(null);
return;
}
setMessage(result.error);
}, [recordingId]);
useEffect(() => {
let cancelled = false;
async function load() {
if (!recordingId || !window.electronAPI?.guide) {
setSession(null);
setBusyAction(null);
return;
}
setBusyAction("load");
const result = await window.electronAPI.guide.readSession(recordingId);
if (cancelled) {
return;
}
setBusyAction(null);
if (result.success) {
setSession(result.data);
setMessage(null);
} else if (result.code === "guide-session-not-found") {
setSession(null);
setMessage(null);
} else {
setMessage(result.error);
}
}
load();
return () => {
cancelled = true;
};
}, [recordingId]);
const ensureEventsSession = useCallback(async (): Promise<GuideSession> => {
if (!recordingId || !videoSourcePath) {
throw new Error(copy.noRecording);
}
let current = session;
if (!current) {
const startResult = await window.electronAPI.guide.startSession(recordingId);
if (!startResult.success) {
throw new Error(startResult.error);
}
current = startResult.data;
}
if (current.status === "recording" || current.videoPath !== videoSourcePath) {
const finalizeResult = await window.electronAPI.guide.finalizeEvents({
recordingId,
videoPath: videoSourcePath,
});
if (!finalizeResult.success) {
throw new Error(finalizeResult.error);
}
current = finalizeResult.data;
}
setSession(current);
return current;
}, [copy.noRecording, recordingId, session, videoSourcePath]);
const runAction = useCallback(
async (action: BusyAction, task: () => Promise<void>) => {
if (!canUseGuide) {
setMessage(copy.noRecording);
return;
}
setBusyAction(action);
setMessage(null);
try {
await task();
} catch (error) {
const text = error instanceof Error ? error.message : String(error);
setMessage(text);
toast.error(text);
} finally {
setBusyAction(null);
}
},
[canUseGuide, copy.noRecording],
);
const handleProviderChange = useCallback(
(nextProvider: GuideAiProvider) => {
setProvider(nextProvider);
if (nextProvider === "deepseek" && !aiSettings?.deepseek.hasApiKey) {
setSettingsOpen(true);
setMessage(copy.keyMissing);
}
},
[aiSettings?.deepseek.hasApiKey, copy.keyMissing],
);
const handleSaveAiSettings = useCallback(async () => {
if (!window.electronAPI?.guide?.saveAiSettings) {
return;
}
setSettingsBusy(true);
setMessage(null);
try {
const result = await window.electronAPI.guide.saveAiSettings({
deepseekApiKeyEnvName: deepSeekApiKeyEnvName,
baseUrl: deepSeekBaseUrl,
model: deepSeekModel,
});
if (!result.success) {
throw new Error(result.error);
}
setAiSettings(result.data);
setDeepSeekApiKeyEnvName(result.data.deepseek.apiKeyEnvName);
setDeepSeekBaseUrl(result.data.deepseek.baseUrl);
setDeepSeekModel(result.data.deepseek.model);
toast.success(copy.keySaved);
} catch (error) {
const text = error instanceof Error ? error.message : String(error);
setMessage(text);
toast.error(text);
} finally {
setSettingsBusy(false);
}
}, [copy.keySaved, deepSeekApiKeyEnvName, deepSeekBaseUrl, deepSeekModel]);
const handleClearDeepSeekKey = useCallback(async () => {
if (!window.electronAPI?.guide?.saveAiSettings) {
return;
}
setSettingsBusy(true);
setMessage(null);
try {
const result = await window.electronAPI.guide.saveAiSettings({
clearDeepseekApiKeyEnvName: true,
baseUrl: deepSeekBaseUrl,
model: deepSeekModel,
});
if (!result.success) {
throw new Error(result.error);
}
setAiSettings(result.data);
setDeepSeekApiKeyEnvName(result.data.deepseek.apiKeyEnvName);
toast.success(copy.keySaved);
} catch (error) {
const text = error instanceof Error ? error.message : String(error);
setMessage(text);
toast.error(text);
} finally {
setSettingsBusy(false);
}
}, [copy.keySaved, deepSeekBaseUrl, deepSeekModel]);
const handleGenerateGuide = useCallback(() => {
void runAction("generate", async () => {
if (provider === "deepseek" && !aiSettings?.deepseek.hasApiKey) {
setSettingsOpen(true);
throw new Error(copy.keyMissing);
}
if (!videoPath) {
throw new Error("Video URL is not available.");
}
let current = await ensureEventsSession();
if (current.events.length === 0) {
throw new Error(copy.noEvents);
}
if (current.snapshots.length < current.events.length) {
current = await captureGuideSnapshots({
session: current,
videoUrl: videoPath,
maxWidth: 1280,
});
setSession(current);
}
if (current.ocrBlocks.length === 0 && current.snapshots.length > 0) {
const ocrResult = await window.electronAPI.guide.runOcr({
recordingId: current.recordingId,
});
if (!ocrResult.success) {
if (ocrResult.code === "guide-ocr-unavailable") {
toast.warning(copy.ocrUnavailable);
}
throw new Error(ocrResult.error);
}
current = ocrResult.data;
setSession(current);
}
const result = await window.electronAPI.guide.generateDraft({
recordingId: current.recordingId,
language: guideLanguage,
provider,
});
if (!result.success) {
throw new Error(result.error);
}
const markdownResult = await window.electronAPI.guide.exportMarkdown({
recordingId: current.recordingId,
});
if (!markdownResult.success) {
throw new Error(markdownResult.error);
}
const htmlResult = await window.electronAPI.guide.exportHtml({
recordingId: current.recordingId,
});
if (!htmlResult.success) {
throw new Error(htmlResult.error);
}
const revealResult = await window.electronAPI.revealInFolder(htmlResult.data.path);
if (!revealResult.success) {
toast.warning(revealResult.error ?? "Unable to open guide folder.");
}
setSession(htmlResult.data.session);
toast.success(copy.exported, {
description: `${markdownResult.data.path}\n${htmlResult.data.path}`,
});
});
}, [
aiSettings?.deepseek.hasApiKey,
copy.exported,
copy.keyMissing,
copy.noEvents,
copy.ocrUnavailable,
ensureEventsSession,
guideLanguage,
provider,
runAction,
videoPath,
]);
return (
<section className="editor-inspector-shell flex max-h-[320px] min-h-[246px] shrink-0 flex-col overflow-hidden rounded-[18px] border border-white/[0.075] bg-[#090a0c]">
<div className="flex items-center justify-between border-b border-white/[0.07] px-3 py-2">
<div className="flex min-w-0 items-center gap-2">
<ListChecks className="h-4 w-4 shrink-0 text-[#34B27B]" />
<span className="truncate text-sm font-semibold text-slate-100">{copy.title}</span>
</div>
<button
type="button"
title="Reload guide session"
disabled={isBusy || !recordingId}
onClick={loadSession}
className="flex h-7 w-7 items-center justify-center rounded-lg border border-transparent text-slate-500 transition-all hover:border-white/10 hover:bg-white/[0.06] hover:text-slate-200 disabled:cursor-not-allowed disabled:opacity-40"
>
<RefreshCw className={`h-3.5 w-3.5 ${busyAction === "load" ? "animate-spin" : ""}`} />
</button>
</div>
<div className="min-h-0 flex-1 overflow-y-auto p-3 custom-scrollbar">
<p className="mb-2 text-[11px] leading-4 text-slate-400">
{canUseGuide ? statusLabel : copy.noRecording}
</p>
{message && <p className="mb-2 text-[11px] leading-4 text-amber-300">{message}</p>}
<div className="mb-2 flex items-center gap-1.5">
<select
value={provider}
onChange={(event) => handleProviderChange(event.target.value as GuideAiProvider)}
className="h-8 flex-1 rounded-md border border-white/[0.08] bg-white/[0.04] px-2 text-[11px] font-medium text-slate-200 outline-none"
disabled={isBusy}
>
<option value="local">{copy.local}</option>
<option value="deepseek">{copy.deepseek}</option>
</select>
<Button
type="button"
size="sm"
title={copy.settings}
disabled={isBusy}
onClick={() => setSettingsOpen((current) => !current)}
className="h-8 rounded-md border border-white/[0.08] bg-white/[0.04] px-2 text-[11px] text-slate-200 hover:bg-white/[0.08]"
>
<KeyRound className="h-3.5 w-3.5" />
</Button>
</div>
<button
type="button"
disabled={!canUseGuide || isBusy}
onClick={handleGenerateGuide}
className="mb-2 flex h-10 w-full items-center justify-center gap-2 rounded-md border border-[#34B27B]/35 bg-[#34B27B]/20 px-3 text-sm font-semibold text-[#B9F5D2] transition-all hover:border-[#34B27B]/55 hover:bg-[#34B27B]/28 disabled:cursor-not-allowed disabled:opacity-40"
>
<Wand2
className={`h-4 w-4 shrink-0 ${busyAction === "generate" ? "animate-pulse" : ""}`}
/>
<span className="truncate">
{busyAction === "generate" ? copy.generating : copy.generateGuide}
</span>
</button>
{settingsOpen && (
<div className="mb-2 space-y-2 rounded-md border border-white/[0.07] bg-white/[0.035] p-2">
<div className="flex items-center justify-between gap-2">
<div className="min-w-0">
<div className="truncate text-[11px] font-semibold text-slate-100">
{copy.deepseek} {copy.settings}
</div>
<div className="truncate text-[10px] text-slate-500">
{aiSettings?.deepseek.hasApiKey
? `${copy.keyConfigured}: ${aiSettings.deepseek.apiKeyEnvName}`
: `${copy.keyNotConfigured}: ${
aiSettings?.deepseek.apiKeyEnvName ?? "DEEPSEEK_API_KEY"
}`}
</div>
</div>
<span className="shrink-0 rounded border border-white/[0.08] px-1.5 py-0.5 text-[10px] text-slate-500">
{aiSettings?.deepseek.storage ?? "none"}
</span>
</div>
<label className="block text-[10px] font-medium text-slate-400">
{copy.apiKey}
<input
type="text"
value={deepSeekApiKeyEnvName}
onChange={(event) => setDeepSeekApiKeyEnvName(event.target.value)}
placeholder={copy.apiKeyPlaceholder}
disabled={settingsBusy}
className="mt-1 h-8 w-full rounded-md border border-white/[0.08] bg-black/20 px-2 text-[11px] text-slate-100 outline-none placeholder:text-slate-600"
/>
</label>
<div className="grid grid-cols-2 gap-1.5">
<label className="block min-w-0 text-[10px] font-medium text-slate-400">
{copy.baseUrl}
<input
type="text"
value={deepSeekBaseUrl}
onChange={(event) => setDeepSeekBaseUrl(event.target.value)}
disabled={settingsBusy}
className="mt-1 h-8 w-full rounded-md border border-white/[0.08] bg-black/20 px-2 text-[11px] text-slate-100 outline-none"
/>
</label>
<label className="block min-w-0 text-[10px] font-medium text-slate-400">
{copy.model}
<input
type="text"
value={deepSeekModel}
onChange={(event) => setDeepSeekModel(event.target.value)}
disabled={settingsBusy}
className="mt-1 h-8 w-full rounded-md border border-white/[0.08] bg-black/20 px-2 text-[11px] text-slate-100 outline-none"
/>
</label>
</div>
<div className="flex items-center gap-1.5">
<Button
type="button"
size="sm"
disabled={settingsBusy}
onClick={handleSaveAiSettings}
className="h-8 rounded-md border border-white/[0.08] bg-[#34B27B]/15 px-2 text-[11px] text-[#9BE7BF] hover:bg-[#34B27B]/25"
>
<Save className="h-3.5 w-3.5" />
{copy.saveSettings}
</Button>
<Button
type="button"
size="sm"
disabled={settingsBusy || !aiSettings?.deepseek.hasApiKey}
onClick={handleClearDeepSeekKey}
className="h-8 rounded-md border border-white/[0.08] bg-white/[0.04] px-2 text-[11px] text-slate-200 hover:bg-white/[0.08]"
>
<Trash2 className="h-3.5 w-3.5" />
{copy.clearKey}
</Button>
</div>
</div>
)}
<div className="space-y-1.5">
{generatedSteps.slice(0, 4).map((step) => (
<div
key={step.id}
className="rounded-md border border-white/[0.06] bg-white/[0.035] px-2 py-1.5"
>
<div className="truncate text-[11px] font-semibold text-slate-100">
{step.order}. {step.title}
</div>
<p className="line-clamp-2 text-[10px] leading-4 text-slate-400">
{step.instruction}
</p>
</div>
))}
</div>
</div>
</section>
);
}
+2 -2
View File
@@ -41,7 +41,7 @@ export function ShortcutsProvider({ children }: { children: ReactNode }) {
});
window.electronAPI
.getShortcuts?.()
?.getShortcuts?.()
.then((saved) => {
if (saved) {
setShortcuts(mergeWithDefaults(saved as Partial<ShortcutsConfig>));
@@ -54,7 +54,7 @@ export function ShortcutsProvider({ children }: { children: ReactNode }) {
const persistShortcuts = useCallback(
async (config?: ShortcutsConfig) => {
await window.electronAPI.saveShortcuts?.(config ?? shortcuts);
await window.electronAPI?.saveShortcuts?.(config ?? shortcuts);
},
[shortcuts],
);
+202
View File
@@ -0,0 +1,202 @@
export const GUIDE_SCHEMA_VERSION = 1;
export type GuideRecordingIdInput = string | number;
export type GuideEventKind = "click" | "hotkey" | "manual";
export type GuideEventSource = "cursor-recording" | "guide-hotkey" | "review-ui";
export type GuideEventButton = "left" | "right" | "middle" | "unknown";
export type GuideAction = "click" | "choose" | "type" | "wait" | "manual";
export type GuideTargetRole = "button" | "menu" | "tab" | "field" | "link" | "unknown";
export type GuideLanguage = "vi" | "en";
export type GuideAiProvider = "deepseek" | "local";
export type GuideSecretStorage = "environment" | "none";
export type GuideSessionStatus =
| "recording"
| "events-ready"
| "snapshots-ready"
| "ocr-ready"
| "draft-ready"
| "reviewed";
export type GuideErrorCode =
| "guide-session-not-found"
| "guide-invalid-input"
| "guide-invalid-schema"
| "guide-video-load-failed"
| "guide-snapshot-failed"
| "guide-ocr-unavailable"
| "guide-ocr-failed"
| "guide-ai-key-missing"
| "guide-ai-request-failed"
| "guide-ai-invalid-output"
| "guide-export-failed"
| "guide-internal-error";
export interface GuideEvent {
id: string;
recordingId: string;
kind: GuideEventKind;
source: GuideEventSource;
timeMs: number;
x?: number;
y?: number;
normalizedX?: number;
normalizedY?: number;
button?: GuideEventButton;
label?: string;
screenshotOffsetMs?: number;
createdAt: string;
}
export interface GuideSnapshot {
id: string;
eventId: string;
timeMs: number;
offsetMs: number;
path: string;
width: number;
height: number;
}
export interface OcrBlock {
id: string;
snapshotId: string;
text: string;
confidence: number;
box: {
x: number;
y: number;
width: number;
height: number;
};
}
export interface GuideStepCandidate {
id: string;
eventId: string;
snapshotId?: string;
timeMs: number;
action: GuideAction;
targetText?: string;
targetRole?: GuideTargetRole;
nearbyText: string[];
confidence: number;
}
export interface GeneratedGuideStep {
id: string;
order: number;
title: string;
instruction: string;
screenshotPath?: string;
sourceCandidateId?: string;
}
export interface GeneratedGuide {
title: string;
summary?: string;
steps: GeneratedGuideStep[];
}
export interface GuideSession {
schemaVersion: typeof GUIDE_SCHEMA_VERSION;
recordingId: string;
videoPath: string;
cursorPath?: string;
guidePath: string;
outputDir: string;
status: GuideSessionStatus;
events: GuideEvent[];
snapshots: GuideSnapshot[];
ocrBlocks: OcrBlock[];
candidates: GuideStepCandidate[];
generatedGuide?: GeneratedGuide;
createdAt: string;
updatedAt: string;
}
export interface AddGuideMarkerInput {
recordingId: GuideRecordingIdInput;
timeMs: number;
kind: "hotkey" | "manual";
label?: string;
}
export interface FinalizeGuideEventsInput {
recordingId: GuideRecordingIdInput;
videoPath: string;
cursorPath?: string;
}
export interface WriteGuideSnapshotInput {
recordingId: GuideRecordingIdInput;
eventId: string;
timeMs: number;
offsetMs: number;
pngBytes: ArrayBuffer;
width: number;
height: number;
}
export interface RunGuideOcrInput {
recordingId: GuideRecordingIdInput;
snapshotIds?: string[];
}
export interface GenerateGuideDraftInput {
recordingId: GuideRecordingIdInput;
language: GuideLanguage;
provider: GuideAiProvider;
}
export interface GuideAiSettings {
deepseek: {
hasApiKey: boolean;
apiKeyEnvName: string;
baseUrl: string;
model: string;
storage: GuideSecretStorage;
encryptionAvailable: boolean;
updatedAt?: string;
};
}
export interface SaveGuideAiSettingsInput {
deepseekApiKeyEnvName?: string;
clearDeepseekApiKeyEnvName?: boolean;
baseUrl?: string;
model?: string;
}
export interface SaveGuideInput {
recordingId: GuideRecordingIdInput;
generatedGuide: GeneratedGuide;
}
export interface DiscardGuideSessionInput {
recordingId: GuideRecordingIdInput;
}
export interface ExportGuideInput {
recordingId: GuideRecordingIdInput;
}
export interface ExportGuideResult {
path: string;
session: GuideSession;
}
export interface GuideIpcSuccess<TData> {
success: true;
data: TData;
message?: string;
}
export interface GuideIpcFailure {
success: false;
code: GuideErrorCode;
error: string;
retryable?: boolean;
}
export type GuideIpcResult<TData> = GuideIpcSuccess<TData> | GuideIpcFailure;
+110
View File
@@ -0,0 +1,110 @@
import { describe, expect, it } from "vitest";
import { buildGuideEventsFromCursor, mergeGuideEvents, sortGuideEvents } from "./eventBuilder";
describe("buildGuideEventsFromCursor", () => {
it("converts cursor click samples into guide events", () => {
const events = buildGuideEventsFromCursor({
recordingId: 123,
nowIso: "2026-05-27T00:00:00.000Z",
samples: [
{ timeMs: 10, cx: 0.2, cy: 0.3, interactionType: "move" },
{ timeMs: 20, cx: 0.4, cy: 0.5, interactionType: "click" },
{ timeMs: 30, cx: 0.4, cy: 0.5, interactionType: "mouseup" },
],
});
expect(events).toHaveLength(1);
expect(events[0]).toMatchObject({
recordingId: "123",
kind: "click",
source: "cursor-recording",
timeMs: 20,
normalizedX: 0.4,
normalizedY: 0.5,
screenshotOffsetMs: 500,
});
});
it("deduplicates click bounce samples close in time and position", () => {
const events = buildGuideEventsFromCursor({
recordingId: "abc",
samples: [
{ timeMs: 1000, cx: 0.5, cy: 0.5, interactionType: "click" },
{ timeMs: 1050, cx: 0.501, cy: 0.501, interactionType: "click" },
{ timeMs: 1500, cx: 0.5, cy: 0.5, interactionType: "click" },
],
});
expect(events.map((event) => event.timeMs)).toEqual([1000, 1500]);
});
it("keeps close-timed clicks when positions are meaningfully different", () => {
const events = buildGuideEventsFromCursor({
recordingId: "abc",
samples: [
{ timeMs: 1000, cx: 0.2, cy: 0.2, interactionType: "click" },
{ timeMs: 1050, cx: 0.8, cy: 0.8, interactionType: "click" },
],
});
expect(events).toHaveLength(2);
});
it("sorts guide events by timestamp", () => {
expect(
sortGuideEvents([
{
id: "2",
recordingId: "r",
kind: "manual",
source: "review-ui",
timeMs: 200,
createdAt: "now",
},
{
id: "1",
recordingId: "r",
kind: "manual",
source: "review-ui",
timeMs: 100,
createdAt: "now",
},
]).map((event) => event.id),
).toEqual(["1", "2"]);
});
it("merges cursor and manual events without dropping manual markers", () => {
const events = mergeGuideEvents([
{
id: "manual",
recordingId: "r",
kind: "manual",
source: "review-ui",
timeMs: 120,
createdAt: "now",
},
{
id: "click-1",
recordingId: "r",
kind: "click",
source: "cursor-recording",
timeMs: 100,
normalizedX: 0.5,
normalizedY: 0.5,
createdAt: "now",
},
{
id: "click-2",
recordingId: "r",
kind: "click",
source: "cursor-recording",
timeMs: 150,
normalizedX: 0.5,
normalizedY: 0.5,
createdAt: "now",
},
]);
expect(events.map((event) => event.id)).toEqual(["click-1", "manual"]);
});
});
+137
View File
@@ -0,0 +1,137 @@
import type { CursorRecordingSample } from "@/native/contracts";
import { type GuideEvent, type GuideRecordingIdInput } from "./contracts";
export const DEFAULT_GUIDE_SCREENSHOT_OFFSET_MS = 500;
export const DEFAULT_GUIDE_CLICK_DEDUPE_WINDOW_MS = 250;
export const DEFAULT_GUIDE_CLICK_DEDUPE_RADIUS = 0.004;
export interface BuildGuideEventsFromCursorInput {
recordingId: GuideRecordingIdInput;
samples: CursorRecordingSample[];
screenshotOffsetMs?: number;
dedupeWindowMs?: number;
dedupeRadius?: number;
nowIso?: string;
}
export function buildGuideEventsFromCursor(input: BuildGuideEventsFromCursorInput): GuideEvent[] {
const recordingId = normalizeRecordingId(input.recordingId);
if (!recordingId) {
return [];
}
const screenshotOffsetMs = Number.isFinite(input.screenshotOffsetMs)
? Math.max(0, input.screenshotOffsetMs ?? DEFAULT_GUIDE_SCREENSHOT_OFFSET_MS)
: DEFAULT_GUIDE_SCREENSHOT_OFFSET_MS;
const createdAt = input.nowIso ?? new Date().toISOString();
const clickEvents = input.samples
.filter((sample) => sample.interactionType === "click")
.map((sample, index): GuideEvent => {
const timeMs = Number.isFinite(sample.timeMs) ? Math.max(0, sample.timeMs) : 0;
return {
id: createGuideEventId(recordingId, timeMs, index),
recordingId,
kind: "click",
source: "cursor-recording",
timeMs,
normalizedX: clamp01(sample.cx),
normalizedY: clamp01(sample.cy),
button: "left",
screenshotOffsetMs,
createdAt,
};
});
return dedupeGuideClickEvents(sortGuideEvents(clickEvents), {
windowMs: input.dedupeWindowMs ?? DEFAULT_GUIDE_CLICK_DEDUPE_WINDOW_MS,
radius: input.dedupeRadius ?? DEFAULT_GUIDE_CLICK_DEDUPE_RADIUS,
});
}
export function mergeGuideEvents(
events: GuideEvent[],
options: {
dedupeWindowMs?: number;
dedupeRadius?: number;
} = {},
): GuideEvent[] {
const cursorEvents = events.filter((event) => event.source === "cursor-recording");
const nonCursorEvents = events.filter((event) => event.source !== "cursor-recording");
return sortGuideEvents([
...dedupeGuideClickEvents(sortGuideEvents(cursorEvents), {
windowMs: options.dedupeWindowMs ?? DEFAULT_GUIDE_CLICK_DEDUPE_WINDOW_MS,
radius: options.dedupeRadius ?? DEFAULT_GUIDE_CLICK_DEDUPE_RADIUS,
}),
...nonCursorEvents,
]);
}
export function sortGuideEvents(events: GuideEvent[]): GuideEvent[] {
return [...events].sort((left, right) => left.timeMs - right.timeMs);
}
export function createGuideEventId(recordingId: string, timeMs: number, index: number): string {
return `guide-click-${recordingId}-${Math.round(timeMs)}-${index}`;
}
function dedupeGuideClickEvents(
events: GuideEvent[],
options: { windowMs: number; radius: number },
): GuideEvent[] {
const deduped: GuideEvent[] = [];
for (const event of events) {
const previous = deduped.at(-1);
if (previous && isDuplicateClick(previous, event, options)) {
continue;
}
deduped.push(event);
}
return deduped;
}
function isDuplicateClick(
previous: GuideEvent,
next: GuideEvent,
options: { windowMs: number; radius: number },
): boolean {
if (previous.source !== "cursor-recording" || next.source !== "cursor-recording") {
return false;
}
if (next.timeMs - previous.timeMs > options.windowMs) {
return false;
}
const previousX = previous.normalizedX;
const previousY = previous.normalizedY;
const nextX = next.normalizedX;
const nextY = next.normalizedY;
if (
previousX === undefined ||
previousY === undefined ||
nextX === undefined ||
nextY === undefined
) {
return true;
}
const distance = Math.hypot(nextX - previousX, nextY - previousY);
return distance <= options.radius;
}
function normalizeRecordingId(recordingId: GuideRecordingIdInput): string | null {
if (typeof recordingId === "number") {
return Number.isFinite(recordingId) ? String(Math.trunc(recordingId)) : null;
}
if (typeof recordingId !== "string") {
return null;
}
const trimmed = recordingId.trim();
return trimmed.length > 0 ? trimmed : null;
}
function clamp01(value: number): number | undefined {
if (!Number.isFinite(value)) {
return undefined;
}
return Math.min(1, Math.max(0, value));
}
+86
View File
@@ -0,0 +1,86 @@
import { describe, expect, it } from "vitest";
import { GUIDE_SCHEMA_VERSION, type GuideSession } from "./contracts";
import { exportGuideToHtml, exportGuideToMarkdown } from "./exporters";
const session: GuideSession = {
schemaVersion: GUIDE_SCHEMA_VERSION,
recordingId: "rec-1",
videoPath: "/tmp/recording.mp4",
guidePath: "/tmp/recording.guide.json",
outputDir: "/tmp/recording-guide",
status: "draft-ready",
events: [
{
id: "event-1",
recordingId: "rec-1",
kind: "click",
source: "cursor-recording",
timeMs: 1000,
normalizedX: 0.25,
normalizedY: 0.75,
button: "left",
createdAt: "now",
},
],
snapshots: [
{
id: "snapshot-1",
eventId: "event-1",
timeMs: 1500,
offsetMs: 500,
path: "/tmp/recording-guide/step-001.png",
width: 1280,
height: 720,
},
],
ocrBlocks: [],
candidates: [
{
id: "candidate-1",
eventId: "event-1",
snapshotId: "snapshot-1",
timeMs: 1000,
action: "click",
targetText: "Settings",
targetRole: "button",
nearbyText: ["Settings"],
confidence: 0.9,
},
],
generatedGuide: {
title: "User guide",
summary: "A generated guide.",
steps: [
{
id: "guide-step-1",
order: 1,
title: "Open Settings",
instruction: "Click Settings.",
screenshotPath: "/tmp/recording-guide/step-001.png",
sourceCandidateId: "candidate-1",
},
],
},
createdAt: "now",
updatedAt: "now",
};
describe("guide exporters", () => {
it("exports markdown with relative screenshot references", () => {
const markdown = exportGuideToMarkdown(session);
expect(markdown).toContain("# User guide");
expect(markdown).toContain("## 1. Open Settings");
expect(markdown).toContain("](step-001.png)");
});
it("exports escaped HTML", () => {
const html = exportGuideToHtml(session);
expect(html).toContain("<!doctype html>");
expect(html).toContain("<h1>User guide</h1>");
expect(html).toContain('src="step-001.png"');
expect(html).toContain("click-marker");
expect(html).toContain("left: 25.00%; top: 75.00%;");
});
});
+141
View File
@@ -0,0 +1,141 @@
import path from "node:path";
import type { GeneratedGuideStep, GuideSession } from "./contracts";
export function exportGuideToMarkdown(session: GuideSession): string {
const guide = requireGeneratedGuide(session);
const lines = [`# ${guide.title}`, ""];
if (guide.summary) {
lines.push(guide.summary, "");
}
for (const step of guide.steps) {
lines.push(`## ${step.order}. ${step.title}`, "", step.instruction, "");
if (step.screenshotPath) {
lines.push(`![${escapeMarkdownAlt(step.title)}](${path.basename(step.screenshotPath)})`, "");
}
}
return `${lines.join("\n").trimEnd()}\n`;
}
export function exportGuideToHtml(session: GuideSession): string {
const guide = requireGeneratedGuide(session);
const steps = guide.steps.map((step) => renderStepHtml(step, session)).join("\n");
const summary = guide.summary ? `<p class="summary">${escapeHtml(guide.summary)}</p>` : "";
return `<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>${escapeHtml(guide.title)}</title>
<style>
body { font-family: Arial, sans-serif; line-height: 1.55; margin: 32px; color: #111827; }
main { max-width: 880px; margin: 0 auto; }
h1 { font-size: 28px; margin: 0 0 8px; }
.summary { color: #4b5563; margin: 0 0 28px; }
.step { border-top: 1px solid #e5e7eb; padding: 22px 0; }
.step h2 { font-size: 18px; margin: 0 0 8px; }
.step p { margin: 0 0 12px; }
.shot { display: inline-block; position: relative; max-width: 100%; margin: 0; }
img { display: block; max-width: 100%; border: 1px solid #e5e7eb; border-radius: 6px; }
.click-marker { position: absolute; width: 26px; height: 26px; border: 3px solid #ef4444; border-radius: 999px; box-shadow: 0 0 0 4px rgba(239, 68, 68, 0.18), 0 2px 8px rgba(17, 24, 39, 0.28); transform: translate(-50%, -50%); pointer-events: none; }
.click-marker::after { content: ""; position: absolute; left: 50%; top: 50%; width: 6px; height: 6px; border-radius: 999px; background: #ef4444; transform: translate(-50%, -50%); }
</style>
</head>
<body>
<main>
<h1>${escapeHtml(guide.title)}</h1>
${summary}
${steps}
</main>
</body>
</html>
`;
}
function renderStepHtml(step: GeneratedGuideStep, session: GuideSession): string {
const clickPoint = resolveStepClickPoint(step, session);
const marker = clickPoint
? `<span class="click-marker" style="left: ${formatPercent(clickPoint.x)}%; top: ${formatPercent(clickPoint.y)}%;" aria-label="Click position"></span>`
: "";
const image = step.screenshotPath
? `<figure class="shot"><img src="${escapeHtml(path.basename(step.screenshotPath))}" alt="${escapeHtml(step.title)}">${marker}</figure>`
: "";
return `<section class="step">
<h2>${step.order}. ${escapeHtml(step.title)}</h2>
<p>${escapeHtml(step.instruction)}</p>
${image}
</section>`;
}
function requireGeneratedGuide(session: GuideSession) {
if (!session.generatedGuide) {
throw new Error("Guide session does not have a generated guide.");
}
return session.generatedGuide;
}
function escapeMarkdownAlt(value: string): string {
return value.replace(/[[\]]/g, "");
}
function escapeHtml(value: string): string {
return value
.replace(/&/g, "&amp;")
.replace(/</g, "&lt;")
.replace(/>/g, "&gt;")
.replace(/"/g, "&quot;")
.replace(/'/g, "&#39;");
}
function resolveStepClickPoint(
step: GeneratedGuideStep,
session: GuideSession,
): { x: number; y: number } | null {
const candidate = step.sourceCandidateId
? session.candidates.find((item) => item.id === step.sourceCandidateId)
: undefined;
const eventId = candidate?.eventId;
const event = eventId ? session.events.find((item) => item.id === eventId) : undefined;
if (!event || event.kind !== "click") {
return null;
}
if (isNormalizedNumber(event.normalizedX) && isNormalizedNumber(event.normalizedY)) {
return { x: clamp01(event.normalizedX), y: clamp01(event.normalizedY) };
}
const screenshotFileName = step.screenshotPath ? path.basename(step.screenshotPath) : undefined;
const snapshot =
(candidate?.snapshotId
? session.snapshots.find((item) => item.id === candidate.snapshotId)
: undefined) ??
(screenshotFileName
? session.snapshots.find((item) => path.basename(item.path) === screenshotFileName)
: undefined);
if (
!snapshot ||
typeof event.x !== "number" ||
typeof event.y !== "number" ||
snapshot.width <= 0 ||
snapshot.height <= 0
) {
return null;
}
return {
x: clamp01(event.x / snapshot.width),
y: clamp01(event.y / snapshot.height),
};
}
function formatPercent(value: number): string {
return (clamp01(value) * 100).toFixed(2);
}
function isNormalizedNumber(value: unknown): value is number {
return typeof value === "number" && Number.isFinite(value) && value >= 0 && value <= 1;
}
function clamp01(value: number): number {
return Math.min(1, Math.max(0, value));
}
+65
View File
@@ -0,0 +1,65 @@
import { describe, expect, it } from "vitest";
import { GUIDE_SCHEMA_VERSION, type GuideSession, type GuideStepCandidate } from "./contracts";
import { buildGuideDraftPrompt, buildLocalGuideDraft } from "./promptBuilder";
const session: GuideSession = {
schemaVersion: GUIDE_SCHEMA_VERSION,
recordingId: "rec-1",
videoPath: "/tmp/recording.mp4",
guidePath: "/tmp/recording.guide.json",
outputDir: "/tmp/recording-guide",
status: "ocr-ready",
events: [],
snapshots: [
{
id: "snapshot-1",
eventId: "event-1",
timeMs: 1500,
offsetMs: 500,
path: "/tmp/recording-guide/step-001.png",
width: 1280,
height: 720,
},
],
ocrBlocks: [],
candidates: [],
createdAt: "now",
updatedAt: "now",
};
const candidates: GuideStepCandidate[] = [
{
id: "candidate-1",
eventId: "event-1",
snapshotId: "snapshot-1",
timeMs: 1000,
action: "click",
targetText: "Save",
targetRole: "button",
nearbyText: ["Save"],
confidence: 0.9,
},
];
describe("guide draft helpers", () => {
it("builds a strict JSON prompt for AI providers", () => {
const prompt = buildGuideDraftPrompt({ session, candidates, language: "en" });
expect(prompt).toContain("Return JSON only");
expect(prompt).toContain('"targetText": "Save"');
expect(prompt).toContain('"id":"guide-step-1"');
});
it("builds a deterministic local guide draft", () => {
const guide = buildLocalGuideDraft(session, candidates, "en");
expect(guide.title).toBe("User guide");
expect(guide.steps[0]).toMatchObject({
id: "guide-step-1",
order: 1,
title: "Step 1: Save",
instruction: 'Click "Save".',
screenshotPath: "/tmp/recording-guide/step-001.png",
});
});
});
+109
View File
@@ -0,0 +1,109 @@
import type {
GeneratedGuide,
GeneratedGuideStep,
GuideLanguage,
GuideSession,
GuideStepCandidate,
} from "./contracts";
export interface GuidePromptInput {
session: GuideSession;
candidates: GuideStepCandidate[];
language: GuideLanguage;
}
export function buildGuideDraftPrompt(input: GuidePromptInput): string {
const languageLabel = input.language === "vi" ? "Vietnamese" : "English";
const candidatesJson = JSON.stringify(
input.candidates.map((candidate, index) => ({
order: index + 1,
timeMs: Math.round(candidate.timeMs),
action: candidate.action,
targetText: candidate.targetText,
targetRole: candidate.targetRole,
nearbyText: candidate.nearbyText,
confidence: candidate.confidence,
})),
null,
2,
);
return [
"You write software user guides from recorded UI interactions.",
`Write the guide in ${languageLabel}.`,
"Return JSON only with this shape:",
'{"title":"...","summary":"...","steps":[{"id":"guide-step-1","order":1,"title":"...","instruction":"...","sourceCandidateId":"..."}]}',
"Rules:",
"- Use short, explicit step instructions.",
"- Prefer visible target text from OCR when it is available.",
"- Do not invent buttons or screens that are not in the candidates.",
"- If a target is unclear, describe the action by screen position or timestamp.",
"",
"Candidates:",
candidatesJson,
].join("\n");
}
export function buildLocalGuideDraft(
session: GuideSession,
candidates: GuideStepCandidate[],
language: GuideLanguage,
): GeneratedGuide {
const sortedCandidates = [...candidates].sort((left, right) => left.timeMs - right.timeMs);
const steps = sortedCandidates.map((candidate, index): GeneratedGuideStep => {
const order = index + 1;
return {
id: `guide-step-${order}`,
order,
title: buildStepTitle(candidate, order, language),
instruction: buildInstruction(candidate, language),
screenshotPath: session.snapshots.find((snapshot) => snapshot.eventId === candidate.eventId)
?.path,
sourceCandidateId: candidate.id,
};
});
return {
title: language === "vi" ? "Hướng dẫn thao tác" : "User guide",
summary:
language === "vi"
? "Tài liệu được tạo từ các thao tác đã ghi lại trên màn hình."
: "Generated from recorded screen interactions.",
steps,
};
}
function buildStepTitle(
candidate: GuideStepCandidate,
order: number,
language: GuideLanguage,
): string {
if (candidate.targetText) {
return language === "vi"
? `Bước ${order}: ${candidate.targetText}`
: `Step ${order}: ${candidate.targetText}`;
}
return language === "vi" ? `Bước ${order}` : `Step ${order}`;
}
function buildInstruction(candidate: GuideStepCandidate, language: GuideLanguage): string {
const target = candidate.targetText;
if (language === "vi") {
if (target) {
return `${candidate.action === "click" ? "Nhấn" : "Thực hiện thao tác"} vào "${target}".`;
}
return `Thực hiện thao tác tại mốc ${formatTimestamp(candidate.timeMs)}.`;
}
if (target) {
return `${candidate.action === "click" ? "Click" : "Use"} "${target}".`;
}
return `Perform the action at ${formatTimestamp(candidate.timeMs)}.`;
}
function formatTimestamp(timeMs: number): string {
const totalSeconds = Math.max(0, Math.round(timeMs / 1000));
const minutes = Math.floor(totalSeconds / 60);
const seconds = totalSeconds % 60;
return `${minutes}:${String(seconds).padStart(2, "0")}`;
}
+145
View File
@@ -0,0 +1,145 @@
import type { GuideEvent, GuideSession } from "../contracts";
export interface CaptureGuideSnapshotsInput {
session: GuideSession;
videoUrl: string;
maxWidth?: number;
}
export async function captureGuideSnapshots(
input: CaptureGuideSnapshotsInput,
): Promise<GuideSession> {
const events = [...input.session.events].sort((left, right) => left.timeMs - right.timeMs);
if (events.length === 0) {
return input.session;
}
const video = document.createElement("video");
video.preload = "auto";
video.muted = true;
video.src = input.videoUrl;
video.playsInline = true;
try {
await waitForLoadedMetadata(video);
const canvas = document.createElement("canvas");
const context = canvas.getContext("2d");
if (!context) {
throw new Error("Canvas 2D context is unavailable.");
}
const sourceWidth = video.videoWidth || 1280;
const sourceHeight = video.videoHeight || 720;
const scale = input.maxWidth && sourceWidth > input.maxWidth ? input.maxWidth / sourceWidth : 1;
canvas.width = Math.max(1, Math.round(sourceWidth * scale));
canvas.height = Math.max(1, Math.round(sourceHeight * scale));
let latestSession = input.session;
for (const event of events) {
const offsetMs = event.screenshotOffsetMs ?? 500;
const timeMs = getSnapshotTimeMs(event, offsetMs, video.duration);
await seekVideo(video, timeMs / 1000);
context.drawImage(video, 0, 0, canvas.width, canvas.height);
const pngBytes = await canvasToPngBytes(canvas);
const result = await window.electronAPI.guide.writeSnapshot({
recordingId: input.session.recordingId,
eventId: event.id,
timeMs,
offsetMs,
pngBytes,
width: canvas.width,
height: canvas.height,
});
if (!result.success) {
throw new Error(result.error);
}
latestSession = result.data;
}
return latestSession;
} finally {
video.removeAttribute("src");
video.load();
}
}
function getSnapshotTimeMs(event: GuideEvent, offsetMs: number, durationSeconds: number): number {
const durationMs = Number.isFinite(durationSeconds)
? durationSeconds * 1000
: Number.POSITIVE_INFINITY;
return Math.max(0, Math.min(durationMs, event.timeMs + Math.max(0, offsetMs)));
}
function waitForLoadedMetadata(video: HTMLVideoElement): Promise<void> {
if (video.readyState >= HTMLMediaElement.HAVE_METADATA) {
return Promise.resolve();
}
return waitForVideoEvent(video, "loadedmetadata", "Unable to load video metadata.");
}
function seekVideo(video: HTMLVideoElement, timeSeconds: number): Promise<void> {
return new Promise((resolve, reject) => {
const cleanup = () => {
window.clearTimeout(timeoutId);
video.removeEventListener("seeked", handleSeeked);
video.removeEventListener("error", handleError);
};
const handleSeeked = () => {
cleanup();
resolve();
};
const handleError = () => {
cleanup();
reject(new Error("Unable to seek video for guide snapshot."));
};
const timeoutId = window.setTimeout(() => {
cleanup();
reject(new Error("Timed out while seeking video for guide snapshot."));
}, 8000);
video.addEventListener("seeked", handleSeeked, { once: true });
video.addEventListener("error", handleError, { once: true });
video.currentTime = Math.max(0, timeSeconds);
});
}
function waitForVideoEvent(
video: HTMLVideoElement,
eventName: keyof HTMLMediaElementEventMap,
errorMessage: string,
): Promise<void> {
return new Promise((resolve, reject) => {
const cleanup = () => {
window.clearTimeout(timeoutId);
video.removeEventListener(eventName, handleReady);
video.removeEventListener("error", handleError);
};
const handleReady = () => {
cleanup();
resolve();
};
const handleError = () => {
cleanup();
reject(new Error(errorMessage));
};
const timeoutId = window.setTimeout(() => {
cleanup();
reject(new Error(errorMessage));
}, 8000);
video.addEventListener(eventName, handleReady, { once: true });
video.addEventListener("error", handleError, { once: true });
video.load();
});
}
function canvasToPngBytes(canvas: HTMLCanvasElement): Promise<ArrayBuffer> {
return new Promise((resolve, reject) => {
canvas.toBlob((blob) => {
if (!blob) {
reject(new Error("Unable to encode guide snapshot PNG."));
return;
}
blob.arrayBuffer().then(resolve, reject);
}, "image/png");
});
}
+139
View File
@@ -0,0 +1,139 @@
import { describe, expect, it } from "vitest";
import { GUIDE_SCHEMA_VERSION, type GuideSession } from "./contracts";
import { buildGuideStepCandidates } from "./targetMapper";
function createSession(): GuideSession {
return {
schemaVersion: GUIDE_SCHEMA_VERSION,
recordingId: "rec-1",
videoPath: "/tmp/recording.mp4",
guidePath: "/tmp/recording.guide.json",
outputDir: "/tmp/recording-guide",
status: "ocr-ready",
events: [
{
id: "event-1",
recordingId: "rec-1",
kind: "click",
source: "cursor-recording",
timeMs: 1000,
normalizedX: 0.5,
normalizedY: 0.5,
createdAt: "now",
},
],
snapshots: [
{
id: "snapshot-1",
eventId: "event-1",
timeMs: 1200,
offsetMs: 200,
path: "/tmp/recording-guide/step-001.png",
width: 1000,
height: 800,
},
],
ocrBlocks: [
{
id: "ocr-1",
snapshotId: "snapshot-1",
text: "Save",
confidence: 0.94,
box: { x: 0.44, y: 0.46, width: 0.14, height: 0.07 },
},
{
id: "ocr-2",
snapshotId: "snapshot-1",
text: "Settings",
confidence: 0.9,
box: { x: 0.1, y: 0.1, width: 0.18, height: 0.06 },
},
],
candidates: [],
createdAt: "now",
updatedAt: "now",
};
}
describe("buildGuideStepCandidates", () => {
it("selects OCR text near the recorded click", () => {
const candidates = buildGuideStepCandidates(createSession());
expect(candidates).toHaveLength(1);
expect(candidates[0]).toMatchObject({
eventId: "event-1",
action: "click",
targetText: "Save",
targetRole: "button",
snapshotId: "snapshot-1",
});
expect(candidates[0]?.nearbyText).toEqual(["Save", "Settings"]);
expect(candidates[0]?.confidence).toBeGreaterThan(0.8);
});
it("uses manual labels when OCR is not available", () => {
const session = createSession();
session.events[0] = {
...session.events[0],
kind: "manual",
source: "review-ui",
label: "Open report",
};
session.ocrBlocks = [];
const candidates = buildGuideStepCandidates(session);
expect(candidates[0]).toMatchObject({
action: "manual",
targetText: "Open report",
confidence: 0.75,
});
});
it("prefers a nearby line phrase over a single OCR word", () => {
const session = createSession();
session.events[0] = {
...session.events[0],
normalizedX: 0.49,
normalizedY: 0.31,
};
session.ocrBlocks = [
{
id: "ocr-1",
snapshotId: "snapshot-1",
text: "Cho",
confidence: 0.8,
box: { x: 0.36, y: 0.3, width: 0.035, height: 0.02 },
},
{
id: "ocr-2",
snapshotId: "snapshot-1",
text: "phép",
confidence: 0.8,
box: { x: 0.4, y: 0.3, width: 0.04, height: 0.02 },
},
{
id: "ocr-3",
snapshotId: "snapshot-1",
text: "điều",
confidence: 0.8,
box: { x: 0.445, y: 0.3, width: 0.04, height: 0.02 },
},
{
id: "ocr-4",
snapshotId: "snapshot-1",
text: "khiển",
confidence: 0.8,
box: { x: 0.49, y: 0.3, width: 0.045, height: 0.02 },
},
];
const candidates = buildGuideStepCandidates(session);
expect(candidates[0]).toMatchObject({
targetText: "Cho phép điều khiển",
targetRole: "unknown",
});
expect(candidates[0]?.nearbyText[0]).toBe("Cho phép điều khiển");
});
});
+354
View File
@@ -0,0 +1,354 @@
import type {
GuideAction,
GuideEvent,
GuideSession,
GuideStepCandidate,
GuideTargetRole,
OcrBlock,
} from "./contracts";
const DEFAULT_MAX_NEARBY_TEXT = 5;
const DEFAULT_CLICK_RADIUS = 0.18;
const TARGET_SCORE_THRESHOLD = 0.32;
interface TextRegion {
text: string;
confidence: number;
box: OcrBlock["box"];
}
export interface BuildGuideStepCandidatesOptions {
maxNearbyText?: number;
clickRadius?: number;
}
export function buildGuideStepCandidates(
session: GuideSession,
options: BuildGuideStepCandidatesOptions = {},
): GuideStepCandidate[] {
const maxNearbyText = Math.max(1, options.maxNearbyText ?? DEFAULT_MAX_NEARBY_TEXT);
const clickRadius = Math.max(0.01, options.clickRadius ?? DEFAULT_CLICK_RADIUS);
const snapshotsByEventId = new Map(
session.snapshots.map((snapshot) => [snapshot.eventId, snapshot]),
);
const ocrBlocksBySnapshotId = groupOcrBlocksBySnapshot(session.ocrBlocks);
return [...session.events]
.sort((left, right) => left.timeMs - right.timeMs)
.map((event): GuideStepCandidate => {
const snapshot = snapshotsByEventId.get(event.id);
const blocks = snapshot ? (ocrBlocksBySnapshotId.get(snapshot.id) ?? []) : [];
const rankedRegions = rankTextRegionsForEvent(event, blocks, clickRadius);
const targetRegion = rankedRegions.find(
({ score }) => score >= TARGET_SCORE_THRESHOLD,
)?.region;
const nearbyText = uniqueText(rankedRegions.map(({ region }) => region.text)).slice(
0,
maxNearbyText,
);
const label = normalizeText(event.label);
const targetText = label ?? normalizeText(targetRegion?.text);
return {
id: `candidate-${event.id}`,
eventId: event.id,
snapshotId: snapshot?.id,
timeMs: event.timeMs,
action: inferAction(event),
targetText,
targetRole: inferTargetRole(targetText),
nearbyText,
confidence: calculateCandidateConfidence(event, targetRegion, rankedRegions[0]?.score),
};
});
}
function groupOcrBlocksBySnapshot(ocrBlocks: OcrBlock[]): Map<string, OcrBlock[]> {
const grouped = new Map<string, OcrBlock[]>();
for (const block of ocrBlocks) {
const existing = grouped.get(block.snapshotId) ?? [];
existing.push(block);
grouped.set(block.snapshotId, existing);
}
return grouped;
}
function rankTextRegionsForEvent(
event: GuideEvent,
blocks: OcrBlock[],
clickRadius: number,
): Array<{ region: TextRegion; score: number }> {
const click = getEventPoint(event);
return buildTextRegions(blocks)
.map((region) => ({ region, score: scoreRegion(region, click, clickRadius) }))
.filter(({ region }) => normalizeText(region.text) !== undefined)
.sort(
(left, right) =>
right.score - left.score ||
right.region.confidence - left.region.confidence ||
right.region.text.length - left.region.text.length,
);
}
function buildTextRegions(blocks: OcrBlock[]): TextRegion[] {
const wordRegions = blocks
.map((block): TextRegion | null => {
const text = normalizeText(block.text);
if (!text || !isUsefulOcrText(text)) {
return null;
}
return {
text,
confidence: block.confidence,
box: block.box,
};
})
.filter((region): region is TextRegion => region !== null);
const phraseRegions = buildPhraseRegions(wordRegions);
return [...phraseRegions, ...wordRegions];
}
function buildPhraseRegions(regions: TextRegion[]): TextRegion[] {
const sorted = [...regions].sort((left, right) => regionCenterY(left) - regionCenterY(right));
const lines: TextRegion[][] = [];
for (const region of sorted) {
const centerY = regionCenterY(region);
const line = lines.find(
(candidate) =>
Math.abs(regionCenterY(candidate[0]) - centerY) <=
Math.max(0.012, averageHeight(candidate) * 0.9),
);
if (line) {
line.push(region);
} else {
lines.push([region]);
}
}
const phrases: TextRegion[] = [];
for (const line of lines) {
const segments = splitLineIntoSegments(line.sort((left, right) => left.box.x - right.box.x));
for (const segment of segments) {
if (segment.length < 2) {
continue;
}
const phrase = mergeRegions(segment);
if (phrase.text.length >= 3) {
phrases.push(phrase);
}
}
}
return phrases;
}
function splitLineIntoSegments(line: TextRegion[]): TextRegion[][] {
const segments: TextRegion[][] = [];
let current: TextRegion[] = [];
for (const region of line) {
const previous = current.at(-1);
if (
previous &&
region.box.x - (previous.box.x + previous.box.width) >
Math.max(0.025, averageHeight(current) * 3)
) {
segments.push(current);
current = [];
}
current.push(region);
}
if (current.length > 0) {
segments.push(current);
}
return segments;
}
function mergeRegions(regions: TextRegion[]): TextRegion {
const minX = Math.min(...regions.map((region) => region.box.x));
const minY = Math.min(...regions.map((region) => region.box.y));
const maxX = Math.max(...regions.map((region) => region.box.x + region.box.width));
const maxY = Math.max(...regions.map((region) => region.box.y + region.box.height));
return {
text: regions.map((region) => region.text).join(" "),
confidence:
regions.reduce((total, region) => total + clamp01(region.confidence), 0) / regions.length,
box: {
x: minX,
y: minY,
width: maxX - minX,
height: maxY - minY,
},
};
}
function scoreRegion(
region: TextRegion,
click: { x: number; y: number } | null,
clickRadius: number,
): number {
if (!click) {
return clamp01(region.confidence);
}
const centerX = region.box.x + region.box.width / 2;
const centerY = region.box.y + region.box.height / 2;
const distance = Math.hypot(centerX - click.x, centerY - click.y);
const proximity = clamp01(1 - distance / clickRadius);
const contains = pointInsideExpandedBox(click, region, 0.025) ? 0.35 : 0;
return clamp01(
proximity * 0.35 +
clamp01(region.confidence) * 0.2 +
contains +
calculateTextQuality(region.text) * 0.2,
);
}
function getEventPoint(event: GuideEvent): { x: number; y: number } | null {
if (event.normalizedX !== undefined && event.normalizedY !== undefined) {
return { x: clamp01(event.normalizedX), y: clamp01(event.normalizedY) };
}
if (
event.x !== undefined &&
event.y !== undefined &&
event.x >= 0 &&
event.x <= 1 &&
event.y >= 0 &&
event.y <= 1
) {
return { x: event.x, y: event.y };
}
return null;
}
function pointInsideExpandedBox(
point: { x: number; y: number },
region: Pick<TextRegion, "box">,
padding: number,
): boolean {
return (
point.x >= region.box.x - padding &&
point.x <= region.box.x + region.box.width + padding &&
point.y >= region.box.y - padding &&
point.y <= region.box.y + region.box.height + padding
);
}
function inferAction(event: GuideEvent): GuideAction {
if (event.kind === "click") {
return "click";
}
return "manual";
}
function inferTargetRole(text: string | undefined): GuideTargetRole | undefined {
if (!text) {
return undefined;
}
const normalized = text.toLowerCase();
if (
/\b(ok|save|create|next|continue|login|submit|cancel|done|apply|open|start|finish)\b/.test(
normalized,
) ||
/(lưu|tiếp|đăng nhập|hủy|áp dụng|mở|bắt đầu|hoàn tất)/i.test(text)
) {
return "button";
}
if (/\b(menu|file|edit|view|settings)\b/.test(normalized) || /(menu|cài đặt)/i.test(text)) {
return "menu";
}
if (/\b(tab)\b/.test(normalized) || /(thẻ|tab)/i.test(text)) {
return "tab";
}
if (/\b(search|name|email|password|input|field)\b/.test(normalized)) {
return "field";
}
return "unknown";
}
function calculateCandidateConfidence(
event: GuideEvent,
targetRegion: TextRegion | undefined,
score: number | undefined,
): number {
if (targetRegion) {
return roundConfidence(
0.45 + clamp01(targetRegion.confidence) * 0.25 + clamp01(score ?? 0) * 0.3,
);
}
if (event.label) {
return 0.75;
}
if (getEventPoint(event)) {
return 0.45;
}
return 0.3;
}
function uniqueText(values: Array<string | undefined>): string[] {
const seen = new Set<string>();
const result: string[] = [];
for (const value of values) {
const normalized = normalizeText(value);
if (!normalized) {
continue;
}
const key = normalized.toLowerCase();
if (seen.has(key)) {
continue;
}
seen.add(key);
result.push(normalized);
}
return result;
}
function normalizeText(value: string | undefined): string | undefined {
const text = value?.replace(/\s+/g, " ").trim();
return text ? text : undefined;
}
function isUsefulOcrText(text: string): boolean {
if (!/[A-Za-z0-9À-ỹ]/.test(text)) {
return false;
}
if (text.length === 1) {
return false;
}
return true;
}
function calculateTextQuality(text: string): number {
let score = 0.35;
if (text.includes(" ")) {
score += 0.5;
}
if (text.length >= 4) {
score += 0.25;
}
if (/[�]/.test(text)) {
score -= 0.25;
}
if (/^[\W_]+$/.test(text)) {
score -= 0.35;
}
return clamp01(score);
}
function regionCenterY(region: TextRegion): number {
return region.box.y + region.box.height / 2;
}
function averageHeight(regions: TextRegion[]): number {
return regions.reduce((total, region) => total + region.box.height, 0) / regions.length;
}
function roundConfidence(value: number): number {
return Math.round(clamp01(value) * 100) / 100;
}
function clamp01(value: number): number {
if (!Number.isFinite(value)) {
return 0;
}
return Math.min(1, Math.max(0, value));
}
+113 -3
View File
@@ -54,6 +54,9 @@ type UseScreenRecorderReturn = {
toggleRecording: () => void;
togglePaused: () => void;
canPauseRecording: boolean;
guideModeEnabled: boolean;
setGuideModeEnabled: (enabled: boolean) => void;
addGuideMarker: () => void;
restartRecording: () => void;
cancelRecording: () => void;
microphoneEnabled: boolean;
@@ -100,6 +103,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
const [systemAudioEnabled, setSystemAudioEnabled] = useState(false);
const [webcamEnabled, setWebcamEnabledState] = useState(false);
const [cursorCaptureMode, setCursorCaptureMode] = useState<CursorCaptureMode>("editable-overlay");
const [guideModeEnabled, setGuideModeEnabledState] = useState(false);
const screenRecorder = useRef<RecorderHandle | null>(null);
const webcamRecorder = useRef<RecorderHandle | null>(null);
const nativeWindowsRecording = useRef<NativeWindowsRecordingHandle | null>(null);
@@ -117,6 +121,8 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
const discardRecordingId = useRef<number | null>(null);
const restarting = useRef(false);
const countdownRunId = useRef(0);
const activeGuideRecordingId = useRef<number | null>(null);
const guideModeEnabledRef = useRef(false);
const [countdownActive, setCountdownActive] = useState(false);
const webcamReady = useRef(false);
const webcamAcquireId = useRef(0);
@@ -134,6 +140,89 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
return accumulatedDurationMs.current + segmentDuration;
}, []);
const setGuideModeEnabled = useCallback(
(enabled: boolean) => {
if (recording) {
return;
}
guideModeEnabledRef.current = enabled;
setGuideModeEnabledState(enabled);
},
[recording],
);
const startGuideSession = useCallback(async (activeRecordingId: number) => {
if (!guideModeEnabledRef.current || !window.electronAPI?.guide) {
return;
}
const result = await window.electronAPI.guide.startSession(activeRecordingId);
if (!result.success) {
console.warn("Failed to start guide session:", result.error);
return;
}
activeGuideRecordingId.current = activeRecordingId;
}, []);
const finalizeGuideSession = useCallback(
async (activeRecordingId: number, videoPath?: string | null) => {
if (
activeGuideRecordingId.current !== activeRecordingId ||
!videoPath ||
!window.electronAPI?.guide
) {
return;
}
const result = await window.electronAPI.guide.finalizeEvents({
recordingId: activeRecordingId,
videoPath,
});
if (!result.success) {
console.warn("Failed to finalize guide session:", result.error);
}
activeGuideRecordingId.current = null;
},
[],
);
const discardGuideSession = useCallback(async (activeRecordingId: number) => {
if (!window.electronAPI?.guide) {
return;
}
const result = await window.electronAPI.guide.discardSession({
recordingId: activeRecordingId,
});
if (!result.success) {
console.warn("Failed to discard guide session:", result.error);
}
if (activeGuideRecordingId.current === activeRecordingId) {
activeGuideRecordingId.current = null;
}
}, []);
const addGuideMarker = useCallback(() => {
const activeRecordingId = activeGuideRecordingId.current;
if (!recording || activeRecordingId === null || !window.electronAPI?.guide) {
return;
}
void window.electronAPI.guide
.addMarker({
recordingId: activeRecordingId,
kind: "manual",
timeMs: getRecordingDurationMs(),
label: "Manual marker",
})
.then((result) => {
if (!result.success) {
console.warn("Failed to add guide marker:", result.error);
}
});
}, [getRecordingDurationMs, recording]);
const selectMimeType = () => {
// H.264 first: hardware-accelerated on all modern devices, gives sharp
// real-time output. AV1/VP9 are great for distribution but too
@@ -337,6 +426,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
const screenBlob = await activeScreenRecorder.recordedBlobPromise;
if (discardRecordingId.current === activeRecordingId) {
window.electronAPI?.discardCursorTelemetry(activeRecordingId);
await discardGuideSession(activeRecordingId);
return;
}
// When streaming succeeded the blob is empty — the data is already on disk.
@@ -393,6 +483,10 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
} else if (result.path) {
await window.electronAPI.setCurrentVideoPath(result.path);
}
await finalizeGuideSession(
activeRecordingId,
result.session?.screenVideoPath ?? result.path,
);
await window.electronAPI.switchToEditor();
} catch (error) {
@@ -417,7 +511,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
}
})();
},
[cursorCaptureMode, teardownMedia],
[cursorCaptureMode, discardGuideSession, finalizeGuideSession, teardownMedia],
);
const finalizeNativeWindowsRecording = useCallback(
@@ -456,6 +550,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
try {
const result = await window.electronAPI.stopNativeWindowsRecording(discard);
if (discard || result.discarded) {
await discardGuideSession(activeNativeRecording.recordingId);
clearNativeRecordingState();
return true;
}
@@ -501,6 +596,10 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
} else if (result.path) {
await window.electronAPI.setCurrentVideoPath(result.path);
}
await finalizeGuideSession(
activeNativeRecording.recordingId,
storedSession?.screenVideoPath ?? result.path,
);
await window.electronAPI.switchToEditor();
return true;
@@ -517,7 +616,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
}
}
},
[cursorCaptureMode, getRecordingDurationMs],
[cursorCaptureMode, discardGuideSession, finalizeGuideSession, getRecordingDurationMs],
);
const finalizeNativeMacRecording = useCallback(
@@ -570,6 +669,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
const result = await window.electronAPI.stopNativeMacRecording(discard);
const webcamAsset = await webcamAssetPromise;
if (discard || result.discarded) {
await discardGuideSession(activeNativeRecording.recordingId);
clearNativeRecordingState();
return true;
}
@@ -601,6 +701,10 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
} else if (result.path) {
await window.electronAPI.setCurrentVideoPath(result.path);
}
await finalizeGuideSession(
activeNativeRecording.recordingId,
result.session?.screenVideoPath ?? result.path,
);
await window.electronAPI.switchToEditor();
return true;
@@ -617,7 +721,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
}
}
},
[cursorCaptureMode, getRecordingDurationMs],
[cursorCaptureMode, discardGuideSession, finalizeGuideSession, getRecordingDurationMs],
);
const stopRecording = useRef(() => {
@@ -885,6 +989,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
accumulatedDurationMs.current = 0;
segmentStartedAt.current = Date.now();
allowAutoFinalize.current = true;
await startGuideSession(result.recordingId);
setRecording(true);
setPaused(false);
setElapsedSeconds(0);
@@ -1029,6 +1134,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
accumulatedDurationMs.current = 0;
segmentStartedAt.current = Date.now();
allowAutoFinalize.current = true;
await startGuideSession(result.recordingId);
setRecording(true);
setPaused(false);
setElapsedSeconds(0);
@@ -1368,6 +1474,7 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
accumulatedDurationMs.current = 0;
segmentStartedAt.current = Date.now();
allowAutoFinalize.current = true;
await startGuideSession(activeRecordingId);
setRecording(true);
setPaused(false);
setElapsedSeconds(0);
@@ -1664,6 +1771,9 @@ export function useScreenRecorder(): UseScreenRecorderReturn {
toggleRecording,
togglePaused,
canPauseRecording,
guideModeEnabled,
setGuideModeEnabled,
addGuideMarker,
restartRecording,
cancelRecording,
microphoneEnabled,
+5
View File
@@ -43,5 +43,10 @@
"cursor": {
"useEditableCursor": "استخدام مؤشر قابل للتحرير",
"useSystemCursor": "استخدام مؤشر النظام"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
}
}
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "Use editable cursor",
"useSystemCursor": "Use system cursor"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "Loading sources...",
"screens": "Screens ({{count}})",
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "Usar cursor editable",
"useSystemCursor": "Usar cursor del sistema"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "Cargando fuentes...",
"screens": "Pantallas ({{count}})",
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "Utiliser le curseur éditable",
"useSystemCursor": "Utiliser le curseur système"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "Chargement des sources...",
"screens": "Écrans ({{count}})",
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "Usa cursore modificabile",
"useSystemCursor": "Usa cursore di sistema"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "Caricamento sorgenti...",
"screens": "Schermi ({{count}})",
+1
View File
@@ -1,5 +1,6 @@
{
"zoom": {
"previewHold": "Tieni premuto per vedere l'effetto zoom",
"level": "Livello zoom",
"customScale": "Zoom personalizzato",
"selectRegion": "Seleziona una regione zoom da regolare",
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "編集可能なカーソルを使う",
"useSystemCursor": "システムカーソルを使う"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "ソースを読み込み中...",
"screens": "画面 ({{count}})",
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "편집 가능한 커서 사용",
"useSystemCursor": "시스템 커서 사용"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "소스 불러오는 중...",
"screens": "화면 ({{count}}개)",
+5
View File
@@ -43,5 +43,10 @@
"cursor": {
"useEditableCursor": "Использовать редактируемый курсор",
"useSystemCursor": "Использовать системный курсор"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
}
}
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "Düzenlenebilir imleci kullan",
"useSystemCursor": "Sistem imlecini kullan"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "Kaynaklar yükleniyor...",
"screens": "Ekranlar ({{count}})",
+5
View File
@@ -43,5 +43,10 @@
"cursor": {
"useEditableCursor": "Dùng con trỏ có thể chỉnh sửa",
"useSystemCursor": "Dùng con trỏ hệ thống"
},
"guide": {
"enableGuideMode": "Bật chế độ tạo hướng dẫn",
"disableGuideMode": "Tắt chế độ tạo hướng dẫn",
"addMarker": "Thêm mốc hướng dẫn"
}
}
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "使用可编辑光标",
"useSystemCursor": "使用系统光标"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "正在加载源...",
"screens": "屏幕 ({{count}})",
+5
View File
@@ -28,6 +28,11 @@
"useEditableCursor": "使用可編輯游標",
"useSystemCursor": "使用系統游標"
},
"guide": {
"enableGuideMode": "Enable guide mode",
"disableGuideMode": "Disable guide mode",
"addMarker": "Add guide marker"
},
"sourceSelector": {
"loading": "正在載入來源...",
"screens": "螢幕 ({{count}})",
@@ -0,0 +1,111 @@
{
"Global": {
"model_name": "PP-OCRv5_mobile_det"
},
"Hpi": {
"backend_configs": {
"paddle_infer": {
"trt_dynamic_shapes": {
"x": [
[
1,
3,
32,
32
],
[
1,
3,
736,
736
],
[
1,
3,
4000,
4000
]
]
}
},
"tensorrt": {
"dynamic_shapes": {
"x": [
[
1,
3,
32,
32
],
[
1,
3,
736,
736
],
[
1,
3,
4000,
4000
]
]
}
}
}
},
"PreProcess": {
"transform_ops": [
{
"DecodeImage": {
"channel_first": false,
"img_mode": "BGR"
}
},
{
"DetLabelEncode": null
},
{
"DetResizeForTest": {
"resize_long": 960
}
},
{
"NormalizeImage": {
"mean": [
0.485,
0.456,
0.406
],
"order": "hwc",
"scale": "1./255.",
"std": [
0.229,
0.224,
0.225
]
}
},
{
"ToCHWImage": null
},
{
"KeepKeys": {
"keep_keys": [
"image",
"shape",
"polys",
"ignore_tags"
]
}
}
]
},
"PostProcess": {
"name": "DBPostProcess",
"thresh": 0.3,
"box_thresh": 0.6,
"max_candidates": 1000,
"unclip_ratio": 1.5
}
}
File diff suppressed because one or more lines are too long
@@ -0,0 +1,53 @@
Global:
model_name: PP-OCRv5_mobile_det
Hpi:
backend_configs:
paddle_infer:
trt_dynamic_shapes: &id001
x:
- - 1
- 3
- 32
- 32
- - 1
- 3
- 736
- 736
- - 1
- 3
- 4000
- 4000
tensorrt:
dynamic_shapes: *id001
PreProcess:
transform_ops:
- DecodeImage:
channel_first: false
img_mode: BGR
- DetLabelEncode: null
- DetResizeForTest:
resize_long: 960
- NormalizeImage:
mean:
- 0.485
- 0.456
- 0.406
order: hwc
scale: 1./255.
std:
- 0.229
- 0.224
- 0.225
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- shape
- polys
- ignore_tags
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
@@ -0,0 +1,935 @@
{
"Global": {
"model_name": "latin_PP-OCRv5_mobile_rec"
},
"Hpi": {
"backend_configs": {
"paddle_infer": {
"trt_dynamic_shapes": {
"x": [
[
1,
3,
48,
160
],
[
1,
3,
48,
320
],
[
8,
3,
48,
3200
]
]
}
},
"tensorrt": {
"dynamic_shapes": {
"x": [
[
1,
3,
48,
160
],
[
1,
3,
48,
320
],
[
8,
3,
48,
3200
]
]
}
}
}
},
"PreProcess": {
"transform_ops": [
{
"DecodeImage": {
"channel_first": false,
"img_mode": "BGR"
}
},
{
"MultiLabelEncode": {
"gtc_encode": "NRTRLabelEncode",
"max_text_length": 1000
}
},
{
"RecResizeImg": {
"eval_mode": true,
"image_shape": [
3,
48,
320
]
}
},
{
"KeepKeys": {
"keep_keys": [
"image",
"label_ctc",
"label_gtc",
"length",
"valid_ratio"
]
}
}
]
},
"PostProcess": {
"name": "CTCLabelDecode",
"character_dict": [
"0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
"O",
"P",
"Q",
"R",
"S",
"T",
"U",
"V",
"W",
"X",
"Y",
"Z",
"a",
"b",
"c",
"d",
"e",
"f",
"g",
"h",
"i",
"j",
"k",
"l",
"m",
"n",
"o",
"p",
"q",
"r",
"s",
"t",
"u",
"v",
"w",
"x",
"y",
"z",
"À",
"Á",
"Â",
"Ã",
"Ä",
"Å",
"Æ",
"Ç",
"È",
"É",
"Ê",
"Ë",
"Ì",
"Í",
"Î",
"Ï",
"Ð",
"Ñ",
"Ò",
"Ó",
"Ô",
"Õ",
"Ö",
"×",
"Ø",
"Ù",
"Ú",
"Û",
"Ü",
"Ý",
"Þ",
"ß",
"à",
"á",
"â",
"ã",
"ä",
"å",
"æ",
"ç",
"è",
"é",
"ê",
"ë",
"ì",
"í",
"î",
"ï",
"ð",
"ñ",
"ò",
"ó",
"ô",
"õ",
"ö",
"÷",
"ø",
"ù",
"ú",
"û",
"ü",
"ý",
"þ",
"ÿ",
"Ā",
"ā",
"Ă",
"ă",
"Ą",
"ą",
"Ć",
"ć",
"Ĉ",
"ĉ",
"Ċ",
"ċ",
"Č",
"č",
"Ď",
"ď",
"Đ",
"đ",
"Ē",
"ē",
"Ĕ",
"ĕ",
"Ė",
"ė",
"Ę",
"ę",
"Ě",
"ě",
"Ĝ",
"ĝ",
"Ğ",
"ğ",
"Ġ",
"ġ",
"Ģ",
"ģ",
"Ĥ",
"ĥ",
"Ħ",
"ħ",
"Ĩ",
"ĩ",
"Ī",
"ī",
"Ĭ",
"ĭ",
"Į",
"į",
"İ",
"ı",
"IJ",
"ij",
"Ĵ",
"ĵ",
"Ķ",
"ķ",
"ĸ",
"Ĺ",
"ĺ",
"Ļ",
"ļ",
"Ľ",
"ľ",
"Ŀ",
"ŀ",
"Ł",
"ł",
"Ń",
"ń",
"Ņ",
"ņ",
"Ň",
"ň",
"ʼn",
"Ŋ",
"ŋ",
"Ō",
"ō",
"Ŏ",
"ŏ",
"Ő",
"ő",
"Œ",
"œ",
"Ŕ",
"ŕ",
"Ŗ",
"ŗ",
"Ř",
"ř",
"Ś",
"ś",
"Ŝ",
"ŝ",
"Ş",
"ş",
"Š",
"š",
"Ţ",
"ţ",
"Ť",
"ť",
"Ŧ",
"ŧ",
"Ũ",
"ũ",
"Ū",
"ū",
"Ŭ",
"ŭ",
"Ů",
"ů",
"Ű",
"ű",
"Ų",
"ų",
"Ŵ",
"ŵ",
"Ŷ",
"ŷ",
"Ÿ",
"Ź",
"ź",
"Ż",
"ż",
"Ž",
"ž",
"ſ",
"ƀ",
"Ɓ",
"Ƃ",
"ƃ",
"Ƅ",
"ƅ",
"Ɔ",
"Ƈ",
"ƈ",
"Ɖ",
"Ɗ",
"Ƌ",
"ƌ",
"ƍ",
"Ǝ",
"Ə",
"Ɛ",
"Ƒ",
"ƒ",
"Ɠ",
"Ɣ",
"ƕ",
"Ɩ",
"Ɨ",
"Ƙ",
"ƙ",
"ƚ",
"ƛ",
"Ɯ",
"Ɲ",
"ƞ",
"Ɵ",
"Ơ",
"ơ",
"Ƣ",
"ƣ",
"Ƥ",
"ƥ",
"Ʀ",
"Ƨ",
"ƨ",
"Ʃ",
"ƪ",
"ƫ",
"Ƭ",
"ƭ",
"Ʈ",
"Ư",
"ư",
"Ʊ",
"Ʋ",
"Ƴ",
"ƴ",
"Ƶ",
"ƶ",
"Ʒ",
"Ƹ",
"ƹ",
"ƺ",
"ƻ",
"Ƽ",
"ƽ",
"ƾ",
"ƿ",
"ǀ",
"ǁ",
"ǂ",
"ǃ",
"DŽ",
"Dž",
"dž",
"LJ",
"Lj",
"lj",
"NJ",
"Nj",
"nj",
"Ǎ",
"ǎ",
"Ǐ",
"ǐ",
"Ǒ",
"ǒ",
"Ǔ",
"ǔ",
"Ǖ",
"ǖ",
"Ǘ",
"ǘ",
"Ǚ",
"ǚ",
"Ǜ",
"ǜ",
"ǝ",
"Ǟ",
"ǟ",
"Ǡ",
"ǡ",
"Ǣ",
"ǣ",
"Ǥ",
"ǥ",
"Ǧ",
"ǧ",
"Ǩ",
"ǩ",
"Ǫ",
"ǫ",
"Ǭ",
"ǭ",
"Ǯ",
"ǯ",
"ǰ",
"DZ",
"Dz",
"dz",
"Ǵ",
"ǵ",
"Ƕ",
"Ƿ",
"Ǹ",
"ǹ",
"Ǻ",
"ǻ",
"Ǽ",
"ǽ",
"Ǿ",
"ǿ",
"Ȁ",
"ȁ",
"Ȃ",
"ȃ",
"Ȅ",
"ȅ",
"Ȇ",
"ȇ",
"Ȉ",
"ȉ",
"Ȋ",
"ȋ",
"Ȍ",
"ȍ",
"Ȏ",
"ȏ",
"Ȑ",
"ȑ",
"Ȓ",
"ȓ",
"Ȕ",
"ȕ",
"Ȗ",
"ȗ",
"Ș",
"ș",
"Ț",
"ț",
"Ȝ",
"ȝ",
"Ȟ",
"ȟ",
"Ƞ",
"ȡ",
"Ȣ",
"ȣ",
"Ȥ",
"ȥ",
"Ȧ",
"ȧ",
"Ȩ",
"ȩ",
"Ȫ",
"ȫ",
"Ȭ",
"ȭ",
"Ȯ",
"ȯ",
"Ȱ",
"ȱ",
"Ȳ",
"ȳ",
"ȴ",
"ȵ",
"ȶ",
"ȷ",
"ȸ",
"ȹ",
"Ⱥ",
"Ȼ",
"ȼ",
"Ƚ",
"Ⱦ",
"ȿ",
"ɀ",
"Ɂ",
"ɂ",
"Ƀ",
"Ʉ",
"Ʌ",
"Ɇ",
"ɇ",
"Ɉ",
"ɉ",
"Ɋ",
"ɋ",
"Ɍ",
"ɍ",
"Ɏ",
"ɏ",
"!",
"\"",
"#",
"$",
"%",
"&",
"'",
"(",
")",
"*",
"+",
",",
"-",
".",
"/",
":",
";",
"<",
"=",
">",
"?",
"@",
"[",
"\\",
"]",
"_",
"`",
"{",
"|",
"}",
"^",
"~",
"©",
"®",
"℉",
"№",
"Ω",
"",
"™",
"∆",
"✓",
"✔",
"✗",
"✘",
"✕",
"☑",
"☒",
"●",
"▪",
"▫",
"◼",
"▶",
"◀",
"⬆",
"¤",
"¦",
"§",
"¨",
"ª",
"«",
"¬",
"¯",
"°",
"²",
"³",
"´",
"µ",
"¶",
"¸",
"¹",
"º",
"»",
"¼",
"½",
"¾",
"¿",
"×",
"",
"",
"",
"—",
"―",
"‖",
"‗",
"",
"",
"",
"",
"“",
"”",
"„",
"‟",
"†",
"‡",
"‣",
"",
"…",
"‧",
"‰",
"‴",
"",
"‶",
"‷",
"‸",
"",
"",
"※",
"‼",
"‽",
"‾",
"",
"₤",
"₡",
"₹",
"₽",
"₴",
"₿",
"¢",
"€",
"£",
"¥",
"",
"Ⅱ",
"Ⅲ",
"Ⅳ",
"",
"Ⅵ",
"Ⅶ",
"Ⅷ",
"Ⅸ",
"",
"Ⅺ",
"Ⅻ",
"",
"ⅱ",
"ⅲ",
"ⅳ",
"",
"ⅵ",
"ⅶ",
"ⅷ",
"ⅸ",
"",
"ⅺ",
"ⅻ",
"➀",
"➁",
"➂",
"➃",
"➄",
"➅",
"➆",
"➇",
"➈",
"➉",
"➊",
"➋",
"➌",
"➍",
"➎",
"➏",
"➐",
"➑",
"➒",
"➓",
"❶",
"❷",
"❸",
"❹",
"❺",
"❻",
"❼",
"❽",
"❾",
"❿",
"①",
"②",
"③",
"④",
"⑤",
"⑥",
"⑦",
"⑧",
"⑨",
"⑩",
"↑",
"→",
"↓",
"↕",
"←",
"↔",
"⇒",
"⇐",
"⇔",
"∀",
"∃",
"∄",
"∴",
"∵",
"∝",
"∞",
"∩",
"",
"∂",
"∫",
"∬",
"∭",
"∮",
"∯",
"∰",
"∑",
"∏",
"√",
"∛",
"∜",
"∱",
"∲",
"∳",
"",
"∷",
"",
"",
"",
"≈",
"≠",
"≡",
"≤",
"≥",
"⊂",
"⊃",
"⊥",
"⊾",
"⊿",
"□",
"∥",
"∋",
"ƒ",
"",
"″",
"À",
"Á",
"Â",
"Ã",
"Ä",
"Å",
"Æ",
"Ç",
"È",
"É",
"Ê",
"Ë",
"Ì",
"Í",
"Î",
"Ï",
"Ð",
"Ñ",
"Ò",
"Ó",
"Ô",
"Õ",
"Ö",
"Ø",
"Ù",
"Ú",
"Û",
"Ü",
"Ý",
"Þ",
"à",
"á",
"â",
"ã",
"ä",
"å",
"æ",
"ç",
"è",
"é",
"ê",
"ë",
"ì",
"í",
"î",
"ï",
"ð",
"ñ",
"ò",
"ó",
"ô",
"õ",
"ö",
"ø",
"ù",
"ú",
"û",
"ü",
"ý",
"þ",
"ÿ",
"Α",
"Β",
"Γ",
"Δ",
"Ε",
"Ζ",
"Η",
"Θ",
"Ι",
"Κ",
"Λ",
"Μ",
"Ν",
"Ξ",
"Ο",
"Π",
"Ρ",
"Σ",
"Τ",
"Υ",
"Φ",
"Χ",
"Ψ",
"Ω",
"α",
"β",
"γ",
"δ",
"ε",
"ζ",
"η",
"θ",
"ι",
"κ",
"λ",
"μ",
"ν",
"ξ",
"ο",
"π",
"ρ",
"σ",
"ς",
"τ",
"υ",
"φ",
"χ",
"ψ",
"ω",
"Å",
"ℏ",
"⌀",
"",
"⍵",
"𝑢",
"𝜓",
"",
"‥",
"︽",
"﹥",
"•",
"÷",
"",
"∙",
"⋅",
"·",
"±",
"∓",
"∟",
"∠",
"∡",
"∢",
"℧",
"☺"
]
}
}
File diff suppressed because one or more lines are too long
@@ -0,0 +1,881 @@
Global:
model_name: latin_PP-OCRv5_mobile_rec
Hpi:
backend_configs:
paddle_infer:
trt_dynamic_shapes: &id001
x:
- - 1
- 3
- 48
- 160
- - 1
- 3
- 48
- 320
- - 8
- 3
- 48
- 3200
tensorrt:
dynamic_shapes: *id001
PreProcess:
transform_ops:
- DecodeImage:
channel_first: false
img_mode: BGR
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
max_text_length: 1000
- RecResizeImg:
eval_mode: true
image_shape:
- 3
- 48
- 320
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
PostProcess:
name: CTCLabelDecode
character_dict:
- '0'
- '1'
- '2'
- '3'
- '4'
- '5'
- '6'
- '7'
- '8'
- '9'
- A
- B
- C
- D
- E
- F
- G
- H
- I
- J
- K
- L
- M
- N
- O
- P
- Q
- R
- S
- T
- U
- V
- W
- X
- Y
- Z
- a
- b
- c
- d
- e
- f
- g
- h
- i
- j
- k
- l
- m
- n
- o
- p
- q
- r
- s
- t
- u
- v
- w
- x
- y
- z
- À
- Á
- Â
- Ã
- Ä
- Å
- Æ
- Ç
- È
- É
- Ê
- Ë
- Ì
- Í
- Î
- Ï
- Ð
- Ñ
- Ò
- Ó
- Ô
- Õ
- Ö
- ×
- Ø
- Ù
- Ú
- Û
- Ü
- Ý
- Þ
- ß
- à
- á
- â
- ã
- ä
- å
- æ
- ç
- è
- é
- ê
- ë
- ì
- í
- î
- ï
- ð
- ñ
- ò
- ó
- ô
- õ
- ö
- ÷
- ø
- ù
- ú
- û
- ü
- ý
- þ
- ÿ
- Ā
- ā
- Ă
- ă
- Ą
- ą
- Ć
- ć
- Ĉ
- ĉ
- Ċ
- ċ
- Č
- č
- Ď
- ď
- Đ
- đ
- Ē
- ē
- Ĕ
- ĕ
- Ė
- ė
- Ę
- ę
- Ě
- ě
- Ĝ
- ĝ
- Ğ
- ğ
- Ġ
- ġ
- Ģ
- ģ
- Ĥ
- ĥ
- Ħ
- ħ
- Ĩ
- ĩ
- Ī
- ī
- Ĭ
- ĭ
- Į
- į
- İ
- ı
- IJ
- ij
- Ĵ
- ĵ
- Ķ
- ķ
- ĸ
- Ĺ
- ĺ
- Ļ
- ļ
- Ľ
- ľ
- Ŀ
- ŀ
- Ł
- ł
- Ń
- ń
- Ņ
- ņ
- Ň
- ň
- ʼn
- Ŋ
- ŋ
- Ō
- ō
- Ŏ
- ŏ
- Ő
- ő
- Œ
- œ
- Ŕ
- ŕ
- Ŗ
- ŗ
- Ř
- ř
- Ś
- ś
- Ŝ
- ŝ
- Ş
- ş
- Š
- š
- Ţ
- ţ
- Ť
- ť
- Ŧ
- ŧ
- Ũ
- ũ
- Ū
- ū
- Ŭ
- ŭ
- Ů
- ů
- Ű
- ű
- Ų
- ų
- Ŵ
- ŵ
- Ŷ
- ŷ
- Ÿ
- Ź
- ź
- Ż
- ż
- Ž
- ž
- ſ
- ƀ
- Ɓ
- Ƃ
- ƃ
- Ƅ
- ƅ
- Ɔ
- Ƈ
- ƈ
- Ɖ
- Ɗ
- Ƌ
- ƌ
- ƍ
- Ǝ
- Ə
- Ɛ
- Ƒ
- ƒ
- Ɠ
- Ɣ
- ƕ
- Ɩ
- Ɨ
- Ƙ
- ƙ
- ƚ
- ƛ
- Ɯ
- Ɲ
- ƞ
- Ɵ
- Ơ
- ơ
- Ƣ
- ƣ
- Ƥ
- ƥ
- Ʀ
- Ƨ
- ƨ
- Ʃ
- ƪ
- ƫ
- Ƭ
- ƭ
- Ʈ
- Ư
- ư
- Ʊ
- Ʋ
- Ƴ
- ƴ
- Ƶ
- ƶ
- Ʒ
- Ƹ
- ƹ
- ƺ
- ƻ
- Ƽ
- ƽ
- ƾ
- ƿ
- ǀ
- ǁ
- ǂ
- ǃ
- DŽ
- Dž
- dž
- LJ
- Lj
- lj
- NJ
- Nj
- nj
- Ǎ
- ǎ
- Ǐ
- ǐ
- Ǒ
- ǒ
- Ǔ
- ǔ
- Ǖ
- ǖ
- Ǘ
- ǘ
- Ǚ
- ǚ
- Ǜ
- ǜ
- ǝ
- Ǟ
- ǟ
- Ǡ
- ǡ
- Ǣ
- ǣ
- Ǥ
- ǥ
- Ǧ
- ǧ
- Ǩ
- ǩ
- Ǫ
- ǫ
- Ǭ
- ǭ
- Ǯ
- ǯ
- ǰ
- DZ
- Dz
- dz
- Ǵ
- ǵ
- Ƕ
- Ƿ
- Ǹ
- ǹ
- Ǻ
- ǻ
- Ǽ
- ǽ
- Ǿ
- ǿ
- Ȁ
- ȁ
- Ȃ
- ȃ
- Ȅ
- ȅ
- Ȇ
- ȇ
- Ȉ
- ȉ
- Ȋ
- ȋ
- Ȍ
- ȍ
- Ȏ
- ȏ
- Ȑ
- ȑ
- Ȓ
- ȓ
- Ȕ
- ȕ
- Ȗ
- ȗ
- Ș
- ș
- Ț
- ț
- Ȝ
- ȝ
- Ȟ
- ȟ
- Ƞ
- ȡ
- Ȣ
- ȣ
- Ȥ
- ȥ
- Ȧ
- ȧ
- Ȩ
- ȩ
- Ȫ
- ȫ
- Ȭ
- ȭ
- Ȯ
- ȯ
- Ȱ
- ȱ
- Ȳ
- ȳ
- ȴ
- ȵ
- ȶ
- ȷ
- ȸ
- ȹ
- Ⱥ
- Ȼ
- ȼ
- Ƚ
- Ⱦ
- ȿ
- ɀ
- Ɂ
- ɂ
- Ƀ
- Ʉ
- Ʌ
- Ɇ
- ɇ
- Ɉ
- ɉ
- Ɋ
- ɋ
- Ɍ
- ɍ
- Ɏ
- ɏ
- '!'
- '"'
- '#'
- $
- '%'
- '&'
- ''''
- (
- )
- '*'
- +
- ','
- '-'
- .
- /
- ':'
- ;
- <
- '='
- '>'
- '?'
- '@'
- '['
- \
- ']'
- _
- '`'
- '{'
- '|'
- '}'
- ^
- '~'
- ©
- ®
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ¤
- ¦
- §
- ¨
- ª
- «
- ¬
- ¯
- °
- ²
- ³
- ´
- µ
-
- ¸
- ¹
- º
- »
- ¼
- ½
- ¾
- ¿
- ×
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ¢
-
- £
- ¥
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ƒ
-
-
- À
- Á
- Â
- Ã
- Ä
- Å
- Æ
- Ç
- È
- É
- Ê
- Ë
- Ì
- Í
- Î
- Ï
- Ð
- Ñ
- Ò
- Ó
- Ô
- Õ
- Ö
- Ø
- Ù
- Ú
- Û
- Ü
- Ý
- Þ
- à
- á
- â
- ã
- ä
- å
- æ
- ç
- è
- é
- ê
- ë
- ì
- í
- î
- ï
- ð
- ñ
- ò
- ó
- ô
- õ
- ö
- ø
- ù
- ú
- û
- ü
- ý
- þ
- ÿ
- Α
- Β
- Γ
- Δ
- Ε
- Ζ
- Η
- Θ
- Ι
- Κ
- Λ
- Μ
- Ν
- Ξ
- Ο
- Π
- Ρ
- Σ
- Τ
- Υ
- Φ
- Χ
- Ψ
- Ω
- α
- β
- γ
- δ
- ε
- ζ
- η
- θ
- ι
- κ
- λ
- μ
- ν
- ξ
- ο
- π
- ρ
- σ
- ς
- τ
- υ
- φ
- χ
- ψ
- ω
-
-
-
-
-
- 𝑢
- 𝜓
-
-
-
-
-
- ÷
-
-
-
- ·
- ±
-
-
-
-
-
-
-
+17
View File
@@ -0,0 +1,17 @@
from __future__ import annotations
import os
import uvicorn
from paddle_ocr_service import app
def main() -> None:
host = os.getenv("OPENSCREEN_OCR_HOST", "127.0.0.1")
port = int(os.getenv("OPENSCREEN_OCR_PORT", "8866"))
uvicorn.run(app, host=host, port=port, log_level=os.getenv("OPENSCREEN_OCR_LOG_LEVEL", "warning"))
if __name__ == "__main__":
main()
+326
View File
@@ -0,0 +1,326 @@
from __future__ import annotations
import base64
import importlib.util
import os
import sys
import tempfile
from pathlib import Path
from threading import Lock
from typing import Any
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from starlette.concurrency import run_in_threadpool
app = FastAPI(title="OpenScreen PaddleOCR service")
_engines: dict[str, Any] = {}
_engine_lock = Lock()
class OcrRequest(BaseModel):
imageBase64: str | None = None
path: str | None = None
imagePath: str | None = None
language: str | None = None
@app.get("/health")
def health() -> dict[str, Any]:
return {
"ok": True,
"paddleocrInstalled": importlib.util.find_spec("paddleocr") is not None,
"paddleInstalled": importlib.util.find_spec("paddle") is not None,
"engineReady": bool(_engines),
"defaultLanguage": os.getenv("PADDLEOCR_LANG", "latin"),
}
@app.post("/ocr")
async def ocr(request: OcrRequest) -> dict[str, Any]:
image_path, should_delete = _resolve_image_path(request)
try:
engine = _get_engine(request.language)
blocks = await run_in_threadpool(_recognize_blocks, engine, image_path)
return {"blocks": blocks}
finally:
if should_delete:
Path(image_path).unlink(missing_ok=True)
def _resolve_image_path(request: OcrRequest) -> tuple[str, bool]:
path_value = request.path or request.imagePath
if path_value:
path = Path(path_value)
if not path.exists():
raise HTTPException(status_code=400, detail=f"Image path does not exist: {path}")
return str(path), False
if not request.imageBase64:
raise HTTPException(status_code=400, detail="Request must include imageBase64 or path.")
try:
image_bytes = base64.b64decode(request.imageBase64, validate=True)
except Exception as error:
raise HTTPException(status_code=400, detail="imageBase64 is invalid.") from error
handle = tempfile.NamedTemporaryFile(prefix="openscreen-ocr-", suffix=".png", delete=False)
try:
handle.write(image_bytes)
finally:
handle.close()
return handle.name, True
def _get_engine(language: str | None) -> Any:
paddle_lang = _resolve_paddle_language(language)
cache_key = f"{paddle_lang}|{os.getenv('PADDLEOCR_DEVICE', 'cpu')}"
with _engine_lock:
if cache_key not in _engines:
_engines[cache_key] = _create_engine(paddle_lang)
return _engines[cache_key]
def _create_engine(paddle_lang: str) -> Any:
try:
_patch_paddlex_frozen_ocr_extra_gate()
from paddleocr import PaddleOCR
except ImportError as error:
raise HTTPException(
status_code=503,
detail=(
"PaddleOCR is not installed. Run: "
"python -m pip install -r tools/ocr/requirements.txt"
),
) from error
device = os.getenv("PADDLEOCR_DEVICE", "cpu")
ocr_version = os.getenv("PADDLEOCR_VERSION", "PP-OCRv5")
modern_kwargs: dict[str, Any] = {
"lang": paddle_lang,
"ocr_version": ocr_version,
"device": device,
"enable_mkldnn": os.getenv("PADDLEOCR_ENABLE_MKLDNN", "0") == "1",
"use_doc_orientation_classify": False,
"use_doc_unwarping": False,
"use_textline_orientation": False,
}
if os.getenv("PADDLEOCR_USE_MOBILE", "1") != "0":
modern_kwargs.update(
{
"text_detection_model_name": "PP-OCRv5_mobile_det",
"text_recognition_model_name": _mobile_recognition_model(paddle_lang),
}
)
try:
return PaddleOCR(**modern_kwargs)
except TypeError:
legacy_lang = "en" if paddle_lang == "latin" else paddle_lang
return PaddleOCR(lang=legacy_lang, use_angle_cls=False, show_log=False)
def _patch_paddlex_frozen_ocr_extra_gate() -> None:
if not getattr(sys, "frozen", False):
return
try:
import paddlex.utils.deps as deps
except Exception:
return
if getattr(deps, "_openscreen_ocr_extra_patch", False):
return
original_is_extra_available = deps.is_extra_available
original_require_extra = deps.require_extra
def is_extra_available(extra: str) -> bool:
if extra in {"ocr", "ocr-core"}:
return True
return original_is_extra_available(extra)
def require_extra(extra: str, *, obj_name: str | None = None, alt: str | None = None) -> None:
if extra in {"ocr", "ocr-core"} or alt in {"ocr", "ocr-core"}:
return
original_require_extra(extra, obj_name=obj_name, alt=alt)
deps.is_extra_available = is_extra_available
deps.require_extra = require_extra
deps._openscreen_ocr_extra_patch = True
def _resolve_paddle_language(language: str | None) -> str:
explicit = os.getenv("PADDLEOCR_LANG")
if explicit:
return explicit
language_value = (language or "vi,en").lower()
if "vi" in language_value or "latin" in language_value:
return "latin"
if "en" in language_value:
return "en"
return language_value.split(",")[0].strip() or "latin"
def _mobile_recognition_model(paddle_lang: str) -> str:
if paddle_lang == "en":
return "en_PP-OCRv5_mobile_rec"
if paddle_lang == "latin":
return "latin_PP-OCRv5_mobile_rec"
return "PP-OCRv5_mobile_rec"
def _recognize_blocks(engine: Any, image_path: str) -> list[dict[str, Any]]:
if hasattr(engine, "predict"):
result = engine.predict(image_path)
blocks = _blocks_from_v3_result(result)
if blocks:
return blocks
result = engine.ocr(image_path, cls=False)
return _blocks_from_legacy_result(result)
def _blocks_from_v3_result(result: Any) -> list[dict[str, Any]]:
blocks: list[dict[str, Any]] = []
for item in _as_list(result):
data = _result_to_dict(item)
if not data:
continue
texts = _as_list(_first_present(data, ("rec_texts", "texts")))
scores = _as_list(_first_present(data, ("rec_scores", "scores")))
boxes = _as_list(_first_present(data, ("rec_boxes", "rec_polys", "dt_polys")))
for index, text_value in enumerate(texts):
text = str(text_value).strip()
if not text:
continue
box = _box_to_rect(boxes[index] if index < len(boxes) else None)
if not box:
continue
blocks.append(
{
"text": text,
"confidence": _score_to_float(scores[index] if index < len(scores) else None),
"box": box,
}
)
return blocks
def _first_present(data: dict[str, Any], keys: tuple[str, ...]) -> Any:
for key in keys:
if key in data and data[key] is not None:
return data[key]
return None
def _blocks_from_legacy_result(result: Any) -> list[dict[str, Any]]:
blocks: list[dict[str, Any]] = []
_collect_legacy_blocks(result, blocks)
return blocks
def _collect_legacy_blocks(value: Any, blocks: list[dict[str, Any]]) -> None:
if not isinstance(value, (list, tuple)):
return
if len(value) >= 2 and _looks_like_box(value[0]):
rec = value[1]
if isinstance(rec, (list, tuple)) and rec:
text = str(rec[0]).strip()
if text:
box = _box_to_rect(value[0])
if box:
blocks.append(
{
"text": text,
"confidence": _score_to_float(rec[1] if len(rec) > 1 else None),
"box": box,
}
)
return
for item in value:
_collect_legacy_blocks(item, blocks)
def _result_to_dict(item: Any) -> dict[str, Any]:
if isinstance(item, dict):
data = item
elif hasattr(item, "res") and isinstance(item.res, dict):
data = item.res
elif hasattr(item, "to_dict"):
data = item.to_dict()
elif hasattr(item, "json") and isinstance(item.json, dict):
data = item.json
elif hasattr(item, "__dict__"):
data = dict(item.__dict__)
else:
return {}
nested = data.get("res")
return nested if isinstance(nested, dict) else data
def _as_list(value: Any) -> list[Any]:
if value is None:
return []
if hasattr(value, "tolist"):
return value.tolist()
if isinstance(value, list):
return value
if isinstance(value, tuple):
return list(value)
return [value]
def _looks_like_box(value: Any) -> bool:
box = _as_list(value)
if len(box) == 4 and all(_is_number(item) for item in box):
return True
return bool(box) and all(isinstance(point, (list, tuple)) for point in box)
def _box_to_rect(value: Any) -> dict[str, float] | None:
if value is None:
return None
box = _as_list(value)
if len(box) == 4 and all(_is_number(item) for item in box):
left, top, right, bottom = [float(item) for item in box]
return _rect(left, top, right, bottom)
points = [_as_list(point) for point in box]
coordinates = [
(float(point[0]), float(point[1]))
for point in points
if len(point) >= 2 and _is_number(point[0]) and _is_number(point[1])
]
if not coordinates:
return None
xs = [point[0] for point in coordinates]
ys = [point[1] for point in coordinates]
return _rect(min(xs), min(ys), max(xs), max(ys))
def _rect(left: float, top: float, right: float, bottom: float) -> dict[str, float] | None:
width = max(0.0, right - left)
height = max(0.0, bottom - top)
if width == 0 or height == 0:
return None
return {"x": left, "y": top, "width": width, "height": height}
def _score_to_float(value: Any) -> float:
try:
score = float(value)
except (TypeError, ValueError):
return 0.5
return max(0.0, min(1.0, score / 100 if score > 1 else score))
def _is_number(value: Any) -> bool:
return isinstance(value, (int, float)) and not isinstance(value, bool)
+10
View File
@@ -0,0 +1,10 @@
fastapi>=0.115
uvicorn[standard]>=0.30
pillow>=10
paddlepaddle>=3.0
paddlex[ocr]>=3.5,<3.6
paddleocr>=3.0
pyclipper>=1.3
pypdfium2>=4
python-bidi>=0.6
shapely>=2.0