Initial OpenScreen import
CI / Lint (push) Has been cancelled
CI / Type Check (push) Has been cancelled
CI / Test (push) Has been cancelled
CI / Build (push) Has been cancelled
Bump Nix package on release / bump (release) Has been cancelled
Update Homebrew Cask / update-cask (release) Has been cancelled

This commit is contained in:
huanld
2026-05-29 08:31:04 +07:00
commit 1073b0c214
439 changed files with 79711 additions and 0 deletions
+935
View File
@@ -0,0 +1,935 @@
# Quy trình triển khai Auto User Guide Generation
Mục tiêu của tính năng này là biến OpenScreen từ công cụ quay màn hình thành công cụ tự tạo tài liệu hướng dẫn sử dụng phần mềm. Người dùng bật Guide Mode, quay thao tác như bình thường, hệ thống ghi lại thời điểm click hoặc hotkey, trích ảnh từ video sau khi quay xong, chạy OCR local để đọc chữ trên giao diện, sau đó dùng AI tạo bản nháp hướng dẫn từng bước.
Tài liệu này được viết để có thể bắt đầu coding ngay: có kiến trúc, schema, file cần thêm/sửa, thứ tự task, tiêu chí test và định nghĩa MVP.
## Trạng Thái MVP Hiện Tại
- Đã có Guide Mode trong HUD, ghi click/marker vào `.guide.json`.
- Đã có GuidePanel trong editor để chạy: prepare events, capture snapshots, OCR, generate draft, export Markdown/HTML.
- Đã có local deterministic draft để test không cần DeepSeek key.
- DeepSeek được gọi khi chọn provider `DeepSeek` và có `DEEPSEEK_API_KEY`.
- OCR local mặc định gọi `OPENSCREEN_GUIDE_OCR_URL` hoặc `http://127.0.0.1:8866/ocr`.
- Verification hiện tại: targeted guide tests pass, `npm test` pass, `npm run build-vite` pass, `npm run i18n:check` pass.
## Mục Tiêu Sản Phẩm
Flow người dùng:
1. Bật Guide Mode.
2. Quay màn hình phần mềm cần hướng dẫn.
3. Trong lúc quay, hệ thống tự ghi timestamp các click chuột.
4. Người dùng có thể bấm một hotkey/nút marker nếu muốn đánh dấu bước thủ công.
5. Sau khi dừng quay, hệ thống trích ảnh màn hình từ video tại các timestamp đó.
6. OCR local đọc text trên ảnh giao diện.
7. Hệ thống map vị trí click tới text/control gần nhất.
8. AI Agent tạo tài liệu dạng từng bước.
9. Người dùng review, sửa nội dung, export Markdown/HTML.
Ví dụ output:
```md
# Hướng dẫn xuất báo cáo
## Bước 1: Mở phần cài đặt
Nhấn nút **Settings** ở thanh điều hướng bên trái.
## Bước 2: Chọn Export
Trong màn hình Settings, chọn **Export report**.
```
## Phạm Vi MVP
MVP cần làm:
- Bật/tắt Guide Mode trước khi quay.
- Tận dụng recorder hiện tại, không viết recorder mới.
- Tận dụng `.cursor.json` hiện tại để lấy click timestamp.
- Thêm marker bằng hotkey hoặc nút trên HUD.
- Tạo sidecar `.guide.json` riêng cho guide.
- Trích screenshot sau khi quay xong, từ video đã lưu.
- OCR local bằng PaddleOCR service.
- Tạo step candidate từ click position + OCR blocks.
- Gọi DeepSeek bằng text metadata, không gửi ảnh mặc định.
- Có panel review trong editor.
- Export Markdown và HTML.
Không làm trong MVP:
- Không chụp screenshot realtime trong lúc quay nếu chưa có benchmark cần thiết.
- Không gửi raw screenshot lên cloud AI mặc định.
- Không sửa schema `.cursor.json` nếu không bắt buộc.
- Không build full UI automation engine.
- Không làm PDF/DOCX ngay.
- Không bundle OCR runtime vào app packaged ngay.
## Code Hiện Có Cần Tận Dụng
Các điểm đã có trong codebase:
- Recording orchestration: `src/hooks/useScreenRecorder.ts`
- Launch/HUD UI: `src/components/launch/LaunchWindow.tsx`
- Source selection: `src/components/launch/SourceSelector.tsx`
- Editor chính: `src/components/video-editor/VideoEditor.tsx`
- Project/session persistence: `src/components/video-editor/projectPersistence.ts`
- Cursor contracts: `src/native/contracts.ts`
- Hook đọc cursor data: `src/native/hooks/useCursorRecordingData.ts`
- IPC main handlers: `electron/ipc/handlers.ts`
- Native bridge: `electron/ipc/nativeBridge.ts`
- Cursor service: `electron/native-bridge/services/cursorService.ts`
- Windows cursor recording: `electron/native-bridge/cursor/recording/windowsNativeRecordingSession.ts`
- macOS cursor recording: `electron/native-bridge/cursor/recording/macNativeCursorRecordingSession.ts`
- Frame/export primitives: `src/lib/exporter/frameRenderer.ts`
Nhận định kỹ thuật:
- Windows/macOS native cursor recording đã có dữ liệu click.
- Cursor sample hiện có thể có `interactionType: "click" | "mouseup" | "move"`.
- Editor hiện đã dùng click timestamp để render hiệu ứng click.
- Vì schema cursor đang được nhiều nơi dùng, MVP nên tạo `.guide.json` riêng thay vì mở rộng `.cursor.json`.
## Kiến Trúc Tổng Thể
```mermaid
flowchart TD
A["User bật Guide Mode"] --> B["Quay video bằng recorder hiện tại"]
B --> C["Cursor recorder ghi click timestamp"]
B --> D["Hotkey/HUD marker ghi manual event"]
C --> E["Dừng quay"]
D --> E
E --> F["Guide assembler tạo .guide.json"]
F --> G["Snapshot extractor seek video và xuất PNG"]
G --> H["PaddleOCR local đọc text + bounding boxes"]
H --> I["Target mapper map click tới OCR text/control"]
I --> J["DeepSeek/local LLM viết draft guide"]
J --> K["GuidePanel cho user review/sửa"]
K --> L["Export Markdown/HTML"]
```
Quyết định chính:
- Realtime recording chỉ ghi event/timestamp, không xử lý OCR/AI.
- Screenshot được trích từ video sau khi quay, tránh ảnh hưởng performance recorder.
- OCR chạy local-first.
- DeepSeek chỉ nhận text metadata trừ khi user opt-in gửi ảnh.
- Guide data nằm cạnh recording artifact.
## File Cần Thêm
```text
src/guide/
contracts.ts
eventBuilder.ts
targetMapper.ts
promptBuilder.ts
generatedGuideSchema.ts
snapshot/
extractGuideSnapshots.ts
export/
markdownExporter.ts
htmlExporter.ts
__tests__/
eventBuilder.test.ts
targetMapper.test.ts
promptBuilder.test.ts
markdownExporter.test.ts
src/components/video-editor/guide/
GuidePanel.tsx
GuideStepList.tsx
GuideStepEditor.tsx
GuideSnapshotPreview.tsx
electron/guide/
guideStore.ts
guidePaths.ts
guideIpc.ts
ocr/
paddleOcrClient.ts
ai/
deepseekGuideClient.ts
```
File hiện có khả năng phải sửa:
- `src/hooks/useScreenRecorder.ts`
- `src/components/launch/LaunchWindow.tsx`
- `src/components/video-editor/VideoEditor.tsx`
- `electron/ipc/handlers.ts`
- `electron/preload.ts`
- file khai báo type cho `window.electronAPI`
- `package.json` nếu thêm script test hoặc dependency nhỏ
## Artifact Đầu Ra
Với video `recording-123.mp4`, hệ thống tạo:
```text
recording-123.mp4
recording-123.cursor.json
recording-123.guide.json
recording-123-guide/
step-001.png
step-002.png
ocr.json
guide.md
guide.html
```
Quy tắc:
- `.cursor.json` vẫn là dữ liệu cursor gốc.
- `.guide.json` là source of truth cho guide workflow.
- Folder `recording-123-guide/` chứa file phát sinh từ guide.
- `guide.md``guide.html` có thể được tạo lại từ `.guide.json`.
## Contract Chính
Tạo `src/guide/contracts.ts`.
```ts
export type GuideEventKind = "click" | "hotkey" | "manual";
export type GuideEventSource =
| "cursor-recording"
| "guide-hotkey"
| "review-ui";
export interface GuideEvent {
id: string;
recordingId: string;
kind: GuideEventKind;
source: GuideEventSource;
timeMs: number;
x?: number;
y?: number;
normalizedX?: number;
normalizedY?: number;
button?: "left" | "right" | "middle" | "unknown";
label?: string;
screenshotOffsetMs?: number;
createdAt: string;
}
export interface GuideSnapshot {
id: string;
eventId: string;
timeMs: number;
offsetMs: number;
path: string;
width: number;
height: number;
}
export interface OcrBlock {
id: string;
snapshotId: string;
text: string;
confidence: number;
box: {
x: number;
y: number;
width: number;
height: number;
};
}
export interface GuideStepCandidate {
id: string;
eventId: string;
snapshotId?: string;
timeMs: number;
action: "click" | "choose" | "type" | "wait" | "manual";
targetText?: string;
targetRole?: "button" | "menu" | "tab" | "field" | "link" | "unknown";
nearbyText: string[];
confidence: number;
}
export interface GeneratedGuideStep {
id: string;
order: number;
title: string;
instruction: string;
screenshotPath?: string;
sourceCandidateId?: string;
}
export interface GeneratedGuide {
title: string;
summary?: string;
steps: GeneratedGuideStep[];
}
export interface GuideSession {
schemaVersion: 1;
recordingId: string;
videoPath: string;
cursorPath?: string;
guidePath: string;
outputDir: string;
status:
| "recording"
| "events-ready"
| "snapshots-ready"
| "ocr-ready"
| "draft-ready"
| "reviewed";
events: GuideEvent[];
snapshots: GuideSnapshot[];
ocrBlocks: OcrBlock[];
candidates: GuideStepCandidate[];
generatedGuide?: GeneratedGuide;
createdAt: string;
updatedAt: string;
}
```
Quy tắc dữ liệu:
- `timeMs` luôn tính theo timeline video cuối cùng.
- `x/y` là tọa độ pixel nếu có.
- `normalizedX/Y` dùng để chống lệch khi video scale.
- `screenshotOffsetMs` mặc định `500`, nghĩa là lấy ảnh sau click 0.5 giây để bắt trạng thái UI sau thao tác.
- AI output chỉ là draft, user edit mới là nội dung cuối.
## IPC Cần Thêm
MVP dùng app-level Electron IPC, không cần đưa vào native bridge vì đây là workflow cấp ứng dụng.
Preload API đề xuất:
```ts
window.electronAPI.guide = {
startSession(recordingId: string): Promise<GuideSession>;
addMarker(input: AddGuideMarkerInput): Promise<GuideEvent>;
finalizeEvents(input: FinalizeGuideEventsInput): Promise<GuideSession>;
writeSnapshot(input: WriteGuideSnapshotInput): Promise<GuideSnapshot>;
runOcr(input: RunGuideOcrInput): Promise<GuideSession>;
generateDraft(input: GenerateGuideDraftInput): Promise<GuideSession>;
saveGuide(input: SaveGuideInput): Promise<GuideSession>;
exportMarkdown(input: ExportGuideInput): Promise<{ path: string }>;
exportHtml(input: ExportGuideInput): Promise<{ path: string }>;
};
```
Input types:
```ts
export interface AddGuideMarkerInput {
recordingId: string;
timeMs: number;
kind: "hotkey" | "manual";
label?: string;
}
export interface FinalizeGuideEventsInput {
recordingId: string;
videoPath: string;
cursorPath?: string;
}
export interface WriteGuideSnapshotInput {
recordingId: string;
eventId: string;
timeMs: number;
offsetMs: number;
pngBytes: ArrayBuffer;
width: number;
height: number;
}
export interface RunGuideOcrInput {
recordingId: string;
snapshotIds?: string[];
}
export interface GenerateGuideDraftInput {
recordingId: string;
language: "vi" | "en";
provider: "deepseek" | "local";
}
export interface SaveGuideInput {
recordingId: string;
generatedGuide: GeneratedGuide;
}
export interface ExportGuideInput {
recordingId: string;
}
```
## Phase 1: Contracts, Store, IPC
Mục tiêu: tạo khung lưu trữ `.guide.json` mà chưa đụng recorder.
Task coding:
1. Tạo `src/guide/contracts.ts`.
2. Tạo `electron/guide/guidePaths.ts`.
3. Tạo `electron/guide/guideStore.ts`.
4. Tạo `electron/guide/guideIpc.ts`.
5. Register guide IPC trong `electron/ipc/handlers.ts`.
6. Expose API trong `electron/preload.ts`.
7. Bổ sung type cho `window.electronAPI.guide`.
Yêu cầu kỹ thuật:
- Ghi file atomically: write temp file rồi rename.
- Validate `schemaVersion`.
- Không throw raw error ra renderer, trả error code ổn định.
- Không yêu cầu AI/OCR trong phase này.
Acceptance:
- Tạo được guide session fake bằng IPC.
- Đọc/ghi `.guide.json` round-trip không mất dữ liệu.
- Input thiếu `recordingId` hoặc `videoPath` bị reject rõ ràng.
Test:
- `guideStore` tạo path đúng.
- `guideStore` đọc file lỗi schema và trả error.
- IPC handler reject input thiếu field.
## Phase 2: Build Event Từ Cursor Click
Mục tiêu: lấy click event từ `.cursor.json` hiện tại.
Task coding:
1. Tạo `src/guide/eventBuilder.ts`.
2. Thêm hàm `buildGuideEventsFromCursor`.
3. Lọc sample có `interactionType === "click"`.
4. Convert sang `GuideEvent`.
5. De-duplicate click trong cửa sổ `250ms`.
6. Sort theo `timeMs`.
7. Merge với marker thủ công nếu có.
Pseudo-code:
```ts
export function buildGuideEventsFromCursor(input: {
recordingId: string;
samples: CursorRecordingSample[];
videoWidth?: number;
videoHeight?: number;
}): GuideEvent[] {
const events = input.samples
.filter((sample) => sample.interactionType === "click")
.map((sample) => ({
id: createGuideEventId(input.recordingId, sample.timeMs),
recordingId: input.recordingId,
kind: "click" as const,
source: "cursor-recording" as const,
timeMs: sample.timeMs,
x: sample.cx,
y: sample.cy,
normalizedX: normalize(sample.cx, input.videoWidth),
normalizedY: normalize(sample.cy, input.videoHeight),
button: "left" as const,
screenshotOffsetMs: 500,
createdAt: new Date().toISOString(),
}));
return sortGuideEvents(dedupeGuideEvents(events));
}
```
Acceptance:
- 5 click samples tạo 5 guide events.
- `move``mouseup` không tạo step.
- Double click hoặc click bounce không tạo quá nhiều step nếu nằm trong dedupe window.
- Không có cursor click thì vẫn dùng được manual marker.
Test:
- convert click sample.
- bỏ qua move/mouseup.
- dedupe theo thời gian.
- sort đúng thứ tự.
- xử lý sample thiếu tọa độ.
## Phase 3: Guide Mode UI Và Manual Marker
Mục tiêu: user bật được Guide Mode và đánh dấu bước thủ công.
Task coding:
1. Thêm Guide Mode toggle trong `LaunchWindow.tsx`.
2. Truyền trạng thái guide vào flow recording trong `useScreenRecorder.ts`.
3. Khi start recording và Guide Mode on, gọi `guide.startSession(recordingId)`.
4. Thêm nút marker trong HUD.
5. Thêm global hotkey ở Electron main, ví dụ `CommandOrControl+Shift+G`.
6. Khi bấm marker/hotkey, gọi `guide.addMarker`.
7. Khi stop recording, gọi `guide.finalizeEvents`.
Lưu ý:
- Global hotkey phải nằm ở Electron main vì app đang được quay có thể đang focus.
- Nếu register hotkey fail, UI vẫn dùng nút marker.
- Không làm thay đổi behavior khi Guide Mode off.
Acceptance:
- Guide Mode off: quay/sửa/export vẫn như cũ.
- Guide Mode on: stop recording tạo `.guide.json`.
- Hotkey tạo event đúng timestamp.
- Cancel recording không để lại guide artifact rác.
## Phase 4: Snapshot Extraction
Mục tiêu: trích ảnh PNG cho từng event sau khi quay xong.
Quyết định MVP:
- Không chụp realtime trong lúc quay.
- Dùng video đã lưu, seek tới timestamp cần lấy.
- Thực hiện trong renderer/editor bằng hidden `<video>` + `<canvas>`.
- Persist PNG qua IPC.
Task coding:
1. Tạo `src/guide/snapshot/extractGuideSnapshots.ts`.
2. Nhận `GuideSession` + `videoPath`.
3. Với mỗi event, lấy timestamp `event.timeMs + screenshotOffsetMs`.
4. Clamp timestamp vào duration video.
5. Seek hidden video tới timestamp.
6. Draw frame vào canvas.
7. Convert canvas thành PNG bytes.
8. Gọi `guide.writeSnapshot`.
9. Update `.guide.json` với danh sách snapshots.
Acceptance:
- Mỗi event có một ảnh `step-xxx.png`.
- Nếu một event fail snapshot, các event khác vẫn chạy.
- Ảnh được lưu trong `recording-123-guide/`.
- UI báo lỗi recoverable, không crash editor.
Test:
- clamp timestamp.
- tên file đúng thứ tự.
- handle seek timeout.
- không abort toàn bộ batch khi một frame lỗi.
## Phase 5: OCR Local
Mục tiêu: đọc text trên giao diện phần mềm từ screenshot.
Khuyến nghị:
- Dùng PaddleOCR làm OCR chính.
- Tesseract chỉ nên là fallback đơn giản.
- VLM local như Gemma 3 4B/MiniCPM/Qwen-VL chỉ dùng cho trường hợp icon/no-text khó, không dùng làm OCR chính.
Kiến trúc MVP:
- Chạy PaddleOCR như local HTTP service tại `127.0.0.1:8866`.
- Electron main gọi OCR service.
- Renderer không gọi OCR trực tiếp.
API local OCR đề xuất:
```http
GET /health
POST /ocr
Content-Type: application/json
{
"imagePath": "D:\\Code\\OpenScreen\\recording-123-guide\\step-001.png",
"language": "vi,en"
}
```
Response:
```json
{
"blocks": [
{
"text": "Settings",
"confidence": 0.97,
"box": { "x": 120, "y": 80, "width": 90, "height": 24 }
}
]
}
```
Task coding:
1. Tạo `electron/guide/ocr/paddleOcrClient.ts`.
2. Thêm health check.
3. Gọi OCR cho từng snapshot.
4. Convert output sang `OcrBlock`.
5. Ghi `ocrBlocks` vào `.guide.json`.
6. Ghi bản tổng hợp vào `recording-123-guide/ocr.json`.
Config đề xuất:
```ts
export interface GuideOcrConfig {
provider: "paddleocr";
baseUrl: string; // default http://127.0.0.1:8866
language: string; // default vi,en
timeoutMs: number; // default 30000
}
```
Acceptance:
- OCR service offline thì UI báo lỗi rõ ràng.
- OCR fail không xóa snapshots.
- OCR result có text, confidence, bounding box.
- Guide vẫn export thủ công được nếu OCR không chạy.
## Phase 6: Target Mapper
Mục tiêu: xác định user đã click vào nút/menu/field nào dựa trên tọa độ click và OCR.
Task coding:
1. Tạo `src/guide/targetMapper.ts`.
2. Với mỗi `GuideEvent`, lấy snapshot tương ứng.
3. Lấy OCR blocks của snapshot đó.
4. Score từng OCR block.
5. Chọn target tốt nhất.
6. Sinh `GuideStepCandidate`.
Scoring đề xuất:
- `+100` nếu click nằm trong OCR box.
- Điểm cao hơn nếu box center gần click hơn.
- Cộng điểm nếu text ngắn, giống label button/menu.
- Trừ điểm nếu confidence thấp.
- Nếu không có block đủ tốt, để `targetRole: "unknown"`.
Role heuristic:
- `button`: click vào/near text dạng action label.
- `menu`: text nằm trong danh sách dọc.
- `tab`: text nằm trong hàng ngang gần đầu giao diện.
- `field`: click vào vùng giống input.
- `unknown`: không đủ tự tin.
Acceptance:
- Click trực tiếp vào nút text map đúng target text.
- Click gần label map được OCR block gần nhất.
- Click vùng icon/no-text tạo candidate confidence thấp để user review.
Test:
- click inside box.
- nearest box.
- low-confidence penalty.
- no OCR fallback.
## Phase 7: AI Draft Generation
Mục tiêu: tạo bản nháp hướng dẫn từ candidate metadata.
Provider MVP:
- DeepSeek API cho cloud text generation.
- Local LLM có thể thêm sau qua cùng prompt contract.
- Không gửi ảnh lên DeepSeek mặc định.
Task coding:
1. Tạo `src/guide/promptBuilder.ts`.
2. Tạo `electron/guide/ai/deepseekGuideClient.ts`.
3. Đọc API key ở Electron main qua env/config.
4. Build prompt từ candidates + OCR nearby text.
5. Yêu cầu output JSON.
6. Validate output.
7. Ghi `generatedGuide` vào `.guide.json`.
Env:
```powershell
$env:DEEPSEEK_API_KEY="..."
$env:DEEPSEEK_BASE_URL="https://api.deepseek.com"
$env:DEEPSEEK_MODEL="deepseek-v4-flash"
```
Prompt input:
```json
{
"language": "vi",
"softwareContext": {
"recordingName": "recording-123",
"userGoal": "Tạo báo cáo"
},
"steps": [
{
"order": 1,
"eventKind": "click",
"targetText": "Settings",
"targetRole": "button",
"nearbyText": ["Home", "Settings", "Account"],
"confidence": 0.91
}
]
}
```
Expected AI output:
```json
{
"title": "Hướng dẫn thao tác",
"summary": "Tài liệu này mô tả các bước thực hiện thao tác đã ghi hình.",
"steps": [
{
"order": 1,
"title": "Mở phần cài đặt",
"instruction": "Nhấn nút Settings ở thanh điều hướng bên trái."
}
]
}
```
Acceptance:
- Thiếu API key thì UI báo lỗi rõ ràng.
- AI trả invalid JSON thì reject và cho retry.
- Output được validate trước khi lưu.
- Có thể generate tiếng Việt.
Test:
- promptBuilder không đưa raw image vào prompt.
- parser reject JSON sai schema.
- DeepSeek client handle timeout/401/rate limit.
## Phase 8: GuidePanel Review UI
Mục tiêu: người dùng sửa được guide trước khi export.
Task coding:
1. Tạo `src/components/video-editor/guide/GuidePanel.tsx`.
2. Tạo `GuideStepList.tsx`.
3. Tạo `GuideStepEditor.tsx`.
4. Tạo `GuideSnapshotPreview.tsx`.
5. Mount panel trong `VideoEditor.tsx`.
6. Load `.guide.json` khi mở video có guide sidecar.
7. Thêm action:
- Generate snapshots.
- Run OCR.
- Generate AI draft.
- Save edits.
- Export Markdown.
- Export HTML.
UX:
- AI output là draft, không khóa nội dung.
- User sửa title/instruction từng step.
- User xóa step nhiễu.
- User merge step sau nếu cần.
- Confidence hiển thị nhỏ, không làm UI rối.
- Guide fail không ảnh hưởng video editing.
Acceptance:
- User sửa step và save được.
- User xóa step được.
- Regenerate cần confirm nếu đang có manual edits.
- Export dùng nội dung đã sửa, không dùng lại AI raw output.
## Phase 9: Export Markdown/HTML
Mục tiêu: tạo tài liệu dùng được ngay.
Task coding:
1. Tạo `src/guide/export/markdownExporter.ts`.
2. Tạo `src/guide/export/htmlExporter.ts`.
3. Gọi exporter từ Electron IPC.
4. Ghi file vào `recording-123-guide/guide.md`.
5. Ghi file vào `recording-123-guide/guide.html`.
6. Dùng relative screenshot link.
Markdown format:
```md
# Hướng dẫn thao tác
Tài liệu này mô tả các bước thực hiện thao tác đã ghi hình.
## Bước 1: Mở phần cài đặt
Nhấn nút **Settings** ở thanh điều hướng bên trái.
![Bước 1](step-001.png)
```
Acceptance:
- Markdown mở được và thấy ảnh local.
- HTML mở được bằng browser.
- Export vẫn chạy nếu guide được viết thủ công, không cần AI.
## Thứ Tự Coding Ngay
Nên làm theo thứ tự này để giảm rủi ro:
1. Tạo `src/guide/contracts.ts`.
2. Tạo `electron/guide/guidePaths.ts`.
3. Tạo `electron/guide/guideStore.ts`.
4. Tạo `electron/guide/guideIpc.ts`.
5. Expose `window.electronAPI.guide`.
6. Viết unit test cho guide store.
7. Tạo `src/guide/eventBuilder.ts`.
8. Viết unit test convert cursor samples sang guide events.
9. Thêm Guide Mode toggle vào launch UI.
10. Gọi `startSession` khi bắt đầu quay.
11. Gọi `finalizeEvents` khi dừng quay.
12. Tạo snapshot extractor trong renderer.
13. Tạo `paddleOcrClient`.
14. Tạo `targetMapper`.
15. Tạo `promptBuilder`.
16. Tạo `deepseekGuideClient`.
17. Tạo `GuidePanel`.
18. Tạo Markdown/HTML exporters.
19. Chạy lint/test/build.
20. Test thủ công flow đầy đủ.
Chia PR đề xuất:
- PR 1: contracts, store, IPC, unit tests.
- PR 2: cursor-click event builder, Guide Mode toggle, manual marker.
- PR 3: snapshot extraction và GuidePanel shell.
- PR 4: PaddleOCR integration và target mapping.
- PR 5: DeepSeek generation, review UI, Markdown/HTML export.
## Error Codes
Dùng error code ổn định để UI xử lý:
```ts
export type GuideErrorCode =
| "guide-session-not-found"
| "guide-invalid-schema"
| "guide-video-load-failed"
| "guide-snapshot-failed"
| "guide-ocr-unavailable"
| "guide-ocr-failed"
| "guide-ai-key-missing"
| "guide-ai-request-failed"
| "guide-ai-invalid-output"
| "guide-export-failed";
```
Quy tắc:
- IPC không throw raw provider error ra renderer.
- OCR fail là recoverable.
- AI fail là recoverable.
- Export fail phải giữ nguyên `.guide.json`.
## Local Development
Baseline:
```powershell
npm install
npm run lint
npm test
npm run build-vite
```
OCR service dev:
```powershell
python -m venv .venv-ocr
.venv-ocr\Scripts\Activate.ps1
pip install paddleocr fastapi uvicorn
uvicorn local_ocr_service:app --host 127.0.0.1 --port 8866
```
DeepSeek env:
```powershell
$env:DEEPSEEK_API_KEY="..."
$env:DEEPSEEK_BASE_URL="https://api.deepseek.com"
$env:DEEPSEEK_MODEL="deepseek-v4-flash"
```
Không commit API key.
## Testing Matrix
Unit tests:
- `eventBuilder`: cursor sample -> guide events.
- `targetMapper`: OCR blocks -> step candidates.
- `promptBuilder`: candidates -> AI prompt.
- `markdownExporter`: generated guide -> Markdown.
- `htmlExporter`: generated guide -> HTML.
Renderer/browser tests:
- snapshot extractor seek video fixture.
- GuidePanel edit/delete/save step.
Manual integration:
1. Quay với Guide Mode off, xác nhận behavior cũ không đổi.
2. Quay với Guide Mode on và 3 click.
3. Kiểm tra `.guide.json` có 3 click events.
4. Generate snapshots.
5. Run OCR local.
6. Generate Vietnamese draft.
7. Sửa một step.
8. Export Markdown.
9. Mở Markdown/HTML xem ảnh local.
10. Tắt OCR service và test lỗi recoverable.
11. Xóa DeepSeek key và test lỗi recoverable.
Lệnh trước khi merge:
```powershell
npm run lint
npm test
npm run build-vite
```
Nếu phase không sửa native recorder thì chưa cần chạy native helper tests.
## Definition Of Done Cho MVP
MVP được xem là xong khi:
- Guide Mode bật/tắt được.
- Guide Mode off không ảnh hưởng recording hiện tại.
- Click events lấy được từ cursor telemetry hiện có.
- Hotkey/HUD marker tạo event thủ công.
- `.guide.json` được tạo cạnh recording.
- Snapshot PNG được trích từ final video.
- PaddleOCR local đọc được text và bounding boxes.
- Target mapper tạo step candidates.
- DeepSeek tạo được draft tiếng Việt từ text metadata.
- User review/sửa/xóa step được.
- Export Markdown/HTML dùng nội dung đã review.
- Lint/test/build pass.
## Nâng Cấp Sau MVP
- Export PDF/DOCX.
- Bundle local OCR runtime vào packaged app.
- Thêm local VLM fallback cho icon-only control.
- Cho phép user opt-in gửi crop ảnh lên remote vision model.
- Merge/dedupe step thông minh hơn cho double click/menu navigation.
- Dùng transcript giọng nói làm ngữ cảnh thêm.
- Template theo từng loại phần mềm.
- Computer vision detect UI element ngoài OCR.
@@ -0,0 +1,210 @@
# macOS Native Recorder Roadmap
OpenScreen's macOS recorder should follow the same architecture boundaries as the Windows native recorder: Electron owns session orchestration and persistence, while a platform-native helper owns capture, timing, encoding, and platform-specific permissions.
This work is intentionally scoped as a macOS-only port. Windows native capture remains owned by the WGC helper, and Linux remains on the existing Electron path.
## Goals
- Capture displays and windows through ScreenCaptureKit.
- Exclude the real system cursor during capture when using the editable OpenScreen cursor overlay.
- Preserve the current high-quality cursor overlay path in preview and export.
- Capture macOS system audio through ScreenCaptureKit on supported macOS versions.
- Capture microphone audio through the same native timing domain where the OS supports it, or through an explicit companion path until it can be moved into the helper.
- Mix system audio and microphone audio into the primary MP4 without renderer-side track assembly.
- Capture webcam video natively and compose it into the helper-owned MP4 during the native-recording migration.
- Keep screen video, audio, webcam, and cursor aligned to one native timing origin.
- Package per-architecture helper binaries with macOS builds.
## Non-Goals
- Replacing the editor/export pipeline.
- Changing Windows native capture behavior.
- Adding Linux native capture.
- Shipping a silent fallback from native macOS capture to Electron capture when the user explicitly requested a native-only feature.
## Architecture
The renderer keeps the existing recording controls. On macOS, `useScreenRecorder` should eventually send a complete recording request to Electron instead of assembling display, audio, microphone, webcam, and cursor streams in the browser.
Electron owns the native recording session:
- resolves the selected display/window source;
- resolves output paths;
- starts cursor telemetry capture when editable cursor mode is selected;
- starts the ScreenCaptureKit helper process;
- sends pause/resume/stop/cancel commands;
- writes `RecordingSession` manifests;
- reports explicit errors when a macOS-native capability is unavailable.
The helper owns macOS media capture:
- ScreenCaptureKit display/window frames;
- ScreenCaptureKit system audio where supported;
- microphone capture or helper-owned companion audio capture;
- webcam capture and initial picture-in-picture composition;
- AVFoundation/VideoToolbox encoding and muxing;
- stream timestamp normalization.
## Helper Contract V1
The helper receives a single JSON argument:
```json
{
"schemaVersion": 1,
"recordingId": 1234567890,
"source": {
"type": "display",
"sourceId": "screen:0:0",
"displayId": 1,
"windowId": null,
"bounds": { "x": 0, "y": 0, "width": 1920, "height": 1080 }
},
"video": {
"fps": 60,
"width": 1920,
"height": 1080,
"bitrate": 18000000,
"hideSystemCursor": true
},
"audio": {
"system": { "enabled": true },
"microphone": {
"enabled": true,
"deviceId": "default",
"deviceName": "MacBook Pro Microphone",
"gain": 1.4
}
},
"webcam": {
"enabled": true,
"deviceId": "default",
"deviceName": "FaceTime HD Camera",
"width": 1280,
"height": 720,
"fps": 30
},
"cursor": {
"mode": "editable-overlay"
},
"outputs": {
"screenPath": "/Users/me/Library/Application Support/openscreen/recordings/recording-123.mp4",
"manifestPath": "/Users/me/Library/Application Support/openscreen/recordings/recording-123.session.json"
}
}
```
The helper emits newline-delimited JSON events to stdout:
```json
{ "event": "ready", "schemaVersion": 1 }
{ "event": "recording-started", "timestampMs": 1234567890 }
{ "event": "warning", "code": "microphone-unavailable", "message": "..." }
{ "event": "recording-stopped", "screenPath": "..." }
{ "event": "error", "code": "screen-permission-denied", "message": "..." }
```
## Implementation Phases
Current PR status: macOS screen/window capture routes through the ScreenCaptureKit helper when it is available so editable-cursor recordings can hide the system cursor. The helper now writes ScreenCaptureKit system audio into the primary MP4 and attempts runtime-gated native microphone capture on macOS versions that expose ScreenCaptureKit microphone output. Webcam capture is currently an Electron-recorded sidecar attached to the same recording session; native AVFoundation webcam composition remains the target end state.
### 1. Native Session Boundary
- Add a structured macOS native recording request type.
- Add a macOS helper resolver and build script placeholders.
- Keep the helper contract process-based, matching the Windows helper boundary.
- Do not route production macOS recording through this helper until the helper is available and validated.
Acceptance:
- TypeScript build passes.
- The macOS helper path and request contract are documented and testable without affecting Windows/Linux behavior.
### 2. ScreenCaptureKit Display Capture
- Implement a Swift helper using ScreenCaptureKit.
- Select display captures by `displayId`.
- Encode H.264 MP4 through AVFoundation/VideoToolbox.
- Set `showsCursor = false` when editable cursor overlay mode is selected.
Acceptance:
- Display-only recording produces a valid MP4.
- The real cursor is not baked into editable-cursor recordings.
### 3. ScreenCaptureKit Window Capture
- Resolve Electron `window:*` selections to ScreenCaptureKit window ids.
- Capture `SCContentFilter(desktopIndependentWindow:)`.
- Handle closed/minimized/protected windows with explicit errors.
- Keep window selection and capture source resolution in Electron/main, not the renderer.
Acceptance:
- Capturing a normal app window works with cursor/audio/webcam disabled.
- Unsupported windows return clear native errors.
### 4. System Audio
- Enable ScreenCaptureKit system audio on supported macOS versions.
- Keep audio format and timing owned by the helper.
- Encode or mux AAC audio into the primary MP4.
Acceptance:
- System-audio-only recordings produce a valid AAC track.
- Unsupported macOS versions return an explicit capability error.
### 5. Microphone
- Resolve the selected microphone device from the renderer-provided browser `deviceId` and user-visible label.
- Capture microphone audio in the helper timing domain.
- Apply OpenScreen microphone gain policy.
- Mix system and microphone audio before final AAC output.
Acceptance:
- Mic-only and mic-plus-system recordings produce a valid, balanced AAC track.
- Device selection honors the selected microphone, not only the default device.
### 6. Webcam Composition
- Capture the selected camera natively through AVFoundation.
- Match browser device id first where possible, then user-visible label.
- Compose an initial picture-in-picture overlay into the primary MP4.
- Hide webcam output until the first usable frame to avoid black startup flashes.
Acceptance:
- Native display/window recordings can include webcam without returning to Electron capture.
- Selected camera is honored.
### 7. Runtime Controls
- Add pause/resume commands to the helper.
- Add cancel command that removes partial outputs.
- Keep restart as stop-discard-start until the helper exposes a native restart operation.
Acceptance:
- Pause/resume keeps output duration coherent.
- Cancel leaves no stale media/session files.
### 8. Test Pipeline
- `npm run build:native:mac`: builds Swift helper binaries on macOS.
- `npm run test:sck-helper:mac`: display-only helper smoke test.
- `npm run test:sck-window:mac`: window capture smoke test.
- `npm run test:sck-audio:mac`: system audio smoke test when supported.
- `npm run test:sck-mic:mac`: microphone smoke test.
- `npm run test:sck-webcam:mac`: webcam smoke test when a webcam is available.
- Packaging check: confirms helpers are available under `electron/native/bin/darwin-${arch}` in packaged builds.
## SSOT Rules
- `src/lib/nativeMacRecording.ts` is the renderer/main TypeScript request contract.
- This document is the feature-level contract and phase checklist.
- The Swift helper owns ScreenCaptureKit/AVFoundation media timing.
- Electron owns output paths, session manifests, and selected source/device resolution.
- Renderer code must use existing hooks/client APIs and should not bind directly to helper process details.
@@ -0,0 +1,77 @@
# PaddleOCR Local Service
OpenScreen calls OCR through a local HTTP service. The default endpoint is:
```text
http://127.0.0.1:8866/ocr
```
The app sends either `imageBase64` or `path`, plus optional `language` and `profile`, and expects OCR blocks:
```json
{
"blocks": [
{
"text": "Settings",
"confidence": 0.97,
"box": { "x": 120, "y": 80, "width": 90, "height": 24 }
}
]
}
```
## Install
Use a separate virtual environment because PaddleOCR and PaddlePaddle are large dependencies.
```powershell
python -m venv .venv-ocr
.\.venv-ocr\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r tools\ocr\requirements.txt
```
If `paddle` is still missing after installing `paddleocr`, install the CPU PaddlePaddle wheel that matches your Python and OS from the official PaddlePaddle install guide.
## Run
```powershell
.\.venv-ocr\Scripts\Activate.ps1
$env:PADDLEOCR_DEVICE="cpu"
$env:OPENSCREEN_OCR_PROFILE="vietnamese"
npm run ocr:paddle
```
Keep this terminal open while using the Guide OCR step in OpenScreen.
## Verify
```powershell
Invoke-WebRequest http://127.0.0.1:8866/health -UseBasicParsing
```
Expected healthy environment:
```json
{
"ok": true,
"paddleocrInstalled": true,
"paddleInstalled": true,
"engineReady": false,
"defaultLanguage": "vi,en",
"defaultProfile": "vietnamese"
}
```
`engineReady` becomes `true` after the first OCR request. The first request can be slow because PaddleOCR downloads and loads models.
## Configuration
- `PADDLEOCR_DEVICE`: `cpu`, `gpu:0`, or another PaddleOCR device string.
- `OPENSCREEN_OCR_PROFILE`: `fast`, `vietnamese`, or `hybrid`. The default `vietnamese` profile upscales and sharpens focused UI screenshots before OCR.
- `OPENSCREEN_GUIDE_OCR_LANGUAGE`: defaults to `vi,en`.
- `PADDLEOCR_LANG`: optional hard override. Leave unset for the app profile/language settings to work.
- `PADDLEOCR_VERSION`: defaults to `PP-OCRv5`.
- `PADDLEOCR_USE_MOBILE`: defaults to `1`; set to `0` to use the default/server models.
- `PADDLEOCR_REC_MODEL`: optional recognizer model override. The bundled profile uses `latin_PP-OCRv5_mobile_rec`, which supports Vietnamese Latin-script text.
- `OPENSCREEN_GUIDE_OCR_URL`: OpenScreen OCR endpoint override; defaults to `http://127.0.0.1:8866`.
@@ -0,0 +1,248 @@
# Windows Native Recorder Roadmap
OpenScreen's Windows recorder should be owned by one native backend. Electron capture can remain available for non-Windows platforms and temporary developer diagnostics, but Windows production recording should not silently fall back to `getDisplayMedia` / `MediaRecorder`.
## Goals
- Capture displays and windows through Windows Graphics Capture (WGC).
- Render the native Windows cursor as OpenScreen's high-quality scalable cursor overlay.
- Capture system audio through WASAPI loopback.
- Capture microphone audio through WASAPI.
- Mix system audio and microphone audio into the primary screen recording.
- Capture webcam video natively and compose it into the Windows helper MP4 during the native-recording migration.
- Keep preview/export aligned because screen video, audio, webcam, and cursor share one native timing origin.
- Keep exported MP4s Windows-friendly: H.264 video plus AAC audio. Opus-in-MP4 is not an acceptable Windows export target.
- Package the native helper with the Windows app.
## Non-Goals
- Replacing the editor/export pipeline.
- Replacing the editor/export pipeline. A later pass can reintroduce a separate editable native `webcamVideoPath`; the current Windows-native milestone prioritizes a helper-owned multi-flux MP4 with deterministic screen/audio/mic/webcam sync.
- Adding a native fallback for macOS or Linux in this branch.
## Target Architecture
The renderer keeps the existing recording controls. On Windows, `useScreenRecorder` sends a complete recording request to Electron and does not assemble Windows `MediaStream` tracks with `MediaRecorder`.
Electron owns the native recording session:
- resolves the selected source;
- resolves output paths;
- starts cursor sampling;
- starts the helper process;
- sends pause/resume/stop/cancel commands;
- writes `RecordingSession` manifests;
- reports explicit errors when a Windows-native capability is unavailable.
The helper owns Windows media capture:
- WGC screen/window frames;
- WASAPI system loopback;
- WASAPI microphone input;
- Media Foundation webcam capture;
- DirectShow webcam fallback for virtual cameras not visible to Media Foundation;
- Media Foundation encoding/muxing;
- stream timestamp normalization.
## Helper Contract V2
The helper receives a single JSON argument:
```json
{
"schemaVersion": 2,
"recordingId": 1234567890,
"source": {
"type": "display",
"sourceId": "screen:0:0",
"displayId": 123,
"windowHandle": null,
"bounds": { "x": 0, "y": 0, "width": 1920, "height": 1080 }
},
"video": {
"fps": 60,
"width": 1920,
"height": 1080,
"bitrate": 18000000
},
"audio": {
"system": { "enabled": true },
"microphone": { "enabled": true, "deviceId": "default", "gain": 1.4 }
},
"webcam": {
"enabled": true,
"deviceId": "default",
"deviceName": "Camera (NVIDIA Broadcast)",
"width": 1280,
"height": 720,
"fps": 30,
"bitrate": 18000000
},
"outputs": {
"screenPath": "C:\\Users\\me\\recording-123.mp4",
"manifestPath": "C:\\Users\\me\\recording-123.session.json"
}
}
```
The helper emits newline-delimited JSON events to stdout:
```json
{ "event": "ready", "schemaVersion": 2 }
{ "event": "recording-started", "timestampMs": 1234567890 }
{ "event": "warning", "code": "audio-device-unavailable", "message": "..." }
{ "event": "recording-stopped", "screenPath": "..." }
{ "event": "error", "code": "unsupported-window-source", "message": "..." }
```
During migration, Electron also accepts the current textual helper messages so existing display-only smoke tests keep working.
## Implementation Phases
### 1. Native Session Boundary
- Add a structured Windows native recording request type.
- Pass source kind, audio flags, microphone device, webcam flags, and output paths into the helper.
- On Windows, do not silently fall back to Electron capture. If the helper is unavailable or a native feature is missing, show a clear error.
- Keep Electron fallback only for non-Windows and optional developer diagnostics.
Acceptance:
- Display-only recording still works.
- Enabling an unsupported native feature returns an explicit native error instead of recording through Electron.
### 2. WASAPI System Audio
Status: initial implementation landed. The helper captures the default render endpoint with WASAPI loopback, passes the runtime mix format into `MFEncoder`, and muxes AAC audio into the primary MP4. Long-run drift correction and explicit silence insertion remain follow-up hardening work.
- Add `WasapiLoopbackCapture`.
- Capture the default render endpoint in shared loopback mode.
- Keep `WasapiLoopbackCapture` responsible only for device activation, packet capture, and packet timestamps.
- Keep `MFEncoder` responsible for all Media Foundation stream definitions and muxing.
- Feed the endpoint mix format into `MFEncoder` as the single source of truth for audio stream shape: sample rate, channel count, bits per sample, block alignment, average bytes/sec, and subtype (`PCM` or `Float`).
- Encode the primary screen MP4 with H.264 video and AAC audio through one `IMFSinkWriter`.
- Timestamp audio from the captured frame count in 100ns units. The first implementation uses the WASAPI packet timeline; later drift correction will add explicit silence or resampling if long recordings show measurable clock skew.
- Treat microphone mixing as a later phase. System loopback must land first without introducing renderer-side audio code.
Acceptance:
- Screen MP4 has an AAC audio track when system audio is enabled.
- A 5-minute recording has audio/video duration drift below one frame.
SSOT rules for this phase:
- `src/lib/nativeWindowsRecording.ts` is the renderer/main TypeScript request contract.
- `docs/engineering/windows-native-recorder-roadmap.md` is the feature-level contract and phase checklist.
- `WgcSession::captureWidth()/captureHeight()` is the encoded screen frame size until a dedicated native scaling stage exists.
- `WasapiLoopbackCapture::inputFormat()` is the runtime audio format source used by `MFEncoder`.
- The renderer passes both the browser webcam `deviceId` and selected display label as `deviceName`; `electron/native/wgc-capture/src/webcam_capture.*` is the only place that maps those values to Media Foundation devices.
- Electron resolves the selected label to a DirectShow filter CLSID once and passes it as `webcamDirectShowClsid`; the helper must not independently guess among DirectShow filters.
- No duplicated hard-coded audio format assumptions in `main.cpp`.
### 3. WASAPI Microphone
Status: initial implementation in progress. The helper can open the default WASAPI capture endpoint, apply the OpenScreen microphone gain, encode mic-only audio, and mix system loopback plus microphone through a single queued `AudioMixer` timeline when both endpoints expose the same runtime format. Audio endpoints are warmed before WGC starts, the mixer drops pre-roll and begins its paced timeline on the first encoded video frame, then cuts queued tail audio on stop so the MP4 does not drift past the video. Browser `deviceId` to MMDevice id mapping, resampling between mismatched endpoint formats, and drift correction remain follow-up hardening work.
- Add microphone device enumeration and stable device-id mapping.
- Capture selected/default microphone through WASAPI.
- Apply OpenScreen's current mic gain policy.
- Mix microphone and system audio before AAC encoding.
Acceptance:
- Mic-only, system-only, and mixed audio recordings produce a valid AAC track.
- Device unplug/permission failure produces an explicit error or warning.
### 4. Webcam Capture
- Add Media Foundation webcam source reader.
- Select requested dimensions/fps or the nearest format accepted by Media Foundation.
- Convert webcam samples to BGRA and compose them into the primary helper MP4 as an initial bottom-right picture-in-picture overlay.
- Ignore black webcam warmup frames and keep the overlay hidden until the first visible frame is available, so virtual cameras do not flash a black picture-in-picture rectangle at recording start.
- Keep the helper process as the SSOT for screen/window, WASAPI system audio, microphone, webcam, and mux timing.
- Match the requested webcam through Media Foundation friendly names first, then browser device ids/symbolic links, so UI selection remains stable across Chromium and Windows native device namespaces.
- Use the Electron-resolved DirectShow CLSID when the selected virtual camera, for example NVIDIA Broadcast, is registered for DirectShow but absent from Media Foundation enumeration.
- Later: promote the same webcam capture source to a separate editable native `webcamVideoPath` if product requirements need post-recording layout edits.
Acceptance:
- Native display/window recordings can include webcam without returning to Electron capture.
- `npm run test:wgc-webcam:win` validates the helper path when a webcam is available and skips explicitly when no webcam device exists.
- Combined webcam + system audio + microphone produces one MP4 with H.264 video and AAC audio.
### 5. Native Window Capture
Status: initial implementation in progress. Electron parses the `window:<HWND>:...` desktop source id through the shared native Windows recording contract and passes `windowHandle` to the helper. The helper resolves the `HWND`, validates it with `IsWindow`, and creates the WGC item with `CreateForWindow(HWND)`. Resize/minimize/move hardening and protected-window diagnostics remain follow-up work.
- Resolve Electron `window:*` selections to an `HWND`.
- Use WGC `CreateForWindow(HWND)`.
- Handle window close, minimize, resize, DPI scaling, and monitor moves.
- Return clear errors for unsupported protected windows.
Acceptance:
- Capturing a normal app window works with cursor/audio/mic/webcam.
- Window resize and movement do not corrupt the recording.
### 6. Runtime Controls
- Add pause/resume commands to the helper.
- Add cancel command that removes partial screen/webcam outputs.
- Keep restart as stop-discard-start from Electron until the helper supports a native restart event.
Acceptance:
- Pause/resume keeps preview duration coherent.
- Cancel leaves no stale media/session/cursor files.
### 7. Test Pipeline
- `npm run test:wgc-helper:win`: display-only helper smoke test.
- `npm run test:wgc-audio:win`: validates AAC track presence and duration.
- `npm run test:wgc-window:win`: captures a fixture window by HWND.
- `npm run test:wgc-webcam:win`: validates webcam output when a webcam is available, otherwise skips explicitly.
- Packaging check: confirms the helper is in `app.asar.unpacked`.
- Export check: exported MP4s generated from native recordings keep an AAC audio track when the source has audio.
- `npm run test:wgc-mic:win`: validates default-microphone capture writes an AAC track when an input endpoint is available.
- `npm run test:wgc-mixed-audio:win`: validates system loopback plus microphone writes one mixed AAC track when endpoint formats are compatible.
## Backlog
### Native Cursor Click Bounce Is Not Visibly Applied
Status: open. Do not treat Windows native cursor `Click Bounce` as shipped.
Problem:
- The cursor settings UI exposes `Size`, `Smoothing`, `Motion Blur`, and `Click Bounce`.
- On Windows native cursor recordings, `Size`, `Smoothing`, and `Motion Blur` are visibly applied in preview/export.
- `Click Bounce` still has no visible effect in manual packaged-app testing, even after adding click-related sample metadata.
What has already been tried:
- Added `interactionType: "click" | "mouseup" | "move"` to native cursor samples.
- Added polling-based left-button state through `GetAsyncKeyState`.
- Added the `GetAsyncKeyState` low-bit path to catch quick clicks between samples.
- Added a PowerShell/C# `WH_MOUSE_LL` mouse hook experiment and launched the sampler through a temporary `.ps1` file to avoid Windows command-line length limits.
- Updated `npm run test:cursor-native:win` so the diagnostic can observe a synthetic short click and emit `clickSampleCount`.
Current diagnosis:
- The diagnostic can observe synthetic click events, but this has not translated into a visible `Click Bounce` effect in the real packaged app.
- The test currently proves that some click metadata can be recorded, not that the full OpenScreen record -> preview -> export path displays a bounce at the expected time.
- The current native implementation may be animating from metadata that is not present in the real recording session, may be using the wrong timestamp origin, or may be applying a scale change too subtle to notice on the DOM/native cursor path.
Next investigation when resumed:
- Inspect the actual `.cursor.json`/session sidecar generated by a packaged-app manual recording and confirm whether real clicks produce `interactionType: "click"` at the right `timeMs`.
- Add a targeted end-to-end fixture that records a known click, loads the generated project, and asserts the preview/export cursor scale changes across adjacent frames.
- Compare the native DOM cursor path against the older `PixiCursorOverlay` click visual state and decide whether native cursor bounce should be a scale-only animation, an additional click ring, or a short explicit keyframe animation independent of sample cadence.
- If event capture remains unreliable in the PowerShell sampler, move click events into a small native cursor helper instead of PowerShell/C# script injection.
## Ship Criteria
- Windows display capture works with cursor, system audio, microphone, and webcam.
- Windows window capture works with cursor, system audio, microphone, and webcam.
- Preview and export show no cursor position drift.
- Preview and export show no measurable audio/video/webcam drift.
- Windows production builds do not depend on Electron capture fallback.
@@ -0,0 +1,84 @@
# Windows Private Trust Signing
OpenScreen supports Microsoft Trusted Signing private trust profiles for Windows
builds. Secrets and signing resource names are read from environment variables;
no certificate, client secret, or API key should be committed.
For a local signing machine, copy `.env.signing.example` to
`.env.signing.local` and fill in values there. `.env.signing.local` is ignored
by Git. Explicit shell environment variables override values in that local file.
## Required Azure Resource Variables
Set these values for the Trusted Signing account and certificate profile:
```powershell
$env:AZURE_TRUSTED_SIGNING_ENDPOINT = "https://<region>.codesigning.azure.net/"
$env:AZURE_TRUSTED_SIGNING_ACCOUNT_NAME = "<trusted-signing-account-name>"
$env:AZURE_TRUSTED_SIGNING_CERTIFICATE_PROFILE_NAME = "<private-trust-profile-name>"
$env:AZURE_TRUSTED_SIGNING_PUBLISHER_NAME = "<certificate-common-name>"
```
`AZURE_TRUSTED_SIGNING_CERTIFICATE_PROFILE_NAME` must point to a certificate
profile created with the `PrivateTrust` profile type.
## Required Azure Auth Variables
Electron Builder uses Azure environment credentials. Set the tenant and client:
```powershell
$env:AZURE_TENANT_ID = "<tenant-id>"
$env:AZURE_CLIENT_ID = "<app-registration-client-id>"
```
Then set one authentication mode. Service principal secret is the simplest for
local signing:
```powershell
$env:AZURE_CLIENT_SECRET = "<client-secret>"
```
Certificate auth is also supported:
```powershell
$env:AZURE_CLIENT_CERTIFICATE_PATH = "C:\secure\signing-auth.pfx"
$env:AZURE_CLIENT_CERTIFICATE_PASSWORD = "<pfx-password>"
```
## Sign Existing Installer
This signs the installer already built at
`release/<version>/Openscreen Setup <version>.exe`:
```powershell
npm run sign:win:private-trust
```
To sign a specific file:
```powershell
npm run sign:win:private-trust -- --file "D:\Code\OpenScreen\release\1.4.0\Openscreen Setup 1.4.0.exe"
```
## Build And Sign
This signs the packaged app executable, bundled OCR service executable, and NSIS
installer during the Windows build:
```powershell
npm run build:win:private-trust
```
The regular `npm run build:win` remains unsigned for local development builds.
## Verification
After signing:
```powershell
Get-AuthenticodeSignature "release\1.4.0\Openscreen Setup 1.4.0.exe" | Format-List
```
Private trust signatures are valid only on machines that trust the private trust
certificate chain/publisher. For public downloads that must be trusted on any
Windows machine, use a public trust certificate profile instead.