feat: add native Windows recorder helper

This commit is contained in:
EtienneLescot
2026-05-05 16:07:07 +02:00
parent d21e5eb34c
commit 062cf2a87c
27 changed files with 2873 additions and 139 deletions
@@ -0,0 +1,202 @@
# Windows Native Recorder Roadmap
OpenScreen's Windows recorder should be owned by one native backend. Electron capture can remain available for non-Windows platforms and temporary developer diagnostics, but Windows production recording should not silently fall back to `getDisplayMedia` / `MediaRecorder`.
## Goals
- Capture displays and windows through Windows Graphics Capture (WGC).
- Render the native Windows cursor as OpenScreen's high-quality scalable cursor overlay.
- Capture system audio through WASAPI loopback.
- Capture microphone audio through WASAPI.
- Mix system audio and microphone audio into the primary screen recording.
- Capture webcam video natively and keep it as a separate editable OpenScreen media stream.
- Keep preview/export aligned because screen video, audio, webcam, and cursor share one native timing origin.
- Keep exported MP4s Windows-friendly: H.264 video plus AAC audio. Opus-in-MP4 is not an acceptable Windows export target.
- Package the native helper with the Windows app.
## Non-Goals
- Replacing the editor/export pipeline.
- Flattening webcam into the screen recording. The editor currently treats webcam as editable picture-in-picture media, so the native recorder should preserve a separate `webcamVideoPath`.
- Adding a native fallback for macOS or Linux in this branch.
## Target Architecture
The renderer keeps the existing recording controls. On Windows, `useScreenRecorder` sends a complete recording request to Electron and does not assemble Windows `MediaStream` tracks with `MediaRecorder`.
Electron owns the native recording session:
- resolves the selected source;
- resolves output paths;
- starts cursor sampling;
- starts the helper process;
- sends pause/resume/stop/cancel commands;
- writes `RecordingSession` manifests;
- reports explicit errors when a Windows-native capability is unavailable.
The helper owns Windows media capture:
- WGC screen/window frames;
- WASAPI system loopback;
- WASAPI microphone input;
- Media Foundation webcam capture;
- Media Foundation encoding/muxing;
- stream timestamp normalization.
## Helper Contract V2
The helper receives a single JSON argument:
```json
{
"schemaVersion": 2,
"recordingId": 1234567890,
"source": {
"type": "display",
"sourceId": "screen:0:0",
"displayId": 123,
"windowHandle": null,
"bounds": { "x": 0, "y": 0, "width": 1920, "height": 1080 }
},
"video": {
"fps": 60,
"width": 1920,
"height": 1080,
"bitrate": 18000000
},
"audio": {
"system": { "enabled": true },
"microphone": { "enabled": true, "deviceId": "default", "gain": 1.4 }
},
"webcam": {
"enabled": true,
"deviceId": "default",
"width": 1280,
"height": 720,
"fps": 30,
"bitrate": 18000000
},
"outputs": {
"screenPath": "C:\\Users\\me\\recording-123.mp4",
"webcamPath": "C:\\Users\\me\\recording-123-webcam.mp4",
"manifestPath": "C:\\Users\\me\\recording-123.session.json"
}
}
```
The helper emits newline-delimited JSON events to stdout:
```json
{ "event": "ready", "schemaVersion": 2 }
{ "event": "recording-started", "timestampMs": 1234567890 }
{ "event": "warning", "code": "audio-device-unavailable", "message": "..." }
{ "event": "recording-stopped", "screenPath": "...", "webcamPath": "..." }
{ "event": "error", "code": "unsupported-window-source", "message": "..." }
```
During migration, Electron also accepts the current textual helper messages so existing display-only smoke tests keep working.
## Implementation Phases
### 1. Native Session Boundary
- Add a structured Windows native recording request type.
- Pass source kind, audio flags, microphone device, webcam flags, and output paths into the helper.
- On Windows, do not silently fall back to Electron capture. If the helper is unavailable or a native feature is missing, show a clear error.
- Keep Electron fallback only for non-Windows and optional developer diagnostics.
Acceptance:
- Display-only recording still works.
- Enabling an unsupported native feature returns an explicit native error instead of recording through Electron.
### 2. WASAPI System Audio
Status: initial implementation landed. The helper captures the default render endpoint with WASAPI loopback, passes the runtime mix format into `MFEncoder`, and muxes AAC audio into the primary MP4. Long-run drift correction and explicit silence insertion remain follow-up hardening work.
- Add `WasapiLoopbackCapture`.
- Capture the default render endpoint in shared loopback mode.
- Keep `WasapiLoopbackCapture` responsible only for device activation, packet capture, and packet timestamps.
- Keep `MFEncoder` responsible for all Media Foundation stream definitions and muxing.
- Feed the endpoint mix format into `MFEncoder` as the single source of truth for audio stream shape: sample rate, channel count, bits per sample, block alignment, average bytes/sec, and subtype (`PCM` or `Float`).
- Encode the primary screen MP4 with H.264 video and AAC audio through one `IMFSinkWriter`.
- Timestamp audio from the captured frame count in 100ns units. The first implementation uses the WASAPI packet timeline; later drift correction will add explicit silence or resampling if long recordings show measurable clock skew.
- Treat microphone mixing as a later phase. System loopback must land first without introducing renderer-side audio code.
Acceptance:
- Screen MP4 has an AAC audio track when system audio is enabled.
- A 5-minute recording has audio/video duration drift below one frame.
SSOT rules for this phase:
- `src/lib/nativeWindowsRecording.ts` is the renderer/main TypeScript request contract.
- `docs/engineering/windows-native-recorder-roadmap.md` is the feature-level contract and phase checklist.
- `WgcSession::captureWidth()/captureHeight()` is the encoded screen frame size until a dedicated native scaling stage exists.
- `WasapiLoopbackCapture::inputFormat()` is the runtime audio format source used by `MFEncoder`.
- No duplicated hard-coded audio format assumptions in `main.cpp`.
### 3. WASAPI Microphone
- Add microphone device enumeration and stable device-id mapping.
- Capture selected/default microphone through WASAPI.
- Apply OpenScreen's current mic gain policy.
- Mix microphone and system audio before AAC encoding.
Acceptance:
- Mic-only, system-only, and mixed audio recordings produce a valid AAC track.
- Device unplug/permission failure produces an explicit error or warning.
### 4. Webcam Capture
- Add Media Foundation webcam source reader.
- Select 1280x720/30fps or nearest supported format.
- Encode webcam to `recording-<id>-webcam.mp4`.
- Synchronize webcam timestamps to the native session clock.
- Store `webcamVideoPath` in the OpenScreen session manifest.
Acceptance:
- Editor loads the native screen recording and the native webcam recording.
- Webcam layout controls behave the same as today.
### 5. Native Window Capture
- Resolve Electron `window:*` selections to an `HWND`.
- Use WGC `CreateForWindow(HWND)`.
- Handle window close, minimize, resize, DPI scaling, and monitor moves.
- Return clear errors for unsupported protected windows.
Acceptance:
- Capturing a normal app window works with cursor/audio/mic/webcam.
- Window resize and movement do not corrupt the recording.
### 6. Runtime Controls
- Add pause/resume commands to the helper.
- Add cancel command that removes partial screen/webcam outputs.
- Keep restart as stop-discard-start from Electron until the helper supports a native restart event.
Acceptance:
- Pause/resume keeps preview duration coherent.
- Cancel leaves no stale media/session/cursor files.
### 7. Test Pipeline
- `npm run test:wgc-helper:win`: display-only helper smoke test.
- `npm run test:wgc-audio:win`: validates AAC track presence and duration.
- `npm run test:wgc-window:win`: captures a fixture window by HWND.
- `npm run test:wgc-webcam:win`: validates webcam output when a webcam is available, otherwise skips explicitly.
- Packaging check: confirms the helper is in `app.asar.unpacked`.
- Export check: exported MP4s generated from native recordings keep an AAC audio track when the source has audio.
## Ship Criteria
- Windows display capture works with cursor, system audio, microphone, and webcam.
- Windows window capture works with cursor, system audio, microphone, and webcam.
- Preview and export show no cursor position drift.
- Preview and export show no measurable audio/video/webcam drift.
- Windows production builds do not depend on Electron capture fallback.
+34
View File
@@ -83,3 +83,37 @@ Together, the scripts make it quick to inspect:
- whether the real OpenScreen preview renders the same cursor behavior as the diagnostic pipeline
They are not a full substitute for an end-to-end manual recording pass. Before shipping cursor changes, also test a real capture session and export from the packaged app.
## Native Windows capture backend
The app now routes Windows recordings through an external WGC helper instead of Electron `getDisplayMedia`. This is meant to remove the coordinate and clock split that made the reconstructed cursor drift in the preview/export path.
Current native availability rules:
- Windows 10 build 19041 or newer
- a helper executable is available
The helper currently implements display video capture and system audio loopback. Window capture, microphone audio, and webcam capture are part of the native recorder roadmap and fail explicitly instead of silently falling back to Electron capture on Windows.
Build OpenScreen's helper locally:
```powershell
npm run build:native:win
```
Smoke-test the helper directly:
```powershell
npm run test:wgc-helper:win
npm run test:wgc-audio:win
```
For local diagnostics with another compatible helper, point OpenScreen at that executable:
```powershell
$env:OPENSCREEN_WGC_CAPTURE_EXE = "C:\path\to\wgc-capture.exe"
npm run build-vite
npm run dev
```
The helper receives one JSON config argument, emits JSON lifecycle events, prints the legacy `Recording started` marker, accepts `stop` on stdin, and prints `Recording stopped. Output path: <path>`. See `electron/native/README.md` for the exact contract and build output paths.