feat: add native Windows recorder helper
This commit is contained in:
@@ -0,0 +1,202 @@
|
||||
# Windows Native Recorder Roadmap
|
||||
|
||||
OpenScreen's Windows recorder should be owned by one native backend. Electron capture can remain available for non-Windows platforms and temporary developer diagnostics, but Windows production recording should not silently fall back to `getDisplayMedia` / `MediaRecorder`.
|
||||
|
||||
## Goals
|
||||
|
||||
- Capture displays and windows through Windows Graphics Capture (WGC).
|
||||
- Render the native Windows cursor as OpenScreen's high-quality scalable cursor overlay.
|
||||
- Capture system audio through WASAPI loopback.
|
||||
- Capture microphone audio through WASAPI.
|
||||
- Mix system audio and microphone audio into the primary screen recording.
|
||||
- Capture webcam video natively and keep it as a separate editable OpenScreen media stream.
|
||||
- Keep preview/export aligned because screen video, audio, webcam, and cursor share one native timing origin.
|
||||
- Keep exported MP4s Windows-friendly: H.264 video plus AAC audio. Opus-in-MP4 is not an acceptable Windows export target.
|
||||
- Package the native helper with the Windows app.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Replacing the editor/export pipeline.
|
||||
- Flattening webcam into the screen recording. The editor currently treats webcam as editable picture-in-picture media, so the native recorder should preserve a separate `webcamVideoPath`.
|
||||
- Adding a native fallback for macOS or Linux in this branch.
|
||||
|
||||
## Target Architecture
|
||||
|
||||
The renderer keeps the existing recording controls. On Windows, `useScreenRecorder` sends a complete recording request to Electron and does not assemble Windows `MediaStream` tracks with `MediaRecorder`.
|
||||
|
||||
Electron owns the native recording session:
|
||||
|
||||
- resolves the selected source;
|
||||
- resolves output paths;
|
||||
- starts cursor sampling;
|
||||
- starts the helper process;
|
||||
- sends pause/resume/stop/cancel commands;
|
||||
- writes `RecordingSession` manifests;
|
||||
- reports explicit errors when a Windows-native capability is unavailable.
|
||||
|
||||
The helper owns Windows media capture:
|
||||
|
||||
- WGC screen/window frames;
|
||||
- WASAPI system loopback;
|
||||
- WASAPI microphone input;
|
||||
- Media Foundation webcam capture;
|
||||
- Media Foundation encoding/muxing;
|
||||
- stream timestamp normalization.
|
||||
|
||||
## Helper Contract V2
|
||||
|
||||
The helper receives a single JSON argument:
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": 2,
|
||||
"recordingId": 1234567890,
|
||||
"source": {
|
||||
"type": "display",
|
||||
"sourceId": "screen:0:0",
|
||||
"displayId": 123,
|
||||
"windowHandle": null,
|
||||
"bounds": { "x": 0, "y": 0, "width": 1920, "height": 1080 }
|
||||
},
|
||||
"video": {
|
||||
"fps": 60,
|
||||
"width": 1920,
|
||||
"height": 1080,
|
||||
"bitrate": 18000000
|
||||
},
|
||||
"audio": {
|
||||
"system": { "enabled": true },
|
||||
"microphone": { "enabled": true, "deviceId": "default", "gain": 1.4 }
|
||||
},
|
||||
"webcam": {
|
||||
"enabled": true,
|
||||
"deviceId": "default",
|
||||
"width": 1280,
|
||||
"height": 720,
|
||||
"fps": 30,
|
||||
"bitrate": 18000000
|
||||
},
|
||||
"outputs": {
|
||||
"screenPath": "C:\\Users\\me\\recording-123.mp4",
|
||||
"webcamPath": "C:\\Users\\me\\recording-123-webcam.mp4",
|
||||
"manifestPath": "C:\\Users\\me\\recording-123.session.json"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The helper emits newline-delimited JSON events to stdout:
|
||||
|
||||
```json
|
||||
{ "event": "ready", "schemaVersion": 2 }
|
||||
{ "event": "recording-started", "timestampMs": 1234567890 }
|
||||
{ "event": "warning", "code": "audio-device-unavailable", "message": "..." }
|
||||
{ "event": "recording-stopped", "screenPath": "...", "webcamPath": "..." }
|
||||
{ "event": "error", "code": "unsupported-window-source", "message": "..." }
|
||||
```
|
||||
|
||||
During migration, Electron also accepts the current textual helper messages so existing display-only smoke tests keep working.
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### 1. Native Session Boundary
|
||||
|
||||
- Add a structured Windows native recording request type.
|
||||
- Pass source kind, audio flags, microphone device, webcam flags, and output paths into the helper.
|
||||
- On Windows, do not silently fall back to Electron capture. If the helper is unavailable or a native feature is missing, show a clear error.
|
||||
- Keep Electron fallback only for non-Windows and optional developer diagnostics.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Display-only recording still works.
|
||||
- Enabling an unsupported native feature returns an explicit native error instead of recording through Electron.
|
||||
|
||||
### 2. WASAPI System Audio
|
||||
|
||||
Status: initial implementation landed. The helper captures the default render endpoint with WASAPI loopback, passes the runtime mix format into `MFEncoder`, and muxes AAC audio into the primary MP4. Long-run drift correction and explicit silence insertion remain follow-up hardening work.
|
||||
|
||||
- Add `WasapiLoopbackCapture`.
|
||||
- Capture the default render endpoint in shared loopback mode.
|
||||
- Keep `WasapiLoopbackCapture` responsible only for device activation, packet capture, and packet timestamps.
|
||||
- Keep `MFEncoder` responsible for all Media Foundation stream definitions and muxing.
|
||||
- Feed the endpoint mix format into `MFEncoder` as the single source of truth for audio stream shape: sample rate, channel count, bits per sample, block alignment, average bytes/sec, and subtype (`PCM` or `Float`).
|
||||
- Encode the primary screen MP4 with H.264 video and AAC audio through one `IMFSinkWriter`.
|
||||
- Timestamp audio from the captured frame count in 100ns units. The first implementation uses the WASAPI packet timeline; later drift correction will add explicit silence or resampling if long recordings show measurable clock skew.
|
||||
- Treat microphone mixing as a later phase. System loopback must land first without introducing renderer-side audio code.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Screen MP4 has an AAC audio track when system audio is enabled.
|
||||
- A 5-minute recording has audio/video duration drift below one frame.
|
||||
|
||||
SSOT rules for this phase:
|
||||
|
||||
- `src/lib/nativeWindowsRecording.ts` is the renderer/main TypeScript request contract.
|
||||
- `docs/engineering/windows-native-recorder-roadmap.md` is the feature-level contract and phase checklist.
|
||||
- `WgcSession::captureWidth()/captureHeight()` is the encoded screen frame size until a dedicated native scaling stage exists.
|
||||
- `WasapiLoopbackCapture::inputFormat()` is the runtime audio format source used by `MFEncoder`.
|
||||
- No duplicated hard-coded audio format assumptions in `main.cpp`.
|
||||
|
||||
### 3. WASAPI Microphone
|
||||
|
||||
- Add microphone device enumeration and stable device-id mapping.
|
||||
- Capture selected/default microphone through WASAPI.
|
||||
- Apply OpenScreen's current mic gain policy.
|
||||
- Mix microphone and system audio before AAC encoding.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Mic-only, system-only, and mixed audio recordings produce a valid AAC track.
|
||||
- Device unplug/permission failure produces an explicit error or warning.
|
||||
|
||||
### 4. Webcam Capture
|
||||
|
||||
- Add Media Foundation webcam source reader.
|
||||
- Select 1280x720/30fps or nearest supported format.
|
||||
- Encode webcam to `recording-<id>-webcam.mp4`.
|
||||
- Synchronize webcam timestamps to the native session clock.
|
||||
- Store `webcamVideoPath` in the OpenScreen session manifest.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Editor loads the native screen recording and the native webcam recording.
|
||||
- Webcam layout controls behave the same as today.
|
||||
|
||||
### 5. Native Window Capture
|
||||
|
||||
- Resolve Electron `window:*` selections to an `HWND`.
|
||||
- Use WGC `CreateForWindow(HWND)`.
|
||||
- Handle window close, minimize, resize, DPI scaling, and monitor moves.
|
||||
- Return clear errors for unsupported protected windows.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Capturing a normal app window works with cursor/audio/mic/webcam.
|
||||
- Window resize and movement do not corrupt the recording.
|
||||
|
||||
### 6. Runtime Controls
|
||||
|
||||
- Add pause/resume commands to the helper.
|
||||
- Add cancel command that removes partial screen/webcam outputs.
|
||||
- Keep restart as stop-discard-start from Electron until the helper supports a native restart event.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Pause/resume keeps preview duration coherent.
|
||||
- Cancel leaves no stale media/session/cursor files.
|
||||
|
||||
### 7. Test Pipeline
|
||||
|
||||
- `npm run test:wgc-helper:win`: display-only helper smoke test.
|
||||
- `npm run test:wgc-audio:win`: validates AAC track presence and duration.
|
||||
- `npm run test:wgc-window:win`: captures a fixture window by HWND.
|
||||
- `npm run test:wgc-webcam:win`: validates webcam output when a webcam is available, otherwise skips explicitly.
|
||||
- Packaging check: confirms the helper is in `app.asar.unpacked`.
|
||||
- Export check: exported MP4s generated from native recordings keep an AAC audio track when the source has audio.
|
||||
|
||||
## Ship Criteria
|
||||
|
||||
- Windows display capture works with cursor, system audio, microphone, and webcam.
|
||||
- Windows window capture works with cursor, system audio, microphone, and webcam.
|
||||
- Preview and export show no cursor position drift.
|
||||
- Preview and export show no measurable audio/video/webcam drift.
|
||||
- Windows production builds do not depend on Electron capture fallback.
|
||||
Reference in New Issue
Block a user