249 lines
14 KiB
Markdown
249 lines
14 KiB
Markdown
# Windows Native Recorder Roadmap
|
|
|
|
OpenScreen's Windows recorder should be owned by one native backend. Electron capture can remain available for non-Windows platforms and temporary developer diagnostics, but Windows production recording should not silently fall back to `getDisplayMedia` / `MediaRecorder`.
|
|
|
|
## Goals
|
|
|
|
- Capture displays and windows through Windows Graphics Capture (WGC).
|
|
- Render the native Windows cursor as OpenScreen's high-quality scalable cursor overlay.
|
|
- Capture system audio through WASAPI loopback.
|
|
- Capture microphone audio through WASAPI.
|
|
- Mix system audio and microphone audio into the primary screen recording.
|
|
- Capture webcam video natively and compose it into the Windows helper MP4 during the native-recording migration.
|
|
- Keep preview/export aligned because screen video, audio, webcam, and cursor share one native timing origin.
|
|
- Keep exported MP4s Windows-friendly: H.264 video plus AAC audio. Opus-in-MP4 is not an acceptable Windows export target.
|
|
- Package the native helper with the Windows app.
|
|
|
|
## Non-Goals
|
|
|
|
- Replacing the editor/export pipeline.
|
|
- Replacing the editor/export pipeline. A later pass can reintroduce a separate editable native `webcamVideoPath`; the current Windows-native milestone prioritizes a helper-owned multi-flux MP4 with deterministic screen/audio/mic/webcam sync.
|
|
- Adding a native fallback for macOS or Linux in this branch.
|
|
|
|
## Target Architecture
|
|
|
|
The renderer keeps the existing recording controls. On Windows, `useScreenRecorder` sends a complete recording request to Electron and does not assemble Windows `MediaStream` tracks with `MediaRecorder`.
|
|
|
|
Electron owns the native recording session:
|
|
|
|
- resolves the selected source;
|
|
- resolves output paths;
|
|
- starts cursor sampling;
|
|
- starts the helper process;
|
|
- sends pause/resume/stop/cancel commands;
|
|
- writes `RecordingSession` manifests;
|
|
- reports explicit errors when a Windows-native capability is unavailable.
|
|
|
|
The helper owns Windows media capture:
|
|
|
|
- WGC screen/window frames;
|
|
- WASAPI system loopback;
|
|
- WASAPI microphone input;
|
|
- Media Foundation webcam capture;
|
|
- DirectShow webcam fallback for virtual cameras not visible to Media Foundation;
|
|
- Media Foundation encoding/muxing;
|
|
- stream timestamp normalization.
|
|
|
|
## Helper Contract V2
|
|
|
|
The helper receives a single JSON argument:
|
|
|
|
```json
|
|
{
|
|
"schemaVersion": 2,
|
|
"recordingId": 1234567890,
|
|
"source": {
|
|
"type": "display",
|
|
"sourceId": "screen:0:0",
|
|
"displayId": 123,
|
|
"windowHandle": null,
|
|
"bounds": { "x": 0, "y": 0, "width": 1920, "height": 1080 }
|
|
},
|
|
"video": {
|
|
"fps": 60,
|
|
"width": 1920,
|
|
"height": 1080,
|
|
"bitrate": 18000000
|
|
},
|
|
"audio": {
|
|
"system": { "enabled": true },
|
|
"microphone": { "enabled": true, "deviceId": "default", "gain": 1.4 }
|
|
},
|
|
"webcam": {
|
|
"enabled": true,
|
|
"deviceId": "default",
|
|
"deviceName": "Camera (NVIDIA Broadcast)",
|
|
"width": 1280,
|
|
"height": 720,
|
|
"fps": 30,
|
|
"bitrate": 18000000
|
|
},
|
|
"outputs": {
|
|
"screenPath": "C:\\Users\\me\\recording-123.mp4",
|
|
"manifestPath": "C:\\Users\\me\\recording-123.session.json"
|
|
}
|
|
}
|
|
```
|
|
|
|
The helper emits newline-delimited JSON events to stdout:
|
|
|
|
```json
|
|
{ "event": "ready", "schemaVersion": 2 }
|
|
{ "event": "recording-started", "timestampMs": 1234567890 }
|
|
{ "event": "warning", "code": "audio-device-unavailable", "message": "..." }
|
|
{ "event": "recording-stopped", "screenPath": "..." }
|
|
{ "event": "error", "code": "unsupported-window-source", "message": "..." }
|
|
```
|
|
|
|
During migration, Electron also accepts the current textual helper messages so existing display-only smoke tests keep working.
|
|
|
|
## Implementation Phases
|
|
|
|
### 1. Native Session Boundary
|
|
|
|
- Add a structured Windows native recording request type.
|
|
- Pass source kind, audio flags, microphone device, webcam flags, and output paths into the helper.
|
|
- On Windows, do not silently fall back to Electron capture. If the helper is unavailable or a native feature is missing, show a clear error.
|
|
- Keep Electron fallback only for non-Windows and optional developer diagnostics.
|
|
|
|
Acceptance:
|
|
|
|
- Display-only recording still works.
|
|
- Enabling an unsupported native feature returns an explicit native error instead of recording through Electron.
|
|
|
|
### 2. WASAPI System Audio
|
|
|
|
Status: initial implementation landed. The helper captures the default render endpoint with WASAPI loopback, passes the runtime mix format into `MFEncoder`, and muxes AAC audio into the primary MP4. Long-run drift correction and explicit silence insertion remain follow-up hardening work.
|
|
|
|
- Add `WasapiLoopbackCapture`.
|
|
- Capture the default render endpoint in shared loopback mode.
|
|
- Keep `WasapiLoopbackCapture` responsible only for device activation, packet capture, and packet timestamps.
|
|
- Keep `MFEncoder` responsible for all Media Foundation stream definitions and muxing.
|
|
- Feed the endpoint mix format into `MFEncoder` as the single source of truth for audio stream shape: sample rate, channel count, bits per sample, block alignment, average bytes/sec, and subtype (`PCM` or `Float`).
|
|
- Encode the primary screen MP4 with H.264 video and AAC audio through one `IMFSinkWriter`.
|
|
- Timestamp audio from the captured frame count in 100ns units. The first implementation uses the WASAPI packet timeline; later drift correction will add explicit silence or resampling if long recordings show measurable clock skew.
|
|
- Treat microphone mixing as a later phase. System loopback must land first without introducing renderer-side audio code.
|
|
|
|
Acceptance:
|
|
|
|
- Screen MP4 has an AAC audio track when system audio is enabled.
|
|
- A 5-minute recording has audio/video duration drift below one frame.
|
|
|
|
SSOT rules for this phase:
|
|
|
|
- `src/lib/nativeWindowsRecording.ts` is the renderer/main TypeScript request contract.
|
|
- `docs/engineering/windows-native-recorder-roadmap.md` is the feature-level contract and phase checklist.
|
|
- `WgcSession::captureWidth()/captureHeight()` is the encoded screen frame size until a dedicated native scaling stage exists.
|
|
- `WasapiLoopbackCapture::inputFormat()` is the runtime audio format source used by `MFEncoder`.
|
|
- The renderer passes both the browser webcam `deviceId` and selected display label as `deviceName`; `electron/native/wgc-capture/src/webcam_capture.*` is the only place that maps those values to Media Foundation devices.
|
|
- Electron resolves the selected label to a DirectShow filter CLSID once and passes it as `webcamDirectShowClsid`; the helper must not independently guess among DirectShow filters.
|
|
- No duplicated hard-coded audio format assumptions in `main.cpp`.
|
|
|
|
### 3. WASAPI Microphone
|
|
|
|
Status: initial implementation in progress. The helper can open the default WASAPI capture endpoint, apply the OpenScreen microphone gain, encode mic-only audio, and mix system loopback plus microphone through a single queued `AudioMixer` timeline when both endpoints expose the same runtime format. Audio endpoints are warmed before WGC starts, the mixer drops pre-roll and begins its paced timeline on the first encoded video frame, then cuts queued tail audio on stop so the MP4 does not drift past the video. Browser `deviceId` to MMDevice id mapping, resampling between mismatched endpoint formats, and drift correction remain follow-up hardening work.
|
|
|
|
- Add microphone device enumeration and stable device-id mapping.
|
|
- Capture selected/default microphone through WASAPI.
|
|
- Apply OpenScreen's current mic gain policy.
|
|
- Mix microphone and system audio before AAC encoding.
|
|
|
|
Acceptance:
|
|
|
|
- Mic-only, system-only, and mixed audio recordings produce a valid AAC track.
|
|
- Device unplug/permission failure produces an explicit error or warning.
|
|
|
|
### 4. Webcam Capture
|
|
|
|
- Add Media Foundation webcam source reader.
|
|
- Select requested dimensions/fps or the nearest format accepted by Media Foundation.
|
|
- Convert webcam samples to BGRA and compose them into the primary helper MP4 as an initial bottom-right picture-in-picture overlay.
|
|
- Ignore black webcam warmup frames and keep the overlay hidden until the first visible frame is available, so virtual cameras do not flash a black picture-in-picture rectangle at recording start.
|
|
- Keep the helper process as the SSOT for screen/window, WASAPI system audio, microphone, webcam, and mux timing.
|
|
- Match the requested webcam through Media Foundation friendly names first, then browser device ids/symbolic links, so UI selection remains stable across Chromium and Windows native device namespaces.
|
|
- Use the Electron-resolved DirectShow CLSID when the selected virtual camera, for example NVIDIA Broadcast, is registered for DirectShow but absent from Media Foundation enumeration.
|
|
- Later: promote the same webcam capture source to a separate editable native `webcamVideoPath` if product requirements need post-recording layout edits.
|
|
|
|
Acceptance:
|
|
|
|
- Native display/window recordings can include webcam without returning to Electron capture.
|
|
- `npm run test:wgc-webcam:win` validates the helper path when a webcam is available and skips explicitly when no webcam device exists.
|
|
- Combined webcam + system audio + microphone produces one MP4 with H.264 video and AAC audio.
|
|
|
|
### 5. Native Window Capture
|
|
|
|
Status: initial implementation in progress. Electron parses the `window:<HWND>:...` desktop source id through the shared native Windows recording contract and passes `windowHandle` to the helper. The helper resolves the `HWND`, validates it with `IsWindow`, and creates the WGC item with `CreateForWindow(HWND)`. Resize/minimize/move hardening and protected-window diagnostics remain follow-up work.
|
|
|
|
- Resolve Electron `window:*` selections to an `HWND`.
|
|
- Use WGC `CreateForWindow(HWND)`.
|
|
- Handle window close, minimize, resize, DPI scaling, and monitor moves.
|
|
- Return clear errors for unsupported protected windows.
|
|
|
|
Acceptance:
|
|
|
|
- Capturing a normal app window works with cursor/audio/mic/webcam.
|
|
- Window resize and movement do not corrupt the recording.
|
|
|
|
### 6. Runtime Controls
|
|
|
|
- Add pause/resume commands to the helper.
|
|
- Add cancel command that removes partial screen/webcam outputs.
|
|
- Keep restart as stop-discard-start from Electron until the helper supports a native restart event.
|
|
|
|
Acceptance:
|
|
|
|
- Pause/resume keeps preview duration coherent.
|
|
- Cancel leaves no stale media/session/cursor files.
|
|
|
|
### 7. Test Pipeline
|
|
|
|
- `npm run test:wgc-helper:win`: display-only helper smoke test.
|
|
- `npm run test:wgc-audio:win`: validates AAC track presence and duration.
|
|
- `npm run test:wgc-window:win`: captures a fixture window by HWND.
|
|
- `npm run test:wgc-webcam:win`: validates webcam output when a webcam is available, otherwise skips explicitly.
|
|
- Packaging check: confirms the helper is in `app.asar.unpacked`.
|
|
- Export check: exported MP4s generated from native recordings keep an AAC audio track when the source has audio.
|
|
- `npm run test:wgc-mic:win`: validates default-microphone capture writes an AAC track when an input endpoint is available.
|
|
- `npm run test:wgc-mixed-audio:win`: validates system loopback plus microphone writes one mixed AAC track when endpoint formats are compatible.
|
|
|
|
## Backlog
|
|
|
|
### Native Cursor Click Bounce Is Not Visibly Applied
|
|
|
|
Status: open. Do not treat Windows native cursor `Click Bounce` as shipped.
|
|
|
|
Problem:
|
|
|
|
- The cursor settings UI exposes `Size`, `Smoothing`, `Motion Blur`, and `Click Bounce`.
|
|
- On Windows native cursor recordings, `Size`, `Smoothing`, and `Motion Blur` are visibly applied in preview/export.
|
|
- `Click Bounce` still has no visible effect in manual packaged-app testing, even after adding click-related sample metadata.
|
|
|
|
What has already been tried:
|
|
|
|
- Added `interactionType: "click" | "mouseup" | "move"` to native cursor samples.
|
|
- Added polling-based left-button state through `GetAsyncKeyState`.
|
|
- Added the `GetAsyncKeyState` low-bit path to catch quick clicks between samples.
|
|
- Added a PowerShell/C# `WH_MOUSE_LL` mouse hook experiment and launched the sampler through a temporary `.ps1` file to avoid Windows command-line length limits.
|
|
- Updated `npm run test:cursor-native:win` so the diagnostic can observe a synthetic short click and emit `clickSampleCount`.
|
|
|
|
Current diagnosis:
|
|
|
|
- The diagnostic can observe synthetic click events, but this has not translated into a visible `Click Bounce` effect in the real packaged app.
|
|
- The test currently proves that some click metadata can be recorded, not that the full OpenScreen record -> preview -> export path displays a bounce at the expected time.
|
|
- The current native implementation may be animating from metadata that is not present in the real recording session, may be using the wrong timestamp origin, or may be applying a scale change too subtle to notice on the DOM/native cursor path.
|
|
|
|
Next investigation when resumed:
|
|
|
|
- Inspect the actual `.cursor.json`/session sidecar generated by a packaged-app manual recording and confirm whether real clicks produce `interactionType: "click"` at the right `timeMs`.
|
|
- Add a targeted end-to-end fixture that records a known click, loads the generated project, and asserts the preview/export cursor scale changes across adjacent frames.
|
|
- Compare the native DOM cursor path against the older `PixiCursorOverlay` click visual state and decide whether native cursor bounce should be a scale-only animation, an additional click ring, or a short explicit keyframe animation independent of sample cadence.
|
|
- If event capture remains unreliable in the PowerShell sampler, move click events into a small native cursor helper instead of PowerShell/C# script injection.
|
|
|
|
## Ship Criteria
|
|
|
|
- Windows display capture works with cursor, system audio, microphone, and webcam.
|
|
- Windows window capture works with cursor, system audio, microphone, and webcam.
|
|
- Preview and export show no cursor position drift.
|
|
- Preview and export show no measurable audio/video/webcam drift.
|
|
- Windows production builds do not depend on Electron capture fallback.
|