14 KiB
Windows Native Recorder Roadmap
OpenScreen's Windows recorder should be owned by one native backend. Electron capture can remain available for non-Windows platforms and temporary developer diagnostics, but Windows production recording should not silently fall back to getDisplayMedia / MediaRecorder.
Goals
- Capture displays and windows through Windows Graphics Capture (WGC).
- Render the native Windows cursor as OpenScreen's high-quality scalable cursor overlay.
- Capture system audio through WASAPI loopback.
- Capture microphone audio through WASAPI.
- Mix system audio and microphone audio into the primary screen recording.
- Capture webcam video natively and compose it into the Windows helper MP4 during the native-recording migration.
- Keep preview/export aligned because screen video, audio, webcam, and cursor share one native timing origin.
- Keep exported MP4s Windows-friendly: H.264 video plus AAC audio. Opus-in-MP4 is not an acceptable Windows export target.
- Package the native helper with the Windows app.
Non-Goals
- Replacing the editor/export pipeline.
- Replacing the editor/export pipeline. A later pass can reintroduce a separate editable native
webcamVideoPath; the current Windows-native milestone prioritizes a helper-owned multi-flux MP4 with deterministic screen/audio/mic/webcam sync. - Adding a native fallback for macOS or Linux in this branch.
Target Architecture
The renderer keeps the existing recording controls. On Windows, useScreenRecorder sends a complete recording request to Electron and does not assemble Windows MediaStream tracks with MediaRecorder.
Electron owns the native recording session:
- resolves the selected source;
- resolves output paths;
- starts cursor sampling;
- starts the helper process;
- sends pause/resume/stop/cancel commands;
- writes
RecordingSessionmanifests; - reports explicit errors when a Windows-native capability is unavailable.
The helper owns Windows media capture:
- WGC screen/window frames;
- WASAPI system loopback;
- WASAPI microphone input;
- Media Foundation webcam capture;
- DirectShow webcam fallback for virtual cameras not visible to Media Foundation;
- Media Foundation encoding/muxing;
- stream timestamp normalization.
Helper Contract V2
The helper receives a single JSON argument:
{
"schemaVersion": 2,
"recordingId": 1234567890,
"source": {
"type": "display",
"sourceId": "screen:0:0",
"displayId": 123,
"windowHandle": null,
"bounds": { "x": 0, "y": 0, "width": 1920, "height": 1080 }
},
"video": {
"fps": 60,
"width": 1920,
"height": 1080,
"bitrate": 18000000
},
"audio": {
"system": { "enabled": true },
"microphone": { "enabled": true, "deviceId": "default", "gain": 1.4 }
},
"webcam": {
"enabled": true,
"deviceId": "default",
"deviceName": "Camera (NVIDIA Broadcast)",
"width": 1280,
"height": 720,
"fps": 30,
"bitrate": 18000000
},
"outputs": {
"screenPath": "C:\\Users\\me\\recording-123.mp4",
"manifestPath": "C:\\Users\\me\\recording-123.session.json"
}
}
The helper emits newline-delimited JSON events to stdout:
{ "event": "ready", "schemaVersion": 2 }
{ "event": "recording-started", "timestampMs": 1234567890 }
{ "event": "warning", "code": "audio-device-unavailable", "message": "..." }
{ "event": "recording-stopped", "screenPath": "..." }
{ "event": "error", "code": "unsupported-window-source", "message": "..." }
During migration, Electron also accepts the current textual helper messages so existing display-only smoke tests keep working.
Implementation Phases
1. Native Session Boundary
- Add a structured Windows native recording request type.
- Pass source kind, audio flags, microphone device, webcam flags, and output paths into the helper.
- On Windows, do not silently fall back to Electron capture. If the helper is unavailable or a native feature is missing, show a clear error.
- Keep Electron fallback only for non-Windows and optional developer diagnostics.
Acceptance:
- Display-only recording still works.
- Enabling an unsupported native feature returns an explicit native error instead of recording through Electron.
2. WASAPI System Audio
Status: initial implementation landed. The helper captures the default render endpoint with WASAPI loopback, passes the runtime mix format into MFEncoder, and muxes AAC audio into the primary MP4. Long-run drift correction and explicit silence insertion remain follow-up hardening work.
- Add
WasapiLoopbackCapture. - Capture the default render endpoint in shared loopback mode.
- Keep
WasapiLoopbackCaptureresponsible only for device activation, packet capture, and packet timestamps. - Keep
MFEncoderresponsible for all Media Foundation stream definitions and muxing. - Feed the endpoint mix format into
MFEncoderas the single source of truth for audio stream shape: sample rate, channel count, bits per sample, block alignment, average bytes/sec, and subtype (PCMorFloat). - Encode the primary screen MP4 with H.264 video and AAC audio through one
IMFSinkWriter. - Timestamp audio from the captured frame count in 100ns units. The first implementation uses the WASAPI packet timeline; later drift correction will add explicit silence or resampling if long recordings show measurable clock skew.
- Treat microphone mixing as a later phase. System loopback must land first without introducing renderer-side audio code.
Acceptance:
- Screen MP4 has an AAC audio track when system audio is enabled.
- A 5-minute recording has audio/video duration drift below one frame.
SSOT rules for this phase:
src/lib/nativeWindowsRecording.tsis the renderer/main TypeScript request contract.docs/engineering/windows-native-recorder-roadmap.mdis the feature-level contract and phase checklist.WgcSession::captureWidth()/captureHeight()is the encoded screen frame size until a dedicated native scaling stage exists.WasapiLoopbackCapture::inputFormat()is the runtime audio format source used byMFEncoder.- The renderer passes both the browser webcam
deviceIdand selected display label asdeviceName;electron/native/wgc-capture/src/webcam_capture.*is the only place that maps those values to Media Foundation devices. - Electron resolves the selected label to a DirectShow filter CLSID once and passes it as
webcamDirectShowClsid; the helper must not independently guess among DirectShow filters. - No duplicated hard-coded audio format assumptions in
main.cpp.
3. WASAPI Microphone
Status: initial implementation in progress. The helper can open the default WASAPI capture endpoint, apply the OpenScreen microphone gain, encode mic-only audio, and mix system loopback plus microphone through a single queued AudioMixer timeline when both endpoints expose the same runtime format. Audio endpoints are warmed before WGC starts, the mixer drops pre-roll and begins its paced timeline on the first encoded video frame, then cuts queued tail audio on stop so the MP4 does not drift past the video. Browser deviceId to MMDevice id mapping, resampling between mismatched endpoint formats, and drift correction remain follow-up hardening work.
- Add microphone device enumeration and stable device-id mapping.
- Capture selected/default microphone through WASAPI.
- Apply OpenScreen's current mic gain policy.
- Mix microphone and system audio before AAC encoding.
Acceptance:
- Mic-only, system-only, and mixed audio recordings produce a valid AAC track.
- Device unplug/permission failure produces an explicit error or warning.
4. Webcam Capture
- Add Media Foundation webcam source reader.
- Select requested dimensions/fps or the nearest format accepted by Media Foundation.
- Convert webcam samples to BGRA and compose them into the primary helper MP4 as an initial bottom-right picture-in-picture overlay.
- Ignore black webcam warmup frames and keep the overlay hidden until the first visible frame is available, so virtual cameras do not flash a black picture-in-picture rectangle at recording start.
- Keep the helper process as the SSOT for screen/window, WASAPI system audio, microphone, webcam, and mux timing.
- Match the requested webcam through Media Foundation friendly names first, then browser device ids/symbolic links, so UI selection remains stable across Chromium and Windows native device namespaces.
- Use the Electron-resolved DirectShow CLSID when the selected virtual camera, for example NVIDIA Broadcast, is registered for DirectShow but absent from Media Foundation enumeration.
- Later: promote the same webcam capture source to a separate editable native
webcamVideoPathif product requirements need post-recording layout edits.
Acceptance:
- Native display/window recordings can include webcam without returning to Electron capture.
npm run test:wgc-webcam:winvalidates the helper path when a webcam is available and skips explicitly when no webcam device exists.- Combined webcam + system audio + microphone produces one MP4 with H.264 video and AAC audio.
5. Native Window Capture
Status: initial implementation in progress. Electron parses the window:<HWND>:... desktop source id through the shared native Windows recording contract and passes windowHandle to the helper. The helper resolves the HWND, validates it with IsWindow, and creates the WGC item with CreateForWindow(HWND). Resize/minimize/move hardening and protected-window diagnostics remain follow-up work.
- Resolve Electron
window:*selections to anHWND. - Use WGC
CreateForWindow(HWND). - Handle window close, minimize, resize, DPI scaling, and monitor moves.
- Return clear errors for unsupported protected windows.
Acceptance:
- Capturing a normal app window works with cursor/audio/mic/webcam.
- Window resize and movement do not corrupt the recording.
6. Runtime Controls
- Add pause/resume commands to the helper.
- Add cancel command that removes partial screen/webcam outputs.
- Keep restart as stop-discard-start from Electron until the helper supports a native restart event.
Acceptance:
- Pause/resume keeps preview duration coherent.
- Cancel leaves no stale media/session/cursor files.
7. Test Pipeline
npm run test:wgc-helper:win: display-only helper smoke test.npm run test:wgc-audio:win: validates AAC track presence and duration.npm run test:wgc-window:win: captures a fixture window by HWND.npm run test:wgc-webcam:win: validates webcam output when a webcam is available, otherwise skips explicitly.- Packaging check: confirms the helper is in
app.asar.unpacked. - Export check: exported MP4s generated from native recordings keep an AAC audio track when the source has audio.
npm run test:wgc-mic:win: validates default-microphone capture writes an AAC track when an input endpoint is available.npm run test:wgc-mixed-audio:win: validates system loopback plus microphone writes one mixed AAC track when endpoint formats are compatible.
Backlog
Native Cursor Click Bounce Is Not Visibly Applied
Status: open. Do not treat Windows native cursor Click Bounce as shipped.
Problem:
- The cursor settings UI exposes
Size,Smoothing,Motion Blur, andClick Bounce. - On Windows native cursor recordings,
Size,Smoothing, andMotion Blurare visibly applied in preview/export. Click Bouncestill has no visible effect in manual packaged-app testing, even after adding click-related sample metadata.
What has already been tried:
- Added
interactionType: "click" | "mouseup" | "move"to native cursor samples. - Added polling-based left-button state through
GetAsyncKeyState. - Added the
GetAsyncKeyStatelow-bit path to catch quick clicks between samples. - Added a PowerShell/C#
WH_MOUSE_LLmouse hook experiment and launched the sampler through a temporary.ps1file to avoid Windows command-line length limits. - Updated
npm run test:cursor-native:winso the diagnostic can observe a synthetic short click and emitclickSampleCount.
Current diagnosis:
- The diagnostic can observe synthetic click events, but this has not translated into a visible
Click Bounceeffect in the real packaged app. - The test currently proves that some click metadata can be recorded, not that the full OpenScreen record -> preview -> export path displays a bounce at the expected time.
- The current native implementation may be animating from metadata that is not present in the real recording session, may be using the wrong timestamp origin, or may be applying a scale change too subtle to notice on the DOM/native cursor path.
Next investigation when resumed:
- Inspect the actual
.cursor.json/session sidecar generated by a packaged-app manual recording and confirm whether real clicks produceinteractionType: "click"at the righttimeMs. - Add a targeted end-to-end fixture that records a known click, loads the generated project, and asserts the preview/export cursor scale changes across adjacent frames.
- Compare the native DOM cursor path against the older
PixiCursorOverlayclick visual state and decide whether native cursor bounce should be a scale-only animation, an additional click ring, or a short explicit keyframe animation independent of sample cadence. - If event capture remains unreliable in the PowerShell sampler, move click events into a small native cursor helper instead of PowerShell/C# script injection.
Ship Criteria
- Windows display capture works with cursor, system audio, microphone, and webcam.
- Windows window capture works with cursor, system audio, microphone, and webcam.
- Preview and export show no cursor position drift.
- Preview and export show no measurable audio/video/webcam drift.
- Windows production builds do not depend on Electron capture fallback.