8.3 KiB
Windows Native Recorder Roadmap
OpenScreen's Windows recorder should be owned by one native backend. Electron capture can remain available for non-Windows platforms and temporary developer diagnostics, but Windows production recording should not silently fall back to getDisplayMedia / MediaRecorder.
Goals
- Capture displays and windows through Windows Graphics Capture (WGC).
- Render the native Windows cursor as OpenScreen's high-quality scalable cursor overlay.
- Capture system audio through WASAPI loopback.
- Capture microphone audio through WASAPI.
- Mix system audio and microphone audio into the primary screen recording.
- Capture webcam video natively and keep it as a separate editable OpenScreen media stream.
- Keep preview/export aligned because screen video, audio, webcam, and cursor share one native timing origin.
- Keep exported MP4s Windows-friendly: H.264 video plus AAC audio. Opus-in-MP4 is not an acceptable Windows export target.
- Package the native helper with the Windows app.
Non-Goals
- Replacing the editor/export pipeline.
- Flattening webcam into the screen recording. The editor currently treats webcam as editable picture-in-picture media, so the native recorder should preserve a separate
webcamVideoPath. - Adding a native fallback for macOS or Linux in this branch.
Target Architecture
The renderer keeps the existing recording controls. On Windows, useScreenRecorder sends a complete recording request to Electron and does not assemble Windows MediaStream tracks with MediaRecorder.
Electron owns the native recording session:
- resolves the selected source;
- resolves output paths;
- starts cursor sampling;
- starts the helper process;
- sends pause/resume/stop/cancel commands;
- writes
RecordingSessionmanifests; - reports explicit errors when a Windows-native capability is unavailable.
The helper owns Windows media capture:
- WGC screen/window frames;
- WASAPI system loopback;
- WASAPI microphone input;
- Media Foundation webcam capture;
- Media Foundation encoding/muxing;
- stream timestamp normalization.
Helper Contract V2
The helper receives a single JSON argument:
{
"schemaVersion": 2,
"recordingId": 1234567890,
"source": {
"type": "display",
"sourceId": "screen:0:0",
"displayId": 123,
"windowHandle": null,
"bounds": { "x": 0, "y": 0, "width": 1920, "height": 1080 }
},
"video": {
"fps": 60,
"width": 1920,
"height": 1080,
"bitrate": 18000000
},
"audio": {
"system": { "enabled": true },
"microphone": { "enabled": true, "deviceId": "default", "gain": 1.4 }
},
"webcam": {
"enabled": true,
"deviceId": "default",
"width": 1280,
"height": 720,
"fps": 30,
"bitrate": 18000000
},
"outputs": {
"screenPath": "C:\\Users\\me\\recording-123.mp4",
"webcamPath": "C:\\Users\\me\\recording-123-webcam.mp4",
"manifestPath": "C:\\Users\\me\\recording-123.session.json"
}
}
The helper emits newline-delimited JSON events to stdout:
{ "event": "ready", "schemaVersion": 2 }
{ "event": "recording-started", "timestampMs": 1234567890 }
{ "event": "warning", "code": "audio-device-unavailable", "message": "..." }
{ "event": "recording-stopped", "screenPath": "...", "webcamPath": "..." }
{ "event": "error", "code": "unsupported-window-source", "message": "..." }
During migration, Electron also accepts the current textual helper messages so existing display-only smoke tests keep working.
Implementation Phases
1. Native Session Boundary
- Add a structured Windows native recording request type.
- Pass source kind, audio flags, microphone device, webcam flags, and output paths into the helper.
- On Windows, do not silently fall back to Electron capture. If the helper is unavailable or a native feature is missing, show a clear error.
- Keep Electron fallback only for non-Windows and optional developer diagnostics.
Acceptance:
- Display-only recording still works.
- Enabling an unsupported native feature returns an explicit native error instead of recording through Electron.
2. WASAPI System Audio
Status: initial implementation landed. The helper captures the default render endpoint with WASAPI loopback, passes the runtime mix format into MFEncoder, and muxes AAC audio into the primary MP4. Long-run drift correction and explicit silence insertion remain follow-up hardening work.
- Add
WasapiLoopbackCapture. - Capture the default render endpoint in shared loopback mode.
- Keep
WasapiLoopbackCaptureresponsible only for device activation, packet capture, and packet timestamps. - Keep
MFEncoderresponsible for all Media Foundation stream definitions and muxing. - Feed the endpoint mix format into
MFEncoderas the single source of truth for audio stream shape: sample rate, channel count, bits per sample, block alignment, average bytes/sec, and subtype (PCMorFloat). - Encode the primary screen MP4 with H.264 video and AAC audio through one
IMFSinkWriter. - Timestamp audio from the captured frame count in 100ns units. The first implementation uses the WASAPI packet timeline; later drift correction will add explicit silence or resampling if long recordings show measurable clock skew.
- Treat microphone mixing as a later phase. System loopback must land first without introducing renderer-side audio code.
Acceptance:
- Screen MP4 has an AAC audio track when system audio is enabled.
- A 5-minute recording has audio/video duration drift below one frame.
SSOT rules for this phase:
src/lib/nativeWindowsRecording.tsis the renderer/main TypeScript request contract.docs/engineering/windows-native-recorder-roadmap.mdis the feature-level contract and phase checklist.WgcSession::captureWidth()/captureHeight()is the encoded screen frame size until a dedicated native scaling stage exists.WasapiLoopbackCapture::inputFormat()is the runtime audio format source used byMFEncoder.- No duplicated hard-coded audio format assumptions in
main.cpp.
3. WASAPI Microphone
- Add microphone device enumeration and stable device-id mapping.
- Capture selected/default microphone through WASAPI.
- Apply OpenScreen's current mic gain policy.
- Mix microphone and system audio before AAC encoding.
Acceptance:
- Mic-only, system-only, and mixed audio recordings produce a valid AAC track.
- Device unplug/permission failure produces an explicit error or warning.
4. Webcam Capture
- Add Media Foundation webcam source reader.
- Select 1280x720/30fps or nearest supported format.
- Encode webcam to
recording-<id>-webcam.mp4. - Synchronize webcam timestamps to the native session clock.
- Store
webcamVideoPathin the OpenScreen session manifest.
Acceptance:
- Editor loads the native screen recording and the native webcam recording.
- Webcam layout controls behave the same as today.
5. Native Window Capture
- Resolve Electron
window:*selections to anHWND. - Use WGC
CreateForWindow(HWND). - Handle window close, minimize, resize, DPI scaling, and monitor moves.
- Return clear errors for unsupported protected windows.
Acceptance:
- Capturing a normal app window works with cursor/audio/mic/webcam.
- Window resize and movement do not corrupt the recording.
6. Runtime Controls
- Add pause/resume commands to the helper.
- Add cancel command that removes partial screen/webcam outputs.
- Keep restart as stop-discard-start from Electron until the helper supports a native restart event.
Acceptance:
- Pause/resume keeps preview duration coherent.
- Cancel leaves no stale media/session/cursor files.
7. Test Pipeline
npm run test:wgc-helper:win: display-only helper smoke test.npm run test:wgc-audio:win: validates AAC track presence and duration.npm run test:wgc-window:win: captures a fixture window by HWND.npm run test:wgc-webcam:win: validates webcam output when a webcam is available, otherwise skips explicitly.- Packaging check: confirms the helper is in
app.asar.unpacked. - Export check: exported MP4s generated from native recordings keep an AAC audio track when the source has audio.
Ship Criteria
- Windows display capture works with cursor, system audio, microphone, and webcam.
- Windows window capture works with cursor, system audio, microphone, and webcam.
- Preview and export show no cursor position drift.
- Preview and export show no measurable audio/video/webcam drift.
- Windows production builds do not depend on Electron capture fallback.