# Windows Native Recorder Roadmap OpenScreen's Windows recorder should be owned by one native backend. Electron capture can remain available for non-Windows platforms and temporary developer diagnostics, but Windows production recording should not silently fall back to `getDisplayMedia` / `MediaRecorder`. ## Goals - Capture displays and windows through Windows Graphics Capture (WGC). - Render the native Windows cursor as OpenScreen's high-quality scalable cursor overlay. - Capture system audio through WASAPI loopback. - Capture microphone audio through WASAPI. - Mix system audio and microphone audio into the primary screen recording. - Capture webcam video natively and compose it into the Windows helper MP4 during the native-recording migration. - Keep preview/export aligned because screen video, audio, webcam, and cursor share one native timing origin. - Keep exported MP4s Windows-friendly: H.264 video plus AAC audio. Opus-in-MP4 is not an acceptable Windows export target. - Package the native helper with the Windows app. ## Non-Goals - Replacing the editor/export pipeline. - Replacing the editor/export pipeline. A later pass can reintroduce a separate editable native `webcamVideoPath`; the current Windows-native milestone prioritizes a helper-owned multi-flux MP4 with deterministic screen/audio/mic/webcam sync. - Adding a native fallback for macOS or Linux in this branch. ## Target Architecture The renderer keeps the existing recording controls. On Windows, `useScreenRecorder` sends a complete recording request to Electron and does not assemble Windows `MediaStream` tracks with `MediaRecorder`. Electron owns the native recording session: - resolves the selected source; - resolves output paths; - starts cursor sampling; - starts the helper process; - sends pause/resume/stop/cancel commands; - writes `RecordingSession` manifests; - reports explicit errors when a Windows-native capability is unavailable. The helper owns Windows media capture: - WGC screen/window frames; - WASAPI system loopback; - WASAPI microphone input; - Media Foundation webcam capture; - DirectShow webcam fallback for virtual cameras not visible to Media Foundation; - Media Foundation encoding/muxing; - stream timestamp normalization. ## Helper Contract V2 The helper receives a single JSON argument: ```json { "schemaVersion": 2, "recordingId": 1234567890, "source": { "type": "display", "sourceId": "screen:0:0", "displayId": 123, "windowHandle": null, "bounds": { "x": 0, "y": 0, "width": 1920, "height": 1080 } }, "video": { "fps": 60, "width": 1920, "height": 1080, "bitrate": 18000000 }, "audio": { "system": { "enabled": true }, "microphone": { "enabled": true, "deviceId": "default", "gain": 1.4 } }, "webcam": { "enabled": true, "deviceId": "default", "deviceName": "Camera (NVIDIA Broadcast)", "width": 1280, "height": 720, "fps": 30, "bitrate": 18000000 }, "outputs": { "screenPath": "C:\\Users\\me\\recording-123.mp4", "manifestPath": "C:\\Users\\me\\recording-123.session.json" } } ``` The helper emits newline-delimited JSON events to stdout: ```json { "event": "ready", "schemaVersion": 2 } { "event": "recording-started", "timestampMs": 1234567890 } { "event": "warning", "code": "audio-device-unavailable", "message": "..." } { "event": "recording-stopped", "screenPath": "..." } { "event": "error", "code": "unsupported-window-source", "message": "..." } ``` During migration, Electron also accepts the current textual helper messages so existing display-only smoke tests keep working. ## Implementation Phases ### 1. Native Session Boundary - Add a structured Windows native recording request type. - Pass source kind, audio flags, microphone device, webcam flags, and output paths into the helper. - On Windows, do not silently fall back to Electron capture. If the helper is unavailable or a native feature is missing, show a clear error. - Keep Electron fallback only for non-Windows and optional developer diagnostics. Acceptance: - Display-only recording still works. - Enabling an unsupported native feature returns an explicit native error instead of recording through Electron. ### 2. WASAPI System Audio Status: initial implementation landed. The helper captures the default render endpoint with WASAPI loopback, passes the runtime mix format into `MFEncoder`, and muxes AAC audio into the primary MP4. Long-run drift correction and explicit silence insertion remain follow-up hardening work. - Add `WasapiLoopbackCapture`. - Capture the default render endpoint in shared loopback mode. - Keep `WasapiLoopbackCapture` responsible only for device activation, packet capture, and packet timestamps. - Keep `MFEncoder` responsible for all Media Foundation stream definitions and muxing. - Feed the endpoint mix format into `MFEncoder` as the single source of truth for audio stream shape: sample rate, channel count, bits per sample, block alignment, average bytes/sec, and subtype (`PCM` or `Float`). - Encode the primary screen MP4 with H.264 video and AAC audio through one `IMFSinkWriter`. - Timestamp audio from the captured frame count in 100ns units. The first implementation uses the WASAPI packet timeline; later drift correction will add explicit silence or resampling if long recordings show measurable clock skew. - Treat microphone mixing as a later phase. System loopback must land first without introducing renderer-side audio code. Acceptance: - Screen MP4 has an AAC audio track when system audio is enabled. - A 5-minute recording has audio/video duration drift below one frame. SSOT rules for this phase: - `src/lib/nativeWindowsRecording.ts` is the renderer/main TypeScript request contract. - `docs/engineering/windows-native-recorder-roadmap.md` is the feature-level contract and phase checklist. - `WgcSession::captureWidth()/captureHeight()` is the encoded screen frame size until a dedicated native scaling stage exists. - `WasapiLoopbackCapture::inputFormat()` is the runtime audio format source used by `MFEncoder`. - The renderer passes both the browser webcam `deviceId` and selected display label as `deviceName`; `electron/native/wgc-capture/src/webcam_capture.*` is the only place that maps those values to Media Foundation devices. - Electron resolves the selected label to a DirectShow filter CLSID once and passes it as `webcamDirectShowClsid`; the helper must not independently guess among DirectShow filters. - No duplicated hard-coded audio format assumptions in `main.cpp`. ### 3. WASAPI Microphone Status: initial implementation in progress. The helper can open the default WASAPI capture endpoint, apply the OpenScreen microphone gain, encode mic-only audio, and mix system loopback plus microphone through a single queued `AudioMixer` timeline when both endpoints expose the same runtime format. Audio endpoints are warmed before WGC starts, the mixer drops pre-roll and begins its paced timeline on the first encoded video frame, then cuts queued tail audio on stop so the MP4 does not drift past the video. Browser `deviceId` to MMDevice id mapping, resampling between mismatched endpoint formats, and drift correction remain follow-up hardening work. - Add microphone device enumeration and stable device-id mapping. - Capture selected/default microphone through WASAPI. - Apply OpenScreen's current mic gain policy. - Mix microphone and system audio before AAC encoding. Acceptance: - Mic-only, system-only, and mixed audio recordings produce a valid AAC track. - Device unplug/permission failure produces an explicit error or warning. ### 4. Webcam Capture - Add Media Foundation webcam source reader. - Select requested dimensions/fps or the nearest format accepted by Media Foundation. - Convert webcam samples to BGRA and compose them into the primary helper MP4 as an initial bottom-right picture-in-picture overlay. - Ignore black webcam warmup frames and keep the overlay hidden until the first visible frame is available, so virtual cameras do not flash a black picture-in-picture rectangle at recording start. - Keep the helper process as the SSOT for screen/window, WASAPI system audio, microphone, webcam, and mux timing. - Match the requested webcam through Media Foundation friendly names first, then browser device ids/symbolic links, so UI selection remains stable across Chromium and Windows native device namespaces. - Use the Electron-resolved DirectShow CLSID when the selected virtual camera, for example NVIDIA Broadcast, is registered for DirectShow but absent from Media Foundation enumeration. - Later: promote the same webcam capture source to a separate editable native `webcamVideoPath` if product requirements need post-recording layout edits. Acceptance: - Native display/window recordings can include webcam without returning to Electron capture. - `npm run test:wgc-webcam:win` validates the helper path when a webcam is available and skips explicitly when no webcam device exists. - Combined webcam + system audio + microphone produces one MP4 with H.264 video and AAC audio. ### 5. Native Window Capture Status: initial implementation in progress. Electron parses the `window::...` desktop source id through the shared native Windows recording contract and passes `windowHandle` to the helper. The helper resolves the `HWND`, validates it with `IsWindow`, and creates the WGC item with `CreateForWindow(HWND)`. Resize/minimize/move hardening and protected-window diagnostics remain follow-up work. - Resolve Electron `window:*` selections to an `HWND`. - Use WGC `CreateForWindow(HWND)`. - Handle window close, minimize, resize, DPI scaling, and monitor moves. - Return clear errors for unsupported protected windows. Acceptance: - Capturing a normal app window works with cursor/audio/mic/webcam. - Window resize and movement do not corrupt the recording. ### 6. Runtime Controls - Add pause/resume commands to the helper. - Add cancel command that removes partial screen/webcam outputs. - Keep restart as stop-discard-start from Electron until the helper supports a native restart event. Acceptance: - Pause/resume keeps preview duration coherent. - Cancel leaves no stale media/session/cursor files. ### 7. Test Pipeline - `npm run test:wgc-helper:win`: display-only helper smoke test. - `npm run test:wgc-audio:win`: validates AAC track presence and duration. - `npm run test:wgc-window:win`: captures a fixture window by HWND. - `npm run test:wgc-webcam:win`: validates webcam output when a webcam is available, otherwise skips explicitly. - Packaging check: confirms the helper is in `app.asar.unpacked`. - Export check: exported MP4s generated from native recordings keep an AAC audio track when the source has audio. - `npm run test:wgc-mic:win`: validates default-microphone capture writes an AAC track when an input endpoint is available. - `npm run test:wgc-mixed-audio:win`: validates system loopback plus microphone writes one mixed AAC track when endpoint formats are compatible. ## Backlog ### Native Cursor Click Bounce Is Not Visibly Applied Status: open. Do not treat Windows native cursor `Click Bounce` as shipped. Problem: - The cursor settings UI exposes `Size`, `Smoothing`, `Motion Blur`, and `Click Bounce`. - On Windows native cursor recordings, `Size`, `Smoothing`, and `Motion Blur` are visibly applied in preview/export. - `Click Bounce` still has no visible effect in manual packaged-app testing, even after adding click-related sample metadata. What has already been tried: - Added `interactionType: "click" | "mouseup" | "move"` to native cursor samples. - Added polling-based left-button state through `GetAsyncKeyState`. - Added the `GetAsyncKeyState` low-bit path to catch quick clicks between samples. - Added a PowerShell/C# `WH_MOUSE_LL` mouse hook experiment and launched the sampler through a temporary `.ps1` file to avoid Windows command-line length limits. - Updated `npm run test:cursor-native:win` so the diagnostic can observe a synthetic short click and emit `clickSampleCount`. Current diagnosis: - The diagnostic can observe synthetic click events, but this has not translated into a visible `Click Bounce` effect in the real packaged app. - The test currently proves that some click metadata can be recorded, not that the full OpenScreen record -> preview -> export path displays a bounce at the expected time. - The current native implementation may be animating from metadata that is not present in the real recording session, may be using the wrong timestamp origin, or may be applying a scale change too subtle to notice on the DOM/native cursor path. Next investigation when resumed: - Inspect the actual `.cursor.json`/session sidecar generated by a packaged-app manual recording and confirm whether real clicks produce `interactionType: "click"` at the right `timeMs`. - Add a targeted end-to-end fixture that records a known click, loads the generated project, and asserts the preview/export cursor scale changes across adjacent frames. - Compare the native DOM cursor path against the older `PixiCursorOverlay` click visual state and decide whether native cursor bounce should be a scale-only animation, an additional click ring, or a short explicit keyframe animation independent of sample cadence. - If event capture remains unreliable in the PowerShell sampler, move click events into a small native cursor helper instead of PowerShell/C# script injection. ## Ship Criteria - Windows display capture works with cursor, system audio, microphone, and webcam. - Windows window capture works with cursor, system audio, microphone, and webcam. - Preview and export show no cursor position drift. - Preview and export show no measurable audio/video/webcam drift. - Windows production builds do not depend on Electron capture fallback.