PDA

View Full Version : Windowed mode, vsync stutter, DWM (Vista+)



Nicholas Steel
10-24-2014, 07:41 PM
This is a topic discussing how developers of software can eliminate most causes of stutter in Windowed programs under Windows Vista and newer Windows operating systems while using an Aero desktop theme (So the compositor is handling vsync), basically making vsync as stable and stutter free as it is under Windows XP and older!

Below is my understanding of why windowed programs under Windows Vista and newer tend to suffer from erroneous/excessive stutter while using an Aero desktop theme (Compared to running the software under Windows XP and older). Included, is an explanation of a simple change which can be made to existing vsync code to avoid the problems. The change should be "fairly easy" to implement (I'm not a programmer, I dunno how easy it actually is).

-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
Windows Vista and newer Windows operating systems use a compositor which currently handles vsync operations for the desktop and non-exclusive fullscreen display mode programs. The compositor will only handle vsync operations while using an Aero desktop theme under Windows Vista & 7, under Windows 8 the compositor will always handle vsync regardless of desktop theme.

There is no sure fire way to detect exactly when vblank will occur because displays/GPU isn't necessarily required to generate a vblank interrupt (And it may not reliably generate a interrupt even if it did). So programs (Including the compositor?) will generally poll for vblank.

Programs present video frames to the video hardware during vblank, after vblank has elapsed the new video frame is displayed. With the compositor in Windows Vista and later versions of Windows, the program will still detect vblank and only present frames during vblank because it still thinks it is presenting video frames to the video hardware when in reality, the video frame is being silently redirected to the compositor.

Frames sent to the compositor from any running programs on the PC will be pooled by the compositor. When the compositor detects vblank it will present all the video frames that have been pooled, to the video hardware during vblank and when vblank elapses the new video frames for each program will be displayed. Any video frames sent after the compositor has stopped gathering frames for the upcoming vblank period and before the subsequent vblank period completes, may not be shown (See 1A and 1B below).

Problems that can occur with this system:

1a) Programs detect vblank by polling for it and only presents new video frames, to the hardware during vblank. It is possible for the program to present a new video frame after the compositor has finished collecting video frames in preparation for the upcoming vblank, and thus the new video frame from the program will not be displayed (Assuming the compositor stops accepting new frames once its done its thing and begins waiting for vblank). This causes the previous video frame to be displayed for twice as long. (This behaviour might only happen if there is only one back buffer and the program failed to Present a new frame.)

1b) Alternatively the new video frame will be accepted by the compositor but because it's delayed until the next composition occurs, it could be overwritten with a new video frame before composition occurs. (So you end up with a skipped frame instead of displaying the previous one again)

2) In addition to Problem 1A, the programs vsync implementation runs the risk of naturally failing to detect vblank which results in no new video frame being presented to the hardware (Silently re-routed to the compositor) and thus the previous video frame is displayed for twice as long.

3) There is also a chance for the program to fail to detect vblank (Problem 2) after previously failing to present a frame in time for the compositor (Problem 1a), resulting in a video frame that was originally displayed 2 vblanks prior, being displayed for a 3rd time in a row! (Problem 3)

As you can see, this polling setup is not ideal when the compositor is present. There are multiple problems that can cause a new video frame to fail to be rendered causing a previous video frame to be displayed for longer than intended and a chance for multiple of those issues to combine and cause an old video frame to be displayed for a very extensive period of time! Thanks Mr. Compositor, you're doing a great job (Sarcasm)!

The solution? Present new video frames immediately and after presenting a new video frame, immediately call a command that will stall presenting of new frames until the compositor has completed it's task. This will ensure that at most 1 new video frame is presented to the compositor between each vblank period. While you're at it, you can scrap trying to detect vblank because that is the compositors job now.

Now a video frame will only be displayed for longer then it should be if the compositor misses vblank (Don't ditch detection of vblank if you intend to offer a Fullscreen Exclusive Display Mode as an option to the user).

The commands needed to enhance compatibility with the compositor, will be ignored by Windows XP so you will need to retain the old method of vsync detection as a fallback for those users. Another words, the changes required will be minor additions to your existing code and won't affect backwards compatibility with Windows XP so why WOULDN'T you handle stuff this way under Vista and newer?
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------

What do you think, devs? Would you like to try to incorporate the needed changes? I believe the command you need to call immediately after presenting a video frame is DwmFlush()

Windows XP will ignore the command so you'll need the old polling method as a fallback. Thus Vsync support for Vista and newer operating systems will finally be on par with Vsync support under Windows XP and older, even while using an Aero desktop theme!

This change will also mean I can finally run Zelda Classic in a windowed mode with an Aero theme and not suffer from erroneous stutter that isn't caused by vsyncing at the wrong refresh rate for the software.

Edit:
In addition to the (edited) above, here is an update from the guy explaining it to me:


I made some comments on your post (in red) above. It might be good to give an example of the change - you're basically moving to:
if (DWM.isEnabled()) {
present();
DWM.DwmFlush();
} else {
waitForVBlank();
present();
}... where you might want to cache the 'isEnabled()' state (though it can change over time, so beware).

Another problem is that if you use the audio driver to synchronize your emulator, as higan does, this will block execution for at least 10ms at a time (because those are the increments in which Windows' audio service drains the buffers, except in exclusive mode WASAPI where you can set it yourself, down to 3ms or so), so if you disable video synchronization or move it off the main thread, you'll get video frames once every 10 or 20ms (depending on whether emulation progressed far enough to make one available before waiting on the audio driver for a buffer to become available).

One way to address that is to systematically insert smaller pauses (e.g. Sleep(1) with timeBeginPeriod set to 1ms), so you don't feed the audio driver in 10ms bursts. Of course you have to be careful that you don't underrun your audio buffers while doing this. My memory is a little hazy on how I implemented this in my 'vsync driver' build, but I'm sure other programmers can figure it out.

Saffith
10-24-2014, 08:10 PM
It's DwmFlush(), and we already use it. It's disabled by default because it causes performance issues on some systems. The option is use_dwm_flush in ag.cfg.

Nicholas Steel
10-24-2014, 08:22 PM
Okay. As an aside, did you think my explanation was well written?

Saffith
10-28-2014, 01:18 PM
I suppose, but I'm not much of a writing critic. That "another words" is bugging me, though. :p

Nicholas Steel
10-29-2014, 01:46 PM
Well there is always this, but it's afaik a little too easy to read:


There is no sure fire way to detect exactly when VBlank will occur because Windows doesn't expose a VBlank interrupt, and displays/GPU aren't necessarily required to generate one anyway (in addition, 'current scanline' information as given by e.g. IDirect3DDevice9::GetRasterStatus may not be accurate). As a result, programs generally poll for VBlank or rely on Direct3D/OpenGL to do it for them.

Programs present video frames during VBlank to avoid tearing, since the monitor will happily switch to the new frame mid-draw. With the compositor in Windows Vista and later versions of Windows, these programs will still detect VBlank and only present frames during it, as they think they are presenting video frames directly, when in reality the video frames are feeding into the compositor first. Frames sent to the compositor (from any running programs on the PC) will be queued up by the compositor, and merged together to be swapped/copied into place during VBlank.

Problems that can occur with this system:

1) A program polling for VBlank may miss composition. This will cause the frame to be queued up for the next composition, meaning the previous frame will be shown twice as long.

2) Worse, the next frame may not miss composition, and end up overwriting the previously queued up frame - so you end up with a duplicate frame followed by a skipped frame.

3) The program's VSync implementation may naturally fail to detect VBlank (which has only a short duration), causing it to wait until the next VBlank and risk problems 1 and/or 2.

4) These problems may even combine to generate a 'perfect storm' of duplicate and/or missed frames.

As you can see, this polling setup is not ideal, and far worse when a compositor is present. There are multiple problems that can cause a new video frame to fail to be displayed, causing a previous frame to be displayed for longer than intended and potentially skipping the new frame altogether!

The solution is to work with the compositor instead of against it. Present a new video frame immediately and afterward, call a command that will block until the compositor has completed it's task (DwmFlush). This will ensure that at most 1 new video frame is presented to the compositor between each VBlank period. As long as the compositor is active, you also won't have to worry about polling for VBlank yourself anymore.

Of course, since Windows XP doesn't have a compositor and not all users on Vista and 7 run with the compositor enabled, you will need to retain the old method of VSync detection as a fallback. But the changes required will be minor additions to your existing code, so why not handle things this way under Vista and newer?

I made some comments on your post (in red) above. It might be good to give an example of the change - you're basically moving to:
if (DWM.isEnabled()) {
present();
DWM.DwmFlush();
} else {
waitForVBlank();
present();
}... where you might want to cache the 'isEnabled()' state (though it can change over time, so beware).

Another problem is that if you use the audio driver to synchronize your emulator, as higan does, this will block execution for at least 10ms at a time (because those are the increments in which Windows' audio service drains the buffers, except in exclusive mode WASAPI where you can set it yourself, down to 3ms or so), so if you disable video synchronization or move it off the main thread, you'll get video frames once every 10 or 20ms (depending on whether emulation progressed far enough to make one available before waiting on the audio driver for a buffer to become available).

One way to address that is to systematically insert smaller pauses (e.g. Sleep(1) with timeBeginPeriod set to 1ms), so you don't feed the audio driver in 10ms bursts. Of course you have to be careful that you don't underrun your audio buffers while doing this. My memory is a little hazy on how I implemented this in my 'vsync driver' build, but I'm sure other programmers can figure it out.