Using dynamic VHDx for IO intensive VMs may not be a good idea

For the hasty reader:
Using dynamic VHDx for IO intensive workload will generate high CPU usage in the management OS, but only on the first three cores. Using Sysinternals Process Explorer we found out that there are exactly three threads in the “System” process (ID 4) called “vmbusr.sys” that are the root of the CPU usage. We researched RSS, VMQ and other things. Basically, the huge load went away when we changed from dynamic to fixed VHDx.

The longer story:
During the testing phase of our Private Cloud environment we also did IO tests using SQLIO in up to 30VMs on the hypervisor machines. The hypervisors talk SMB3 to the storage environment. We ran test with 8k IOs and 40-80k IOs. We always noticed that, as soon as the VMs started doing the heavy SQLIO based IO, in the management OS the cores 0 to 2 were under full fire:
20140626-01
Looking at the process tree in Process Explorer we found that the System process (which has ID 4) was showing this load in the Threads tab:
20140626-02

That didn’t really help much, as you will find very little information on the web about the Hyper-V VMBus system. One is the architectural description at MSDN. It says:

VMBus – Channel-based communication mechanism used for inter-partition communication and device enumeration on systems with multiple active virtualized partitions. The VMBus is installed with Hyper-V Integration Services.

Another one is Kristian Nese’s blog post, for Hyper-V in W2K8, but the basics should still be true.

Not very enlightening (pun intended) four our case; why should the VMs doing the SQLIO workload talk to each other….? Maybe device enumeration…? We tried to assess the problem from various angles, playing with the size of the SQLIO blocks, tuning SMB network interface paramters, VMQ settings (although these VMs weren’t doing guest network traffic). In a calm minute my colleague Christoph tried doing the SQLIO with a bunch of VMs that were slightly different than the others. Tada: normal CPU load distribution among all cores! The difference was easily found: the VHDs were of fixed size. We will yet have to find out if there’s a limit in the number of VMs running on a host for not showing the strange behaviour.

The bad news: this happens already with only 5 VMs. We have not done a full comparison test, but the high VMBus load also seems to be introducing a limit to the IO a VM with dynamic VHDx can do.

Any helpful comments, hints or tricks are highly welcome.

Cheers,
Peter.

Edit 2014-06-30: Post from a guy with a similar issue