Solving a problem with “Get-VM” cmdlet

Environment:

  • Domain “nice.domain”
  • script server “server1.nice.domain”
  • Hyper-V hypervisor “hyperv1.nice.domain”
  • all Windows Server 2012 R2
  • script user “nice\script”
  • script user is member of “Hyper-V Administrators” group on hyperv1.
  • script user has the “Log on as a batch job” right on server1

Scenario 1:
running

in a Powershell console window on server1 (which was started with “run as different user”) will show a nice list of VMs on the hypervisor.

Scenario 2:

  • running
    in a scheduled task script that runs as the script user on server1 will return NOTHING. No, also no exception thrown.
  • other Hyper-V cmdlets seem to work, f.e. works.

Observations:

  • Making the script user a member of the “Administrators” group on “server1” works, but that’s not good security.
  • Adding the script user to the “Administrators” group on hyperv1 does not help
  • It’s obviously a problem on server1, something stops Get-VM from working in a scheduled task.

Question: which rights or security settings are missing in Scenario 2?

If you have any ideas or even the solution, please try to comment below or tweet back here (@RicochetPeter). Thx in advance 🙂

Private Cloud Project – Ch02 – The Design Concept

This is a follow-up to my first post in this series, Private Cloud Project – Ch01 – The Mission.

As the mission was now set, I had to find resources how to go forward with the components that would make up our private cloud:

  • Hyper-V cluster
  • File Server cluster
  • Scale Out file server
  • Storage Spaces

When I started gathering information about these components, especially those related to storage, I first had to learn the nomenclature or vocabulary that was used in this area. “Storage Spaces”, f.e., which has been introduced in Windows Server 2012, is fairly new to the tech world, and usable resources, except for “marketing stuff”, are hard to find. To this date (September 2014) only few people seem to be using Storage Spaces, or actively writing about it, although it’s a great technology from Microsoft. A few gadgets are still missing, associated to management, which we all know from “standard” RAID controllers like the HP Smart and PERC. For example, you do not see the rebuild status for a drive when it has faulted and been replaced.

Also pretty hard to find was a hardware vendor whose whole stack of Server->Controller->Shelf->Drives would be suitable for Storage Spaces and also supported by Microsoft. After fiddling with Intel shelves connected to some DELL servers, talking to DELL representatives and well known local MVP, we settled with a design consisting of purely DELL equipment. A few of the certifications were still to be done by DELL, but as the process was obviously already advancing, we decided to go that way.

The result should look like this:
Microsoft private cloud stack
(Source)

The components of the environment are:

  1. The Hypervisor machines, based on Microsoft Hyper-V
  2. A central storage, based on Microsoft Windows Server 2012 R2 “Scale out File Server” (SOFS)
  3. • SOFS provides storage

  4. Microsoft System Center Virtual Machine Manager (VMM)
  5. • VM creation and management, workload deployment and distribution

  6. Microsoft System Center Data Protection Manager (DPM)
  7. • Backup

  8. Microsoft System Center Orchestrator (ORC)
  9. • Process automation of creation, deployment and monitoring

  10. Microsoft System Center App Controller (APC)
  11. • Self Service platform for internal customers

    As a more modern and more flexible alternative to using ORC+APC, Microsoft has issued the free “Azure Pack”, which is a port of their commercial platform software adapted for the usage with On-Premise private cloud environments.

  12. Azure Pack (AzP)
  13. • Self Service platform for admins and internal customers

Hypervisor Nodes

Hypervisors are the work horses of the Private Cloud environment. They host the Virtual Machines by virtualizing their physical hardware resources to the guest operating systems. To be able to host a significant number of virtual machines, the Hypervisors need to have powerful processor hardware, a large amount of memory and very fast network interfaces. The Hypervisor hardware setup consists of:

  • DELL Poweredge R620 (1U)
  • 2x E5-2650 v2
  • 16x 16GB DIMM (256GB)
  • NDC 4x I350 1GBit Ethernet
  • 2x X520 SFP LP PCIe Dual 10Gbit Network Card
  • 2x 300GB 10k SAS for System RAID1

Hypervisors have

  • Microsoft Windows Server 2012 R2 Datacenter installed
  • The Hyper-V role activated

Scale Out File Server (SOFS)

The SOFS replaces traditional enterprise storage systems. The SOFS will contain the virtual disk container files (VHDs) where the VMs which are hosted on the Hypervisor nodes store their data. The storage is made available to the Hypervisors through SMB 3.0, the known protocol for accessing file shares in the Windows world, which has been optimised for serving applications like MSSQL or – like in our case – VMs.

The SOFS is basically built on DELL PE 720xd, but contains some extras. These are:

  • Three DELL SAS Controllers (without RAID functionality)
  • An Intel X520 10Gbit network card
  • JBOD Hard Disk Shelves
  • SAS SSD Drives for storing “hot data”
  • SAS Hard Disks
  • 64GB of RAM

SOFS will be built as a cluster, with building blocks of initially two physical servers in a cluster. Both servers will be attached to three JBOD shelves, each exporting all of their hard drives to both servers. The SAS devices are managed by Windows Storage Spaces and are exported to the Hypervisors through the SOFS role as special “file shares”, f.e. “\SOFSvmshare”

The SOFS nodes will have

  • Microsoft Windows Server 2012 R2 Standard installed
  • The Failover Clustering feature installed
  • SOFS activated

Switch Fabric

Our network department advised us to some really fancy but also pretty expensive network hardware for setting up our environment. The 10 gigabit connections for the building block we were buidling would be provided by 2 Cisco N5K-C5596UP-FA switches. The 1 gigabit connections would be provided by 2 Cisco N2K-C2248TP-1GE, so-called “fabric extenders”, connected to the 10gb switches. They are switches that only have minimal logic of their own, the “real work” is done by the 10gb switch.

Here’s how hypervisors and file server are connected:

hy-net

We decided against converging the network connections on the hypervisor, as we would lose other functionality, like RSS. Instead we would use the first two onboard NICs (1gb) to build the “Managament Team”, then have a team of two of the 10gb ports – each one from a different 10gb NIC – for VM traffic, and the remaining two 10gb ports for storage traffic to the file servers via SMB3.

sofs-net

We used the same scheme for the file servers, just no VM team here.

All connections are distributed among the switches for redundancy. The teams are all LACP teams, therefore the switches need to be in a virtual chassis mode, so that LACP teams can be spread across the two physical units. Don’t ask for details, though, I do not know them 🙂

The rack space we intended to use would be set up like this:
dell-racks

So much for part 2 of the series.
In the next part: The management cluster (for the System Center machines).

Using dynamic VHDx for IO intensive VMs may not be a good idea

For the hasty reader:
Using dynamic VHDx for IO intensive workload will generate high CPU usage in the management OS, but only on the first three cores. Using Sysinternals Process Explorer we found out that there are exactly three threads in the “System” process (ID 4) called “vmbusr.sys” that are the root of the CPU usage. We researched RSS, VMQ and other things. Basically, the huge load went away when we changed from dynamic to fixed VHDx.

The longer story:
During the testing phase of our Private Cloud environment we also did IO tests using SQLIO in up to 30VMs on the hypervisor machines. The hypervisors talk SMB3 to the storage environment. We ran test with 8k IOs and 40-80k IOs. We always noticed that, as soon as the VMs started doing the heavy SQLIO based IO, in the management OS the cores 0 to 2 were under full fire:
20140626-01
Looking at the process tree in Process Explorer we found that the System process (which has ID 4) was showing this load in the Threads tab:
20140626-02

That didn’t really help much, as you will find very little information on the web about the Hyper-V VMBus system. One is the architectural description at MSDN. It says:

VMBus – Channel-based communication mechanism used for inter-partition communication and device enumeration on systems with multiple active virtualized partitions. The VMBus is installed with Hyper-V Integration Services.

Another one is Kristian Nese’s blog post, for Hyper-V in W2K8, but the basics should still be true.

Not very enlightening (pun intended) four our case; why should the VMs doing the SQLIO workload talk to each other….? Maybe device enumeration…? We tried to assess the problem from various angles, playing with the size of the SQLIO blocks, tuning SMB network interface paramters, VMQ settings (although these VMs weren’t doing guest network traffic). In a calm minute my colleague Christoph tried doing the SQLIO with a bunch of VMs that were slightly different than the others. Tada: normal CPU load distribution among all cores! The difference was easily found: the VHDs were of fixed size. We will yet have to find out if there’s a limit in the number of VMs running on a host for not showing the strange behaviour.

The bad news: this happens already with only 5 VMs. We have not done a full comparison test, but the high VMBus load also seems to be introducing a limit to the IO a VM with dynamic VHDx can do.

Any helpful comments, hints or tricks are highly welcome.

Cheers,
Peter.

Edit 2014-06-30: Post from a guy with a similar issue