For work the other week I was able to go back to Houston for a few days to work face to face with the team. My main reason for going was to set up a Microsoft Hyper-V cluster capable of handling the load of the various testing initiatives we have going on. I had several servers to work with in our testing lab, all good from a CPU perspective, but lacking enough RAM to really make this work. After we ordered more RAM, I started planning it all out. I was going to work with a bunch of technologies I hadn't really messed with, Starwind Virtual SAN, iSCSI, network aggregation, and Microsoft Scale Out File Servers. Two of the servers were already in use for other purposes, so I would need to thread carefully around them as to not disrupt ongoing tests, and another two were already being used for a small Hyper-V cluster.
I wasted no time setting up the first thing I wanted to test: network aggregation. I plugged in three of the 1Gb NICs on my server, configured port channels on the Cisco switch, and configured LACP teaming on the server. It worked, perfectly, the first time. It was able to transfer data at about the 3Gb combined speed it should. This early success had me feeling pretty confident.
The next thing I discovered is that Starwind Virtual SAN works pretty well. I had no problems carving out an iSCSI disk and presenting it to my small Hyper-V cluster. Another easy victory, with a little fiddling the Hyper-V cluster recognized it as usable storage for a cluster shared volume. This brought me to another thing I learned: iSCSI can be a pain to set up. In order to get your storage to pass the Microsoft cluster validation, each server needs multiple paths, network or otherwise, to the storage, so if one fails, you have a backup. Getting multipath to work was a bitch. The "discover multipaths" tool that Windows Server provides needs a reboot each time you try it, and it doesn't always work. Of course, it didn't work for me.
In the end, I manually started multiple connections to the iSCSI targets. The validation tests passed, and all looked well...until I moved a VM to the storage and benchmarked it. Pathetic, slow, transfer speeds are what I found, a mere 10MB/s or so. Off to google. After much searching, I found the answer and confirmed it; iSCSI and network teaming to not play nice under Windows. The iSCSI initiator was passing over my fast team and using the crappy 100Mb "backup" network that was plugged into the server. Okay, fine, turn off teaming and disable the backup network for now. Better speeds, but still not great.
Fast forward a bit, and I'm able to get the three servers I had earmarked for my Hyper-V cluster set up and running well. Ran into some minor problems with creating the cluster object that were fixed by delegating computer object creation to the cluster, but after that it was fine. I moved my VMs on to the iSCSI CSV that the cluster was using and got to work on configuring a more highly available storage solution. Performance still isn't very good on the storage end. Getting less than 1Gb of throughput in benchmarks when I should be getting...more...with 3 1Gb NICs using MPIO.
On my storage servers, I have two RAID5 arrays on each server, both with more than 3TB capacity. On one of these arrays, I used Starwind to create a replicated virtual disk. 10Gb NICs in the servers, directly connected, ensure that a fast network is available for speedy replication. Once that is created you can connect to it using iSCSI from both machines, and this creates a shared storage object that can be used as a CSV. Cluster validation passes, and I set up a scale out file server on the new cluster with an application file share. Once running, I started migrating VMs off of the CSV and on to the SMB3 share.
Once everything was migrated to the SMB3 share, it worked quite well. Disk transfers were a bit slow, but over all things worked well...until I tried doing a live migration. While quick migration went fine, live migrations would slowly go until they failed with a fairly unhelpful error. After much troubleshooting and talking with Microsoft, it was discovered that the delegation of the CIFS and Microsoft Virtual System Migration Service was incorrectly applied by the script I used. After correcting that small issue, live migration worked perfectly and ran at an appropriate speed for the network.
After several months of running this cobbled together mess, I am still surprised by how stable it is. Other than mediocre storage performance, the I haven't really had any additional problems with it