Mastering VMware vSphere 6. Marshall Nick
in normal day-to-day operations, vMotion can be used when multiple VMs on the same host are in contention for the same resource (which ultimately causes poor performance across all the VMs). With vMotion, you can migrate any VMs facing contention to another ESXi host with greater availability for the resource in demand. For example, when two VMs contend with each other for CPU resources, you can eliminate the contention by using vMotion to move one VM to an ESXi host with more available CPU resources.
vMotion moves the execution of a VM, relocating the CPU and memory footprint between physical servers but leaving the storage untouched. Storage vMotion builds on the idea and principle of vMotion: you can leave the CPU and memory footprint untouched on a physical server but migrate a VM’s storage while the VM is still running.
Deploying vSphere in your environment generally means that lots of shared storage – Fibre Channel or iSCSI SAN or NFS – is needed. What happens when you need to migrate from an older storage array to a newer storage array? What kind of downtime would be required? Or what about a situation where you need to rebalance utilization of the array, either from a capacity or performance perspective?
With the ability to move storage for a running VM between datastores, Storage vMotion lets you address all of these situations without downtime. This feature ensures that outgrowing datastores or moving to a new SAN does not force an outage for the affected VMs and provides you with yet another tool to increase your flexibility in responding to changing business needs.
vSphere Distributed Resource Scheduler vMotion is a manual operation, meaning that you must initiate the vMotion operation. What if VMware vSphere could perform vMotion operations automatically? That is the basic idea behind vSphere Distributed Resource Scheduler (DRS). If you think that vMotion sounds exciting, your anticipation will only grow after learning about DRS. DRS, simply put, leverages vMotion to provide automatic distribution of resource utilization across multiple ESXi hosts that are configured in a cluster.
Given the prevalence of Microsoft Windows Server in today’s datacenters, the use of the term cluster often draws IT professionals into thoughts of Microsoft Windows Server clusters. Windows Server clusters are often active-passive or active-active-passive clusters. However, ESXi clusters are fundamentally different, operating in an active-active mode to aggregate and combine resources into a shared pool. Although the underlying concept of aggregating physical hardware to serve a common goal is the same, the technology, configuration, and feature sets are quite different between VMware ESXi clusters and Windows Server clusters.
Aggregate Capacity and Single Host Capacity
Although I say that a DRS cluster is an implicit aggregation of CPU and memory capacity, it’s important to keep in mind that a VM is limited to using the CPU and RAM of a single physical host at any given time. If you have two ESXi servers with 32GB of RAM each in a DRS cluster, the cluster will correctly report 64 GB of aggregate RAM available, but any given VM will not be able to use more than approximately 32 GB of RAM at a time.
An ESXi cluster is an implicit aggregation of the CPU power and memory of all hosts involved in the cluster. After two or more hosts have been assigned to a cluster, they work in unison to provide CPU and memory to the VMs assigned to the cluster (keeping in mind that any given VM can only use resources from one host; see the sidebar “Aggregate Capacity and Single Host Capacity”). The goal of DRS is twofold:
• At startup, DRS attempts to place each VM on the host that is best suited to run that VM at that time.
• Once a VM is running, DRS seeks to provide that VM with the required hardware resources while minimizing the amount of contention for those resources in an effort to maintain balanced utilization levels.
The first part of DRS is often referred to as intelligent placement. DRS can automate the placement of each VM as it is powered on within a cluster, placing it on the host in the cluster that it deems to be best suited to run that VM at that moment.
DRS isn’t limited to operating only at VM startup, though. DRS also manages the VM’s location while it is running. For example, let’s say three servers have been configured in an ESXi cluster with DRS enabled. When one of those servers begins to experience a high contention for CPU utilization, DRS detects that the cluster is imbalanced in its resource usage and uses an internal algorithm to determine which VM(s) should be moved in order to create the least imbalanced cluster. For every VM, DRS will simulate a migration to each host and the results will be compared. The migrations that create the least imbalanced cluster will be recommended or automatically performed, depending on the DRS configuration.
DRS performs these on-the-fly migrations without any downtime or loss of network connectivity to the VMs by leveraging vMotion, the live migration functionality I described earlier. This makes DRS extremely powerful because it allows clusters of ESXi hosts to dynamically rebalance their resource utilization based on the changing demands of the VMs running on that cluster.
Fewer Bigger Servers or More Smaller Servers?
Recall from Table 1.2 that VMware ESXi supports servers with up to 320 logical CPU cores and up to 6 TB of RAM. With vSphere DRS, though, you can combine multiple smaller servers for the purpose of managing aggregate capacity. This means that bigger, more powerful servers might not be better servers for virtualization projects. These larger servers, in general, are significantly more expensive than smaller servers, and using a greater number of smaller servers (often referred to as “scaling out”) may provide greater flexibility than a smaller number of larger servers (often referred to as “scaling up”). The key thing to remember is that a bigger server isn’t necessarily a better server.
vSphere Storage DRS
vSphere Storage DRS takes the idea of vSphere DRS and applies it to storage. Just as vSphere DRS helps to balance CPU and memory utilization across a cluster of ESXi hosts, Storage DRS helps balance storage capacity and storage performance across a cluster of datastores using mechanisms that echo those used by vSphere DRS.
Earlier I described vSphere DRS’s feature called intelligent placement, which automates the placement of new VMs based on resource usage within an ESXi cluster. In the same fashion, Storage DRS has an intelligent placement function that automates the placement of VM virtual disks based on storage utilization. Storage DRS does this through the use of datastore clusters. When you create a new VM, you simply point it to a datastore cluster, and Storage DRS automatically places the VM’s virtual disks on an appropriate datastore within that datastore cluster.
Likewise, just as vSphere DRS uses vMotion to balance resource utilization dynamically, Storage DRS uses Storage vMotion to rebalance storage utilization based on capacity and/or latency thresholds. Because Storage vMotion operations are typically much more resource intensive than vMotion operations, vSphere provides extensive controls over the thresholds, timing, and other guidelines that will trigger a Storage DRS automatic migration via Storage vMotion.
Storage I/O Control and Network I/O Control
VMware vSphere has always had extensive controls for modifying or controlling the allocation of CPU and memory resources to VMs. What vSphere didn’t have prior to the release of vSphere 4.1 was a way to apply these same sort of extensive controls to storage I/O and network I/O. Storage I/O Control and Network I/O Control address that shortcoming.
Storage I/O Control (SIOC) allows you to assign relative priority to storage I/O as well as assign storage I/O limits to VMs. These settings are enforced cluster-wide; when an ESXi host detects storage congestion through an increase of latency beyond a user-configured threshold, it will apply the settings configured for that VM. The result is that you can help the VMs that need priority access to storage resources get more of the resources they need. In vSphere 4.1, Storage I/O Control applied only to VMFS storage; vSphere 5 extended that functionality to NFS datastores.
The same goes for Network I/O Control (NIOC), which provides you with more granular controls over how VMs use network bandwidth provided by the physical NICs. As the widespread adoption of 10 Gigabit Ethernet (10GbE) continues, Network I/O Control provides you with a way to more reliably ensure that network bandwidth