VirtualAusterity: vSphere

Upgrading an ESX 3.5 U4 to vSphere ESX 4.0i U1 I noticed a very strange behaviour.

In my environment, the upgrade task, requires to reinstall ESXi from scratch then replicate the previous configuration using a custom made powershell script.

The ESXi install phase, normally so fast, took a huge amount of time. That forced me to have the server reinstalled again to watch carefully at logs.

That's what I've found:

CLUE #1

On the installation LUN selection screen, from which you chose the LUN holding the hypervisor, appears a "strange" empty DISK 0 with 0 byte size (see figure 1-1)

figure 1-1

CLUE #2

Pressing ALT-F12 on the server console, to switch to VMKernel log screen, reveals a huge number of following warning messages:

Jan 18 10:19:44 vmkernel: 44:22:15:55.304 cpu3:5453)NMP: nmp_CompleteCommandForPath: Command 0x12 (0x410007063440) to NMP device "mpx.vmhba2:C0:T2:L0" failed on physical path "vmhba2:C0:T2:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

Jan 18 10:19:44 vmkernel: 44:22:15:55.304 cpu3:5453)WARNING: NMP: nmp_DeviceRetryCommand: Device "mpx.vmhba2:C0:T2:L0": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

Jan 18 10:19:45 vmkernel: 44:22:15:56.134 cpu6:4363)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "mpx.vmhba2:C0:T2:L0" - issuing command 0x410007063440

Jan 18 10:19:45 vmkernel: 44:22:15:56.134 cpu3:41608)WARNING: NMP: nmp_CompleteRetryForPath: Retry command 0x12 (0x410007063440) to NMP device "mpx.vmhba2:C0:T2:L0" failed on physical path "vmhba2:C0:T2:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x2

Jan 18 10:19:45 5 0x0.

Jan 18 10:19:45 vmkernel: 44:22:15:56.134 cpu3:41608)WARNING: NMP: nmp_CompleteRetryForPath: Logical device "mpx.vmhba2:C0:T2:L0": awaiting fast path state update before retrying failed command again...

Jan 18 10:19:46 vmkernel: 44:22:15:57.134 cpu5:4363)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "mpx.vmhba2:C0:T2:L0" - issuing command 0x410007063440

Jan 18 10:19:46 vmkernel: 44:22:15:57.134 cpu3:41608)WARNING: NMP: nmp_CompleteRetryForPath: Retry command 0x12 (0x410007063440) to NMP device "mpx.vmhba2:C0:T2:L0" failed on physical path "vmhba2:C0:T2:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x2

You don't need to be a vmkernel storage engineer to correlate cause and effect.

The new vmware storage architecture (PSA) behaves differently from ESX 3.5. During the initial storage scan it finds a "virtual" disk0 device exposed by my storage virtualization appliance (FALCONSTOR NSS) mapped to ESX as LUN 0 device, and it pretends to handle that as all other "real" SAN devices.

This generates a lot of errors and retries, slowing down the boot phase and the vmkernel every time you rescan a storage path again.

The output provided by the following esxcli command, confirms the suspects:

# esxcli --server $HOST --username $USER --password $PASSWD nmp device list

mpx.vmhba3:C0:T0:L0

Device Display Name: Local VMware Disk (mpx.vmhba3:C0:T0:L0)

Storage Array Type: VMW_SATP_LOCAL

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba3:C0:T0:L0;current=vmhba3:C0:T0:L0}

Working Paths: vmhba3:C0:T0:L0

eui.000b080080002001

Device Display Name: Pillar Fibre Channel Disk (eui.000b080080002001)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba2:C0:T3:L63;current=vmhba2:C0:T3:L63}

Working Paths: vmhba2:C0:T3:L63

eui.000b08008a002000

Device Display Name: Pillar Fibre Channel Disk (eui.000b08008a002000)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba2:C0:T1:L60;current=vmhba2:C0:T1:L60}

Working Paths: vmhba2:C0:T1:L60

mpx.vmhba2:C0:T2:L0

Device Display Name: FALCON Fibre Channel Disk (mpx.vmhba2:C0:T2:L0)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba2:C0:T2:L0;current=vmhba2:C0:T2:L0}

Working Paths: vmhba2:C0:T2:L0

mpx.vmhba0:C0:T0:L0

Device Display Name: Local Optiarc CD-ROM (mpx.vmhba0:C0:T0:L0)

Storage Array Type: VMW_SATP_LOCAL

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba0:C0:T0:L0;current=vmhba0:C0:T0:L0}

Working Paths: vmhba0:C0:T0:L0

naa.6000d77800005acc528d69135fbc1c44

Device Display Name: FALCON Fibre Channel Disk (naa.6000d77800005acc528d69135fbc1c44)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=0: NumIOsPending=0,numBytesPending=0}

Working Paths: vmhba1:C0:T2:L68, vmhba2:C0:T2:L68

naa.6000d77800008c5576716bd63f8f9901

Device Display Name: FALCON Fibre Channel Disk (naa.6000d77800008c5576716bd63f8f9901)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}

Working Paths: vmhba1:C0:T2:L3, vmhba2:C0:T2:L3

Watching carefully through the output you should see mpx.vmhba2 and mpx.vmhba3 referring to a runtime name somewhat different compared to the more traditional naa. and eui. shown for the other paths (to have a clear idea about vmware disk identifiers see the Identifying disks when working with VMware ESX KB article).

I don't know why Falconstor IPStor NSS is exposing those fake LUNs (I'll open s SR), probably this is related to the fact that I don't map, for an internal standard, any LUN number 0 to my ESX servers. Mapping a LUN 0 will hide for sure the issue.

Anyway, I've found another workaround.

The following script add two new claim rules that MASK (hide) all the fake LUN 0 paths using the usual esxcli command line:

# esxcli --server $HOST --username $USER --password $PASSWD corestorage claimrule add -P MASK_PATH -r 109 -t location -A vmhba2 -C 0 -T 2 -L 0

# esxcli --server $HOST --username $USER --password $PASSWD corestorage claimrule add -P MASK_PATH -r 110 -t location -A vmhba3 -C 0 -T 0 -L 0

to check the result type the corestorage claimrule list command

# esxcli --server $HOST --username $USER --password $PASSWD corestorage claimrule list

Rule Class Type Plugin Matches

---- ----- ---- ------ -------

0 runtime transport NMP transport=usb

1 runtime transport NMP transport=sata

2 runtime transport NMP transport=ide

3 runtime transport NMP transport=block

4 runtime transport NMP transport=unknown

101 runtime vendor MASK_PATH vendor=DELL model=Universal Xport

101 file vendor MASK_PATH vendor=DELL model=Universal Xport

109 runtime location MASK_PATH adapter=vmhba1 channel=0 target=0 lun=0

109 file location MASK_PATH adapter=vmhba1 channel=0 target=0 lun=0

110 runtime location MASK_PATH adapter=vmhba2 channel=0 target=0 lun=0

110 file location MASK_PATH adapter=vmhba2 channel=0 target=0 lun=0

65535 runtime vendor NMP vendor=* model=*

Be sure to specify:

The correct (new) Rule number (if you start from number 102 will be ok)

The correct location (vmhba number followed by Channel (C) : Target (T) : Lun (L) corresponding to the fake path)

and then reboot the ESX host.

With the coming of vSphere 4.0 I have the opportunity to rethink my approach to storage multipathing in vmware world. To justify my post title, I need to share some storage background.

Following concepts and descriptions are grabbed from the well written vSphere Fibre Channel SAN configuration guide that is a "must read"

Backgrounds

Storage System Types

Storage disk systems can be active-active and active-passive.

ESX/ESXi supports the following types of storage systems:

An active-active storage system, which allows access to the LUNs simultaneously through all the storage ports that are available without significant performance degradation. All the paths are active at all times,unless a path fails.

An active-passive storage system, in which one port is actively providing access to a given LUN. The other ports act as backup for the LUN and can be actively providing access to other LUN I/O. I/O can be successfully sent only to an active port for a given LUN. If access through the primary storage port fails, one of the secondary ports or storage processors becomes active, either automatically or through administrator intervention.

ALUA, Asymmetric logical unit access

ALUA is a relatively new multipathing technology for asymmetrical arrays. If the array is ALUA compliant and the host multipathing layer is ALUA aware then virtually no additional configuration is required for proper path management by the host. An Asymmetrical array is one which provides different levels of access per port. For example on a typical Asymmetrical array with 2 controllers it may be that a particular LUN's paths to controller-0 port-0 are active and optimized while that LUN's paths to controller-1 port-0 are active non-optimized. The multipathing layer should then use paths to controller-0 port-0 as the primary paths and paths to controller-1 port-0 as the secondary (failover) paths. Pillar AXIOM 500 and 600 is an example of an ALUA array. A Netapp FAS3020 with Data ONTAP 7.2.x is another example of an ALUA compliant array.

Understanding Multipathing and Failover

To maintain a constant connection between an ESX/ESXi host and its storage, ESX/ESXi supports multipathing. Multipathing lets you use more than one physical path that transfers data between the host and external storage device.

In case of a failure of any element in the SAN network, such as an adapter, switch, or cable, ESX/ESXi can switch to another physical path, which does not use the failed component. This process of path switching to avoid failed components is known as path failover.

In addition to path failover, multipathing provides load balancing. Load balancing is the process of distributing I/O loads across multiple physical paths. Load balancing reduces or removes potential bottlenecks.

Host-Based Failover with Fibre Channel

To support multipathing, your host typically has two or more HBAs available. This configuration supplements the SAN multipathing configuration that generally provides one or more switches in the SAN fabric and the one or more storage processors on the storage array device itself.

In Figure 1-1, multiple physical paths connect each server with the storage device. For example, if HBA1 or the link between HBA1 and the FC switch fails, HBA2 takes over and provides the connection between the server and the switch. The process of one HBA taking over for another is called HBA failover.

Figure 1-1. Multipathing and Failover

Similarly, if SP1 fails or the links between SP1 and the switches breaks, SP2 takes over and provides the connection between the switch and the storage device. This process is called SP failover. VMware ESX/ESXi supports HBA and SP failover with its multipathing capability.

Managing Multiple Paths inside VMware PSA (Pluggable Storage Architecture)

To manage storage multipathing, ESX/ESXi users a special VMkernel layer, Pluggable Storage Architecture (PSA). The PSA is an open modular framework that coordinates the simultaneous operation of multiple multipathing plug-ins (MPPs).

The VMkernel multipathing plug-in that ESX/ESXi provides by default is the VMware Native Multipathing Plug-In (NMP).

The NMP is an extensible module that manages sub-plug-ins. There are two types of NMP

sub-plug-ins, Storage Array Type Plug-Ins (SATPs), and Path Selection Plug-Ins (PSPs). SATPs and PSPs can be built-in and provided by VMware, or can be provided by a third party.

If more multipathing functionality is required, a third party can also provide an MPP to run in addition to, or as a replacement for, the default NMP.

When coordinating the VMware NMP and any installed third-party MPPs, the PSA performs the following tasks:

Loads and unloads multipathing plug-ins.
Hides virtual machine specifics from a particular plug-in.
Routes I/O requests for a specific logical device to the MPP managing that device.
Handles I/O queuing to the logical devices.
Implements logical device bandwidth sharing between virtual machines.
Handles I/O queueing to the physical storage HBAs.
Handles physical path discovery and removal.
Provides logical device and physical path I/O statistics.

Figure 1-2. Pluggable Storage Architecture

vSphere 4's Pluggable Storage Architecture allows third-party developers to replace ESX's storage I/O stack

The multipathing modules perform the following operations:

Manage physical path claiming and unclaiming.
Manage creation, registration, and deregistration of logical devices.
Associate physical paths with logical devices.
Process I/O requests to logical devices:
Select an optimal physical path for the request.
Depending on a storage device, perform specific actions necessary to handle path failures and I/O command retries.
Support management tasks, such as abort or reset of logical devices.

VMware Multipathing Module

By default, ESX/ESXi provides an extensible multipathing module called the Native Multipathing Plug-In (NMP).

Generally, the VMware NMP supports all storage arrays listed on the VMware storage HCL and provides a default path selection algorithm based on the array type. The NMP associates a set of physical paths with a specific storage device, or LUN. The specific details of handling path failover for a given storage array are delegated to a Storage Array Type Plug-In (SATP). The specific details for determining which physical path is used to issue an I/O request to a storage device are handled by a Path Selection Plug-In (PSP). SATPs and PSPs are sub-plug-ins within the NMP module.

VMware SATPs

Storage Array Type Plug-Ins (SATPs) run in conjunction with the VMware NMP and are responsible for array specific operations.

ESX/ESXi offers an SATP for every type of array that VMware supports. These SATPs include an active/active SATP and active/passive SATP for non-specified storage arrays, and the local SATP for direct-attached storage.

Each SATP accommodates special characteristics of a certain class of storage arrays and can perform the array specific operations required to detect path state and to activate an inactive path. As a result, the NMP module can work with multiple storage arrays without having to be aware of the storage device specifics. After the NMP determines which SATP to call for a specific storage device and associates the SATP with the physical paths for that storage device, the SATP implements the tasks that include the following:

Monitors health of each physical path.
Reports changes in the state of each physical path.
Performs array-specific actions necessary for storage fail-over. For example, for active/passive devices, it can activate passive paths.

VMware PSPs

Path Selection Plug-Ins (PSPs) run in conjunction with the VMware NMP and are responsible for choosing a physical path for I/O requests.

The VMware NMP assigns a default PSP for every logical device based on the SATP associated with the physical paths for that device. You can override the default PSP.

By default, the VMware NMP supports the following PSPs:

Most Recently Used (MRU)

Selects the path the ESX/ESXi host used most recently to access the given device.

If this path becomes unavailable, the host switches to an alternative path and

continues to use the new path while it is available.

Fixed

Uses the designated preferred path, if it has been configured. Otherwise, it uses

the first working path discovered at system boot time. If the host cannot use

the preferred path, it selects a random alternative available path. The host

automatically reverts back to the preferred path as soon as that path becomes

available.

Round Robin (RR)

Uses a path selection algorithm that rotates through all available paths enabling

load balancing across the paths.

VMware NMP Flow of I/O

When a virtual machine issues an I/O request to a storage device managed by the NMP, the following process takes place.

The NMP calls the PSP assigned to this storage device.
The PSP selects an appropriate physical path on which to issue the I/O.
If the I/O operation is successful, the NMP reports its completion.
If the I/O operation reports an error, the NMP calls an appropriate SATP.
The SATP interprets the I/O command errors and, when appropriate, activates inactive paths.
The PSP is called to select a new path on which to issue the I/O.

I hope this could help to have the initial background needed to understand why PSA architecture provides an incredible steps forward for customers, like me, that are using ALUA storage arrays.

In my next post I'll share my experiences moving from a manually FIXED PSP path to a new automatically balanced ROUND ROBIN path.

VirtualAusterity

martedì 19 gennaio 2010

The case of the ghost LUN 0

lunedì 11 gennaio 2010

Why VMware PSA is helping me "to save the day" (part 1)

Archivio blog

Informazioni personali