VirtualAusterity: The case of the ghost LUN 0

Upgrading an ESX 3.5 U4 to vSphere ESX 4.0i U1 I noticed a very strange behaviour.

In my environment, the upgrade task, requires to reinstall ESXi from scratch then replicate the previous configuration using a custom made powershell script.

The ESXi install phase, normally so fast, took a huge amount of time. That forced me to have the server reinstalled again to watch carefully at logs.

That's what I've found:

CLUE #1

On the installation LUN selection screen, from which you chose the LUN holding the hypervisor, appears a "strange" empty DISK 0 with 0 byte size (see figure 1-1)

figure 1-1

CLUE #2

Pressing ALT-F12 on the server console, to switch to VMKernel log screen, reveals a huge number of following warning messages:

Jan 18 10:19:44 vmkernel: 44:22:15:55.304 cpu3:5453)NMP: nmp_CompleteCommandForPath: Command 0x12 (0x410007063440) to NMP device "mpx.vmhba2:C0:T2:L0" failed on physical path "vmhba2:C0:T2:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0.

Jan 18 10:19:44 vmkernel: 44:22:15:55.304 cpu3:5453)WARNING: NMP: nmp_DeviceRetryCommand: Device "mpx.vmhba2:C0:T2:L0": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.

Jan 18 10:19:45 vmkernel: 44:22:15:56.134 cpu6:4363)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "mpx.vmhba2:C0:T2:L0" - issuing command 0x410007063440

Jan 18 10:19:45 vmkernel: 44:22:15:56.134 cpu3:41608)WARNING: NMP: nmp_CompleteRetryForPath: Retry command 0x12 (0x410007063440) to NMP device "mpx.vmhba2:C0:T2:L0" failed on physical path "vmhba2:C0:T2:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x2

Jan 18 10:19:45 5 0x0.

Jan 18 10:19:45 vmkernel: 44:22:15:56.134 cpu3:41608)WARNING: NMP: nmp_CompleteRetryForPath: Logical device "mpx.vmhba2:C0:T2:L0": awaiting fast path state update before retrying failed command again...

Jan 18 10:19:46 vmkernel: 44:22:15:57.134 cpu5:4363)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "mpx.vmhba2:C0:T2:L0" - issuing command 0x410007063440

Jan 18 10:19:46 vmkernel: 44:22:15:57.134 cpu3:41608)WARNING: NMP: nmp_CompleteRetryForPath: Retry command 0x12 (0x410007063440) to NMP device "mpx.vmhba2:C0:T2:L0" failed on physical path "vmhba2:C0:T2:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x2

You don't need to be a vmkernel storage engineer to correlate cause and effect.

The new vmware storage architecture (PSA) behaves differently from ESX 3.5. During the initial storage scan it finds a "virtual" disk0 device exposed by my storage virtualization appliance (FALCONSTOR NSS) mapped to ESX as LUN 0 device, and it pretends to handle that as all other "real" SAN devices.

This generates a lot of errors and retries, slowing down the boot phase and the vmkernel every time you rescan a storage path again.

The output provided by the following esxcli command, confirms the suspects:

# esxcli --server $HOST --username $USER --password $PASSWD nmp device list

mpx.vmhba3:C0:T0:L0

Device Display Name: Local VMware Disk (mpx.vmhba3:C0:T0:L0)

Storage Array Type: VMW_SATP_LOCAL

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba3:C0:T0:L0;current=vmhba3:C0:T0:L0}

Working Paths: vmhba3:C0:T0:L0

eui.000b080080002001

Device Display Name: Pillar Fibre Channel Disk (eui.000b080080002001)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba2:C0:T3:L63;current=vmhba2:C0:T3:L63}

Working Paths: vmhba2:C0:T3:L63

eui.000b08008a002000

Device Display Name: Pillar Fibre Channel Disk (eui.000b08008a002000)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba2:C0:T1:L60;current=vmhba2:C0:T1:L60}

Working Paths: vmhba2:C0:T1:L60

mpx.vmhba2:C0:T2:L0

Device Display Name: FALCON Fibre Channel Disk (mpx.vmhba2:C0:T2:L0)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba2:C0:T2:L0;current=vmhba2:C0:T2:L0}

Working Paths: vmhba2:C0:T2:L0

mpx.vmhba0:C0:T0:L0

Device Display Name: Local Optiarc CD-ROM (mpx.vmhba0:C0:T0:L0)

Storage Array Type: VMW_SATP_LOCAL

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_FIXED

Path Selection Policy Device Config: {preferred=vmhba0:C0:T0:L0;current=vmhba0:C0:T0:L0}

Working Paths: vmhba0:C0:T0:L0

naa.6000d77800005acc528d69135fbc1c44

Device Display Name: FALCON Fibre Channel Disk (naa.6000d77800005acc528d69135fbc1c44)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=0: NumIOsPending=0,numBytesPending=0}

Working Paths: vmhba1:C0:T2:L68, vmhba2:C0:T2:L68

naa.6000d77800008c5576716bd63f8f9901

Device Display Name: FALCON Fibre Channel Disk (naa.6000d77800008c5576716bd63f8f9901)

Storage Array Type: VMW_SATP_DEFAULT_AA

Storage Array Type Device Config:

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}

Working Paths: vmhba1:C0:T2:L3, vmhba2:C0:T2:L3

Watching carefully through the output you should see mpx.vmhba2 and mpx.vmhba3 referring to a runtime name somewhat different compared to the more traditional naa. and eui. shown for the other paths (to have a clear idea about vmware disk identifiers see the Identifying disks when working with VMware ESX KB article).

I don't know why Falconstor IPStor NSS is exposing those fake LUNs (I'll open s SR), probably this is related to the fact that I don't map, for an internal standard, any LUN number 0 to my ESX servers. Mapping a LUN 0 will hide for sure the issue.

Anyway, I've found another workaround.

The following script add two new claim rules that MASK (hide) all the fake LUN 0 paths using the usual esxcli command line:

# esxcli --server $HOST --username $USER --password $PASSWD corestorage claimrule add -P MASK_PATH -r 109 -t location -A vmhba2 -C 0 -T 2 -L 0

# esxcli --server $HOST --username $USER --password $PASSWD corestorage claimrule add -P MASK_PATH -r 110 -t location -A vmhba3 -C 0 -T 0 -L 0

to check the result type the corestorage claimrule list command

# esxcli --server $HOST --username $USER --password $PASSWD corestorage claimrule list

Rule Class Type Plugin Matches

---- ----- ---- ------ -------

0 runtime transport NMP transport=usb

1 runtime transport NMP transport=sata

2 runtime transport NMP transport=ide

3 runtime transport NMP transport=block

4 runtime transport NMP transport=unknown

101 runtime vendor MASK_PATH vendor=DELL model=Universal Xport

101 file vendor MASK_PATH vendor=DELL model=Universal Xport

109 runtime location MASK_PATH adapter=vmhba1 channel=0 target=0 lun=0

109 file location MASK_PATH adapter=vmhba1 channel=0 target=0 lun=0

110 runtime location MASK_PATH adapter=vmhba2 channel=0 target=0 lun=0

110 file location MASK_PATH adapter=vmhba2 channel=0 target=0 lun=0

65535 runtime vendor NMP vendor=* model=*

Be sure to specify:

The correct (new) Rule number (if you start from number 102 will be ok)

The correct location (vmhba number followed by Channel (C) : Target (T) : Lun (L) corresponding to the fake path)

and then reboot the ESX host.

1 commento:

Michal O.7 maggio 2014 alle ore 16:16
I had the same problem with Falconstor NSS 6.15. My esx 4.1 hosts disconnected from vcenter and it was not possible to connect to them.
I have created small disk and allocated it to esx host with LUN 0. After that the host became responsive and "fake" 0 LUNS have disappeared.
RispondiElimina
Risposte

Aggiungi commento

VirtualAusterity

martedì 19 gennaio 2010

The case of the ghost LUN 0

1 commento:

Archivio blog

Informazioni personali