Monday, April 25, 2011

Nexus VPC: Auto recovery vs. peer-config-check-bypass

It's critical to understand how your vPC will behave in different fail scenarios. In my own effort to understand this behavior thoroughly, I will explain the two vPC domain commands: Auto recovery vs. peer-config-check-bypass.

Before talking about Auto recovery vs. peer-config-check-bypass, we need to understand what happenes before two Nexus switches can bring up a vPC. First, the two switches in the same vPC domain exchange configuration parameters, ( a consistency check), to ensure both switches have compatible vPC configrations to enable thier vPC topology and begin forwarding traffic.

All checks that match a "type 1" inconsistency, will behave like this: When * graceful consistency check*  is on ( by default ),, the primary switch keeps the vPC up while the secondary bring it down. when graceful consistency check is disabled, both peer switches suspend VLANS on the vPC ports.

* show vpc consistency-parameters global
* show vpc consistency-paramenters interface port-channel 200
* show vpc brief

Full Cisco Paper on this: http://www.cisco.com/en/US/docs/switches/datacenter/nexus5000/sw/operations/n5k_vpc_ops.html

Scenario 1: The vPC check message is sent via the vPC peer link, and does not function if the vPC peer link is down. In the situation, the secondary vPC switch suspends all of its vPC member ports, while the Primary vPC switch's member ports remain forwarding.

Scenario 2: The vPC check message is sent via the vPC peer link, and does not function if the vPC peer link is down. If one of the vPC MEMBER ports on the vPC primary switch flaps while the vPC peer link is down, the ports remain down to to the vPC consistency check,

 hence enter the feature . .  Auto recovery

I have not seen this work live, but according to Cisco . . . Cisco NX-OS Release 5.0(2)N2 the auto-recover feature brings up the vPC member ports, (after a member port flap on the primary vPC switch), when one peer is down. Cisco outlines some scenarios and what Auto-recover does in each:

* If both switches reload, and only one switch boots up, auto-recovery allows that switch to assume the role of the primary switch. The vPC links come up after a configurable period of time if the vPC peer-link and the peer-keepalive fail to become operational within that time.

 [ NOTE: I am not sure what Cisco means here that the auto-recovery feature "allows the switch to assume the role of the primary switch", as the role of the vPC primary is clearly determined by the 'role-priority' command ].

* If both switches reload, both switches come up, the vPC peer link comes up, but the peer-keepalive does not come up, both peer switches keep the vPC links down.

* When you disable vPCs on a secondary vPC switch because of a peer-link failure and then the primary vPC switch fails, the secondary switch reenables the vPCs. In this scenario, the vPC waits for three consecutive keepalive failures before recovering the vPC links.

This example shows how to enable the auto-recovery feature and to set the reload delay period:

switch(config)# vpc domain 10
switch(config-vpc-domain)# auto-recovery ?
   <CR>
   reload-delay  Duration to wait after reload to recover vPCs
switch(config-vpc-domain)# auto-recovery reload-delay ?
   <240-3600>  Time-out for restoring vPC links (in seconds)
switch(config-vpc-domain)# auto-recovery reload-delay 240

Okay, thats 'auto-recovery', now let's look at peer-config-check-bypass

The above referenced URL makes no reference to peer-config-check-bypass, although it appears they do pretty much the same thing, except this is solely for the Primary vPC switch.

With this command, Cisco says that we can modify the default, (above mentioned) default vPC beahvior when a peer link is down on the vPC Primary switch. [ NOTE: Cisco does state that there is no behavior change on the secondary switch with this command, whereas with auto-recovery, there is mention of behavior change on the secondary switch. ]. Specifically with this command, Cisco states " This command allows newly configured vPCs and existing vPCs that have flapped to be brought up when a peer link is down and the vPC switch role has been determined to be primary"

Here's Cisco's paper on it:

http://www.cisco.com/en/US/docs/switches/datacenter/nexus5000/sw/layer2/421_n2_1/Cisco_n5k_layer2_config_gd_rel_421_N2_1_chapter8.html

On our config here, we are using peer-config-check-bypass in production.  Please add your thoughts on these two commands!

No comments:

Post a Comment