ACI Rogue/Coop Exception List

Overview

ACI has a feature, ‘ACI Rogue/Coop Exception List’, to mitigate the impact of rogue endpoint control and coop endpoint dampening on some legitimate frequent traffic movement during server initialization, firewall failover and others.

Some use cases are:

  1. Firewall failover:
    • During firewall failover there may be frequent MAC address movement that may trigger the rogue endpoint control to mark the MAC address as rogue.
    • This makes the firewall failover to take longer.
    • Adding the MAC addresses of the firewalls to the rogue/coop exception list addresses the firewall failover issue.
  2. VxRail cluster discovery process:
    • A VxRail manager VM frequent moves during initial cluster discovery triggers the rogue endpoint control and/or coop endpoint dampening.
    • This makes the cluster initial setup / discovery to fail.
    • Adding the MAC address of the VxRail manager VM to the rogue/coop exception list make the cluster discovery process successful.

The Rogue/Coop Exception List feature is used to mitigate the kind of issues listed above and other similar scenarios.

The following section covers the overview and configuration of:

  1. Rogue Endpoint Control
  2. COOP Endpoint Dampening
  3. Rogue/COOP Exception List

1. Rogue Endpoint Control

Rogue endpoint control is a feature introduced in release 3.2(1l), it’s disabled by default, to detect and pause learning of a frequently moving endpoint to avoid impacts on forwarding and resources like CPU. Once the feature is enabled, the endpoint is marked as rouge if it moves between ports for more than ‘Rouge EP Detection Multiplication Factor’ (by default 4) time with in ‘Rogue EP Detection Interval’ (by default 60 sec). For an endpoint marked as rogue the learning will pause for a configured hold time (by default 1800 sec). During the hold time the endpoint stays static. After the hold time the normal endpoint learning resumes.

Sample output (some of the information are greyed out) under – System -> History

Figure 1. Sample output for a rogue endpoint

The feature is enabled globally but individual leaf switches track the movement and take action. Being rogue in one leaf switch won’t pause learning on other leaf switches. The learning on another switch proceed normally, if the endpoint happened to move to a different switch and behave properly.

Figure 2. Rogue EP control

There is an option to clear the rogue endpoint declaration manually if there is a need to reduce the potential downtime impact on the rouge endpoint.

Figure 3. Clear rogue endpoint

2. COOP Endpoint Dampening

The COOP Endpoint Dampening feature was introduced in APIC Release 4.2(3) and it’s enabled by default.

For release 5.0(1) and later the feature can only be disabled using API calls.

The API Call Body:
<polUni>
    <infraInfra>
       <infraSetPol disableEpDampening="true"></infraSetPol>
    </infraInfra>
</polUni>

COOP Endpoint Dampening is used by spine to mitigate the impact of unreasonable amounts of endpoint updates. When a spine node identifies a dampened endpoint a fault is raised and the spine notifies all leaf nodes to ignore the update from the endpoint.

COOP Endpoint Dampening is based on penalty. The penalty is accumulated and when it reaches 4000 per IP address of an endpoint the status changes from ‘Normal’ to ‘Critical’. If the penalty per IP address stays in a ‘Critical’ state for five minutes or the penalty exceeds 10000, ‘the freeze threshold’ the endpoint status will change to freeze (dampening) state.

Click here to learn more about the Coop endpoint dampening penalty

Figure 4. Sample output for dampened endpoint

There is an option to clear the dampened endpoint declaration manually if there is a need to reduce the potential downtime impact on the dampened endpoint.

Figure 5. Clearing dampened endpoints

3. Rogue/COOP Exception List

Starting ACI release 5.2(3) a new feature is added to lessen the rogue endpoint control and COOP impacts on frequently moving endpoints. This feature is called ‘Rogue/COOP Exception List’. The feature is under BD -> Advanced/Troubleshooting section. The MAC addresses that are part of the exception list will only marked as rogue if more than 3000 movements occurred in 10 minutes.

This feature is applicable with L2 BD only because if routing is enabled the ACI leaf will detect the IP movement and mark the endpoint rogue based on the IP address. Disabling Data-plane learning can be used to disable IP address based rogue EP control.

As of this writing, the max limit for the ‘ Rogue/COOP Exception List’ is 100 MAC addresses globally (fabric wide).

Figure 6. Rogue/coop exception list

https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-739989.html

Leave a Comment

Your email address will not be published. Required fields are marked *