Trustsec and the use of hundreds of SGT’s in one ISE Matrix

A. Abstract/Introduction

Trustsec is a Cisco Systems technology suite – a subset of which is Scalable Group Tags and the assignment, transport and enforcement of these at line-rate in Cisco equipment.
Trustsec uses the term “domain”, which is a network where the edge of the domain does ingress tagging and egress enforcement, the central part of the domain transport IP data packets, in ethernet frames, with an attached SGT source tag.

The devices that can operate a policy based on SGT are routers, switches and firewalls.
This document is focused on the Cisco Catalyst 3k and 9k series.

SGT assignment

Assignment of an SGT tag to a packet happens at the Ingress switch per. vrf.
There are several ways to assign SGTs. Two methods are used in this scenario – dynamic assignment via 802.1x and Cisco ISE, and static assignment via IP-SGT maps for IP subnets distributed by Cisco ISE.
Assignment of an SGT via 802.1x and a subnet SGT via static mapping is both an “IP SGT-map”, the netmask is the only difference.
When an SGT is assigned to an IP subnet at the edge, two things are achieved.
All packets entering this edge with an IP address in the subnet will get the SGT attached when the packet is forwarded towards the center of the Trustsec domain, i.e. this is the source SGT when the packet is in transit.
At the same time the edge now has registered that this IP address is local on the switch and is associated with this SGT. Traffic destined for this IP address form a source SGT(sSGT)/destination SGT(dSGT)pair.

Trustsec policy

The Trustsec policy for the local switch is a subset of the full ISE matrix.
ISE holds an SGT source/destination matrix, and for each source/destination SGT pair an access-list is referenced.
As soon as the switch has registered an SGT as local, the switch will request the policy associated with this destination SGT from ISE, including the necessary role-based access-list.
Functionality related to Cisco ISE and the switches, such as communication protocols, are out of scope of this document and is assumed to be correct and in working order.
The matrix in ISE is not “statefull”.
An SGT pair like SGT2/SGT3 has two states SGT2 -> SGT3 and SGT3 -> SGT2 and they don’t have to be the same.
The pair sSGT 2 -> dSGT 3 can be “Permit IP” and reverse sSGT 3 -> dSGT 2 can be “Permit tcp eq 443”, like shown in the following example.

Source/destination

1

2

3

1

Permit IP

Permit IP

Deny IP

2

Permit IP

Permit IP

Permit IP

3

Deny IP

Permit tcp eq 443

Permit IP

The ISE matrix has a default access-list which is used by the switch if no specific access-list is present in the matrix for a given source/destination SGT pair.
Usually the default access-list is either “00-permit IP” or “00-deny IP”.

SGT enforcement

When an IP packet arrives at a switch from the Trustsec domain, it will have a source SGT attached to the packet. This SGT is the sSGT of the sSGT/dSGT pair. The destination SGT is obtained by the switch by doing a lookup for the destination IP address in the local “ip to sgt-mapping table”.
The switch now has both a source SGT and a destination SGT and can enforce based on the SGT pair.
For each source/destination SGT pair the switch programs it’s TCAM with a policy line enforced by a role-based access-list.
The policy in the switch is Global and is enforced on any outgoing interface, unless the interface has been programmed to the non-default of “no cts role-base enforcement” interface command.
When using SGT enforcement like in this example, problems does not occur until the switches reach their TCAM resource limitations.
If a switch downloads a policy with 10 different SGTs in the matrix and all combinations of sSGT and dSGT pairs need to be enforced by a specific access-list it would require 100 lines of TCAM.
A TCAM line is required each time the access-list referenced by the source/destination SGT pair in the matric cell is different from the default access-list for the entire matrix.

TCAM optimization

The Catalyst does optimize TCAM usage in several ways.

  • Only policies of known local SGTs are programmed in TCAM.
  • Only policies where a specific non default access-list is used, is programmed in TCAM.
  • A policy programmed in TCAM holds only a pointer to an access-list in a different TCAM space, not the complete access-list, thus using only one line of policy TCAM for each policy.
  • If two policies use the exact same access-list, the access-list will only be programmed once in TCAM.
  • Only access-lists that are in use by a policy will be downloaded and programmed into TCAM.

The example uses a small number of access-lists across the entire matrix so there is no risk of overloading access-list TCAM space on the switches.
This document discusses a scenario where the number of SGTs and thus the size of the matrix would exceed the limitations of TCAM in the switches unless a careful design strategy was chosen.

B. Problem Statement

As stated in the background section, the network after design review will need to support a total of 630 SGTs in one ISE Trustsec matrix.
At review start approximately 135 SGTs were present in the matrix, some dynamically assigned and some statically assigned.
The matrix was at this time populated with no regards to how the switches does the TCAM programming of the matrix. As an example, instead of using the default access-list in the matrix, some cells were populated with a specific access-list of “permit IP”.
When using a specific access-list in the matrix it requires an allocation of one line of TCAM in switches that need this policy. If the default policy had been used in the matrix, no TCAM entry would be used as the switch does not use TCAM for policy lines using the default access-list.
In the network four traffic patterns were identified as important.

  • User to server
  • Server to user
  • Server to server
  • User to user.

These can be translated into SGTs where dynamic assigned SGTs are users, and static assigned SGTs are servers and IP subnets.
The matrix was analyzed, and it was found that the matrix was populated with a percentage of specific access-lists requiring TCAM allocation, evaluating specific “Permit IP” entries as if it were the default access-list.
A logical conversion of the matrix to “default deny” (also known as whitelist) instead of current “default permit” was also performed:
The population percentages:

Traffic Flow“deny IP” util %“permit IP” util %
Static to Static

33

71

Static to Dynamic

49

57

Dynamic to Static

33

71

Dynamic to Dynamic

33

71


Looking at the diagram the traffic flows between the groups in the network can be optimized to make a smaller footprint in the TCAM of the enforcing switches.
The enforcing switches are the SDA Fabric edges(A), traditional Trustsec access edges(B) and the backbone switches(C) which does the enforcement on behalf of (D) and (E).
Switches that “only” transport SGTs like the VRF fusion do not program TCAM with policy and are of no concern.

Access switch analysis

The access switches A and B are the destination where users and other devices are located, and this is where dynamic SGTs are assigned.
When a user connects to the access switch an SGT gets assigned to all traffic originating from this user.
The SGT assigned to the user is considered local and is a destination SGT for all traffic destined for the user and it needs to be evaluated by the switch.
As stated in the background section, each department has 10 SGTs available for the department’s use. Due to this, an access switch’s resources should at a minimum be able to sustain this load.
When analyzing both the (A) and (B) access-switches, they had up to 13 active local SGTs in the current network.
An access-switch would then need space for at least 13 SGTs, with a policy population of ~ 80 % (dynamic to dynamic) requiring (13x630x0,80) 6552 TCAM lines, which would disqualify the Catalyst 3650, which only has a maximum of 4000 TCAM Entries.

Backbone switch analysis

When analyzing the backbone switch (C) all 110 Static entries need to be programmed into TCAM on this switch, as it does static mappings for all servers and all other IP subnets.
The backbone switch (C) would then need the space for 110 SGTs with a population of ~ 70 % (static to dynamic) requiring (110x630x0,70) 48510 TCAM lines which would disqualify all Catalyst switches (9606R has 32000 max).
The 110 static SGTs programmed in the backbone switch would probably only have an 80% impact in the core TCAM as server-to-server communication would not cross the core router unless the serves were in different VRFs, but it would still amount to ~ 38000 entries.
It is clear in both cases that doing a simple TCAM estimation shows that TCAM resource would exceed the switches hardware limit and cause unpredictable results unless something was done.

C. Background

The customer has approx. 50 departments, a datacenter and a large legacy network where the internet also resides.
The requirement for the network is that for each department 10 SGTs should be available, and all known IP traffic needs to be match by IP-SGT maps so that all traffic can be enforced by the policy based on SGTs.
The network of this scenario implements both SDA with micro/macro segmentation and traditional non-SDA Trustsec.
The following diagram shows the customer network at project start before any optimization.

Current SGT usage

The number of SGTs in the matrix is approx. 90 dynamic user SGTs and 45 statically assigned IP-SGT maps, which makes a total of 135 SGTs in the matrix and a default policy of “00-Permit IP”.
With only a small number of departments in the Trustsec part of the network, the matrix is already at 135 SGTs, growing each time a department is migrated to use Trustsec and 802.1x.
The Trustsec network is macro segmented into 3 VRFs. Trust, Managed and Public, joining the VRFs at the VRF fusion device.
Each department have users in both Trust and Managed, the Public VRF is shared across the entire campus including the Nexus DC part of the network.

Future SGT usage

The estimation of the future SGT requirement is:
50 departments with each 5 SGTs in Trust VRF and 5 SGTs in Managed VRF, in total 500 dynamic assigned SGTs.
A Public VRF with a total of 20 dynamic assigned SGTs.
110 SGTs for static mapping of server IP subnets in the datacenter across the VRF’s with 75 for Trust, 25 for Managed and 10 for Public.
Legacy and Internet is attached in the Trust VRF, and SGTs needed for IP subnets in this part of the network are taken from Trust’s static mapping allocation.
The total future matrix will then have 630 SGTs.
The Catalyst models and their available TCAM for SGT policy.

Switch ModelAvailable TCAM
Catalyst 3650

4000

Catalyst 3850

4000

Catalyst 9300

8000

Catalyst 9500H

16000

Catalyst 9600

32000

d. Solution

In both the core and the access-layer the solution is the same, minimize the number of local SGTs enforced on the switch and minimize the number of non default access-list in the matrix.

Backbone solution

The first part of the solution is to migrate all access networks to SDA to keep dynamic to dynamic traffic away from core switch.
Next part is to move all networks that do not require line-rate speed behind a firewall so these IP subnets can be identified by one SGT that signifies “already inspected by a superior security device”.
This also happens for macro segmentation of VRF traffic across the firewall, but it will not amount to any reduction as the core switch will have to download the entire policy for a single SGT each time it has an sgt-map.
So, it will still be the number of sgt-maps across all VRFs times the number of SGTs times population percentage.

Finally align configuration in the non-SDA Trustsec part of the network, to have no cts enforcement on internal Trustsec interlink interfaces and make sure that all sgt-caching is disabled on all switches (it is default disabled).

Access layer solution

In the access layer the current maximum of local SGTs is 13.
An administrative maximum of 20 local SGTs has been decided to allow for dynamic changes.
If the 20 SGT limit is exceeded, the users on the switch will have to be rearranged or if it is a switch stack the stack is split into 2 stacks, each with a local SGT number lower that the limit of 20.

ISE matrix solution

The resource exhaustion in the switches are not solved by the optimizations in the core and in the access-layer. As explained in the problem statement section, further optimizations are needed to lower the number of policy lines needed in each switch type.
To solve the TCAM usage in the core and the access-switch with the least resources, a reduction in the ISE matrix population is required.
This can be achieved by changing the matrix from default “permit IP” to default “deny IP”.
In the access-layer, default “deny IP” reduces the matrix population from ~80% to ~30 %, making it possible for a Catalyst 3650 to have 20 local SGTs (20x630x0,30) and 3780 TCAM lines which is within limits of 4000 lines.
In the core-layer, default “deny IP” reduces the matrix from ~70 % to ~30 % making it possible for a catalyst 9600 to have 110 local SGTs (110x630x0,30) and 20790 TCAM lines which is within limits of 32000, the actual number would not be this high as all server to server communication inside a VRF would not require TCAM entries, as this traffic is not enforced in the core switch, and can be left to default in the matrix.

e. Conclusion

With careful planning it is possible use a large amount of SGTs in campus network with an ISP like structure.
Learnings are:

  • Disable sgt-caching globally
  • “no cts role-based enforcement” on internal links
  • Reuse of access-lists
  • Use only specific access-list if necessary

Keeping the population usage in the matrix low is key to having a large amount of active SGTs in the network.
Another key element with large scale SGT deployments is to only use SGT enforcement for servers and IP subnets where switch line-rate and delays are necessary.
If line-rate is unnecessary move these subnets behind some other security device and classify them all with one SGT that signifies “already inspected by a superior security device”.
When making the change from default “permit IP” to default “deny IP” make sure to study the Cisco white paper “Cisco ISE Trustsec Allow-List Model (Default Deny IP) With SDA”.
In this scenario we need to keep in mind that the network infrastructure systems like DHCP, DNS, DNAC and ISE are outside the SDA and in a part of the network where Trustsec is enforced.
All switches need a local manually CLI configured system-sgt plus all role-based access-lists and IP-sgt maps needed for the network to boot while each switch is getting its policy from the ISE system.
The policy from the ISE system will have priority as soon as it is loaded on the switch.
The catch is that the policy is configured by the switch before the complete policy has been received and thus there is a risk of installing a global “deny rule” if no manually “permit rule” has been set on the switches.
The manual policy should be a mirror of the policy in the matrix for all infrastructure related access.
As a network evolves the TCAM population will change when SGTs are added or removed or when the matrix is changed.
When operating at the limits or close to the limits of the switch resource capabilities it is necessary to monitor the TCAM usage of all devices and define an administrative policy to ensure that TCAM usage does not exceed hardware limits.

f. References

https://www.cisco.com/c/en/us/support/docs/cloud-systems-management/dna-center/215516-Trustsec-whitelist-model-with-sda.html
Cisco Live
BRKARC-2035.pdf
BRKCRS-2891.pdf
BRKARC-3863.pdf
BRKSEC-3690.pdf
https://www.cisco.com/c/dam/en/us/solutions/collateral/borderless-networks/Trustsec/C07-730151-00_overview_of_Trustsec_og.pdf