ACSv5 – RADIUS Attributes in Authz Responses
Posted by Jim in Tech Notes on December 30, 2011
This is a quick how-to guide on returning RADIUS attributes in ACSv5. After a tweet by Ethan Banks (@ecbanks) it occurred to me that the majority of network engineers out there are used to the ‘old school’ of ACS 4.x. I’ve only really know version 5 – I was actually part of the team who did the initial Beta testing and NAC-RADIUS inter-op testing, so I don’t really have a view of how different it is to version 4. But I found that’s a good thing; when I first started the Beta testing, I was working alongside an ‘old-school’ engineer who’d worked on v4 for a long time, and he was totally confused by v5′s new approach – for me, it all seemed to make sense.
Anyway – this was supposed to be a quick how-to returning RADIUS attributes – firstly, you can’t define a ‘custom’ IETF attribute, else it wouldn’t be an IETF one. But you can use one of the many pre-defined ones available in the Dictionary.
Q. What’s the Dictionary?
A. In ACSv5, any attribute or value that can be sent/received with an authentication or accounting interaction is defined in the Dictionary. A dictionary entry will define the attributes’s Name and value-type (integer, boolean, string..) as well as any pre-defined values
So – on the left navigation under System Administration -> Configuration -> Dictionaries -> Protocols -> RADIUS -> RADIUS IETF you’ll find all the RADIUS IETF attributes that are available to ACS – either for use as a response, or to validate against in a rule.
If you’re wondering.. RADIUS VSA stands for ‘Vendor Specific Attributes’ – such as Cisco’s AV-PAIR or Microsoft’s CHAP or even Juniper’s ‘Allow commands’.

So how to make use of these..
RADIUS and other attributes are returned to an authenticating device when a rule in the ‘Authorization Policy’ is hit, however in the spirit of ‘object-orientated’ doo-hickies, they are not directly defined here. Instead you define an ‘Authorization Profile’ under Policy-Elements. The idea being, you can build up authorization responses (a set of attributes for example) and use them multiple times in multiple Authorization Policies.
Confused yet?
Under Policy Elements -> Authorization and Permissions -> Network Access -> Authorization Profiles – create yourself a new profile and go to the ‘RADIUS Attributes’ tab:

Now you can select and ‘Add’ the attributes you want to make use of to your Auth Profile, remembering to set the value you want

The value is set in the bottom box. If it’s a custom value, you should be able to enter string text, if it’s predefined or boolean, then use the Select button to pick the value it should be.

If you Add an attribute and make a mistake – select it in the list and click ‘Edit’ – this will pull it back down into the boxes.. here you’ll edit it and click ‘Replace’ to apply the changes. (The wording of the buttons isn’t great, I’ll admit).
Name and Submit your new profile and it’ll now be available under ‘Authorization Policies’ for use.

This really was a ‘quick and dirty’ post. I’ll try and write-up the whole ACSv5 approach to authentication/authorization and accounting in the next few days.
Cheers- Jim.
Good CoPP, Bad CoPP – Balanced Policing
Posted by Jim in Tech Notes on October 20, 2011
Right – this might be a bit long.. I haven’t yet worked out how to make a ‘short but sucint’ blog post..
I had one of those ‘tada‘ or ‘eurika‘ or ‘bloody hell, why didn’t that occur to me earlier’ moments this afternoon (it was the latter). You know what it’s like:- you’re at the end of your tether trying to get something to work, you’ve been fumbling around for hours, and out of the corner of your brain a little flicker resembling a thought process occurs. You give it a moment to surface, chew on it and then ‘oh hell, I’m an idiot’ as you prove you’ve fixed your own mess of a problem. This happened while staring at a multicast convergence problem today – and it was all due to bad CoPP.
CoPP – or Control Plane Policing – is regarded by Cisco as a security feature/mechanism. It’s designed to protect the switch’s CPU from being overwhelmed by control-plane traffic (whether that traffic is legitimate, accidental or the likes of a DoS attack). The Catalyst 6500 had it – but no-one ever seemed to configured it. In the Nexus 7000 and all new NX-OS based switches, it’s a default configuration (unless you’re a monkey and choose ‘none’ during the startup script).
CoPP is configured using MQC and allows you define classes of traffic that might head to the CPU and apply a policing policy to it.. Usually, the default policies (lenient, moderate or strict) are fine for most network deployments, say 80% of them. For the other 20% you have to tweak the policy a little.
Case 1 Bad CoPP – A common [and in my opinion, stupid] method for a server detecting the loss of a gateway is by using ARP. In the old days, this wasn’t a problem – the 6500 would just churn through the ARPs and spit out responses, perhaps missing one or two. When a customer decided to test the Nexus 7000, they found that their servers kept seeing gateway losses. Turns out, the default policy was being exceeded by the sheer number of servers sending out ARPs. The customer moaned. Of course they would, they think the switch is bad and broken and.. anyways. So to get over the [stupid stupid idea] the CoPP is tuned to allow more ARP up to the CPU. It’s a solution to a problem that’s easy to implement, rather than fixing the fact that your servers don’t need to ARP for the gateway as you have HSRP (but hey, who am I to argue with the customer?).
Case 2 Good CoPP – A customer has a setup where they need a fast multicast-convergence time but are also receiving the same (S,G) streams on two different interfaces. Fast multicast-convergence means we need to register the multicast frames with the RP as quickly as possible, so the RP can learn and then (as we happen to be using PIM Anycast-RP) relay the PIM registers to other RPs. For this, we can increase the policing of PIM protocol messages (the default was 200pps, so we upped it to 600pps). This is fine, we’re just allowing the policy to scale upwards.
The trouble of balancing the CoPP came with the (S,G)s being received on two interfaces. In multicast we can only have one interface being the incoming interface – this incoming interface is determined by the RPF check (reverse path forwarding) and is programmed into hardware. Once programmed, all matching multicast on that interface is forwarded in hardware. The (S,G)s being received at the non-RPF interface would not match a hardware (FIB) entry – and thus would be forwarded up to CPU for software processing. The problem with this is we’re punting useless traffic up to the CPU, wasting CPU resources and preventing that inband bandwidth for being used for other things (such as that fast multicast convergence). When testing this initially, I identified IPMCMISS as the class which this useless, RPF-failing, traffic was hitting in CoPP and trimmed it right back to 10pps. When I went to do another convergence test, I found that convergence was super-slow, even though I had tuned the PIMREG upwards.
What I didn’t realised was that IPMCMISS doesn’t actually [just] match RPF-failing traffic – it actually matches any multicast traffic that triggers a ‘FIB-miss’ – this was the ‘bloody hell’ moment. Whenever we receive multicast traffic into hardware, and there’s no hardware-programmed FIB entry, it’s a FIB-miss – and this is punted to CPU for processing or software switching. FIB-miss would be triggered the first time we see an (S,G), which is how we get into the process of punting to CPU, PIM learning, inserting into MRIB, generating a PIM Register and programming the hardware. So by cutting away the bandwidth available to IPMCMISS, I was also reducing the chance of new (S,G) frames making it to the CPU for learning.
So to summarise – I now I have to work out how to balance policing of the useless traffic and wasting CPU bandwidth against the need to learn new (S,G)s. I would never condone opening CoPP up for something like ARP, it sounds silly to me.
The end.
PS – [Just an updated thought] – It’s worth noting that on the N7000 you can define different class-maps for IPMCMISS and RPF-failing traffic, you can’t yet do this on the N3000, and I have yet to check the N5x00.
class-map type control-plane match-any mc-rpf-fail
match exception multicast rpf-failure
VM-FEX and VXLAN
Posted by Jim in DC, Tech Notes, Virtualisation on October 13, 2011
So yesterday I had a chance to read up on both VXLAN (Virtual eXtensible LAN) and VM-FEX, as well as having a good discussion with Greg Ferro (@etherealmind) about VXLAN and he introduced me to the concept of OpenFlow.
My source for VM-FEX was a whitepaper by Cisco on ‘Unify Virtual and Physical Networking with Cisco Virtual Interface Card‘ – which made things pretty easy to understand. The short story is; we attached vNICs to virtual machines using VMWare’s DirectPath – the VM sees a NIC as normal, in vCenter it sees a vNIC, in reality it’s a hardware-based NIC emulation on the VIC. Instead of having a virtual switch on the host we do PTS (Pass Thru Switching) and the vNIC is bound to a VIF (Virtual InterFace) further up the path on a real physical switch. That VIF is presented just like a normal switchport from a configuration point of view.. ie, it looks the same as a switchport on the end of a FEX (Fabric Extender.. aka 2232, 2148 etc). This VM-FEX vNIC supports vMotion in vSphere 5 by doing some fancy stuff around the NIC registers and state information that’s on the VIC. Now, this VM-FEX technology only currently works inside UCS – so we’ll have to wait to see how/if it can be implemented outside of that.
Now to VXLAN.. When I read Coding Relic’s write-up on how VXLAN works (there’s three pages, but they’re all good), I couldn’t help thinking “This is OTV”. In fact, even after a quick discussion with Greg about the matter – I still think it’s OTV. The only difference is, we’re not terminating Layer-2 on a switch somewhere, we’re terminating it directly on the host machine. So now, these VXLANS only exist on the hosts – they don’t exist on the underlying infrastructure – which got me to thinking about how this scales. In a normal vSwitch/dvSwitch/1000v environment, the virtual-switch on the host only needs to learn the MAC addresses of the directly connected VMs – everything else is northbound on the physical infrastructure, so there’s only one way to send it (ignoring all the stuff about mac-pinning and port-channeling, blah). Now, we have VXLANS and only the hosts know what’s on that VXLAN – so essentially, the host now needs to have a bunch of MAC lookup tables (much like TCAM in a physical switch). Using similar control-plane methods as OTV, it learns the MAC addresses of VMs on other hosts via multicasts and then stores that information locally. The whole point of VXLAN is to break out of the 4096-VLAN limit and allow easy multi-tenancy – but how much overhead does learning all the MACs of all the VMs on all these new VXLANs add to the host itself? Of course, the obvious bad-points around VXLAN is visibility of traffic on the underlying infrastructure, policy enforcement has to take place on the virtual-switch and there’s an added layer of troubleshooting to do.
OpenFlow is the next topic on my reading list.. I’ve had a quick introduction by Greg but I need to do the reading too.









