Converting a NetApp FAS22xx or FAS25xx or AFF system to Advanced Drive Partitioning

ONTAP 8.3 is out right now as a release candidate, which means you can install it, but systems aren’t shipping with it. If you’re installing a new one of these entry level systems, consider carefully if it’s the right choice. From my point of view, it is, as you get much better storage efficiency. You should plan to do an upgrade from 8.3RC1 to 8.3 in Feb-Mar 2015, but with Clustered ONTAP, and the right client and share settings, it can be totally non disruptive (even to CIFS). It’s worth noting too, that for now at least, if you need to replace an ADP partitioned drive, you will probably need to call support for assistance (but it’s a free call, and they love to learn new stuff)

If you want to convert straight out of the box, it’s pretty easy:

  1. Download ONTAP 8.3 from support.netapp.com, the installable version, and put it in an http accessible location
  2. Control-C at startup time to get to the boot menu. Choose option 7 – install new software first, and enter the URL of the 8.3RC1 image from your webserver
  3. Once 8.3 is installed on the internal boot device, boot into maintenance mode from the boot menu and unassign all drives from each node – this isn’t covered in NetApp’s current documentation, but is required
  4. Reboot each node, and then choose option 4 – wipeconfig. Once the wipe is finished, the system will use ADP on the internal drives and be ready to setup

Setup of the FAS22xx and FAS25xx systems can be as easy or as complex as you’d like. They come with a USB key to run a GUI setup application, so you never need to touch the serial connection, but I’ve never used it, and the version on the key probably doesn’t support 8.3RC1, so I just do initial node and cluster setup from the CLI. Another great benefit with 8.3 is that there is now a built in OnCommand System Manager on the cluster management – so no need to ensure there’s always a machine with the right version of Java and Flash on.

Theoretically, you might be able to do a conversion to ADP with data in place by relocating all of the data to one node, and unowning and reformatting the other one, then re-joining to the cluster. I haven’t been in a position to try it, but if you have, I’d be interested to know how it went (email me@thisdomain). Some caveats on an ADP conversion – you need to have at least 4 disks per node for ADP – so the 3 current root aggr drives, and one spare. Ideally re-locate your data to the surviving node using vol move, not aggregate relocation. Then once one node is converted, vol move everything to a data aggregate the converted node, then do the same thing to the unconverted one.

Disk slicing gives you a very valid option of a true active/passive configuration, one which was never really possible with even 7 mode. The root and the data partitions do not need to be assigned to the same node. You can assign all the data partitions to one node, while just leaving enough root partitions on the passive node, or go for the traditional active/active of two data aggregates – one per node. There are some pretty big caveats for disk spares and the importance of quick replacement of failed drives, but on a FAS2520, it’s probably worth it.

I have started writing a post on active/active vs active/passive configs a couple of times, but put it off in favour of waiting till ADP was available. The basic thought is that you want a node to be able to takeover for its HA parter without service degradation, so you want to keep each node below 50% utilization, so it would be the same as running all workloads on a single system, but maybe you’ll accept some degradation in favour of getting more use out of the system. You have lots of choices. One thing to consider is processor affinity – with more processor cores, there’s less need to schedule CPU time to volume operations, and running on more processors (ie, both nodes) gives you access to more cores. But on a FAS2520, how many volumes are you likely to have?

Arista VLAN assignment, and MLAGs

I have done a few Arista deployments lately – they’re awesome, cheap, 10GbE switches. The EOS config is very similar to Cisco IOS, but there is one really important difference for my purposes, regarding VLAN assignments.

On a Cisco switch, you could run the following command:

switchport mode trunk
switchport trunk allowed vlan add 123,124,125

Aristas will let you run this command, without error, but it won’t do what you expect. As soon as you set a port to be a trunk, it allows all VLANs on it, without being told. So on an EOS switch, the configuration is:

switchport mode trunk
switchport trunk allowed vlan none
switchport trunk allowed vlan add 123,124,125

The recommended way of configuring your VLANs is to define which “trunk groups” a VLAN is in (under vlan configuration), then assign ports to trunk groups, but this IOS like method also works. You can (and should) verify the 802.1q trunking configuration of a port (or port-channel) by adding “trunk” after it:

show interface Eth7 trunk

Arista has this very concise and well written page on how to setup their virtual chassis MLAG configuration (like Cisco vPC, Brocade Trill, etc). One important key point it doesn’t note clearly at least – the MLAG peer link needs to ONLY have the peer-link VLAN on it, and the peer-link VLAN can’t go to the uplink switches, or you will get a spanning tree shutdown.

 

Clustered ONTAP 8.3 – No more dedicated root aggregate!

O frabjous day! Callooh! Callay! ONTAP 8.3 is out, and with it, the long promised demise of the dedicated root aggregate for lower end systems!

To re-cap – NetApp has always said – have a dedicated root aggregate. But until Clustered ONTAP, that was more of a recommendation, like, say, brush your teeth morning, noon and night. When you only have 24 drives in a system, throwing away 6 of them to boot the thing seems like a silly idea. The lower-end (FAS2xxx) systems represent a very large number of NetApp’s sales by controller count, and for these systems, Clustered ONTAP was not a great move because of it. With 8.3 being Clustered ONTAP only, there had to be a solution to this pretty serious and valid objection, and there is – Advanced Disk Partitioning (ADP).

What is ADP? Basically it’s partitioning drives, and being able to assign partitions to RAID groups and aggregates. Cool, right? Well, yes, mostly. ADP can be used on All-Flash-FAS (AFF), but that is out of scope for this post. There are some important things to be aware of for these lower end systems.

  1. Systems using ADP need an ADP formatted spare, and then non-ADP spares for any other drives
  2. ADP can only be used for internal drives on a FAS2[2,5]xx system
  3. ADP drives can only be part of a RAID group of ADP drives
  4. SSD’s can now be pooled between controllers!

If a system is only using the internal drives, chances are, it is going to be a smaller system, and most of these don’t matter. The issue comes when it is time to add a disk shelf. Consider the following ADP layout system, assuming one data aggregate per controller:

ADP-24-disksADP-24-disks

 

If we were to add a shelf of 24 disks, and split it evenly between controllers, we would need to do some thinking first. We can’t add it to the ADP RG, and we need a non-ADP spare, for each controller. With ADP, and our 42 (18+24) SAS drives (21 per controller), we have used them like this:

  • N1_aggr0
  • N1_aggr1_rg0 – 6 data, 2 parity
  • N1_aggr1_rg1 – 9 data, 2 parity
  • N1 ADP Spare – 1
  • N1 Non ADP Spare – 1
  • N2_aggr0
  • N2_aggr1_rg0 – 6 data, 2 parity
  • N2_aggr1_rg1 – 9 data, 2 parity
  • N2 ADP Spare – 1
  • N2 Non ADP Spare – 1

For a total of:

  • 8 parity
  • 4 spare
  • 30 data

If we didn’t use ADP, we’d be using them like this:

  • N1_aggr0 – 1 root, 2 parity
  • N1_aggr1_rg0 – 15 data, 2 parity
  • N1 Non ADP Spare – 1
  • N2_aggr0 – 1 root, 2 parity
  • N2_aggr1_rg0 – 15 data, 2 parity
  • N2 Non ADP Spare – 1

For a total of:

  • 8 parity
  • 4 spare
  • … annnd 30 data

I toyed with running the numbers on moving the SSD drives to the shelf, meaning we could have larger ADP partitions used in RAID groups, but that still bites you in the end, as you will end up with the same number of RAID groups, but less balanced sizes as more shelves are added.

If we move to 2 shelves – 66 (18+24+24) SAS drives, we could use them like this with ADP:

  • N1_aggr0
  • N1_aggr1_rg0 – 6 data, 2 parity
  • N1_aggr1_rg1 – 9 data, 2 parity
  • N1_aggr1_rg2 – 10 data, 2 parity
  • N1 ADP Spare – 1
  • N1 Non ADP Spare – 1
  • N2_aggr0
  • N2_aggr1_rg0 – 6 data, 2 parity
  • N2_aggr1_rg1 – 9 data, 2 parity
  • N2_aggr1_rg2 – 10 data, 2 parity
  • N2 ADP Spare – 1
  • N2 Non ADP Spare – 1

For a total of:

  • 12 parity
  • 4 spare
  • 50 data

Or this without ADP:

  • N1_aggr0 – 1 root, 2 parity
  • N1_aggr1_rg0 – 15 data, 2 parity
  • N1_aggr1_rg1 – 10 data, 2 parity
  • N1 Non ADP Spare – 1
  • N2_aggr0 – 1 root, 2 parity
  • N2_aggr1_rg0 – 15 data, 2 parity
  • N2_aggr1_rg1 – 10 data, 2 parity
  • N2 Non ADP Spare – 1

For a total of:

  • 12 parity
  • 50 data
  • 2 spare

At 3 shelves, the story changes.. 90 (18+24+24+24) SAS drives, we could use them like this with ADP:

  • N1_aggr0
  • N1_aggr1_rg0 – 6 data, 2 parity
  • N1_aggr1_rg1 – 9 data, 2 parity
  • N1_aggr1_rg2 – 10 data, 2 parity
  • N1_aggr1_rg3 – 10 data, 2 parity
  • N1 ADP Spare – 1
  • N1 Non ADP Spare – 1
  • N2_aggr0
  • N2_aggr1_rg0 – 6 data, 2 parity
  • N2_aggr1_rg1 – 9 data, 2 parity
  • N2_aggr1_rg2 – 10 data, 2 parity
  • N2_aggr1_rg3 – 10 data, 2 parity
  • N2 ADP Spare – 1
  • N2 Non ADP Spare – 1

For a total of:

  • 16 parity
  • 4 spare
  • 70 data

Or this without ADP:

  • N1_aggr0 – 1 root, 2 parity
  • N1_aggr1_rg0 – 19 data, 2 parity
  • N1_aggr1_rg1 – 18 data, 2 parity
  • N1 Non ADP Spare – 1
  • N2_aggr0 – 1 root, 2 parity
  • N2_aggr1_rg0 – 19 data, 2 parity
  • N2_aggr1_rg1 – 18 data, 2 parity
  • N2 Non ADP Spare – 1

For a total of:

  • 12 parity
  • 74 data
  • 2 spare

So, a couple of conclusions:

  1. ADP is good for internal shelf only systems
  2. ADP is neutral for 1 or 2 shelf systems
  3. ADP is bad for 3+ shelf systems
  4. ADP is awesome for Flashpools (not really a conclusion from this post, but trust me on it? 😉

 

As a footnote: savvy readers will notice I’ve got unequally sized RAID groups in some of these configs. With ONTAP 8.3, the Physical Storage Management Guide (page 107) now says:

All RAID groups in an aggregate should have a similar number of disks. The RAID groups do not have to be exactly the same size, but you should avoid having any RAID group that is less than one half the size of other RAID groups in the same aggregate when possible.

This is in comparison to ONTAP 8.2 Physical Storage Management Guide (page 91) which says:

All RAID groups in an aggregate should have the same number of disks. If this is impossible, any RAID group with fewer disks should have only one less disk than the largest RAID group.