What can I do with my old NetApp hardware?

I had a chance today to go through some equipment in my lab pool and try some things I’d been thinking about for a while.

  • Q: If you pull the CF card out of a FAS30xx or FAS31xx system and put it in a PC, does it boot?
  • A: Yes, kind of. It’s a standard FAT16 card, with a standard boot loader on it. However, there is no console, so it just boots up with a flashing cursor, but plug your serial cable into your PC’s serial port and you can interact with it. I tried it in a USB CF reader, and all the kernel boot options refer to IDE devices. With an older system and an IDE to CF header, it might go further, but ONTAP’s boot process has platform checks, so it will probably fail at that point
LOADER> printenv

Variable Name        Value
-------------------- --------------------------------------------------
CPU_NUM_CORES        2
BOOT_CONSOLE         uart0a
BIOS_VERSION         1.3.0
BIOS_DATE            06/22/2010
SYS_MODEL            Vostro 220 Series
SYS_REV              �P�(
SYS_SERIAL_NUM       C384SK1
MOBO_MODEL           0P301D
MOBO_REV             A02
MOBO_SERIAL_NUM      ..CN7360495H03W1.
CPU_SPEED            3000
CPU_TYPE             Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
savenv               saveenv
ENV_VERSION          1
BIOS_INTERFACE       86A0
LOADER_VERSION       1.6.1
ARCH                 x86_64
BOARDNAME            Eaglelake
  • Q: Can I use a DS14MK2/DS14MK4/EXN2000 with Linux?
  • A: Yes! Plenty of people have done it. For FC devices, there is a problem of 520 byte sectors, but for SATA(ATA) devices, the use 512 byte ones natively, so no problem. Use a PCI or PCIe FC card like the LPE11002 ($10 on ebay), then install sg3-utils (ubuntu, check your distro for its name there), and use “sg_format -s 512” on any FC drives to convert them from 520 byte sectors to 512 byte sectors, then use the device like any other.

 

  • Q: What about DS4243/DS4246/DS2246 shelves with Linux?
  • A: This one I’m less sure of – but it seems like it should work. I got pretty close. They are just SAS expanders. I have put a NetApp X2065 PCIe SAS HBA into a Linux system, and it is recognised as a PMC8001 SAS HBA. Plugging the shelf in (single attachment) results in the drives being recognised (same 520 byte problem for SAS drives though). Was able to create a LVM PV on a couple of SATA drives, put it into a VG, and then create an LV, but when I tried formatting the LV, it failed when it got the stage of writing superblocks. It’s probably fixable, but I don’t have the time or need to do so. It is also worth mentioning that the PMC8001 is made for rack mount systems with high airflow – inside a standard PC it gets VERY VERY hot, very quickly.
  • Update: 2017-08 – I had someone email me about this, and Youtube mysteriously suggested this video on this very topic. After some back and forth, it looks like the trick to getting it working is to pull out the second IOM from the back of the system and single attach it. This may only be needed for SATA drives with the interposer board that makes them talk SAS. I know some people who have got the DS2246 with SAS drives working without having to do this.

 

  • Q: What happens if I put a FlashCache (PAM II 512GB) card into a PC?
  • A: Nothing. Linux detects the PCI vendor ID as NetApp, but then doesn’t assign a class, and just says product ID of 774c.

 

  • Q: What if I install Linux on a CF card, then put it into a FAS3170?
  • A: Stay tuned 😉 Standard ubuntu-core won’t fit onto the 1GB supplied CF card. I’m in the process of acquiring a larger one, and I’ll try.

Adding more disks to an ADP NetApp

I have a FrankenFAS2240, made up out of parts from about 5 different systems, totally unsupported. I set it up initially with 12 drives, and then got some more and wanted to grow the ADP setup.

By default, putting these drives into the enclosure, they showed up as broken. The solution to this is from this NetApp Communities post  – once the drives are labelwiped and set to spare, they are automatically partitioned.

From there, it’s just a matter of running disk assign for the data partitions, zeroing them, then adding them! Easy!

Re-ordering shelves for NetApp FAS

No-one is perfect. I recently added some shelves to the wrong location of a stack, breaking the design rule of a single speed transition between 3G and 6G shelves, and didn’t find out until after the new disks were added to an existing aggregate. Under normal circumstances, you can officially hot remove disk shelves from a system running ONTAP 8.2.1 or later. Assuming they don’t have data on them, which these ones did.

Fortunately ONTAP doesn’t require symmetric SAS topologies, so I did the following to resolve it:

  1. Aim to recable the IOM B stack
  2. Failover and take node 2 out of service
  3. Disconnect SAS cable from node 1 (yes, node 1) to IOM B
  4. Recable IOM B’s SAS stack
  5. Disconnect node 2’s connection to the IOM A stack
  6. Bring up node 2, failback
  7. At this point, node 1 and node 2 have different, non redundant topologies
  8. Failover node 1 to node 2
  9. Recable IOM A stack
  10. Reconnect redundant connections for IOM A and IOM B to bring node 2 back into MPHA
  11. Failback to node 1

Tada, all done, non-disruptive (system is iSCSI only – CIFS without SMB3 Continuous Availablity would result in session disconnects)

I am become death, destroyer of SANs

Most people want their SAN to keep data around, with maximum resiliency. But what do you do at the end of their lives?

From time to time I get called in to do the opposite of what most people care about from SANs – destroying them. ONTAP has built in sanitization options, which perform a combination of overwrites and zeroing of drives, to enable you to securely erase the drives, and with some NetApp models, like the FAS2240 and FAS255x’s, you can convert them into disk shelves.

Sanitizing all the drives in a controller is usually a two step process – you destroy the existing aggregates, create a new basic system on a small aggregate, then run sanitize on the remaining disks, then repeat, erasing the ones used for the root volume while you’re running the first sanitize.

But there’s an easier way – disable cf, offline all the volumes except vol0, take the system down, boot to maintenance mode, destroy all the aggregates, then reassign all the drives to one controller, and create a two disk RAID4 aggregate using the two drives that were the spares from each controller previously – they won’t have had data on them, so usually no need to sanitize. Boot into ONTAP and run through the initial setup wizard (there’s a bit of hand waving here about the exact process, as it differs between 7.x and 8.x), run the sanitize, and you’re done in a single step.

To do a shelf conversion without a sanitize, similar plan – offline volumes, disable cf, boot to maintenance mode, take ownership of all the drives (using disk reassign to reassign them from their partner), then destroy the aggregates, then remove ownership from all drives and shut the system down. Then, swap the PCM/IOMEs out for real IOMs, and attach as a new shelf. The new system will need to zero the spares before you can use the drives, and it is usually half the speed of doing it from option 4 in the special boot menu (which makes it about 17 hours for 3TB SATA), but the waiting game is all part of systems administration 😉

ONTAP 8.3 – Disk Assignment policy

ONTAP 8.3 further refines how and when disks are automatically assigned. There are now 4 options for the disk auto-assignment policy – bay, shelf, stack and default. For heavy reading, check out the ONTAP 8.3 Physical Storage Management guide.

If “bay” is chosen, disks in odd-number bays are assigned to the same controller, and disks in even numbered bays are assigned to the same controller. If “stack” is chosen, all drives in the same stack are assigned to the same controller, and if “shelf” is chosen, all drives in the same shelf are assigned to the same controller.

Default is an interesting one – on the FAS22xx and FAS25xx, it means “bay”, on everything else, it means “stack”. If you have a single stack on an 8020? Well, you’ll need to manually set the policy to “shelf”.

Aggregate level snapshots – turn them off!

A company we deal with had a pretty nasty problem recently. They were doing some major VMWare changes, including networking, and were using Storage VMotion to move from one datastore to another, on the same aggregate.

Sounds good, right? Well, there’s a well intentioned, historic, feature in NetApp’s Data ONTAP, of aggregate snapshots. The problem this company faced was due to aggregate level snapshots – while they are set to auto-delete, the blocks are not immediately freed (but the space looked available). There is a low priority free space reaper process that actually makes the blocks writable again. With the blocks unavailable, the aggregate was essentially full, leading to the usual result that WAFL exhaustion leads to, of glacial latency, which reduces the effectiveness of the space freeing process too, making it even worse. And in this case, a whole company going home. Background freeing of blocks is one of those features that makes sense – but as aggregates got bigger and bigger, and over 16TB, there exists a greater impact of whole-volume scan operations.

Normally you make snapshots of volumes, for connected system backups, recovery, etc. Aggregate level ones were a failsafe of last resort for ONTAP. From really early days, NetApp systems have had a battery to keep the NVMEM cache alive, even if the system looses power. When it comes back on, those changes are flushed to disk, and WAFL is once again consistent. The batteries used in the time I’ve dealt with NetApp have allowed for a 72 hour power outage. At times, however, this is not enough. ONTAP will boot, and find WAFL is inconsistent, and run essentially a fsck (wafliron). Usually that works. Sometimes, it doesn’t, and then you have a Very Big Problem. Aggregate level snapshots can save you here – the same way a LUN snapshot might save an attached system.

This all changed with the FAS8000 series – these systems have a battery too, but they use it to de-stage the NVMEM to an SSD. This means they can withstand outages longer than 72 hours. Which means there is no need for aggregate snapshots on the FAS8020, FAS8040, FAS8060 and FAS8080EX systems. I’d go so far as to turn them off on all systems. After the problems this company faced, I’ll be doing it for everyone.

There is one downside – aggregate snapshots can save you if you delete the wrong volume. You move any other volumes off the aggregate (yay cDOT), assuming you have another one, then revert the aggregate. This risk is usually addressed with snapvaults/snapmirrors/backup software, but it’s worth remembering. Aggregate snapshots are also used by syncmirror and Metrocluster, as outlined in this article, but in my market segment, these aren’t major uses.

This KB from NetApp recommends they be turned off for data aggregates, which like flow control on 10GbE ports, makes you wonder why this isn’t the default setting. So, check your systems folks, and turn off your aggregate snapshots, especially if you have FAS8000 systems.

Converting a NetApp FAS22xx or FAS25xx or AFF system to Advanced Drive Partitioning

ONTAP 8.3 is out right now as a release candidate, which means you can install it, but systems aren’t shipping with it. If you’re installing a new one of these entry level systems, consider carefully if it’s the right choice. From my point of view, it is, as you get much better storage efficiency. You should plan to do an upgrade from 8.3RC1 to 8.3 in Feb-Mar 2015, but with Clustered ONTAP, and the right client and share settings, it can be totally non disruptive (even to CIFS). It’s worth noting too, that for now at least, if you need to replace an ADP partitioned drive, you will probably need to call support for assistance (but it’s a free call, and they love to learn new stuff)

If you want to convert straight out of the box, it’s pretty easy:

  1. Download ONTAP 8.3 from support.netapp.com, the installable version, and put it in an http accessible location
  2. Control-C at startup time to get to the boot menu. Choose option 7 – install new software first, and enter the URL of the 8.3RC1 image from your webserver
  3. Once 8.3 is installed on the internal boot device, boot into maintenance mode from the boot menu and unassign all drives from each node – this isn’t covered in NetApp’s current documentation, but is required
  4. Reboot each node, and then choose option 4 – wipeconfig. Once the wipe is finished, the system will use ADP on the internal drives and be ready to setup

Setup of the FAS22xx and FAS25xx systems can be as easy or as complex as you’d like. They come with a USB key to run a GUI setup application, so you never need to touch the serial connection, but I’ve never used it, and the version on the key probably doesn’t support 8.3RC1, so I just do initial node and cluster setup from the CLI. Another great benefit with 8.3 is that there is now a built in OnCommand System Manager on the cluster management – so no need to ensure there’s always a machine with the right version of Java and Flash on.

Theoretically, you might be able to do a conversion to ADP with data in place by relocating all of the data to one node, and unowning and reformatting the other one, then re-joining to the cluster. I haven’t been in a position to try it, but if you have, I’d be interested to know how it went (email me@thisdomain). Some caveats on an ADP conversion – you need to have at least 4 disks per node for ADP – so the 3 current root aggr drives, and one spare. Ideally re-locate your data to the surviving node using vol move, not aggregate relocation. Then once one node is converted, vol move everything to a data aggregate the converted node, then do the same thing to the unconverted one.

Disk slicing gives you a very valid option of a true active/passive configuration, one which was never really possible with even 7 mode. The root and the data partitions do not need to be assigned to the same node. You can assign all the data partitions to one node, while just leaving enough root partitions on the passive node, or go for the traditional active/active of two data aggregates – one per node. There are some pretty big caveats for disk spares and the importance of quick replacement of failed drives, but on a FAS2520, it’s probably worth it.

I have started writing a post on active/active vs active/passive configs a couple of times, but put it off in favour of waiting till ADP was available. The basic thought is that you want a node to be able to takeover for its HA parter without service degradation, so you want to keep each node below 50% utilization, so it would be the same as running all workloads on a single system, but maybe you’ll accept some degradation in favour of getting more use out of the system. You have lots of choices. One thing to consider is processor affinity – with more processor cores, there’s less need to schedule CPU time to volume operations, and running on more processors (ie, both nodes) gives you access to more cores. But on a FAS2520, how many volumes are you likely to have?

Arista VLAN assignment, and MLAGs

I have done a few Arista deployments lately – they’re awesome, cheap, 10GbE switches. The EOS config is very similar to Cisco IOS, but there is one really important difference for my purposes, regarding VLAN assignments.

On a Cisco switch, you could run the following command:

switchport mode trunk
switchport trunk allowed vlan add 123,124,125

Aristas will let you run this command, without error, but it won’t do what you expect. As soon as you set a port to be a trunk, it allows all VLANs on it, without being told. So on an EOS switch, the configuration is:

switchport mode trunk
switchport trunk allowed vlan none
switchport trunk allowed vlan add 123,124,125

The recommended way of configuring your VLANs is to define which “trunk groups” a VLAN is in (under vlan configuration), then assign ports to trunk groups, but this IOS like method also works. You can (and should) verify the 802.1q trunking configuration of a port (or port-channel) by adding “trunk” after it:

show interface Eth7 trunk

Arista has this very concise and well written page on how to setup their virtual chassis MLAG configuration (like Cisco vPC, Brocade Trill, etc). One important key point it doesn’t note clearly at least – the MLAG peer link needs to ONLY have the peer-link VLAN on it, and the peer-link VLAN can’t go to the uplink switches, or you will get a spanning tree shutdown.

 

Clustered ONTAP 8.3 – No more dedicated root aggregate!

O frabjous day! Callooh! Callay! ONTAP 8.3 is out, and with it, the long promised demise of the dedicated root aggregate for lower end systems!

To re-cap – NetApp has always said – have a dedicated root aggregate. But until Clustered ONTAP, that was more of a recommendation, like, say, brush your teeth morning, noon and night. When you only have 24 drives in a system, throwing away 6 of them to boot the thing seems like a silly idea. The lower-end (FAS2xxx) systems represent a very large number of NetApp’s sales by controller count, and for these systems, Clustered ONTAP was not a great move because of it. With 8.3 being Clustered ONTAP only, there had to be a solution to this pretty serious and valid objection, and there is – Advanced Disk Partitioning (ADP).

What is ADP? Basically it’s partitioning drives, and being able to assign partitions to RAID groups and aggregates. Cool, right? Well, yes, mostly. ADP can be used on All-Flash-FAS (AFF), but that is out of scope for this post. There are some important things to be aware of for these lower end systems.

  1. Systems using ADP need an ADP formatted spare, and then non-ADP spares for any other drives
  2. ADP can only be used for internal drives on a FAS2[2,5]xx system
  3. ADP drives can only be part of a RAID group of ADP drives
  4. SSD’s can now be pooled between controllers!

If a system is only using the internal drives, chances are, it is going to be a smaller system, and most of these don’t matter. The issue comes when it is time to add a disk shelf. Consider the following ADP layout system, assuming one data aggregate per controller:

ADP-24-disksADP-24-disks

 

If we were to add a shelf of 24 disks, and split it evenly between controllers, we would need to do some thinking first. We can’t add it to the ADP RG, and we need a non-ADP spare, for each controller. With ADP, and our 42 (18+24) SAS drives (21 per controller), we have used them like this:

  • N1_aggr0
  • N1_aggr1_rg0 – 6 data, 2 parity
  • N1_aggr1_rg1 – 9 data, 2 parity
  • N1 ADP Spare – 1
  • N1 Non ADP Spare – 1
  • N2_aggr0
  • N2_aggr1_rg0 – 6 data, 2 parity
  • N2_aggr1_rg1 – 9 data, 2 parity
  • N2 ADP Spare – 1
  • N2 Non ADP Spare – 1

For a total of:

  • 8 parity
  • 4 spare
  • 30 data

If we didn’t use ADP, we’d be using them like this:

  • N1_aggr0 – 1 root, 2 parity
  • N1_aggr1_rg0 – 15 data, 2 parity
  • N1 Non ADP Spare – 1
  • N2_aggr0 – 1 root, 2 parity
  • N2_aggr1_rg0 – 15 data, 2 parity
  • N2 Non ADP Spare – 1

For a total of:

  • 8 parity
  • 4 spare
  • … annnd 30 data

I toyed with running the numbers on moving the SSD drives to the shelf, meaning we could have larger ADP partitions used in RAID groups, but that still bites you in the end, as you will end up with the same number of RAID groups, but less balanced sizes as more shelves are added.

If we move to 2 shelves – 66 (18+24+24) SAS drives, we could use them like this with ADP:

  • N1_aggr0
  • N1_aggr1_rg0 – 6 data, 2 parity
  • N1_aggr1_rg1 – 9 data, 2 parity
  • N1_aggr1_rg2 – 10 data, 2 parity
  • N1 ADP Spare – 1
  • N1 Non ADP Spare – 1
  • N2_aggr0
  • N2_aggr1_rg0 – 6 data, 2 parity
  • N2_aggr1_rg1 – 9 data, 2 parity
  • N2_aggr1_rg2 – 10 data, 2 parity
  • N2 ADP Spare – 1
  • N2 Non ADP Spare – 1

For a total of:

  • 12 parity
  • 4 spare
  • 50 data

Or this without ADP:

  • N1_aggr0 – 1 root, 2 parity
  • N1_aggr1_rg0 – 15 data, 2 parity
  • N1_aggr1_rg1 – 10 data, 2 parity
  • N1 Non ADP Spare – 1
  • N2_aggr0 – 1 root, 2 parity
  • N2_aggr1_rg0 – 15 data, 2 parity
  • N2_aggr1_rg1 – 10 data, 2 parity
  • N2 Non ADP Spare – 1

For a total of:

  • 12 parity
  • 50 data
  • 2 spare

At 3 shelves, the story changes.. 90 (18+24+24+24) SAS drives, we could use them like this with ADP:

  • N1_aggr0
  • N1_aggr1_rg0 – 6 data, 2 parity
  • N1_aggr1_rg1 – 9 data, 2 parity
  • N1_aggr1_rg2 – 10 data, 2 parity
  • N1_aggr1_rg3 – 10 data, 2 parity
  • N1 ADP Spare – 1
  • N1 Non ADP Spare – 1
  • N2_aggr0
  • N2_aggr1_rg0 – 6 data, 2 parity
  • N2_aggr1_rg1 – 9 data, 2 parity
  • N2_aggr1_rg2 – 10 data, 2 parity
  • N2_aggr1_rg3 – 10 data, 2 parity
  • N2 ADP Spare – 1
  • N2 Non ADP Spare – 1

For a total of:

  • 16 parity
  • 4 spare
  • 70 data

Or this without ADP:

  • N1_aggr0 – 1 root, 2 parity
  • N1_aggr1_rg0 – 19 data, 2 parity
  • N1_aggr1_rg1 – 18 data, 2 parity
  • N1 Non ADP Spare – 1
  • N2_aggr0 – 1 root, 2 parity
  • N2_aggr1_rg0 – 19 data, 2 parity
  • N2_aggr1_rg1 – 18 data, 2 parity
  • N2 Non ADP Spare – 1

For a total of:

  • 12 parity
  • 74 data
  • 2 spare

So, a couple of conclusions:

  1. ADP is good for internal shelf only systems
  2. ADP is neutral for 1 or 2 shelf systems
  3. ADP is bad for 3+ shelf systems
  4. ADP is awesome for Flashpools (not really a conclusion from this post, but trust me on it? 😉

 

As a footnote: savvy readers will notice I’ve got unequally sized RAID groups in some of these configs. With ONTAP 8.3, the Physical Storage Management Guide (page 107) now says:

All RAID groups in an aggregate should have a similar number of disks. The RAID groups do not have to be exactly the same size, but you should avoid having any RAID group that is less than one half the size of other RAID groups in the same aggregate when possible.

This is in comparison to ONTAP 8.2 Physical Storage Management Guide (page 91) which says:

All RAID groups in an aggregate should have the same number of disks. If this is impossible, any RAID group with fewer disks should have only one less disk than the largest RAID group.

 

 

Out-of-band Management ports on NetApp – e0M vs SP vs Serial (and BMC!)

One of the things I’ve seen new (and sometimes existing..) customers to NetApp be most confused about, are the various ways of connecting to the system for management.

Over the years, there have been a couple of different out of band management systems (RLM and BMC are the older systems, SP on the newer ones). This post focuses on systems with Service Processor, or SP, as used in the FAS2200, FAS2500, FAS3100, FAS3200, FAS6100, FAS6200 and FAS8000 families. Lets start by going through the physical ports on the back of the controller. Where the ports are varies slightly by model, but the icons are consistent.

netapp-management-ports

A common question is “ok, so the wrench port is e0M, why doesn’t it just say that?”. The short answer is that it isn’t – although you could be forgiven for making that guess. Even NetApp’s label set for Clustered ONTAP includes an e0M cable label, despite their systems not having a specific port labelled e0M. Let’s look at how the ports connect up, from the point of view of an administrator:

netapp-management-block

 

From this simplified block diagram, you can see how they all relate. The port on the outside of the box actually connects to a switch inside the box, and that has both ONTAP’s e0M and the Service Processor’s IP interface connected to it. It’s almost literally running Ethernet on the motherboard traces (it’s actually something called RMII, not normal 802.3, but close enough). The internal switch is unmanaged, which is why you can’t do VLANs over that port. To clarify some more – the service processor is an independent CPU, with its own RAM, flash and OS running on it. It talks to ONTAP very closely, obviously, and to sensors throughout the system, but it’s separate to the main kernel running on the x86 CPU that runs ONTAP.

On 7-mode systems, e0M is just another interface in ONTAP, but in Clustered ONTAP, it can only be used for management LIFs, not data LIFs (or Cluster LIFs). On the FAS2500 and FAS8000, the wrench port, and therefore e0M, are finally 1G, but on previous systems, it’s only 100M. On 7-mode systems, you have to be careful – you don’t want it on the same subnet as any of your data service IPs, or traffic might go out through it, instead of a 1G or 10G port. To stop this, set “options interface.blocked.mgmt_data_traffic on” for all systems (running ONTAP 8.0.2 or higher), but ideally put it on a different subnet. It’s best practice to have, at the very least, a different OOB subnet to data services.

From our diagram again, if you need to do something like monitor boot/shutdown/reboot during an ONTAP upgrade, you can either connect to the Serial Console or the SP IP – the output is the same. I’ve done lots of remote upgrades this way. Once the system is up, and the SP is configured, there’s almost never a need to use the Serial Console again. The SPs don’t talk to each other, so if one node is online and the other is offline, you can’t use the online node to connect to the offline one.

If you’re the type who like managing your 7-mode NetApp from the command line, you would normally SSH into the e0M IP address, while for Clustered ONTAP, you would normally SSH to the Cluster Management IP. You could go from the SP to the system console, but that will be limited to 9600bps output through the serial connection, and if you’re looking at a lot of text, or pasting a lot of text, that can be limiting. For using GUI applications like OnCommand System Manager, you connect to the e0M IP on 7-mode, and the Cluster Management IP on Clustered ONTAP Systems.

A final question I’ve heard is “what is that USB port for?”. Officially, for regular users, it’s unsupported. Unofficially, you can use it to charge your iPhone while its running in hotspot mode, or to power your Airconsole.

Could this all be made simpler? Well, there are good purposes for all of the different IPs and interfaces you might use, so I’m not 100% convinced it could be. Everything new is complex initially, but once you get a handle on it, it all makes sense. Hope this has helped you!

Edit: 2018-07-11

Since writing this article, we’ve released some new platforms, which enable the USB port while at the boot menu, have faster serial ports, and move from an SP to a BMC. They’re pretty similar, except the BMC doesn’t tap into a serial link to the ONTAP controller – it relays it over an internal network, and it doesn’t share the wrench port with e0M anymore.