Counterfeit Cisco SFPs

Probably not news to anyone, but there are counterfeit Cisco SFPs in the market place. It only matters for support purposes – all the SFPs are made to a standard, and it’s just a matter of what the label says past that point. As they say, if it quacks like a duck..

So how can you tell real from fake? Short answer – you can’t easily. If you buy through legit channels, you’re good. If you buy from eBay, there’s a pretty good chance they’re fake. If it’s from AliExpress.. well, I wouldn’t expect real ones.

Going through some old photos, I found this picture from last year from an install job I did – all of these SFPs came directly from Cisco, and they all look different.

Cisco SFP varieties

FAS2240 controller into DS4243/DS4246/DS424x chassis

If you have a venerable old FAS2240 or FAS255x, you can turn it into a disk shelf, by swapping the controllers with embedded IOM6 (IOM6E) for regular IOM3 or IOM6 controllers, and adding it as a shelf to a different controller.

The opposite however is not always true. It is not supported to turn a disk shelf into a FAS2240, but there are instances where it might be required or desirable, and I’ve seen people hit it a couple of times.

The FAS2240 PCM / IOM6E will work in any DS2246. However, while the DS4246 and DS4243 share an enclosure (the DS424), there are two revisions, and the FAS22xx/25xx only work in the newer one of them, which has better cross-midplane ventilation. Placing them in the older version results in a “FASXXX is not a supported platform” message and failure to boot.

The original version, X558A (430-00048) doesn’t support the embedded PCM/IOM6Es, while the X5560 (430-00061) does. If the shelf was shipped new after April 2012, it is probably the X5560. Some earlier may be also.

ONTAP – Why and why not to have one LIF per NFS volume

LIFs, or logical interfaces, are the interfaces from outside world to the storage of a NetApp system. There is a many to one relationship of LIFs to ports. From the early days of Clustered ONTAP, NetApp has given advice to have one LIF per datastore on VMware. There are more general purpose use-cases for this as well.

But it’s not always worth it.

The justification for a 1:1 LIF to volume mapping has been to allow a volume to move between nodes, and to move the LIF to the other node, to avoid indirect access for longer than a few moments.

Indirect access is when IP traffic comes into one node (for example N1), while the volume is on another node (say N2 – but it could be on another HA pair in another cluster). This means the data is pulled off disk on N2, goes over the cluster interconnect switching network, and then out of N1. This adds front end latency, and increases congestion on the cluster network, which in turn can delay cluster operations.

So, it seems like a good idea, right? Ok, if you have three datastores for VMware, for example – there are minimal overheads for having three IPs. But then – if you only have three datastores, how likely are you to move 1/3rd of the VMs from one node to the other? So that’s an argument for not doing it. But with 7 datastores, it’s much more likely to come up, and still, 7 to 10 IPs isn’t too bad. But if you have 50 datastores, it’s probably more than two nodes, so putting them all in place, managing the mapping datastores to LIFs.. there’s a lot of overhead.

Let’s have a look at WHY you might move a volume:

  1. Aggregate full – no more aggregates on original home node
  2. Controller CPU/IO high – balance workloads to another controller
  3. Equipment replacement – Moving off old equipment onto new equipment

In the third case, indirect access is ok, because it is temporary, so there’s no need for additional LIFs for that. For the other two cases, especially for VMware, there’s always the options of doing a storage vMotion to move all the VMs. For non VM workloads, it’s obviously going to be a different scenario – so the decision to weigh up is – how often do you as an admin think you’ll need to move only one or two volumes at a time? There is always an option of unmounting off a LIF on the source node and remounting from an IP on the destination.

So for my money – more than three datastores and less than ten, one LIF per datastore is probably fine. For anything else, I’d suggest just one NFS LIF per node (per SVM), and deal with preventing indirect access through other means. But I also don’t think it’s a “hard and fast” rule.

Selective LUN Mapping on ONTAP 8.3

We have a customer with a pretty kick-ass ONTAP environment that we built up last year – dual sites, each with 2x FAS8040 HA pairs in a cluster. This year we added an HA pair of AFF8080s with 48 x 3.84TB SSDs to each site, which included an upgrade to ONTAP 8.3.2.

We’re in the process of migrating from older FAS3270s with ONTAP 8.2 for these guys – we did a bunch of migrations last year, and we started again this year. Depending on application, workloads, etc we have a number of different methods for migration, but we got caught out last week with some LUN migrations.

Turns out there is a new features in ONTAP 8.3, which is turned on by default for new and migrated LUNs – selected LUN mapping. SLM reduces host failover time by only announcing paths from the HA pair hosting the LUN. But it’s only turned on for new LUNs – existing ones still show all 12 paths (2 per node). This is a bit of an odd choice to my thinking – I think it should optional if the system is already in production.

So our excellent tech working on the project, thinking it was a bug, called NetApp Support – and spent way too long being told to upgrade HUK, DSM and MPIO. Needless to say.. this didn’t work. Kinda disappointing. I’m told there’s a magic phrase you can use – “I feel this call isn’t progressing fast enough, can you please transfer me to the duty manager?”. Has this ever worked for you? Let me know in the comments 😉

What can I do with my old NetApp hardware?

I had a chance today to go through some equipment in my lab pool and try some things I’d been thinking about for a while.

  • Q: If you pull the CF card out of a FAS30xx or FAS31xx system and put it in a PC, does it boot?
  • A: Yes, kind of. It’s a standard FAT16 card, with a standard boot loader on it. However, there is no console, so it just boots up with a flashing cursor, but plug your serial cable into your PC’s serial port and you can interact with it. I tried it in a USB CF reader, and all the kernel boot options refer to IDE devices. With an older system and an IDE to CF header, it might go further, but ONTAP’s boot process has platform checks, so it will probably fail at that point
LOADER> printenv

Variable Name        Value
-------------------- --------------------------------------------------
CPU_NUM_CORES        2
BOOT_CONSOLE         uart0a
BIOS_VERSION         1.3.0
BIOS_DATE            06/22/2010
SYS_MODEL            Vostro 220 Series
SYS_REV              �P�(
SYS_SERIAL_NUM       C384SK1
MOBO_MODEL           0P301D
MOBO_REV             A02
MOBO_SERIAL_NUM      ..CN7360495H03W1.
CPU_SPEED            3000
CPU_TYPE             Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
savenv               saveenv
ENV_VERSION          1
BIOS_INTERFACE       86A0
LOADER_VERSION       1.6.1
ARCH                 x86_64
BOARDNAME            Eaglelake
  • Q: Can I use a DS14MK2/DS14MK4/EXN2000 with Linux?
  • A: Yes! Plenty of people have done it. For FC devices, there is a problem of 520 byte sectors, but for SATA(ATA) devices, the use 512 byte ones natively, so no problem. Use a PCI or PCIe FC card like the LPE11002 ($10 on ebay), then install sg3-utils (ubuntu, check your distro for its name there), and use “sg_format -s 512” on any FC drives to convert them from 520 byte sectors to 512 byte sectors, then use the device like any other.

 

  • Q: What about DS4243/DS4246/DS2246 shelves with Linux?
  • A: This one I’m less sure of – but it seems like it should work. I got pretty close. They are just SAS expanders. I have put a NetApp X2065 PCIe SAS HBA into a Linux system, and it is recognised as a PMC8001 SAS HBA. Plugging the shelf in (single attachment) results in the drives being recognised (same 520 byte problem for SAS drives though). Was able to create a LVM PV on a couple of SATA drives, put it into a VG, and then create an LV, but when I tried formatting the LV, it failed when it got the stage of writing superblocks. It’s probably fixable, but I don’t have the time or need to do so. It is also worth mentioning that the PMC8001 is made for rack mount systems with high airflow – inside a standard PC it gets VERY VERY hot, very quickly.

 

  • Q: What happens if I put a FlashCache (PAM II 512GB) card into a PC?
  • A: Nothing. Linux detects the PCI vendor ID as NetApp, but then doesn’t assign a class, and just says product ID of 774c.

 

  • Q: What if I install Linux on a CF card, then put it into a FAS3170?
  • A: Stay tuned 😉 Standard ubuntu-core won’t fit onto the 1GB supplied CF card. I’m in the process of acquiring a larger one, and I’ll try.

Adding more disks to an ADP NetApp

I have a FrankenFAS2240, made up out of parts from about 5 different systems, totally unsupported. I set it up initially with 12 drives, and then got some more and wanted to grow the ADP setup.

By default, putting these drives into the enclosure, they showed up as broken. The solution to this is from this NetApp Communities post  – once the drives are labelwiped and set to spare, they are automatically partitioned.

From there, it’s just a matter of running disk assign for the data partitions, zeroing them, then adding them! Easy!

Re-ordering shelves for NetApp FAS

No-one is perfect. I recently added some shelves to the wrong location of a stack, breaking the design rule of a single speed transition between 3G and 6G shelves, and didn’t find out until after the new disks were added to an existing aggregate. Under normal circumstances, you can officially hot remove disk shelves from a system running ONTAP 8.2.1 or later. Assuming they don’t have data on them, which these ones did.

Fortunately ONTAP doesn’t require symmetric SAS topologies, so I did the following to resolve it:

  1. Aim to recable the IOM B stack
  2. Failover and take node 2 out of service
  3. Disconnect SAS cable from node 1 (yes, node 1) to IOM B
  4. Recable IOM B’s SAS stack
  5. Disconnect node 2’s connection to the IOM A stack
  6. Bring up node 2, failback
  7. At this point, node 1 and node 2 have different, non redundant topologies
  8. Failover node 1 to node 2
  9. Recable IOM A stack
  10. Reconnect redundant connections for IOM A and IOM B to bring node 2 back into MPHA
  11. Failback to node 1

Tada, all done, non-disruptive (system is iSCSI only – CIFS without SMB3 Continuous Availablity would result in session disconnects)

I am become death, destroyer of SANs

Most people want their SAN to keep data around, with maximum resiliency. But what do you do at the end of their lives?

From time to time I get called in to do the opposite of what most people care about from SANs – destroying them. ONTAP has built in sanitization options, which perform a combination of overwrites and zeroing of drives, to enable you to securely erase the drives, and with some NetApp models, like the FAS2240 and FAS255x’s, you can convert them into disk shelves.

Sanitizing all the drives in a controller is usually a two step process – you destroy the existing aggregates, create a new basic system on a small aggregate, then run sanitize on the remaining disks, then repeat, erasing the ones used for the root volume while you’re running the first sanitize.

But there’s an easier way – disable cf, offline all the volumes except vol0, take the system down, boot to maintenance mode, destroy all the aggregates, then reassign all the drives to one controller, and create a two disk RAID4 aggregate using the two drives that were the spares from each controller previously – they won’t have had data on them, so usually no need to sanitize. Boot into ONTAP and run through the initial setup wizard (there’s a bit of hand waving here about the exact process, as it differs between 7.x and 8.x), run the sanitize, and you’re done in a single step.

To do a shelf conversion without a sanitize, similar plan – offline volumes, disable cf, boot to maintenance mode, take ownership of all the drives (using disk reassign to reassign them from their partner), then destroy the aggregates, then remove ownership from all drives and shut the system down. Then, swap the PCM/IOMEs out for real IOMs, and attach as a new shelf. The new system will need to zero the spares before you can use the drives, and it is usually half the speed of doing it from option 4 in the special boot menu (which makes it about 17 hours for 3TB SATA), but the waiting game is all part of systems administration 😉

ONTAP 8.3 – Disk Assignment policy

ONTAP 8.3 further refines how and when disks are automatically assigned. There are now 4 options for the disk auto-assignment policy – bay, shelf, stack and default. For heavy reading, check out the ONTAP 8.3 Physical Storage Management guide.

If “bay” is chosen, disks in odd-number bays are assigned to the same controller, and disks in even numbered bays are assigned to the same controller. If “stack” is chosen, all drives in the same stack are assigned to the same controller, and if “shelf” is chosen, all drives in the same shelf are assigned to the same controller.

Default is an interesting one – on the FAS22xx and FAS25xx, it means “bay”, on everything else, it means “stack”. If you have a single stack on an 8020? Well, you’ll need to manually set the policy to “shelf”.

Aggregate level snapshots – turn them off!

A company we deal with had a pretty nasty problem recently. They were doing some major VMWare changes, including networking, and were using Storage VMotion to move from one datastore to another, on the same aggregate.

Sounds good, right? Well, there’s a well intentioned, historic, feature in NetApp’s Data ONTAP, of aggregate snapshots. The problem this company faced was due to aggregate level snapshots – while they are set to auto-delete, the blocks are not immediately freed (but the space looked available). There is a low priority free space reaper process that actually makes the blocks writable again. With the blocks unavailable, the aggregate was essentially full, leading to the usual result that WAFL exhaustion leads to, of glacial latency, which reduces the effectiveness of the space freeing process too, making it even worse. And in this case, a whole company going home. Background freeing of blocks is one of those features that makes sense – but as aggregates got bigger and bigger, and over 16TB, there exists a greater impact of whole-volume scan operations.

Normally you make snapshots of volumes, for connected system backups, recovery, etc. Aggregate level ones were a failsafe of last resort for ONTAP. From really early days, NetApp systems have had a battery to keep the NVMEM cache alive, even if the system looses power. When it comes back on, those changes are flushed to disk, and WAFL is once again consistent. The batteries used in the time I’ve dealt with NetApp have allowed for a 72 hour power outage. At times, however, this is not enough. ONTAP will boot, and find WAFL is inconsistent, and run essentially a fsck (wafliron). Usually that works. Sometimes, it doesn’t, and then you have a Very Big Problem. Aggregate level snapshots can save you here – the same way a LUN snapshot might save an attached system.

This all changed with the FAS8000 series – these systems have a battery too, but they use it to de-stage the NVMEM to an SSD. This means they can withstand outages longer than 72 hours. Which means there is no need for aggregate snapshots on the FAS8020, FAS8040, FAS8060 and FAS8080EX systems. I’d go so far as to turn them off on all systems. After the problems this company faced, I’ll be doing it for everyone.

There is one downside – aggregate snapshots can save you if you delete the wrong volume. You move any other volumes off the aggregate (yay cDOT), assuming you have another one, then revert the aggregate. This risk is usually addressed with snapvaults/snapmirrors/backup software, but it’s worth remembering. Aggregate snapshots are also used by syncmirror and Metrocluster, as outlined in this article, but in my market segment, these aren’t major uses.

This KB from NetApp recommends they be turned off for data aggregates, which like flow control on 10GbE ports, makes you wonder why this isn’t the default setting. So, check your systems folks, and turn off your aggregate snapshots, especially if you have FAS8000 systems.