Setting up a LVM LV quickly..

I’m playing with proxmox right now, and had a need to setup LVM locally on each node. I figure why not script it.. I’ve ended up with this abomination of a script. You probably want to do better, but it’s a start:

set -x;
lvmid=`hostname`_localLVM; 
pvcreate /dev/sda; 
pvs; 
vgcreate vg_$lvmid /dev/sda; 
vgs; l
vcreate -n lv_$lvmid -l 100%FREE vg_$lvmid;
lvs; 
mkdir -p /local/vg_$lvmid-lv_$lvmid; 
mkfs.ext4 /dev/mapper/vg_$lvmid-lv_$lvmid;
echo /dev/mapper/vg_$lvmid-lv_$lvmid /local/vg_$lvmid-lv_$lvmid ext4 defaults 0 0 >> /etc/fstab; 
mount /local/vg_$lvmid-lv_$lvmid; 
df -h

Or on one line..

set -x;lvmid=`hostname`_localLVM; pvcreate /dev/sda; pvs; vgcreate vg_$lvmid /dev/sda; vgs; lvcreate -n lv_$lvmid -l 100%FREE vg_$lvmid;lvs; mkdir -p /local/vg_$lvmid-lv_$lvmid; mkfs.ext4 /dev/mapper/vg_$lvmid-lv_$lvmid;echo /dev/mapper/vg_$lvmid-lv_$lvmid /local/vg_$lvmid-lv_$lvmid ext4 defaults 0 0 >> /etc/fstab; mount /local/vg_$lvmid-lv_$lvmid; df -h

Hope this helps someone in the future.. maybe me!

NetApp H610S Fan speed too high – controlling with IPMI

I was chatting with someone on the NetApp Discord today – they’d just bought a used NetApp H610S aka NAF-1703 aka Quanta D52B-1U, and installed Proxmox on it, but the fans were running too high. This was a WAF issue, and they were looking for a way to calm it down.

We did some digging and found this document on the IPMI commands for a similar platform – but they didn’t work on this one.

Some digging around suggested that they needed to use

ipmitool raw 0x30 0x39 0x00 0x00

instead of

ipmitool raw 0x30 0x39 0x01 0x00

What’s the difference? Who knows, but it worked.

Hope this helps someone!

7 Mode? In this economy?

I had two discussions about Data ONTAP 7-mode last week, which was a bit of a surprise, since it’s been something NetApp has been working to help customers get away from for.. some time now. 8 years really pushing it, 10 years since NetApp started providing Clustered ONTAP as an option.

You can totally understand it – data has GRAVITY. It’s heavy and hard to move. Those moves and cutovers need to be as seamless, or quick (or ideally both) as possible. And 7DOT was a platform people had a lot of experience with and understood, and change is difficult.

I’ve been in videos and given countless presentations how to do 7toC migrations quickly and easily, and done a LOT of them, either personally, or working with customers, but the end result is that some people haven’t done it, and it’s now 2023, and the remaining 7DOT users find themselves in a tough spot.

Last November, Microsoft made some AD changes, which means that to continue using 8.2.5P5 with Active Directory, you need to re-enable RC4 encryption. RC4 is.. not terribly secure, so I wouldn’t do that.

At the beginning of Feb 2023, NetApp stopped supporting the FAS255x and FAS80x0 controllers, which are the last generation to run 8.2.5, the last release of 7DOT, which itself is now in “self support”. Self support means NetApp won’t delete the webpages which help with troubleshooting. But once they’re gone in Jan 2026 (less than three years away.. and it can’t come soon enough), you’ll be stuck with some random university in Wollongong hosting an old copy..

ONTAP does everything 7DOT did except allow direct FC connections (because it requires NPIV for FC LIF hosting) and providing an FTP server. The first is an easy fix (buy an FC switch.. if you’re still running 7DOT, you’re probably not adverse to eBay purchases of infrastructure) and the second is a matter of setting up a small Linux VM somewhere if it’s a really big concern.

The best time to migrate off 7DOT was 2016. The second best time is now.

How to wipe a partitioned ADP NetApp system

With ONTAP 9, there is now an “option 9” in the boot menu that allows you to re-initialise a system to/from ADP, like wipeconfig.

It is a three step process to wipe an HA pair – the first one, option 9a –  removes the existing partition information. And the second, option 9b, will repartition and re-initialise the system, and then finally on the node that was halted, boot it, then wipe it (option 4) from its boot menu.

*************************************************
* Advanced Drive Partitioning Boot Menu Options *
*************************************************
(9a) Unpartition disks and remove their ownership information.
(9b) Clean configuration and initialize node with partitioned disks.
(9c) Clean configuration and initialize node with whole disks.
(9d) Reboot the node.
(9e) Return to main boot menu.

The caveat is that one node has to be halted at the LOADER> prompt while you run the first two commands. That should be it!

Moving your Windows install to an SSD, breaking it, then fixing it.

I’d been putting this off long enough, but yesterday was the day! I was going to move our Windows install to an SSD.

And I did. But you shouldn’t.

If at all possible, do a fresh re-install of Windows on your new SSD, and move your data across. Download the Windows DVD Creator (also makes USB keys), or use the “Make a recovery disk” option in Windows to blatt the installer onto an 8GB USB key, and start fresh.

So why didn’t I do that? I like a challenge, and at this point I’m just being obstinate about not re-installing.

For a short period of time in 2009, my wife worked for a company in Canada, that then got bought out by Microsoft. Like, a really short period of time – she “worked” for 3 weeks, then got 4 weeks severance… and her desk.. and her computer, which was a pretty speccy (for 2009) Dell, running Windows 7. It got case swapped, then we swapped the motherboard, then I moved it from a 750GB SATA HDD to a 2TB SATA SSHD (Hybrid HDD.. I wouldn’t recommend them frankly), and in the process moved from MBR to GPT and BIOS to uEFI, all without re-installing. At this point, we’re now in a different country, almost 8 years later, in it’s third case, second motherboard, fourth graphics card, and it’s now running Windows 10, but it hasn’t been reinstalled.

The first challenge – C:\ was a 750GB partition, and the new SSD was 500GB. Reboot into SysRescCD and use gparted to re-size – except for some reason, it couldn’t unmount it. Mess around for a few reboots, and eventually boot with the option to cache everything into memory, and we’re good – resized down to 450GB

Next challenge – the rest of the source drive isn’t empty – there’s another 750GB scratch partition, as well as two Linux partitions. This means I can’t just copy the entire disk to the new one. But I do need the GPT, EFI boot partition, and Windows partition, and they’re all in the first 500GB. Cue “dd if=/dev/sda of=/dev/sdb bs=4096 count=115500000”. Then load up “gdisk”, delete the entries for partitions that don’t exist, and away we go.

And it works.. until I plug the old drive back in (even after deleting the old C: drive and EFI drive with SysRescCD..). Then it stops booting.

At this point, I’m pretty sure the EFI partition and BCD is hosed, so eventually I find this article – http://www.hasper.info/repair-a-destroyed-windows-7-uefi-boot-sector/ – it works for Windows 10, thankfully, and now everything is back working again, and speedy and on an SSD.

Most people’s saturday night’s don’t involve rewriting partition tables and fixing EFI. Perhaps that mine does is a sign I’m in the right career right now, working for a SAN/NAS Vendor..

Data recovery from Apple FileVault / Encrypted Disk Images

I had a message a few weeks ago from a random guy from Italy, who had found a post of mine about rewriting GPT tables on OSX, and wondered if I could help him recover data from an encrypted disk image that had screwed up when he tried to resize it, and that said no valid partitions when he mounted it. That was an exciting exercise. Fortunately it’s easy to make a master copy, in case you screw it up.

First problem: If OSX fails to find any filesystems after attaching an encrypted disk image, it detaches them. Solution to that is to run it from the command line – “hdiutil attach /path/to/file -nomount”

Peering in with gdisk, it was clear there wasn’t a partition there. We then tried recreating the GPT, by using the metadata to create an identical image, and then create the correct partitions at the correct offsets. That didn’t work, unfortunately, but it was fun to try.

Eventually we left the image attached without attempting to mount filesystems, and he was able to use photorec to recover most of the files out of there. So it wasn’t perfect, but it worked in the end – it was a fun challenge, and troubleshooting this over Facebook Messenger added to it.

If you go to enough effort to find me on Facebook, and you’re half way to a solution, I’m sometimes up for one.

ONTAP – Why and why not to have one LIF per NFS volume

LIFs, or logical interfaces, are the interfaces from outside world to the storage of a NetApp system. There is a many to one relationship of LIFs to ports. From the early days of Clustered ONTAP, NetApp has given advice to have one LIF per datastore on VMware. There are more general purpose use-cases for this as well.

But it’s not always worth it.

The justification for a 1:1 LIF to volume mapping has been to allow a volume to move between nodes, and to move the LIF to the other node, to avoid indirect access for longer than a few moments.

Indirect access is when IP traffic comes into one node (for example N1), while the volume is on another node (say N2 – but it could be on another HA pair in another cluster). This means the data is pulled off disk on N2, goes over the cluster interconnect switching network, and then out of N1. This adds front end latency, and increases congestion on the cluster network, which in turn can delay cluster operations.

So, it seems like a good idea, right? Ok, if you have three datastores for VMware, for example – there are minimal overheads for having three IPs. But then – if you only have three datastores, how likely are you to move 1/3rd of the VMs from one node to the other? So that’s an argument for not doing it. But with 7 datastores, it’s much more likely to come up, and still, 7 to 10 IPs isn’t too bad. But if you have 50 datastores, it’s probably more than two nodes, so putting them all in place, managing the mapping datastores to LIFs.. there’s a lot of overhead.

Let’s have a look at WHY you might move a volume:

  1. Aggregate full – no more aggregates on original home node
  2. Controller CPU/IO high – balance workloads to another controller
  3. Equipment replacement – Moving off old equipment onto new equipment

In the third case, indirect access is ok, because it is temporary, so there’s no need for additional LIFs for that. For the other two cases, especially for VMware, there’s always the options of doing a storage vMotion to move all the VMs. For non VM workloads, it’s obviously going to be a different scenario – so the decision to weigh up is – how often do you as an admin think you’ll need to move only one or two volumes at a time? There is always an option of unmounting off a LIF on the source node and remounting from an IP on the destination.

So for my money – more than three datastores and less than ten, one LIF per datastore is probably fine. For anything else, I’d suggest just one NFS LIF per node (per SVM), and deal with preventing indirect access through other means. But I also don’t think it’s a “hard and fast” rule.

Selective LUN Mapping on ONTAP 8.3

We have a customer with a pretty kick-ass ONTAP environment that we built up last year – dual sites, each with 2x FAS8040 HA pairs in a cluster. This year we added an HA pair of AFF8080s with 48 x 3.84TB SSDs to each site, which included an upgrade to ONTAP 8.3.2.

We’re in the process of migrating from older FAS3270s with ONTAP 8.2 for these guys – we did a bunch of migrations last year, and we started again this year. Depending on application, workloads, etc we have a number of different methods for migration, but we got caught out last week with some LUN migrations.

Turns out there is a new features in ONTAP 8.3, which is turned on by default for new and migrated LUNs – selected LUN mapping. SLM reduces host failover time by only announcing paths from the HA pair hosting the LUN. But it’s only turned on for new LUNs – existing ones still show all 12 paths (2 per node). This is a bit of an odd choice to my thinking – I think it should optional if the system is already in production.

So our excellent tech working on the project, thinking it was a bug, called NetApp Support – and spent way too long being told to upgrade HUK, DSM and MPIO. Needless to say.. this didn’t work. Kinda disappointing. I’m told there’s a magic phrase you can use – “I feel this call isn’t progressing fast enough, can you please transfer me to the duty manager?”. Has this ever worked for you? Let me know in the comments 😉

What can I do with my old NetApp hardware?

I had a chance today to go through some equipment in my lab pool and try some things I’d been thinking about for a while.

  • Q: If you pull the CF card out of a FAS30xx or FAS31xx system and put it in a PC, does it boot?
  • A: Yes, kind of. It’s a standard FAT16 card, with a standard boot loader on it. However, there is no console, so it just boots up with a flashing cursor, but plug your serial cable into your PC’s serial port and you can interact with it. I tried it in a USB CF reader, and all the kernel boot options refer to IDE devices. With an older system and an IDE to CF header, it might go further, but ONTAP’s boot process has platform checks, so it will probably fail at that point
LOADER> printenv

Variable Name        Value
-------------------- --------------------------------------------------
CPU_NUM_CORES        2
BOOT_CONSOLE         uart0a
BIOS_VERSION         1.3.0
BIOS_DATE            06/22/2010
SYS_MODEL            Vostro 220 Series
SYS_REV              �P�(
SYS_SERIAL_NUM       C384SK1
MOBO_MODEL           0P301D
MOBO_REV             A02
MOBO_SERIAL_NUM      ..CN7360495H03W1.
CPU_SPEED            3000
CPU_TYPE             Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
savenv               saveenv
ENV_VERSION          1
BIOS_INTERFACE       86A0
LOADER_VERSION       1.6.1
ARCH                 x86_64
BOARDNAME            Eaglelake
  • Q: Can I use a DS14MK2/DS14MK4/EXN2000 with Linux?
  • A: Yes! Plenty of people have done it. For FC devices, there is a problem of 520 byte sectors, but for SATA(ATA) devices, the use 512 byte ones natively, so no problem. Use a PCI or PCIe FC card like the LPE11002 ($10 on ebay), then install sg3-utils (ubuntu, check your distro for its name there), and use “sg_format -s 512” on any FC drives to convert them from 520 byte sectors to 512 byte sectors, then use the device like any other.

 

  • Q: What about DS4243/DS4246/DS2246 shelves with Linux?
  • A: This one I’m less sure of – but it seems like it should work. I got pretty close. They are just SAS expanders. I have put a NetApp X2065 PCIe SAS HBA into a Linux system, and it is recognised as a PMC8001 SAS HBA. Plugging the shelf in (single attachment) results in the drives being recognised (same 520 byte problem for SAS drives though). Was able to create a LVM PV on a couple of SATA drives, put it into a VG, and then create an LV, but when I tried formatting the LV, it failed when it got the stage of writing superblocks. It’s probably fixable, but I don’t have the time or need to do so. It is also worth mentioning that the PMC8001 is made for rack mount systems with high airflow – inside a standard PC it gets VERY VERY hot, very quickly.
  • Update: 2017-08 – I had someone email me about this, and Youtube mysteriously suggested this video on this very topic. After some back and forth, it looks like the trick to getting it working is to pull out the second IOM from the back of the system and single attach it. This may only be needed for SATA drives with the interposer board that makes them talk SAS. I know some people who have got the DS2246 with SAS drives working without having to do this.

 

  • Q: What happens if I put a FlashCache (PAM II 512GB) card into a PC?
  • A: Nothing. Linux detects the PCI vendor ID as NetApp, but then doesn’t assign a class, and just says product ID of 774c.

 

  • Q: What if I install Linux on a CF card, then put it into a FAS3170?
  • A: Stay tuned 😉 Standard ubuntu-core won’t fit onto the 1GB supplied CF card. I’m in the process of acquiring a larger one, and I’ll try.

Adding more disks to an ADP NetApp

I have a FrankenFAS2240, made up out of parts from about 5 different systems, totally unsupported. I set it up initially with 12 drives, and then got some more and wanted to grow the ADP setup.

By default, putting these drives into the enclosure, they showed up as broken. The solution to this is from this NetApp Communities post  – once the drives are labelwiped and set to spare, they are automatically partitioned.

From there, it’s just a matter of running disk assign for the data partitions, zeroing them, then adding them! Easy!